Uncertainty and the lognormal distribution in ecoinvent

Introduction

The lognormal is the most common distribution chosen to describe the uncertainty in ecoinvent. It has the advantage of not being defined in the negative domain, so credits do not accidentally happen during a Monte Carlo simulation.

The lognormal is not as intuitive as the normal distribution, and is often confusing to the new users. As a primer, we recommend “Log-normal Distributions across the Sciences: Keys and Clues” by Eckhard Limper et al, in BioScience, May 2001, Vol. 51, No.5. (

Definition and basic properties of the lognormal distribution

A variable is lognormally distributed when the logarithm of the sample is normally distributed. The probability density function (PDF) of the lognormal is:

where x is the random variable, mu and sigma are the median and standard deviation of the distribution of ln(x) (sometimes called “the underlying normal distribution). The median and standard deviation of x, noted mu* and sigma*, can be obtained through the following equations:

mu* = exp(mu)

sigma* = exp(sigma)

The quantity sigma* is useful to calculate intervals of confidence:

Confidence interval / lower boundary / upper boundary
68.30% / mu*/(sigma*) / mu*(sigma*)
95.50% / mu*/(sigma*)2 / mu*(sigma*)2
99.70% / mu*/(sigma*)3 / mu*(sigma*)3

In the lognormal distribution, the median corresponds to the geometric mean, and is found at exp(mu). The arithmetic mean is found slightly higher than the geometric mean, at exp(mu + sigma2/2). The mode (the most likely value) is found at a lower value, exp(mu - sigma2). The larger the standard deviation, the larger is the skewedness and the further apart those three quantities will be.

From ecoinvent to the lognormal PDF

Three inputs are necessary from the data provider to determine the parameters of the lognormal distribution: the deterministic value, the basic uncertainty and the pedigree matrix.

Going from the deterministic value to mu is straightforward: this value is taken as equal to mu*. In ecoEditor and ecoQuery, mu is called “Arithmetic mean of log-transformed data”. The deterministic value is also called “Geometric mean” in those tools.

mu = ln(deterministic value)

Then, the basic uncertainty is chosen. This value reflects the fact that even a “perfect” data is uncertain: there are fluctuation over time, errors in measurements, etc. The table 10.3 of data quality guideline provides for values, depending on the type of exchange and process modelled. In ecoEditor and ecoQuery, this value is called “Variance of log-transformed data”. The field “Standard deviation (SD95)” is equal to exp((Variance of log-transformed data)0.5)2, a value that is not used anywhere in the rest of the calculation.

Then, a score from 1 to 5 is selected for 5 indicators: reliability, completeness, temporal correlation, geographical correlation, further technological correlation. These scores are transformed into additional uncertainty in order to reflect that the amount of an exchange might come from sources that are not as reliable as primary data collection. The values can be older, from a different technology, another part of the world or based on estimates rather than calculation or measurement. Table 10.5 of the data quality guidelines shows the relationship between the pedigree scores and the additional uncertainty.

The basic uncertainty is added to the five additional contributions to the uncertainty. This sum is called “Variance of data with pedigree”. Finally, the “CI/2wP, half range of confidence interval” is calculated as

exp((Variance of log-transformed data)0.5)2, corresponding to the square of sigma*.

A numeric example

Mathematical name / ecoEditor/ecoQuery name / formula / value
deterministic value / Geometric mean / (Input by data provider) / 8.5
mu / mu / ln(deterministic value) / 2.140066
basic uncertainty / Variance of log-transformed data / (Input by data provider) / 0.004
Standard deviation (SD95) / exp((Variance of log-transformed data)0.5)2 / 1.134839
Reliability / (Input by data provider) / 1
Completeness / (Input by data provider) / 2
Temporal correlation / (Input by data provider) / 4
Geographical correlation / (Input by data provider) / 3
Further technological correlation / (Input by data provider) / 2
Additional uncertainty, Reliability / from DQG table 10.5 / 0
Additional uncertainty, Completeness / from DQG table 10.5 / 0.0001
Additional uncertainty, Temporal correlation / from DQG table 10.5 / 0.008
Additional uncertainty, Geographical correlation / from DQG table 10.5 / 0.0001
Additional uncertainty, Further technological correlation / from DQG table 10.5 / 0.0006
Variance of data with pedigree / Variance of log-transformed data + additional uncertainties / 0.0128
CI/2wP, half range of confidence interval / exp((Variance of data with pedigree)0.5)2 / 1.253919
sigma* / (CI/2wP)0.5 / 1.119785
sigma / ln(sigma*) / 0.113137

Corresponding ecoEditor uncertainty window