Error calculation

Measurement and Uncertainty

Measurement is always accompanied by uncertainty. We distinguish between two main types:

Systematic uncertainties:

Systematic uncertainties do influence under the same measurement conditions all measurements in a similar way. Systematic uncertainties are not reducible by repeated measurements.
Statistical uncertainties

Statistical uncertainties do manifest themselves through random fluctuations around an average value. Statistical uncertainties are reducible by repeated measurements.

Empirical mean

The empirical mean of $n$ values $z_i$ of a sample is:

$$ \bar{z} =\frac{1}{n} \sum_{i} z_{i} $$

The empirical mean is best estimate for true value $\left< z \right>$ which remains unknown.

Empirical standard deviation

The empirical standard deviation is the best estimate for true standard deviation $\sigma_z$. It is a measurement of the dispersion of individual measurements around the mean : the smaller the value of $\sigma_z$ , the closer the measurements are to the mean measurement of sample quality

$$ s_{z}=\sqrt{\frac{\sum_{i}^{} \left( \bar{z} -z_{i} \right)^{2}}{n-1}} \simeq \sigma_z $$

Standard error of the mean

$$ s_{\bar{z}} \approx \frac{s_z}{\sqrt{n}} $$

Gaussian probability density

Gaussian probability density with expectation $\mu$ and standard deviation $\sigma$:

$$ f\left( z \right) =\frac{1}{\sqrt{2\pi} \sigma} e^{-\frac{\left( z-\mu \right)^{2}}{2\sigma^{2}}} $$

The expectation $\mu$ corresponds to the true value, the mean $\left< z \right >$.

Graph

The gaussian probability density has the following curve:

Central limit theorem

The central limit theorem states that if a large number of independent and identically distributed random variables are summed, their normalized sum tends toward a Gaussian distribution, regardless of the original distribution of the variables.

Interpretation of standard deviation

Standard deviation:

68,3% of individual values are in the interval $\left< z \right > \pm \sigma_z $
95,5% of individual values are in the interval $\left< z \right > \pm 2 \sigma_z $
99,7% of individual values are in the interval $\left< z \right > \pm 3 \sigma_z $

Standard error:

With a probability 68,3% the true value is in the interval $\left< z \right > \pm \sigma_{\left< z \right >} $
With a probability 95,5% the true value is in the interval $\left< z \right > \pm 2 \sigma_{\left< z \right >} $
With a probability 99,7% the true value is in the interval $\left< z \right > \pm 3 \sigma_{\left< z \right >} $

Error propagation

Let $R= R(a, b, \cdots)$ be a physical quantity that cannot be measured directly, but which can be calculated from quantities $a, b, \cdots $ measured directly.

Gaussian error propagation law

Definition

For a function $f ( x_1 , x_2 , \cdots , x_n )$ where each variable $x_i$ has an associated uncertainty $\sigma_{x_i}$, the standard deviation or uncertainty of $f$, denoted as $\sigma_f$, can be approximated using a first-order Taylor expansion. The Gaussian error propagation formula is given by:

$$ \sigma_f = \sqrt{ \left( \frac{\partial f}{\partial x_1} \right)^2 \sigma_{x_1}^2 + \left( \frac{\partial f}{\partial x_2} \right)^2 \sigma_{x_2}^2 + \cdots + \left( \frac{\partial f}{\partial x_n} \right)^2 \sigma_{x_n}^2 } $$

Unweighted linear regression

In order to fir a line $y = a + b x$ to minimize the sum of squared deviations between observed and predicted values we use:

\[ \sum (y_i - (a + b x_i))^2 \]

Remark

This method is known as least squares regression. It assumes all uncertainties are equal or negligible.

Weighted linear regression

When measurement uncertainties vary between data points, a weighted least squares approach is used:

\[ \sum \left( \frac{y_i - (a + b x_i)}{\Delta y_i} \right)^2 \]

The weights $1/\Delta y_i^2$ ensure that points with smaller uncertainties contribute more to the fit.

Coefficient of determination $R^2$

The coefficient of determination, often denoted as $R^2$, is a statistical measure that represents the proportion of the variance for a dependent variable that is explained by one or more independent variables in a regression model. It provides an indication of the goodness-of-fit of the model.

Formula

$$ R^2 = 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2} $$

where $ y_i $ are the observed values, $ \hat{y}_i $ are the predicted values from the model, and $ \bar{y} $ is the mean of the observed data.

Strength and direction of the coefficient of determination

The correlation coefficient quantifies both the strength and direction, which can be positive or negative, of a linear relationship between an independent variable and a dependent variable.

Range of the coefficient of determination

Range $r$ ranges from -1 to 1:

An $r$ value of -1 indicates a perfect negative linear relationship: as one variable increases, the other decreases proportionally.
An $r$ value of 0 suggests no linear relationship between the variables.
An $r$ value of 1 indicates a perfect positive linear relationship: as one variable increases, the other increases proportionally.

Minimum chi-square estimation

The minimum chi-square estimation involves finding parameter values that minimize the chi-square statistic, which measures the discrepancy between observed and expected data under the model.

Formulas

\[ \chi^2 = \sum \left( \frac{y_i - y_{\text{theory}, i}}{\Delta y_i} \right)^2 \]

The reduced chi-square is:

\[ \chi^2/\text{d.o.f.} = \frac{\chi^2}{\text{degrees of freedom}} \]

A value near 1 suggests that the model describes the data within measurement uncertainties.