Distribution Statistics

Most statistical work is done using normal distributions. This allows the use of classical statistical techniques to make predictions about a Population from a set of Sample data. Classical statistics is based on the concept of a Normal Distribution which can be defined by a Mean and Standard Deviation

The normal distribution approach works well with data that is not highly skewed. That is, the data is not significantly biased (appearing at one end or the other) on a histogram type graph. Even where data is moderately skewed, the mean of repeated samples drawn from the population will be normally distributed (central limits theorem). This property allows such distributions to be subjected to classical statistical tests.

Much data that is collected and subjected to statistical analysis is normally distributed. However, geological data, particularly ore grades, frequently show distributions which depart significantly from the normal distribution. It has been found that in many cases the Ln (Natural logarithm) of the values is normally distributed. This is called a Log-normal distribution.

Data likely to be Log-normally distributed is characterised by a large number of observations at some low value and a long tail of higher values. The highest values may be many orders of magnitude higher than the low values. For example, the average gold grade in a mineralised zone may be 2.5 g/t but there may be a continuous range of values up to grades in the order of 1,000 g/t.

Data which is nearly log-normal can often be transformed into log-normal data by the use of an additive constant. This is added to all values before conversion to Ln (Natural Log). It will have a much greater effect on low values than on high values. (e.g. if the additive constant is 2, adding this to a value of 0.05 will produce a relatively much larger increase in value than adding it to a value of 555.) The use of an additive constant is particularly useful when handling data such as gold grades which may have many values near or below the detection limit. These values tend to distort the lower end of the data distribution.