Standard Deviation

Standard deviation is not a simple concept. But it's very important so don't give up. The formula used in the PMBOK for standard deviation is simple. It's just (P-O)/6. That is the pessimistic activity estimate minus the optimistic activity estimate divided by six. The problem is that this in no way shape or form produces a measure of standard deviation. So if this isn't really standard deviation, what is it?

The dictionary definition of standard deviation is something like "a quantity calculated to indicate the extent of deviation from the mean or expected value for a group as a whole." Expected value in this case refers to atypical distributions; those are distributions other than a bellcurve ex. a chi squared distribution. Sometimes it's more succinctly rendered as "the mean of the mean." It is the average of the squared differences from the mean.

Bell Curve
The first problem is that this formula doesnt use the mean or an expected value as a variable at all. If you calculate the deviation from the mean you need to use the mean. If you have two values that's easy to determine. That's the sum of the two figures, divided by two. That does not ocurre in this formula.

Because our distribution has only 2 points it exhibits neither the qualities of a Gaussian curve, Beta distribution, Chi squared distribution, Poisson distribution, the Bernoulli distribution, the binomial distribution, the geometric distribution, or any other common functions. If you only have two data points you only have a line, not a curve of any kind. Using this formula we have only two values: optimistic and pessimistic. This part of the formula is similar to one used in investing. It's called HML: high minus low. Since these are your high and low figures from a distribution the difference between them is called an interval. By some definitions it's also the range. Dividing that by six just produces a figure for 16.6% of the interval.

So what does this figure actually represent? In the example below I describe a linear increase in variance between the optimistic and pessimistic figures in a set of 10 examples. This illustrates the relationship between the PM formula SD, and the optimistic and pessimistic variables.

Example A: If O=1000 hours, P=1000 hours.
SD=(1000 - 1000 = 0) ÷ 6 or 0
Calculated SD = 0, Population
Standard Deviation = 0

Example B: If O=900 hours, P=1100 hours.
SD=(1100 - 900 = 200) ÷ 6 or 33
Calculated SD = 141.42136
Population Standard Deviation = 100

Example C: If O=800 hours, P=1200 hours.
SD=(1200 - 800 = 400) ÷ 6 or 66.67
Calculated SD = 282.84271
Population Standard Deviation = 200

Example D: If O=700 hours, P=1300 hours.
SD=(1300 - 700 = 600) ÷ 6 or 100
Calculated SD = 424.26407
Population Standard Deviation = 300

Example E: If O=600 hours, P=1400 hours.
SD=(1400 - 600 = 800) ÷ 6 or 133.33
Calculated SD = 565.68542
Population Standard Deviation = 400

Example F: If O=500 hours, P=1500 hours.
SD=(1500 - 500 = 1000) ÷ 6 or 166.67
Calculated SD = 707.10678
Population Standard Deviation = 500

Example G: If O=400 hours, P=1600 hours.
SD=(1600 - 400 = 1200) ÷ 6 or 200
Calculated SD = 848.52814
Population Standard Deviation = 600

Example H: If O=300 hours, P=1700 hours.
SD=(1700 - 300 = 1400) ÷ 6 or 233.33
Calculated SD = 989.94949
Population Standard Deviation = 700

Example I: If O=200 hours, P=1800 hours.
SD=(1800 - 200 = 1600) ÷ 6 or 266.67
Calculated SD = 1131.37085
Population Standard Deviation = 800

Example J: If O=100 hours, P=1900 hours.
SD=(1900 - 100 = 1800) ÷ 6 or 300
Calculated SD = 1272.79221
Population Standard Deviation = 900

Example K: If O=0.0 hours, P=2000 hours.
SD=(2000 - 0 = 2000) ÷ 6 or 333
Calculated SD = 1414.21356
Population Standard Deviation = 1000

I've graphed the relationships below.

sd_chart
You can see that SD trends away from the pessemistic estimate the greater the difference between the optimistic and pessemistic estimates. If I rerun those numbers based on a calculated standard deviation with population standard deviation you get a very different (and more cluttered) data set.

mega_sd_chart

I have left the mean in this data set deliberately. I did that to indicate that the range was changing but the mean and sum were not. Because those figures are constant you might expect either the calculated SD or the PM SD figures to produce a line parallel to the mean. They don't. Standard deviation isn't a constant. It measures variance from the mean so a high standard deviation indicates only that the data points are spread out over a wide range of values.

DEDUCTIONS:

The reason that the PMI SD figure trends upward is that it's a fixed percentage of the range. So as the range widens any fixed percentage of that number will also increase. Correspondingly PMI SD and the traditional SD both trend toward the pessimistic estimate. This means that with wider ranges both of these will exhibit bias toward pessimistic estimates.

The population SD trend line remains paralel to the pessimistic estimate curve. This is an unexpected proportionality that may have other consequences.

This pessemistic trend is a constant irrespective of which figure or figures are adjusted, it merely is tacking to the lower estimates. The cause of this behavior isn't complicated. It's because the sum is divided by six. If we had divided by two it would have exhibited a linear relationship ploting a course parallel to the optimistic estimate.

CONCLUSION:

This is an industry standard and while arbitrary it is not egregious. It merely favors an estimate: a math scenario we call bias. It's result set should not be considered sarosanct. The shortcomings of this formula are corrected by another method in the PMBOK: progressive elaboration. Over the course of a project the estimates will become increasingly accurate with the inclusion of additional real-world data. Furthermore the SD formula produces an integer for the Range of Activity Duration calculation which has other uses.