During my early career as an engineer, I was optimizing, tuning, and benchmarking a real-time process. Beyond minimum and maximum values, I also needed to know the average and standard deviation of various code execution elements.

The average for a collection of data points is the sum of the data points divided by the number of observations. So for n data points, the equation for the average is:

The standard deviation is the square root of the variance, where the variance of a population is the sum of the square differences between the data points and their average divided by the number of observations. So for n data points, the variance is:

Thus in order to make these calculations, I need to either record the data for later analysis, or calculate it on the fly. Unfortunately, the data was thousands of data points and the proprietary real-time hardware did not have enough memory to hold that many raw data points. nor any way to offload or log the data. In addition, the process executed on the order of 10 milliseconds, so as the number of data points increased, so would the execution time of the calculations; to the point that the calculations would have taken as much execution time as the main process in question. Thus I needed a method for calculating the average and standard deviation on-the-fly. It needed to be done in as few steps as possible; preferable using only the previous statistics and the newly observed value.

The following is the how I calculated the average and standard deviation on the fly, and the math behind it.

Running Average

Updating the average of a stream of data points is straight forward. When a new data point is recorded, the new average for the n+1 data points is given by (see equation 1):

The sum of the n+1 data points can be rewritten as the sum of the first n points plus the new data point.

Rearranging (1), the sum of the first n data points can be expressed as the product of the average and number of data points.

Substituting (4) into (3) gives:

Rearranging gives (5), which is the average for the first n+1 data points, using just the new data point, the previous average, and the number of data points.

Running Standard Deviation (Variance)

At first, calculating the standard deviation on-the-fly didn’t seem possible. Primarily because it uses the difference of the newly calculated average and each of the original data points. But after some analysis, it can be.

The first step is to expanding the sum of square differences in the variance formula (2).

When the square differences are multiplied out, it can be written as the sum of 3 separate summations, where

  • The 1st term is the sum of the squares of the data points,
  • The 2nd term is the sum of twice the product of the data points with their average, and
  • The 3rd term is the sum of the square of the average.

Starting with the 3rd term, since the square of the average is the same for every data point, its sum n times is simply n times the squared average, so this can be rewritten as:

For the 2nd term, the 2 and the average are constants, so they can be pulled out in front of the summation.

The remaining summation in the 2nd term (sum of the data points) is the same as equation (4), so the number of data point times the average can be substituted in.

Combining like terms and rearranging gives:

Therefore, the equation for the svariance (2) can be rewritten as:

As was the case with the average, an expression for the summation involving the individual data points will be needed when the next data point is observed. So, rearranging (6) to solve for the sum of the square data points (multiplying both sides by n and then moving the n times the square of the average term to the other side) and then factoring out the n results in:

Going back to (6), the standard deviation when a new data point is observed is:

The summation in this equation (sum of the squares of the n+1 data points) can be rewritten as the sum of the squares of the first n points plus the square of new data point.

This summation (sum of the squares of the first n data points) is given by (7), so after substitution

After rearranging, this equation can be used to calculate the variance for the first n+1 data points using just the previous variance, the previous average, the new data point, the new average, and the number of data points.

And thus the standard deviation is:

Example

As shown in the following example, only a few values need to be saved and the calculations are the same for each observed data point, which results with a consistent load on the system. The first received data point uses (1) and (2), which for only a single data is trivial. All subsequent data points use (5) and (8).

Data point 1 = 10.

Data point 2 = 5.

Data point 3 = 15.

Data point 4 = 8.

Data point 5 = 12.

Leave a Reply