Log in

View Full Version : Calculating a standard deviation


Pompounette
Aug 3, 2009, 07:54 AM
Hi there,

I have monthly mean wind speeds and monthly standard deviation of the wind speed, and I am trying to figure out how to calculate the annual standard deviation. I do not know the population size for each month, I just know the data are recorded every 6 hours, any idea how I can do it?

Many thanks!!

ebaines
Aug 4, 2009, 11:18 AM
It is possible, BUT you must know the number of data points in each month's sample set. You say that readings are taken every 6 hours - so doesn't that mean that the number of data points for each month is 4 times the number of days in that month? If this is true, then you can find standard deviation for the entire year's worth of data like this:

[EDIT - to add formulas. Variables with subscript "T" means "total for the year," and variables with subscript "i" means for a given month.]

1. Calculate an average for the year's data. Do this by multiplying the average for each month by the number of data points in that month, sum this up for all 12 months, then divide by this total by the total number of data points for the whole year.


\mu_T =\frac { \sum _{i=1} ^{12} (N_i \cdot \mu _i)} {N_T}


2. For each month, square the std dev number, and multiply by N-1, where N is the number of data points in that month.

3. For each month, multiply N by (month's average - year's average)^2

4. For each month, add the results of steps 2 and 3:


5. Sum the results of step 4 for all 12 months.

6. Divide the step 5 result by the total number of data points for the year minus 1. This gives the variance for the year.

7. Take the square root of the result from step 6: this gives you the standard deviation for the year.

The results of steps 2 - 7 is this formula:


\sigma _T = \sqrt { \frac { \sum _{i=1} ^{12} [ \sigma _i ^2 \cdot (N_i-1) + N_i (\mu_i - \mu _T)^2] } {N_T - 1} }


Hope this helps!

Pompounette
Aug 5, 2009, 03:57 PM
thanks ebaines, much appreciated. Now I have made my t-test but can't really understand the conclusions:

Hi there,

I am trying to see whether 2 ways of assessing the annual wind speeds at the same location are the “equivalent”, one is with a met mast data, the other one with satellite data for the location of the met mast.


Ok so I have made my T-test for every year, but for some years the test the t value is lower than the accepted value for the corresponding degree of freedom, and for other years it is not. Hence for some years, I can say the 2 methods give similar results, and for other years it gives different results. However, I noticed that some years where the t-test concludes the 2 means are significantly different (confidence 95%), the difference between the 2 means is only 2%, hence I don’t really understand my conclusion.

I hope this is clear, hope somebody can help me!

Thanks a lot anyway

Pompounette
Aug 7, 2009, 08:15 AM
Hi again,


Could you explain me where this formula comes from? I have been spending the last 2 days trying to find it out, and it is a still confusing for me.

Thanks again

ebaines
Aug 9, 2009, 01:29 PM
Sorry that I haven't responded sooner, but I've been out of town recently.

I actually didn't find this in a book, but rather derived it from the definition of the variance. It's based on the same mathematics as mechanical or civil engineers use to determine the moment of inertia of various shapes, which like the variance involves the square of differences to a mean.

The basic quantity that one works with here is variance, which involves the sum of squares of the differences for each data point from the mean. If you want to find the sum of the squares of those data points from a different mean, you can do this by starting with the initial variance and then adding the square of the difference new mean from the old times the numbetr of points involved. The proof of it goes like this:

Suppose you have N points that have the mean \mu_0 , and suppose the distance from each point to that mean is d_i . For example if the data points are 8, 9, and 13, the mean is 10 and the values for the d’s are -2, -1 and 3. Note that the sum of the distances is 0; this is the definition of the mean. Stated another way:


\sum d_i = 0.


The variance of this data set is found from


Var_1 = \sum (d_i)^2


Now suppose you want to find the variance of this data set from a new number, call it \mu_1 . Call the distance from the original \mu_1 to \mu_2 = \Delta . You then proceed by finding the square of the differences of each point from the new number:


V_2 = (\Delta + d_1)^2 + (\Delta + d_2)^2 + … + (\Delta + d_n)^2 \\
= ( \Delta^2 + 2\Delta d_1 + d_1^2 )+ ( \Delta ^ 2 + 2 \Delta \d_2 + d_2^2) + ...+ (\Delta ^2 + 2 \Delta d_N + D_N ^2) \\
= N \Delta ^2 + 2N (d_1 + d_2 + ... + d_n) + (d_1 ^ 2 + d_2^2 + .. .+d_n^2)


Now we know that: d_1 + d_2 + ... + d_n = \sum d_i = 0 . So this becomes:


V_2 = N \Delta ^2 + \sum d_i^2


But \sum d_i = V_1

So:

V_2 = V_1 + N \Delta^2

This is the basic idea. The rest of it involves turning the variance into standard deviation, by multiply or dividing as appropriate by N or N-1 as appropriate, and then taking the square root.

Hope this helps. Post back if you're still having difficulty.