Statistical Errors


Purpose

To understand statistical distributions and their appropriate errors by calculating a binomial distribution and comparing it to the Poisson and Gaussian (normal) distributions.


Introduction

Probability distributions are widely used primarily in experiments which involve counting. The sampling errors which occur in counting experiments are called statistical errors. Statistical errors are one special kind of error in a class of errors which are known as random errors. You will find that what you learn in this laboratory is relevant not only in the natural and social sciences, but also in every day life. Please read the theory section that follows, and then the file on Error Analysis before proceeding to do the prelab. Bring the completed error analysis prelab with you.

This section will help the student with the prelab homework. You are probably familiar with polls conducted before a presidential election. If the sample of people who are polled is carefully chosen to represent the general population, then the error in the prediction depends on the number of people in the sample. The larger the number of people, the smaller the error. If the sample is not properly chosen, it would result in a bias (i.e. an additional systematic error).

If a fraction p of the population will vote Democratic and a fraction q = (1-p) will vote Republican, then one expects that in a sample of N people, one will find on average = people who say that they will vote Democratic, and = = N(1 - p) who say that they will vote Republican. If this poll is taken many times for different samples one will find that the distribution of the results for x (which is the number of people who say they will vote Democratic) follows a binomial distribution with the mean of x = = . The probability distribution B(x) for finding x in a sample of N is a function of the probabilities p and q, and is given by the binomial distribution as follows:

(1.1)

where x = 0,1,2,...,N and N! = N(N-1)(N-2)...1. 0! =1 by definition. Here is the number of combinations for x objects taken from a sample of N, px is the probability of getting x number of Democratic voters, and qN-x is the probability of having the remain N-x voters be Republican. The above equation acts as a model that can provide the probability of having a particular x value. Often, the most needed information provided by this distribution is the mean of x and its standard deviation,

. (1.2)

For example, if p = 0.51 and q = 0.49 and N = 900, one expects that the poll will indicate a number close to 50% for the fraction who say that they will vote Democratic. The pollster will find a number x close to Np = 900x0.51 (i.e. around 459) with a standard deviation expected to be

.

If in a particular poll the pollster finds x = 450, he will claim that the poll indicates that 50% (450/900) will vote Democratic with a margin of error of 1.7% (15/900) (another way to calculate the margin of error is given in an optional section at the end of this section). It is not likely that the pollster will find a number such as 40%. This is because 900x0.4 = 360, which is 99 away from the expected number of 459. It is possible but very unlikely that the results will be six (99/15) standard deviations away from the expected value.

For large N and small p, the binomial distribution approaches a Poisson distribution. The Poisson distribution is more commonly applied to phenomenon which occur at a random fixed rate. For example, suppose you stand outside and count the number of people walking by. You stand for 1 hour and count n = 900. If you repeated the experiment many times, you would find the mean of the number of people passing by in one hour is M. The standard deviation of the Poisson distribution is given by,

. (1.3)

The Poisson distribution for measuring n = x when the expected mean is M is given by,

. (1.4)

where e = 2.71828, and x = 1, 2, 3.... Note that the mean M, does not need to be an integer.

For large values of N (the total number for the case of the binomial distribution), and also for large values of M for the case of the Poisson distribution (say M greater than 10 - 30) both binomial and the Poisson distributions approach a Gaussian (normal or Bell curve) distribution. The normal distribution has a mean M and a standard deviation , which are independent. It is a continuous probability distribution G(x) given by,

, (1.6)

where

If you take the point in the normal distribution that is one standard deviation below the mean and the point that is one standard deviation above the mean, the area under the curve between the two points is 0.6827, or 68.27%. That is, the probability of a single measurement falling within one standard deviation of the mean is 68.27%. The probabilities of it to fall between +/-2 and +/-3 standard deviations are 0.9545 and 0.9973, respectively. To the extent that the binomial and Poisson distributions can be approximated by a normal distribution, these probabilities are indicative of how likely or unlikely for a measurement to fall outside one, two or three standard deviations of the mean.

Distribution       Mean    Standard Deviation  
-------------------------------------------------------------
binomial           Np                         
Poisson            M                          
normal(Gaussian)   M                          
-------------------------------------------------------------

Table 1.1


Prelab Homework

Before you do this prelab, read this lab, and the file on Error Analysis. The prelab homework must be done at home and handed to the lab TA before you start the lab.

In order to do this prelab you need to understand the concept of a standard deviation for a binomial distribution.

Questions

It is the month of August and a group of students are having dinner. They are discussing a recent Campus Times article on a medical study which reported that 1 in 10 of the general population suffers from allergies to ragweed pollen. Two of the students were sneezing and rubbing their eyes during the dinner. They lamented the fact that this was ragweed season and they were really suffering.


The Experiment

You will need to bring to this lab:

Procedure

The experiment consists of measuring the fraction of galvanized (silver or nickel color) washers in a mixture of both galvanized and non galvanized (yellow brass color) 1/4-20 brass washers. The 10"x17" plastic bucket contains 24 lb (about 4500) of yellow brass washers, and 8 lb (about 1500) of galvanized (silver color) washers. The washers have been mixed, so the probability of getting a galvanized washer is about 1500/6000 = 0.25. The TA should give each student a small 6" metal bucket containing a random sample of 100 washers from the mixed large bucket (obtained by weight). The TA should do the experiment as one of the students.

Check List

Each student (including the TA) is given:

A. Setting up the Data Sample:

B. Obtaining Data for a Binomial Distribution with n=10:

C. Obtaining Data for a Binomial Distribution with n=100:


Data Analysis

The following data analysis is to be done in the lab after the experiment is completed. You need the data from the other students in order to complete the analysis. The lab report is to be handed in within one week. The entire laboratory is expected to take one hour, with one additional hour for the data analysis.


Lab Homework (Due one week after the lab)

Finish a complete lab report for this experiment. Follow the example given in the file Writing a Lab Report. In addition, hand in the following Lab homework :


References

1. Schaum's outline series, "Statistics" by Murray R. Spiegel, McGraw Hill Book Company.

2. Data and Error analysis in the introductory "Physics Laboratory", by William Lichten, Allyn and Bacon Inc. Newton, MA. 1988.

3. See also references in file: Error Analysis.


DATA SHEET FOR RECORDING CLASS SAMPLE

(record number of combinations for each student in the class)

Combination: (Silver-Color/Yellow-Color)

             0/10,  1/9,  2/8,  3/7,  4/6,  5/5,  6/4,  7/3,  8/2,  9/1, 10/0,(?=10) Silver

Student 1: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 2: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 3: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 4: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 5: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 6: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 7: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 8: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 9: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 10: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 11: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 12: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 13: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 14: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 15: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 16: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 17: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 18: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 19: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 20: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 21: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 22: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 23: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 24: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 25: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

Student 26: ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

TOTAL : ____, ____, ____, ____, ____, ____, ____, ____, ____, ____, ____:____,: ____

* Check that the sum total is 10xNs, where Ns is the number of students.

* Copy the results of the Total to Section B.


Examples of various distribution with a mean of 2.5

Binomial Distribution

:

where

N=10

p=0.25

q=0.75

Poisson Distribution:

where

M=2.5

Gaussian Distribution:

where

M=2.5


Binomial, Poisson, and Gaussian Distributions with mean=2.5

Data Points

  x        binomial     Poisson         Gaussian        
  0        0.056314     0.082085        0.072289        
  1        0.187712     0.205212        0.160882        
  2        0.281568     0.256516        0.240008        
  3        0.250282     0.213763        0.240008        
  4        0.145998     0.133602        0.160882        
  5        0.058399     0.066801        0.072289        
  6        0.016222     0.027834        0.021773        
  7        0.003090     0.009941        0.004396        
  8        0.000386     0.003106        0.000595        
  9        0.000029     0.000863        0.000054        
  10       0.000001     0.000216        0.000003        


Gaussian Distribution(mean=25, dx= =1,)

   x            y             x             y          
   0         0.000000        26         0.078209       
   1         0.000001        27         0.073654       
   2         0.000002        28         0.066645       
   3         0.000005        29         0.057938       
   4         0.000012        30         0.048394       
   5         0.000027        31         0.038837       
   6         0.000058        32         0.029945       
   7         0.000122        33         0.022184       
   8         0.000246        34         0.015790       
   9         0.000477        35         0.010798       
  10         0.000886        36         0.007095       
  11         0.001583        37         0.004479       
  12         0.002717        38         0.002717       
  13         0.004479        39         0.001583       
  14         0.007095        40         0.000886       
  15         0.010798        41         0.000477       
  16         0.015790        42         0.000246       
  17         0.022184        43         0.000122       
  18         0.029945        44         0.000058       
  19         0.038837        45         0.000027       
  20         0.048394        46         0.000012       
  21         0.057938        47         0.000005       
  22         0.066645        48         0.000002       
  23         0.073654        49         0.000001       
  24         0.078209        50         0.000000       
  25         0.079788                                  


Send comments, questions and/or suggestions via email to wolfs@nsrl.rochester.edu.