## Description

Project: Confidence Intervals

Probability and Statistics with Applications to

Computing

Laboratory Projects

CCoonnffiiddeennccee IInntteerrvvaallss

0. Introduction and Background Material

0.1. Sample size and confidence intervals

Assume that you are measuring a statistic in a large population of size N. The statistic has

mean µ and standard deviation σ . Drawing a sample of size n from the population,

produces a distribution for the sample mean ( X ) with:

[ ] X E X = = µ µ and

2

2 2 [( ) ] X X X E X

n n

σ σ

− ==⇒= µσ σ

In this project we will explore the relation of X to the population mean µ .

• As a first example consider a barrel of a million ball bearings (i.e. population size

N = 1,000,000 ) where someone has actually weighed all one million of them and

found the exact mean to be µ = 100 grams and the exact standard deviation to be

σ = 12 grams. This is obviously an unrealistic assumption, but assume for the time

being that these parameters have been measured exactly.

• Now pick a sample of size n (for example n = 5 ) of bearings from the barrel, weigh

them and find the mean of the sample, 12345

5 5

XXXXX X ++++ = .

• Next take a larger sample (for example n = 10 ) and find the new mean

1 2 10

10 10

XX X X + ++ = .

• Continue this process for larger and larger n , until n = 100.

• Plot the points (, ) n n X using a point marker (for example a blue ‘x’) as shown in

Figure 1.

EE 381 Project: Confidence Intervals Dr. Chassiakos – Page 2

• Next for each value of n , calculate the standard deviation of the sample from

Xn n

σ

σ = and plot:

(i) The values of 1.96

n

σ

µ ± as a function of n , shown as the red curves in

Figure 1a. These curves define the 95% confidence interval, which means that

approximately 95% of the sample means will fall within the two red curves in

Figure 1a. This can also be visually confirmed by looking at how many of the

sample means fall outside of the red curves (approx. 5%).

(ii) The values of 2.58

n

σ

µ ± as a function of n , shown as the green curves

in the Figure 1b. These curves define the 99% confidence interval, which means

that approximately 99% of the sample means fall within the green curves in

Figure 1b.

(a) (b)

Figure 1. Sample mean as a function of the sample size

EE 381 Project: Confidence Intervals Dr. Chassiakos – Page 3

0.2.

0.3. Using the sample mean to estimate the population mean

In reference to the previous section, it is obviously unrealistic to think that anyone

actually measured the exact mean and standard deviation of all one million ball bearings.

More realistically, you would not have any idea what the mean or standard deviation was,

and you would need to weigh random samples of different sizes (for example

n = 5, 35, or 100 bearings) and then draw reasonable conclusions about the weight

distribution of all one million bearings.

To simulate this problem, generate a barrel of a million ball bearings with weights

normally distributed, with a mean µ and a standard deviation σ .

As an example, take a sample of n bearings from the population of N = 1,000,000 .

Then calculate the mean of the sample:

XX X 1 2 n X

n

+ ++ =

The standard deviation of the sample mean can be calculated by:

ˆ

n

S

n

σ = , where

1/2 22 2

1 2 ( )( ) ( ) ˆ

1

XX XX XX n S

n

− + − ++ − = −

The question is: Can the value of X , which is calculated for a sample of size n , be used

to estimate the mean µ of the population of N = 1,000,000 bearings?

The answer is given in terms of confidence intervals, typically the 95% and 99%

confidence intervals.

Large samples ( n ≥ 30 ).

Consider the standardized variable

n

X

z µ

σ

− = . Based on the Central Limit Theorem, it is

known that for large samples the standardized variable z will approach the normal

distribution.

The 95% confidence interval for large samples. The 95% confidence interval is

determined by the critical values [ ,] c c −z z such that

{ } 0.95 Pz zz − << = c c

From the tables of the normal distribution it is seen that these critical values correspond

to 1.96 c z = and 1.96 c − =− z . Hence:

{ } 0.95 1.96 1.96 0.95 1.96 c c { n 1.96 0.95 n } n

X P z z z P P X X µ σµ σ

σ

− − << = ⇒ − < < = ⇒ − < < + =

which is written as

EE 381 Project: Confidence Intervals Dr. Chassiakos – Page 4

ˆ ˆ 1.96 1.96 0.95 S S P X X

n n

µ

− << + =

If you define: Lower

ˆ 1.96 S X

n

µ = − and Upper

ˆ 1.96 S X

n

µ = + , then the above equation is

written as:

P{µ µµ Lower << = Upper} 0.95

This equation can be interpreted as follows:

Based on a sample of size n , we are 95% confident that the mean µ of the population

lies in the interval Lower Upper [,] µ µ .

Another way to interpret this statement is:

(i) Obtain a large number of different samples, of size n each

(ii) For each of these samples calculate X and ˆ

n

S

n

σ =

(iii) For each of these samples generate the corresponding intervals

Lower Upper [,] µ µ

(iv) Then the mean µ of the population will be contained in 95% of these

intervals

For example if you obtain 500 samples of size n each, and you create the corresponding

500 intervals Lower Upper [,] µ µ , then 475 of these intervals (95% of 500) will contain the

population mean µ .

The 99% confidence interval for large samples. Similarly, the 99% confidence interval

is determined by the critical values [ ,] c c −z z such that

{ } 0.99 Pz zz − << = c c

From the tables of the normal distribution it is seen that these critical values correspond

to 2.58 c z = and 2.58 c − =− z .

Hence, if you define: Lower

ˆ 2.58 S X

n

µ = − and Upper

ˆ 2.58 S X

n

µ = + , then you can write:

P{µ µµ Lower << = Upper} 0.99

which can be interpreted as follows:

Based on a sample of size n , we are 99% confident that the mean µ of the population

lies in the interval Lower Upper [,] µ µ .

EE 381 Project: Confidence Intervals Dr. Chassiakos – Page 5

Small samples ( n < 30 ).

When the sample size is small, the standardized variable

n

X µ

σ

− does not approach the

normal distribution, since the Central Limit Theorem does not hold for small n .

In this case the standardized variable

n

X T µ

σ

− = follows the Student’s t distribution

with ν = −n 1 degrees of freedom.

The 95% confidence interval for small samples. The 95% confidence interval is

determined by the critical values [ ,] c c −t t such that

{ } 0.95 Pt zt −<< = c c

Note that the critical values [ ,] c c −t t depend on two values: (i) the probability value (0.95)

and (ii) the degrees of freedom (ν ).

Example: sample size n = 5 . In that case ν = −= n 1 4 degrees of freedom. For the 95%

confidence interval, the critical value is: 0.975 2.78 ct t = = , as seen from the Student’s t

distribution tables.

Hence, if you define: Lower

ˆ 2.78

5

S

µ = − X and Upper

ˆ 2.78

5

S

µ = + X , then the above

equation is written as:

P{µ µµ Lower << = Upper} 0.95

which can be interpreted as follows:

Based on a sample of size n = 5 , we are 95% confident that the mean µ of the population

lies in the interval Lower Upper [,] µ µ .

The 99% confidence interval small samples. Similarly the 99% confidence interval is

determined by the critical values [ ,] c c −t t such that

{ } 0.99 Pt zt −<< = c c

Example: sample size n = 5 . In that case ν = −= n 1 4 degrees of freedom. For the 99%

confidence interval, the critical value is: 0.995 4.60 ct t = = , as seen from the Student’s t

distribution tables.

Hence, if you define: Lower

ˆ 4.60

5

S

µ = − X and Upper

ˆ 4.60

5

S

µ = + X , then the above

equation is written as:

P{µ µµ Lower << = Upper} 0.99

which can be interpreted as follows:

Based on a sample of size n = 5 , we are 99% confident that the mean µ of the population

lies in the interval Lower Upper [,] µ µ .

EE 381 Project: Confidence Intervals Dr. Chassiakos – Page 6

1. Effect of sample size on confidence intervals

Create two plots as in Figure 1a and Figure 1b, showing the effect on sample size on the

confidence intervals.

The values of the following parameters have been provided to you:

• Total number of bearings: N ;

• Population mean: µ (grams) ;

• Population standard deviation: σ (grams) ;

• Sample sizes: n =1, 2, 200

SUBMIT a report which includes the results and your code. You must follow the

guidelines given in the syllabus regarding the structure of the report. Points will be

taken off if you do not follow the guidelines.

2. Using the sample mean to estimate the population mean

(A)Perform the following simulation experiment. Use Table 1 to tabulate the results.

1. Choose a random sample of n = 5 bearings from the N bearings you created in the

previous problem. Calculate the sample mean and the sample standard deviation:

1

1 n

j

j

X X

n =

= ∑ and 2

1

1 ˆ ( ) 1

n

j

j

S XX

n =

= − − ∑

2. Create the 95% confidence interval using the normal distribution to fill in the first two

entries in the top row. You realize, however, that this is not an appropriate

distribution to use because you have a small sample n = < 5 30

Lower Upper

ˆ ˆ [ , ] [ 1.96 , 1.96 ] S S X X

n n

µ µ =− +

3. Check if the confidence interval includes the actual mean µ of the population of N

bearings. If it does, then Step 2 is considered a success.

4. The appropriate distribution for small samples ( n ≤ 30 ) is the t-distribution. Create

the 95% confidence interval using the t-distribution with ν = −= n 1 4

Lower Upper 0.975 0.975

ˆ ˆ [ , ][ , ] S S Xt Xt

n n

µ µ =− +

At the 95% confidence level with ν = 4 degrees of freedom, the value of 0.975 t can be

found from the tables, and it is seen to be: 0.975 t = 2.78 . This is the value that will be

used to determine the 95% confidence interval:

Lower Upper

ˆ ˆ [ , ] [ 2.78 , 2.78 ] S S X X

n n

µ µ =− + . For a different sample size, the values of

0.975 t will be different than the ones above. You should find these values from the tdistribution tables, and you should modify the confidence intervals accordingly.

EE 381 Project: Confidence Intervals Dr. Chassiakos – Page 7

5. Check if the confidence interval includes the actual mean µ of the population. If it

does, then Step 4 is considered a success.

6. Repeat the experiment for M = 10,000 times and count the number of successes.

7. Enter the percentage of successful outcomes in Table 1.

8. Repeat steps 1-7 above with n = 5 and 99% confidence interval.

9. After completing all of the above steps you will have filled out the first row of the

table.

(B) Repeat part (A) with n = 40 using the normal distribution and the t-distribution to

complete the second row of the table.

(C) Repeat part (A) with n =120 using the normal distribution and the t-distribution to

complete the third row of the table. You realize, however, that for a large sample

( n 30 ) the t-distribution will be very close to normal, so the differences between

Student’s -t and Normal will be minimal.

SUBMIT a report with the results and your code. You must use Table 1 to report

the results, and you must follow the guidelines given in the syllabus regarding the

structure of the report. Points will be taken off, if you do not follow the guidelines

and if you do not use Table 1 to report the results.

Sample size

(n)

95% Confidence

(Using Normal

distribution)

99% Confidence

(Using Normal

distribution)

95% Confidence

(Using Student’s t

distribution)

99% Confidence

(Using Student’s t

distribution)

5

40

120

Table 1. Success rate (percentage) for different sample sizes