The requirements are including in files
project_directions_2_.doc

20170409040812math5100_byb_chapter_2_part_3_covariance_and_correlation.ppt

20170409040810math5100_byb_chapter_2_part_2_other_means.ppt

20170409040808math5100_byb_chapter_2_part_1_another_type_of_ogive.ppt

20170409040807math5100_byb_chapter_2_part_2_medians_of_distributions.ppt

Unformatted Attachment Preview

Project Part 1 –worth 10 of the 20% Project grade – DUE SUNDAY, April 2
In this part of the project, you will be calculating measures on your data set using both the raw-data
and grouped-data formulas in Chapter 2. For the set of data sent to you,
(a) find the mean, median and range of the data, treating it as raw data.
(b) make a grouped frequency distribution table consisting of 6 – 8 classes (recall that the instructions
are found on p. 38)
(c) find the mean, modal class, variance and standard deviation from the grouped table – include all
relevant columns of information you need ( x ,
xf , x  x, ( x  x ) 2 , ( x  x ) 2 f . Show work (if you
make an error and I can’t see where you went wrong, then I can’t find your error!). After doing this,
compare the mean you found from the table to the raw-data mean. Remember that since you use
class midpoints as approximate data values, the grouped mean probably won’t equal the raw one, but I
want to see how close the two measures are (in other words, how good the approximation is).
(d) write THREE interpretations of the data from the table. In other words, what do you learn about the
set of data in grouped (organized) form that wasn’t apparent from the raw (unorganized) form? I
DON”T want you to write general comments that describe the benefits of organizing data; rather, I
want you to tell me what you learn about your particular set of data after having organized it. For
example, if your set of data was a list of people’s heights, don’t tell me (for example) that organizing
the data in a table helps you to see the distribution of heights; rather, tell me something (for example)
like,”From the table I learned that the majority of people in the list had a height over 58 inches.”, a
discovery that is not as obvious from an unorganized list of numbers!
Save your work (please use a filename that begins with your last name and that indicates what the file
is (ex. DePriter PART 1.doc) and email it to me (there is NO drop box in ulearn.). I also accept pdfs
and scans (jpgs).
Project Part 2 – worth 4 of the 20% Project grade – DUE SUNDAY, April 30
In this part of the Project, you will be converting your frequency distribution from Part 1 into a discrete
probability distribution.
Using your class midpoints as X values, determine the “relative frequency” of each class by dividing its
frequency by n. These will serve as the P(x)’s. Make a probability distribution table with all necessary
columns ( xP ( x ), x   , ( x
deviation. SHOW WORK.
  ) 2 , P ( x )( x   ) 2 ) and calculate the mean, variance and standard
How do the values of these measures compare to those you calculated in Part 1?
Project Part 3 – worth 6 of the 20% Project grade – DUE SATURDAY, May 18
In the final part of the project, you will conduct two hypothesis tests for the population mean based on
your data.
BEYOND THE BASICS…
Chapter 2, Part 3
Covariance and Correlation
College of Arts & Sciences
Department of Mathematics
MATH5100: Statistical Methods
Covariance and Correlation
Covariance and correlation are measures used to determine the
association between two variables, and how strong that
association is.
Given two samples of data represented by x and y,
covariance 

x  x y  y 

cov( x, y) 
n 1
cov( x, y)
correlatio n coefficien t  corr( x, y) 
sx s y
where sx and sy are the standard deviations of the x and y values,
respectively.
2
Covariance and Correlation
A stock broker studies closing returns for two stocks over a 5day period. The data is summarized in the following table:
Day
Stock X (%)
Stock Y (%)
1
0.9
1.6
2
2.1
2.4
3
3.2
2.7
4
1.6
2.0
5
2.2
2.3
3
Covariance and Correlation
A stock broker studies closing returns for two stocks over a 5day period. The data is summarized in the following table:
Day
Stock X (%)
Stock Y (%)
1
0.9
1.6
2
2.1
2.4
3
3.2
2.7
4
1.6
2.0
5
2.2
2.3
covariance  cov( x, y ) 
 x  x y  y 
The statistics for these samples are
X : x  2.0, s  0.85
Y : x  2.2, s  0.42
n 1
(0.9  2.0)(1.6  2.2)  (2.1  2.0)( 2.4  2.2)  (3.2  2.0)( 2.7  2.2)  (1.6  2.0)( 2.0  2.2)  (2.2  2.0)( 2.3  2.2)

5 1
(1.1)( 0.6)  (0.1)(0.2)  (1.2)(0.5)  (0.4)( 0.2)  (0.2)(0.1)

4
0.66  0.02  0.60  0.08  0.02 1.38


 0.345
4
4
4
Covariance and Correlation
A stock broker studies closing returns for two stocks over a 5day period. The data is summarized in the following table:
Day
Stock X (%)
Stock Y (%)
1
0.9
1.6
2
2.1
2.4
3
3.2
2.7
4
1.6
2.0
5
2.2
2.3
covariance  cov( x, y ) 
A positive covariance of 0.345 tells
us that the two stocks “moves” in
the same way: if the return for one
increases, then the other’s return
probably will as well.
 x  x y  y 
n 1
(0.9  2.0)(1.6  2.2)  (2.1  2.0)( 2.4  2.2)  (3.2  2.0)( 2.7  2.2)  (1.6  2.0)( 2.0  2.2)  (2.2  2.0)( 2.3  2.2)

5 1
(1.1)( 0.6)  (0.1)(0.2)  (1.2)(0.5)  (0.4)( 0.2)  (0.2)(0.1)

4
0.66  0.02  0.60  0.08  0.02 1.38


 0.345
4
4
5
Covariance and Correlation
A stock broker studies closing returns for two stocks over a 5day period. The data is summarized in the following table:
Day
Stock X (%)
Stock Y (%)
1
0.9
1.6
2
2.1
2.4
3
3.2
2.7
4
1.6
2.0
5
2.2
2.3
The statistics for these samples are
X : x  2.0, s  0.85
Y : x  2.2, s  0.42
correlatio n coefficien t
cov( x, y )
 corr( x, y ) 
sx s y
0.345
0.345


 0.97
(0.85)(0.42) 0.357
6
Covariance and Correlation
A stock broker studies closing returns for two stocks over a 5day period. The data is summarized in the following table:
Day
Stock X (%)
Stock Y (%)
The correlation coefficient, which
2
2.1
2.4
always equals a value between -1
3
3.2
2.7
and 1, tell us that the relationship
4
1.6
2.0
between Stocks X and Y is very
5
2.2
2.3
strong and “positive”: as one
increases so will the other. A value
correlatio n coefficien t
close to 1 indicates this; a negative
cov( x, value
y ) indicates that they move in
 corr( x, y ) 
s x s y opposite directions; a value close to
0 indicates random, non0.345
0.345
corresponding movement.


 0.97
1
0.9
(0.85)(0.42)
1.6
0.357
7
Covariance and Correlation
If the X-Y pairs are sketched on a coordinate system, the possible
relationships are sketched below for various values of the
correlation coefficient.
8
BEYOND THE BASICS…
Chapter 2, Part 2
Other Types of Means
College of Arts & Sciences
Department of Mathematics
MATH5100: Statistical Methods
Two other types of means that can be calculated on sets of
quantitative data are the geometric mean and the harmonic
mean.
The geometric mean of two numbers, a and b, is the square root
of ab,
geometric mean  ab
The geometric mean of three numbers, a, b, and c, is the cube
root of abc, geometric mean  3 abc
, and so on.
2
Two other types of means that can be calculated on sets of
quantitative data are the geometric mean and the harmonic
mean.
The geometric mean of two numbers, a and b, is the square root
of ab,
geometric mean  ab
The geometric mean of three numbers, a, b, and c, is the cube
root of abc, geometric mean  3 abc
, and so on.
Geometric means are useful when you want to compare sets of
data with more than one quantitative attribute that you want
to consider.
3
For example, suppose you want to see an action movie this
weekend and there’s a choice of two: one that had an opening
attendance of 28,000,000 and an audience rating of 7.2 out of
10, and another with an opening attendance of 19,000,000 and
a rating of 9.1. Taking this information into account, which
movie appears to be the better choice to see?
Despite the large differences in attendance and ratings, the
geometric mean will provide a good comparison.
4
For example, suppose you want to see an action movie this
weekend and there’s a choice of two: one that had an opening
attendance of 28,000,000 and an audience rating of 7.2 out of
10, and another with an opening attendance of 19,000,000 and
a rating of 9.1. Taking this information into account, which
movie appears to be the better choice to see?
The geometric means for the two movies are
(28,000,000)(7.2)  201,600,000  14,198.59
(19,000,000)(9.1)  172,900,000  13,149.14
Taking into consideration both attendance and rating, the first
movie with the larger geometric mean appears to be the
better choice.
5
Geometric means are also better to find when quantitative
values are not independent. This is the case with percentage
figures over successive periods of time like annual investment
returns. One year’s return affects the next because it
determines how much can be invested the next year.
Suppose returns for a certain investor over the past four years
were 50%, 10%, -70% and 40%. Determine the average return.
6
Geometric means are also better to find when quantitative
values are not independent. This is the case with percentage
figures over successive periods of time like annual investment
returns. One year’s return affects the next because it
determines how much can be invested the next year.
Suppose returns for a certain investor over the past four years
were 50%, 10%, -70% and 40%. Determine the average return.
We need to add 1 to each return in order to avoid having a
negative value within the root. We compensate for this by
subtracting 1 from the root afterward. The geometric mean is
4
(1.50)(1.10)(0.30)(1.40)  1
 4 0.693  1  0.912  1  0.088
7
Geometric means are also better to find when quantitative
values are not independent. This is the case with percentage
figures over successive periods of time like annual investment
returns. One year’s return affects the next because it
determines how much can be invested the next year.
We need to add 1 to each return in order to avoid having a
negative value within the root. We compensate for this by
subtracting 1 from the root afterward. The geometric mean is
4
(1.50)(1.10)(0.30)(1.40)  1
 4 0.693  1  0.912  1  0.088
Note that if we added these returns and divided by four, the
mean equals +7.5%! In reality, the rate of -70% has a much
more profound effect on the “average”.
8
The harmonic mean of n quantities, x 1 , x 2 , x 3, … x n , is
harmonic mean 
n
1 1 1
1
   …
x1 x2 x3
xn
It is used to find the average of rates or ratios.
For example, if you drive a certain distance at a velocity of 60
mph, and then drive the same distance at a velocity of 40 mph,
then your average velocity is 48 mph.
2
2
2
2(120)



 48
1
1
2
3
5
5


60 40 120 120 120
9
The harmonic mean of n quantities, x 1 , x 2 , x 3, … x n , is
harmonic mean 
n
1 1 1
1
   …
x1 x2 x3
xn
It is used to find the average of rates or ratios.
2
1
1

60 40

2
2
3

120 120

2
2(120)

 48
5
5
120
Why is this correct? Suppose you drove for 240 miles at each
velocity. Then you drove for 4 hours at 60 mph and for 6 hours
at 40 mph. Therefore, you drove 480 miles in 10 hours, at an
average velocity of 48 mph. Note that simply averaging 60
and 40 and getting 50 mph is not correct.
10
The harmonic mean of n quantities, x 1 , x 2 , x 3, … x n , is
harmonic mean 
n
1 1 1
1
   …
x1 x2 x3
xn
It is used to find the average of rates or ratios.
2
1
1

60 40

2
2
3

120 120

2
2(120)

 48
5
5
120
Why is this correct? Suppose you drove for 240 miles at each
velocity. Then you drove for 4 hours at 60 mph and for 6 hours
at 40 mph. Therefore, you drove 480 miles in 10 hours, at an
average velocity of 48 mph. Note that simply averaging 60
and 40 and getting 50 mph is not correct.
11
The harmonic mean of n quantities, x 1 , x 2 , x 3, … x n , is
harmonic mean 
n
1 1 1
1
   …
x1 x2 x3
xn
It is used to find the average of rates or ratios.
In finance, the harmonic mean is used, for example, to find
averages of interest rates and price-earnings ratios,
12
BEYOND THE BASICS…
Chapter 2, Part 1
Another Type of Ogive
College of Arts & Sciences
Department of Mathematics
MATH5100: Statistical Methods
Another column of information that can be calculated from a
grouped frequency distribution is cumulative relative
frequency.
For each class,
cumulative frequency
cumulative relative frequency 
sample size
2
cumulative frequency
cumulative relative frequency 
sample size
In the following distribution, this formula has been used to fill
the last column.
Class
Boundaries
Frequency
Relative
Frequency
Cumulative
Frequency
Cumulative
Relative
Frequency
0.5 – 9.5
16
16/60 = 0.267
16
16/60 = 0.267
9.5 – 18.5
23
23/60 = 0.383
16 + 23 = 39
39/60 = 0.65
18.5 – 27.5
11
11/60 = 0.183
39 + 11 = 50
50/60 = 0.833
27.5 – 36.5
5
5/60 = 0.083
50 + 5 = 55
55/60 = 0.917
36.5 – 45.5
3
3/60 = 0.050
55 + 3 = 58
58/60 = 0.967
45.5 – 54.5
2
2/60 = 0.033
58 + 2 = 60
60/60 = 1.000
n = 60
3
One use of this information is to produce a cumulative relative
frequency ogive, depicted below from this distribution.
Cumulative
Relative
Frequency
Class Boundaries
4
One use of this information is to produce a cumulative relative
frequency ogive, depicted below from this distribution.
Cumulative
Relative
Frequency
Class Boundaries
This ogive has the same shape as one that was made with
cumulative frequencies, but interpretations using percentages
can be drawn from this version.
5
One use of this information is to produce a cumulative relative
frequency ogive, depicted below from this distribution.
Cumulative
Relative
Frequency
.46
15
25
Class Boundaries
75% of the data is approximately 25 or less.
Approximately 46% of the data is 15 or less.
6
BEYOND THE BASICS…
Chapter 2, Part 2
Medians of Distributions
College of Arts & Sciences
Department of Mathematics
MATH5100: Statistical Methods
Median of a data set (ungrouped frequency
distribution)
The median of an ungrouped frequency distribution may be
difficult to find if the sample or population size is large.
If so, the following aid can be used to simplify the process:
The median is the value in the
n 1
N 1
th or
th position of the ordered list.
2
2
2
Median of a data set (ungrouped frequency
distribution)
The median is the value in the
n 1
N  1 position of the ordered list.
or
th
th
2
2
In this sample, the median is the
x
f
3
16
4
24
5
11
6
26
n 1
77  1
th 
th  39th
2
2
value
Finding the cumulative frequencies can help
identify the median: they are, in order,
16, 40, 51 and 77, so the 39th value is a 4.
n = 77
3
Median of a data set (ungrouped frequency
distribution)
The median is the value in the
n 1
N  1 position of the ordered list.
or
th
th
2
2
In this population, the median is the
x
f
213
2016
214
524
215
3481
216
127
N = 6148
N 1
6148  1
th 
th  3074.5th
2
2
value
Finding the cumulative frequencies can help
identify the median: they are, in order,
2016, 2540, 6021 and 6148, so the 3074.5th
value is a 215 (halfway between the 3074th and
3075th).
4
Median of a data set (grouped frequency
distribution)
The median of a grouped frequency distribution is fairly
complicated to find. The formula is
i
50% of n  cf b 
median  L 
fm
(note that for a population, N replaces n).
L is the left boundary of the class containing the median
i is the class width
fm is the frequency of the class containing the median
cfb is the cumulative frequency of the class before the class
containing the median
5
Median of a data set (grouped frequency
distribution)
i
50% of n  cf b 
median  L 
fm
Class
Boundaries
Frequency
Relative
Frequency
Cumulative
Frequency
Cumulative
Relative
Frequency
0.5 – 9.5
16
16/60 = 0.267
16
16/60 = 0.267
9.5 – 18.5
23
23/60 = 0.383
16 + 23 = 39
39/60 = 0.65
18.5 – 27.5
11
11/60 = 0.183
39 + 11 = 50
50/60 = 0.833
27.5 – 36.5
5
5/60 = 0.083
50 + 5 = 55
55/60 = 0.917
36.5 – 45.5
3
3/60 = 0.050
55 + 3 = 58
58/60 = 0.967
45.5 – 54.5
2
2/60 = 0.033
58 + 2 = 60
60/60 = 1.000
median in
here
Let us assume this distribution represents a sample. The first
thing we need to determine is which class contains the
median. Look for 0.500 in the cumulative relative frequency
column (this would represent the halfway point into the
distribution). IF IT IS NOT THERE, the median is in the class
with the NEXT HIGHEST CUMUL. REL. FREQ.
6
Median of a data set (grouped frequency
distribution)
i
50% of n  cf b 
median  L 
fm
Class
Boundaries
Frequency
Relative
Frequency
0.5 – 9.5
16
16/60 = 0.267
23
18.5 – 27.5
i 9(
cfb
Cumulative
Relative
Frequency
16
16/60 = 0.267
23/60 = 0.383
16 + 23 = 39
39/60 = 0.65
11
11/60 = 0.183
39 + 11 = 50
50/60 = 0.833
27.5 – 36.5
5
5/60 = 0.083
50 + 5 = 55
55/60 = 0.917
36.5 – 45.5
3
3/60 = 0.050
55 + 3 = 58
58/60 = 0.967
45.5 – 54.5
2
2/60 = 0.033
58 + 2 = 60
60/60 = 1.000
9.5 – 18.5
L
Cumulative
Frequency
fm
median in
here
n = 60
L is the left boundary of the class containing the median
i is the class width
fm is the frequency of the class containing the median
cfb is the cumulative frequency of the class before the class
containing the median
7
Median of a data set (grouped frequency
distribution)
i
50% of n  cf b 
median  L 
fm
Class
Boundaries
Frequency
Relative
Frequency
0.5 – 9.5
16
16/60 = 0.267
23
18.5 – 27.5
i 9(
cfb
Cumulative
Relative
Frequency
16
16/60 = 0.267
23/60 = 0.383
16 + 23 = 39
39/60 = 0.65
11
11/60 = 0.183
39 + 11 = 50
50/60 = 0.833
27.5 – 36.5
5
5/60 = 0.083
50 + 5 = 55
55/60 = 0.917
36.5 – 45.5
3
3/60 = 0.050
55 + 3 = 58
58/60 = 0.967
45.5 – 54.5
2
2/60 = 0.033
58 + 2 = 60
60/60 = 1.000
9.5 – 18.5
L
Cumulative
Frequency
fm
median in
here
n = 60
9
median  9.5  50% of 60  16
23
9
9
 9.5  (30  16)  9.5  (14)  9.5  5.48  14.98
23
23
8
Median of a data set (grouped frequency
distribution)
Suppose 0.500 is IN the cumulative relative frequency column?
The median is the right boundary of the class with the 0.500.
This is illustrated in the example on the next slide.
9
Median of a data set (grouped frequency
distribution)
i
50% of n  cf b 
median  L 
fm
L
i 9(
Class
Boundaries
Frequency
Relative
Frequency
0.5 – 9.5
16
16/60 = 0.267
14
18.5 – 27.5
Cumulative
Frequency
cfb
Cumulative
Relative
Frequency
16
16/60 = 0.267
14/60 = 0.233
16 + 14 = 30
30/60 = 0.500
11
11/60 = 0.183
30 + 11 = 41
41/60 = 0.683
27.5 – 36.5
5
5/60 = 0.083
41 + 5 = 46
46/60 = 0.767
36.5 – 45.5
13
13/60 = 0.217
46 + 13 = 59
59/60 = 0.983
45.5 – 54.5
1
1/30 = 0.033
59 + 1 = 60
60/60 = 1.000
9.5 – 18.5
fm
n = 60
median in
here
Right boundary
of the class with
the 0.500 ✓
9
median  9.5  50% of 60  16
14
9
9
 9.5  (30  16)  9.5  (14)  9.5  9  18.5
14
14
10

Purchase answer to see full
attachment