Intro to Stats, Test 1

R. Sinn, Spring 2007                                                                             Student Solutions & Points Scoring Guide

 

Test is out of 90 Points Total

Directions

*       Answer each question in full. In some cases in statistics, more than one answer may be counted correct.  Please justify your statements if in doubt.

*       You may use a graphing calculator on any problem. 

*       Please mark “CALC” in the margin next to any steps performed on the calculator. 

Questions 1 and 2 relate to Histogram 1 shown at right.

 

1.        [5 Points] Check all the statements that apply:

q       The distribution is approximately normal.

q       The distribution is skewed left.

q       The distribution is skewed right.

q       The mean and median are approximately equal.

 

2.        [5 Points] Check all the statements that apply:

q       The mean is likely to be less than the median.

q       The mean is likely to be greater than the median.

q       The data set is not likely to contain outliers.

q       Outliers likely exist, and the majority are on the left.

q       Outliers likely exist, and the majority are on the right.

 

Questions 3 and 4 relate to Histogram 2 shown at right.

 

3.        [5 Points] Check all the statements that apply:

q       The distribution is approximately normal.

q       The distribution is skewed left.

q       The distribution is skewed right.

q       The mean and median are approximately equal.

 

4.        [5 Points] Check all the statements that apply:

q       The mean is likely to be less than the median.

q       The mean is likely to be greater than the median.

q       The data set is not likely to contain outliers.

q       Outliers likely exist, and the majority are on the left.

q       Outliers likely exist, and the majority are on the right.

 

[Grading for Q’s 1 - 4: 5 Pts -1 Pt for each correct option not marked, and -1 Pt for each incorrect option marked.]

 

Regression

 Statistics

Line of Best Fit

  y = a x + b

r

-0.4827

a

-0.372

R2

0.2230

b

7.483

5.         [10 Points] Essay: During the Fall semester 2006, a team of NGCSU students found a correlation between age (x-variable, measured in years) and interest in rock climbing (y-variable, measured on a scale from 1 – 10, with 10 indicating high interest and 1 indicating no interest).  Write 2 – 5 complete sentences explaining everything you know and can infer based upon the output shown to the right.  Be sure to focus your analysis upon real-world implications.

 

The researchers found a strong, negative real-world (moderate negative book) correlation between age and interest in rock climbing [2 Pts].  They found that 22% of the variance in interest in rock climbing was accounted for by age [2 Pts], with older people less interested than younger [1 Pt].  The slope from the prediction equation indicates that for each one year older a person is, his or her interest in rock climbing will decrease by .37 units [2 Pts].  Following directions [3 Pts]: awarded for writing in complete sentences and for connecting statistical output to the real world research scenario.


6.        [5 Points] A statistician computes a regression comparing a father’s height (in inches) to his daughter’s height (in inches).  She finds that R2 = .39.  Interpret this finding.

 

This means that 39% of the variance in a daughter’s height is accounted for by father’s height.

7.        [5 Points] Given Scatter Plot 1 shown at right, check all the statements that apply:

q       Linear Regression is appropriate.

q       Linear Regression is not appropriate.

q       The (linear) correlation is positive.

q       The (linear) correlation is negative.

q       The (linear) correlation is strong.

q       The (linear) correlation is weak.

q       No linear correlation exists.

 

[Grading for Q 7 & 8: 5 Pts -1 Pt for each correct option not marked, and -1 Pt for each incorrect option marked.]

8.        [5 Points] Given Scatter Plot 2 shown at right, check all the statements that apply:  

q       Linear Regression is appropriate.

q       Linear Regression is not appropriate.

q       The (linear) correlation is positive.

q       The (linear) correlation is negative.

q       The (linear) correlation is strong.

q       The (linear) correlation is weak.

q       No linear correlation exists.

 

 

9.        [10 Points] A tallish female college student was studying dating patterns.  She wonders if tall women tend to date taller men than do short women. She measures the height of several women in her dorm; then she measures the next man each woman dates. Here are the data (heights in inches), and linear regression is appropriate:

 

Women (x)

66

64

66

65

70

65

Men (y)

72

68

70

68

71

65

 

See calculator screen shots at end of test for correct calculator steps.

 

a.        Find the correlation coefficient and analyze it.

 

r = 0.565 which indicates a strong, positive relationship (real-world) [3 Pts]

 

b.       If a woman is 5’5” tall, estimate the height of her boyfriend.

 

Plug in 65” for x in line of best fit: y = 0.68 x + 24 → Boyfriend will be approximately 68.2” tall [2 Pts]

 

c.        Analyze R2 in the context of this problem.

 

R2 = 0.32 meaning that 32% of the variance in boyfriend height is accounted for by the girl’s height [2 Pts]

 

d.       Analyze the slope of the prediction equation.  How meaningful is this relationship (the one between the variables, not the dating relationships)?  Hint: refer to your answer in part (c).

 

For every 1 inch taller the woman is, her boyfriend is about 0.68 inches taller.  This is moderately meaningful, since a third of the variance is accounted for [3 Pts]

Record year

x-variable

Time (seconds)

y-variable

1967

2286.4

1970

2130.5

1975

2100.4

1975

2041.4

1977

1995.1

1979

1972.5

1981

1950.8

1981

1937.2

1982

1895.3

1983

1895.0

1983

1887.6

1984

1873.8

1985

1859.4

1986

1813.7

1993

1771.8

10.     [10 Points] The table to the right shows the progress of women’s world record times (in seconds) for the 10,000-meter run.  Answer the following:

 

See calculator screen shots at end of test for correct calculator steps.

 

a.        Find and analyze the correlation between record time and year.

 

r = -0.97 indicating a strong, negative relationship (real-world) [3 Pts].

 

b.       Analyze R2 in the context of this problem.

 

R2 = 0.94 meaning that 94% of the variance in world record times is accounted for by year [3 Pts].

 

c.        In what year (according to your prediction equation) will the women’s 10,000-meter world record time be 5 minutes?

 

Plug 5 x 60 = 300 in for y in the line of best fit: y = -19.9 x + 41373 → The world record will be 5 minutes in the year 2064 [3 Pts].

 

d.       Explain why the correct answer to part (c) makes very little sense in a real-world context.

 

The problem is the scope of the model [1 Pt].  The linear relationship is valid for the very narrow range of years covered in the data set (see scatter plot), but obviously there will come a time when the records reach some limit beyond which significant gains are not humanly possible.

 

11.     [5 Points] Five hundred drivers were asked about the age car they drive (in years).  The boxplot below shows the data collected on the ages of the 500 cars.  For parts (a) through (c), select the one best answer choice.


a.        The median age of cars in the study is: [2 Pts]

q         4

q         8

q       12

q       not available in the

        information provided

 


b.       The mean age of cars in the study is: [2 Pts]

q         4

q         8

q       12

q       not available in the information provided


c.        The percentage of cars reported to be more than 12 years old was: [1 Pt]

q         0

q       25

q       50

q       75

 

 


 

Cause of Death

 

Cancer

Heart Disease

Other

Smoker

135

310

205

Non-Smoker

55

155

140

12.      [5 Points] Is smoking related to cause of death?  Use the Chi-Square test on the following 2-way table.  The mortality rates are for 1000 males in the 45 – 64 year old age category.  Test at the 0.10 level (just as we did in all 3 class examples).

 

See calculator screen shots at end of test for correct calculator steps.

 

Since p = .0154 which is less than 0.1, we have evidence for a relationship between smoking and cause of death.

13.      [10 Points] The data given below are the ages of students in a class:

 

        31    17    21    40    19    21    26    37    24    18    29    25    37    48    21    28    34    23    18    22    32

 

See calculator screen shots at end of test for correct calculator steps.

 


a.        Provide a standard data table for this data set, i.e. mean (), standard deviation (s) and sample size (n).  [3 Pts]

Min

17

Q1

21

Med

25

Q3

33

Max

48

b.       Provide the 5 number summary for this data set. [5 Pts]

 


 

27.2

s

8.35

 n

21

 

c.        Basing your conclusion only on the three numbers mean, median and standard deviation, do you think the distribution is skewed?  If so, explain why you think so and in which direction it is skewed (left or right)?

 

[2 Pts] The difference between the mean and median is 27.2 – 25  = 2.2.  The standard deviation is 8.35, and one tenth of it is about 0.84.  Since the difference of 2.2 > 0.84, we determine the data set is likely to be skewed.  Since the mean is greater than the median, we anticipate a skewed-right data set.

 

14.     [5 Points] You are given the data set below.

 

6

8

10

8

10

18

12

5

7

9

12

7

 

 

See calculator screen shots at end of test for correct calculator steps.

 

a.        Identify any outliers, and state how you did so.

 

Make a box plot w/ outliers ID turned on.  There is an outlier indicated to the right, so we suspect the maximum entry ( x = 18 ) of being an outlier [4 Pts]

 

b.       Compute the z-score for x = 18.

 

We first need the mean and standard deviation to use in the formula, so we run 1VarStats.  Then, we plug x = 18 into the formula: , and get .

 

 

 

9a & 9c                                                                                        9b

 

 


10a & 10b                                                                                  10c

 

10d

 

12

  

 

 

 

13

 

 

 

14