North Georgia College and State University        National Science Foundation

National Science Foundation Grant 0633264: Authentic, Career-Based, Discovery Learning Projects in Introductory Statistics.
Parternering with Georgia Perimeter College and Forsyth Central High School. Contact Robb Sinn or Dianna Spence for details.


Authentic Discovery-Learning Projects in Statistics
Instructor Guide



Annotated Table of Contents

These materials were designed to help mathematics instructors implement survey-based statistics projects into an introductory statistics course. Major hurdles for those without experience doing survey-based statistics research will likely include the survey design, participant rights (IRB) and grading. These topics are covered in detail, along with hints for collaborative group work and other topics.

Instructional Materials Home Page

Links to all student guides and instructor guides.

Overview of Grant Project

Read this brief Project Overview and glance through the appendices at some of the Research Literature relevant to the project together with the Preliminary Findings during the grant proposal stage.

Course Design Considerations

We plan for a regression project completed about midterm and a group comparison (t-test) project at the end of the semester. Project teams (regression) are instructor-assigned and grouped by career and/or major. Group comparison project teams have more flexibility in numbers (1-, 2-, or 3-person teams).

Student Help Guides

Students can use the Project Help Guide which covers both the regression and t-test projects, a help guide for building and analyzing Data Sets in Excel, and an extensive list of Variables and Constructs. This narrative Vignette tells the story of a team working on the regression project that will give insights for both instructors and students.

Instructor Resources

Extensive materials for both the Regression and Group Comparison projects are available. The Regression Project Materials include Hypothesis, Proposal, and Report Guidelines documents (in Word format to be tailored to your class) together with a grading Assignment and Rubric. The Group Comparison (t-test) Project Materials include Proposal and together with a Assignment and Rubric sheet.

Grading and Rubrics

Does project grading take forever? Not if you have the right rubrics and score sheets. This section contains helpful hints and links to several different choices for rubrics you can use or alter for your needs.

Collaborative Groups

Hints for assigning and directing teams. This guide presumes instructor-assigned teams based on career interest and/or majors for the regression project. More freedom is offered for the t-test project.

Research Techniques

Guidelines for helping with survey design, sampling and reporting requirements for regression analysis and t-tests. Good surveys make good projects. View the Variables page for ready-made ideas grouped by topic. Also includes discussion of the IRB and participant rights.

Technology

The projects require only MS Office (Word, Excel, Powerpoint) and a graphing calculator. There is an extensive Excel Help Guide for creating and analyzing data sets in Excel (both 2003 and 2007 versions). We do not use the statistics add-on. Student write simple formulas, make frequency tables, construct their own charts, graphs and perform a regression analysis. For t-tests, most students use their graphing calculators.

Reporting Statistical Conclusions

This section gives details on the analysis we expect for each oral and written presentation of statistical findings, both for regression and t-tests.



Overview

Principal investigators Robb Sinn and Dianna Spence have utilized funding from the National Science Foundation to study the use of authentic discovery-learning projects in introductory statistics courses. The purpose of this guide is to help instructors with no prior background in social science research or statistics-based survey research to facilitate students’ authentic discovery projects.

The scientific method suggests that we hypothesize, design an experiment, collect data, analyze the data, and draw conclusions. The regression project demonstrates to students how they can design a survey instrument (experiment) and collect and analyze their data using statistics. Because they formulate the research idea themselves, they become invested in the outcomes and analysis. The goal for these projects is to guide them through the scientific method in an authentic way where they use survey-based instruments and statistical analysis. The Group Comparison project expands their knowledge of research techniques.

A typical semester begins with a regression project ( n > 100 ) spread over about six weeks, starting about the third week of class meetings. Students present their findings orally in class and turn in a written research report. In the final three weeks of the semester, students complete a t-test project ( n > 25 ). The variables studied in the regression project can serve as estimates of population variables for the t-test project. We suggest teams of three for the regression project. The t-test project can be conducted so individuals or 2- or 3-person teams may work together with reasonably equivalent tasks to complete. We suggest instructor-assigned groups for the regression project based upon majors and career interest.

The two project units are designed as stand-alone, textbook-independent assignments. Data analysis is accomplished with standard Microsoft Excel functions (and not the statistics add-on package). Class presentations use PowerPoint. Project write-ups are submitted in Word. Graphing calculators are most typically used for the t-test calculations on the second project. The low technology requirements suit most students and today’s classroom technology environment.

A comprehensive student guide leads students through the process, offering answers to typical questions and suggestions for avoiding the main hurdles students often face. This guide will help lay the foundation for directing these projects, give examples of rubrics and grading outlines, and offer suggestions.

This guide is drafted to be readable, yet thorough. Routinely you will find FAQ’s within the text that link to more complete and comprehensive information. We also will link liberally to the Student Guide and other materials students will use for their projects (coming soon!!).

FAQ’s

What does the research say about the likely results of using this approach?
What are the preliminary findings from the pilot study?
Why do the projects ignore by-hand calculations?



Course Organization

These instructional materials presume regression topics are covered before hypothesis tests. We suggest a Regression Project with four check-in points and deadlines spaced across four to six weeks, due about midterm, followed by a late-semester Group Comparison Project (t-tests). These guides are written so that most of the research topics students must learn (how to develop a survey, representative sampling, how to write good questions) are taught in the regression project assignment.

The literature indicate that overtly connecting research experiences to future career topics may be beneficial. For the Regression Project, we suggest instructor-assigned teams of three with teammates chosen based upon similar majors and/or career interests. We ask each team to choose at least one study variable related to their major and/or career. The data sets are large ( n > 100 ) and generally require teamwork to analyze and report upon. We allow self-selected 1-, 2- or 3-person teams to complete the group comparison project, with each person responsible for collecting a data set for a specific sub-population ( n > 25 ).

We generally begin the semester with descriptive statistics and by the third week of a typical 16-week semester begin discussing regression. We (Dianna and Robb) both have technology projects during the first couple weeks of class that help students learn to use their graphing calculators and Microsoft Excel to make various graphs and charts and to calculate descriptive statistics.

We suggest the two projects account for at least 20% of each students final course grade. For team projects, we assign each person the same grade for at least 80% of the overall project grade. The amount of course credit earned on the projects will determine how hard students will work on them. You are the expert in your class, so we leave this issue wide open. But students will generally get more learning from the project if it is worth at least a letter grade (10%). I (Robb) generally have each project count as a test grade, with projects counting as much as 40% of a student’s final course grade.


Class Time Required

Total project-related instruction is about 10% of a typical semester, or 4.5 instructional hours across a 45 class hours. The project presentations take one class period (85 minutes at our university). I (Robb) also use one class period to help students construct and analyze their data sets in Excel. The rest of the class time used is for short help sessions of 5 – 15 minutes during which we answer questions and give teams time to organize their efforts. The total classroom timed required for the Group Comparison project is less than half an hour. We typically talk about the project at assignment time (10 minutes) and offer one or two short help sessions (5 minutes) for questions. All project work is completed outside of class, and no presentations are given.

We do not generally provide much direct instruction in how to develop representative samples, how to develop good surveys or how to write good questions. We find the students have a great deal of informal knowledge about these topics and can generally learn “as they go” during the regression projects. We find that the group comparison project requires very little direct instruction.

This guide provides a generic set of project assignments and check-in points together with suggestions for assessing them. We also offer several different versions of the assignments and grading rubrics so that you can explore and find one that suits your teaching style.

Time Line: Linear Regression Project

To use our authentic discovery approach, instructors must insure regression topics are covered early in the semester, preferably within the first three or four weeks of a typical 16-week semester. A suggested time line for the regression project is as follows:

I (Dianna) use a more abbreviated schedule of 4 weeks due, in part, to having access to a computer lab one day per week. This allows my classes to receive more instruction in data analysis in Excel prior to beginning the linear regression project. I (Robb) spend one entire class period working with teams to structure their data sets in Excel, after they have completed data collection. We both allow nearly 3 weeks for the initial brainstorming, project idea development and survey drafting.

Giving students plenty of time to develop a solid research idea and survey is essential. High quality surveys generate much more useable data. Experience suggests that too little survey development time reduces the quality of the projects more than any other factor.

Note that we (Robb and Dianna) both have at least one technology project or assignment prior to the linear regression project. This assignment helps students familiarize themselves with data collection, data entry and basic formulas to use to tabulate and summarize data. Generally, we use the same project teams for the early projects that we intend to use for the regression project. The teams for the t-test projects are more flexible, and we allow students to pick their groups or to work by themselves.

Time Line: Group Comparison Project

The major hurdle here is assigning the project early enough that grading them can be completed prior to exams. We generally give students about two weeks to work on the assignment. Both of us cover Chi-square and give a brief introduction to ANOVA in the final two weeks of the semester which provides time for the students to complete their projects after several weeks of instruction in hypothesis testing and t-test procedures. I (Robb) do not focus heavily on z-tests and do not allow my students to use them in their group comparison projects. Instead, I cover all hypothesis testing using t-test procedures with an overview of group comparisons (including ANOVA) to begin. This allows students to start working on their projects while they are learning about t-tests.

We both use the same typical time line. Assign the projects. One week later, proposals are due. A week after the proposals are returned and approved, the Project Report is due. The proposal deadline forces teams to form – it’s surprising how much time they can waste choosing teams, if allowed to. They also are required to brainstorm an idea and generally get to work choosing sub-populations to study and a research variable. These tasks do not require more than an hour (generally) since the Regression Project has taught them the process. The Proposal deadline simply forces them to get organized.

Data collection is much simpler in the Group Comparison Project and generally requires less an hour of effort. Also, since the survey is much shorter (2 – 3 variables compared to 7 – 8 for regression) and because they use their graphing calculators to perform their tests, the analysis step requires only a few minutes. The total time required to complete a reasonably elegant Group Comparison Project is 5 hours or less for most teams.


Project Deadlines and Tests

We generally schedule a unit test or midterm covering regression topics before the Presentations and Final Report. We have found the students generally make better presentations if they have studied regression topics prior to their project analysis and presentation. The straightforward regression questions used for in-class assessments typically help them grasp the big ideas needed for their Regression Project analysis. For the Group Comparison Project, we generally do the opposite having the project due before the final unit test of the semester. Because of the difficulty students have with hypothesis testing structure and analysis, completing the project generally improves their performance on in-class assessments. There is no hard and fast rule, and we have done various rearrangements of this schedule, depending upon the semester.

The Regression Project requires a lot of work to complete the Presentation and Report tasks. Having a unit test or midterm near these deadlines generally stresses the class out. We suggest having a test at least week before or after these deadlines. I (Dianna) typically schedule a unit test just prior to presentation day, with approximately a week after the test for presentations and reports.

The Group Comparison Project is less time-intensive. We generally try to have them due prior to the unit test covering hypothesis testing for learning purposes – students perform better on the hypothesis tests if they have completed their projects.

Fall semester with its Thanksgiving Break right before finals is tricky. I (Robb) tend to have the Group Comparison Project due before Thanksgiving and a test right after. This allows time for grading both before exam week.

FAQ’s
Why do you not allow z-tests for the group comparison projects?



Instructor Resources


The Linear Regression Project

Part of the NSF research we are conducting tests whether or not career-specific survey projects will improve understanding and attitudes toward statistics. The instructor should therefore consider how to develop teams that have similar majors and/or career interests. Of course, it is impossible to have every team comprised of a single major without extreme serendipity. We often group similar majors on a team, for example, putting psychology and sociology majors together, nursing majors with biology majors, or accounting majors with marketing majors. Even with this leeway, it is often difficult to match each team appropriately. When we cannot, we simply ask teams to be creative and try to include ideas from as many team members’ majors as they can.

We suggest four check-in and/or assessment points. We typically give the survey rough draft a good deal of attention informally. We feel it is vital to formally assess the project proposal prior to allowing students to collect data. We check-in with teams at the following points:

  1. Brainstorming & Team Formation
  2. Survey Draft (informal)
  3. Project Proposal & Survey Final Draft
  4. Oral Presentation
  5. Written Report of Findings

We assign all grades at the end of the project. I (Dianna) formally assess the project proposal which counts about 10% of the final project grade. We both provide informal feedback at the hypothesis stage but not a formal grade. I (Robb) return the project proposals with extensive comments but do not assign a grade.

We include a team evaluation that is kept private between each student and the professor. This allows the instructor to determine any problems with teammate shirking duties or behaving in an unprofessional manner during the project. I (Robb) also include a peer-evaluation for the classroom presentations which has three advantages. First, it keeps the class focused, quiet and awake during presentations. Second, it allows for the “team evaluation” to be completed. Third, I have the students write a sentence justifying their evaluation marks, and their comments often help refresh my memory and add to my own notes when I see dozens of student presentations in the same day.



Regression Assignment Sheets and Rubrics

We have created generic assignments and grading rubrics that match up well with the Student Guide, but we have also provided several options. We have developed various methods over time. We also use slightly different approaches as instructors. For example, I (Robb) started using a template with prompt questions to help students with their written regression reports. Team members were assigned to jobs, and each job had a one-page section in the report. I created versions for both 4-person teams and 3-person teams.

I (Dianna) use an outline format for both project reports which include the section headings I expect to see in their report. Each section heading is followed by a description of the needed content. Teams can divide up the work as desired. The report sections roughly match the rubric I use to grade the reports.

We felt this instructor’s guide would be most beneficial if several options were available. Please feel free to scan the various documents and choose one that best fits your own style and instructional goals. Each has been provided in Word format so that you can alter it for your class and needs.

Project Assignment

  • Generic. Assignment with extensive rubric based on Dianna’s “section heading” report format.
    • Team Tasks (4). Project design for 4-person teams with team tasks assigned (Robb).

Hypothesis

  • Generic. Based on Dianna’s preliminary project proposal.

Project Proposal

Report Guidelines

  • Generic. Based on Dianna’s assignment.
  • Team Task (4). Robb’s earliest attempt (2004) with team task and template for report.
  • Team Task (3). Robb’s revised attempt for 3-person teams, tasks – still a template format.
  • Update ’08. Robb’s latest attempt (3-person teams), includes Chi-Square 2-way test option.

Peer Review for Presentation Day

  • Generic. Robb’s Peer Review and Team Assessment. Each 3-person team performs 6 total peer evaluations (2 each). They also must evaluate their teams.

Rubrics

  • Generic. This is included in the Project Assignment.
  • Short. Robb’s shortened version. Less comprehensive, not matched to section headings.

Team Evaluations

  • Dianna’s Team Evaluation (similar to Robb’s team assessment in Peer Review for Presentation Day above). Excellent for ferreting out problems within teams.

The Group Comparison Project (t-tests)

Most details of this project match the Regression Project: good survey design, good variables and constructs, representative sampling, and so forth. The overall structure is rather different. Each student determines a subpopulation they wish to study. For teams with two or more members, we often have to remind them they need subgroups that cannot overlap. Comparing Greeks vs. Varsity Athletes is not appropriate. We need to sample Non-Greek Varsity Athletes and non-Athlete Greeks.

The resources we have provided for Group Comparisons are less extensive simply because the project is much easier for students (and instructors!).

Project Assignment

  • Generic. Assignment with extensive rubric based on Dianna’s “section heading” report format.
  • Alternate. Robb’s version.

Project Proposal

Rubrics

  • Generic. Based on Dianna’s assignment.
  • New Format. Robb’s version shows an interesting way to format a rubric. Worth a look.

Teams for Group Comparison Project

We allow students to choose their own teams, and we allow teams of any size up to 4 members. Two person teams are easiest, and you may wish to only allow 2-person teams to make your first experience simpler. The project details obviously differ depending upon number of team members.

Individuals. One person teams need either estimates of population variables or access to another data set. I (Robb) post a list of recently used survey questions from the Regression Projects on my website, together with an estimated average (often aggregated over several surveys and several semesters). I only use variables from teams that seemed to do a solid job on survey design and sampling. This provides a useful set of variables for individuals to use, and they perform one-sample t-tests using whatever sub-population they wish to study.

2-Person Teams. The independent samples t-test is perfect. Each person collects a sample ( n > 25 ) from a sub-population. They then perform the t-test.

3- and 4-Person Teams. Each person chooses a subpopulation and collects a sample ( n > 25 ). It is vital that all three teammates use exactly the same survey, so they need to consider how they will demonstrate “representative” sampling before they begin. We generally have the team conduct all possible independent samples t-tests despite the flawed study design.

We point out to our students that t-tests procedures are definitely not supposed to be used for 3-group and 4-group comparisons. ANOVA is the correct test. However, we don’t have time to cover post hoc tests, so having students use ANOVA can be a problem. There are two very easy and very simple approaches to avoid teaching bad habits: (1) do not allow teams of 3 or more, and (2) use the multiple t-test approach but explain why it’s being done and what’s wrong with it. The third slightly more complex approach that I (Robb) favor is to teach them enough about ANOVA and Tukey’s HSD post hoc procedure to do an ANOVA.

FAQ’s
What’s wrong with multiple t-tests?
What do you mean by conceptual problems with ANOVA post hoc procedures?



Grading and Scoring Rubrics

We often hear folks say that “Grading projects takes forever.” While this can be true, the key to efficient scoring is having a rubric with point totals and clear descriptions of the work product required. Developing a good scoring rubric is time consuming. We have provided several ready-made rubrics so that you can find one that matches your style and preferences, then edit as required.

Linear Regression Rubrics

  • Generic. This is included in the Project Assignment above.
  • Short. Robb’s shortened version. Less comprehensive, not matched to section headings.

Group Comparison Rubrics

  • Generic. Based on Dianna’s assignment.
  • New Format. Robb’s version shows an interesting way to format a rubric. Worth a look.

The projects and grading go much more smoothly if the students have access to the scoring rubric before they start the project, provided the scoring rubric clearly describes the work product required for each component. We both provide the scoring rubric we intend to use with Project Assignment document. We also suggest including both the “presentation” scores and the “write-up” scores on the same sheet. This requires a bit of paper shuffling on presentation day but makes the final scoring much easier. All of our ready-made rubrics meet these criteria.


Efficient Grading

We both estimate that grading tests for a 35-person class takes about 2 hours. I (Robb) can generally grade 3-person projects in about the same time it takes me to grade a stack of tests. I (Dianna) can generally grade them in 3 – 4 hours. We both grade the presentations while they are being made, so this requires no additional time outside of class. I (Robb) grade each project in about 8 – 10 minutes. This equates to taking about 3 minutes per test.

We do not grade the data file. We tell students (and wrote in the Help Guide) to always include their data files for partial credit. It is usually easy to detect calculation errors from the text, and we find ourselves opening data file only to see what train wreck occurred. We generally ask for Project Reports to be emailed. Emails can make deadlines fuzzy and create other problems, but we find the grading easier with digital copies. This also allows the Excel data file to be accessed easily if needed.

I (Robb) read each written report once, and then mark all scores on the rubric. I (Dianna) tend to use the rubric as a checklist, marking scores as I see each item completed. I (Robb) tend to use shorter, less comprehensive rubrics since my “all-at-once” scoring requires extreme familiarity with the rubric. Keeping interruptions to a minimum also helps, since I have to start over even if I’m 5 minutes into a paper when the phone rings.

However you choose to grade the papers, you will find it gets much easier after only a few projects. The downfall of typed mathematics is that mistakes are easily made. The beauty of typed mathematics is that mistakes are easily caught. You will develop a system that works, and you will quickly learn what point values you feel each mistake is worth.

Feedback

I (Dianna) typically provide written feedback on the report document, either typing within the digital document itself (emailing the updated version to the team) or writing longhand on a printed version. I (Robb) simply write a numeric score for each component and circle or highlight words in the rubric to indicate what was lacking. We generally make copies of the score sheet for each member of the class so that everyone understands why they received the grade they did.



Working with Collaborative Teams


Assigning Groups: Regression Project

The research we are conducting attempts to “force” students to study variables of interest to their major and/or future career. Thus, it is important to assign project teams so that similar majors are working together, especially on the regression project. Here are some considerations we use when assigning project teams.

FAQ’s
Do teams have to include similar majors?
Can individuals complete a regression project?
What if a team “melts down”?

How many members per team?

We have both used groups of 3 and 4, but we strongly recommend teams of 3. I (Robb) use teams of 2 when the 3-person teams do not work out evenly. I (Dianna) use teams of 4. With teams of 4, we have found a much greater chance one team member will do next to nothing on the project. Teams of 3 typically have an easy democratic feel, allowing a majority of 2 to win any votes or discussions. Early on, brainstorming and then consensus is important which is much easier with smaller groups. And finally, dividing up by majors means several small subgroups already exist, and it’s easier to go with smaller teams to fit that requirement. I (Robb) offer a few bonus points to each team of 2 since the same amount of work is required. We need a large sample ( n > 100 ) for regression analysis.

How to assign teams?

There is no perfect way to select teams. The regression teams must have similar majors. Other concerns include democracy (allow student choice), competency (insure basic skills across teams) and student schedules.

Democracy. Often student-selected teams have fewer problems “getting along” than instructor-assigned teams. This depends upon the class, obviously. If you wish for students to have input on their teammates, consider separating class by major, with a different part of the room for each one. Let individuals pick their own 3-person teams within their subgroups. Instructors must match up stragglers, but this is generally not too time-consuming.

Competency. I (Robb) often give my students a “competency test” to determine the statistical research skills of each person. After determining skill sets, the professor divides the roster by majors and tries to insure complementary skill sets for each team. To determine competencies, I give each student the following survey:

I try to mix and match competencies. When the class size requires me to form 2-person teams, I generally try to have at least one team member with strong competencies across the board and the other with at least 3’s in each category.

Creating teams in this fashion takes about half an hour for a class of 35. I make a spreadsheet with each student’s major and 3 competency scores and start forming teams. I make a formula that totals each team’s overall competencies and check the individual competencies as I move students into different groups. Some majors tend to have overall lower competencies. While this can be time-consuming, it also seems to encourage groups to think that were matched up with others whose strengths will complement their weaknesses.

No formal part of the Regression Project specifically requires the graphing calculator. What I’ve found is that those who rate themselves as very good with the graphing calculator tend to learn regression topics more quickly and easily. They tend to be the ones in the group with the best feel for the analysis steps.

Student Schedules. Another way to select groups is to take their schedules into account. Split the room into four sectors based on “most campus free time available.” Typically, I use MWF mornings, MWF afternoons, TR mornings and TR afternoons as categories. I (Robb) send the class to the sector where they have the most availability and ask them to find folks with similar majors. This presents obvious problems with stragglers who have completely incompatible schedules (or say they do), but it does provide a way for the majority of the teams to have times they can meet together outside of class.

This research is based on the premise that having students study topics and variables related to their major and/or career will help them see more value in statistics as a discipline. However you assign teams, consider “forcing” teams to have similar majors and/or careers.


Assigning Groups: Group Comparison Project

Pairs are the perfect group for completing this project. We allow student to choose their own groups, and we allow teams of any size from individuals up to 4-person teams. You will have to decide what to do with 3-person teams and larger since the correct research design is ANOVA. The easiest method is to allow only 1-person and 2-person teams, but this requires more grading time. I (Dianna) prefer that 3-person and 4-person to collect larger samples and to include two independent variables so that they can perform two highly accurate comparisons. I (Robb) work with these teams to teach them ANOVA methods and Tukey’s HSD post hoc procedure.

For individuals who wish to work alone, we suggest posting the means from the numeric variables studied in the regression project. Generally, we have 11 or 12 regression projects with 3 numeric variables. There is some overlap, but this provides 20 – 25 numeric variables the mean from which can be assumed to estimate the population average. Individuals can select one of the variables and a subpopulation to compare against it.



Research Techniques

Survey Design

For the Regression Project, we require our students to use at least 3 demographics variables (but no more than 5). We also suggest requiring 3 numeric variables that all should be correlated. This allows for 3 possible regressions, one of which generally “works out.” Negative results (no correlation found) are fine, but they make completing all of the regression analysis steps more difficult (and much less meaningful).

For the Group Comparison Project, we require only 1 or 2 demographic variables and only 1 research variable. They learn the essentials of survey design during the Regression Project, so this section will focus on that. Teams generally have little trouble with survey design on the final project.

A good survey is short, typically not more than half a page, since we require the good will of campus to allow us to collect data. The more concise the survey, the more likely it is that participants will carefully read and respond to each question.

Responses must be anonymous to insure participant confidentiality. Do not allow students to put a “name” blank on their surveys.

We allow hierarchical variables and ordinal scales (responses to Liker scale opinion questions and ratings), as well as variables which are structurally numeric. An example of a hierarchical variable is class status: “freshman, sophomore, …” This variable often works well (seniors consume more caffeine than freshmen, it seems). Consider its numerical equivalent: “what number of credit hours have you completed?” Both have their place as regression variables. The most obvious examples of Likert scale questions are the opinion questions to which we respond with “strongly disagree” or agree. We can also use self-ratings, for example, “On a scale of 1 to 10, 10 being perfect, I would rate my study habits as a ___ .”

There is debate in the research community about the legitimacy of ordinal scales and ratings. The most useable ordinal scales arise from variable constructs, where several questions are asked and the overall scores totaled for a single metric. Here is an example:

Self-Esteem (5-point Likert scale e.g. SD=1, D=2, N=3, A=4, SA=5):
   I have a lot of respect for myself.
   I feel like I am just as good as anyone else.
   I am pretty satisfied with myself as a person.
   I feel like I do have more good qualities than bad qualities.
   I believe my parents should be proud of some of the things I have done.
   I rarely get down on myself.
   I am pretty happy with the type of person I have become.
   A lot of my friends have more weaknesses than I do.
   I believe that most people enjoy being around me.
   I like myself.

We sum the response totals (an easy formula in Excel). The range is 10 to 50, with larger numbers indicating higher self-esteem. Constructs like this one generally provide very useable data, even when students create them from scratch.

FAQ’s
What is the issue with Interval vs. Ordinal Data?

Writing questions that yield useable data is an art form. Below is a sample survey, typical for our students’ projects. Following the survey are the important teaching points we generally emphasize during class and in our feedback on the Project Proposal:

Demographics

We generally suggest using a “check all the apply” or “circle all that apply” format for the demographics variables. This is quick, easy to read and also limits who should respond (when necessary). For example, some teams do not include graduate or post bac students. Leaving this option off the survey generally stops participants who are graduate students from wasting their time when their surveys would not be sued.

Good Numeric Variables

Numeric variables that work best have at least 7 – 10 options. This generally allows a sample of 100 responses to form an approximately normal distribution. When the choices are limited, for example using 1 = freshman to 4 = senior, the scatterplot is just a grid of points all lying on top of one another. And of course, the underling distribution isn’t likely to be normal. Credit hours completed would, in general, be a better numeric variable.

Likert scale questions (1 = strongly disagree, 2 = disagree, 3 = neutral / no opinion, 4 = agree, and 5 = strongly agree) typically turn out well provided answer choices spread out across all five categories. We can make larger scales. The above example is a 5-point scale. The Likert scale question in the sample survey is a 6-point scale. The best way to get a good opinion variable is to group 3 or more opinion questions together and sum the responses. These constructs are covered in more detail below. We can also ask participants to rank themselves on a scale of 1 to 10. These questions usually turn out rather well, depending on what is being rated.

We have provided a long list of Variables and Constructs. Please advise your students to develop their own variables and constructs rather than blindly using only the suggestions. They will learn more about real-world research by developing a construct themselves. And every team we’ve worked with that has developed their own scale or construct has generated very useable data. Their presentations are interesting and often provide the class with a great learning opportunity.

The list of suggestions is a fall-back mostly intended to help students brainstorm ideas related to their major or career interest and to help them avoid many common pitfalls. The answer choices provided are not the only ones that could be used, and many of the questions could be stated differently. The constructs, however, were provided by faculty experienced with survey-based research and the constructs in their disciplines. Students can and should use one of the constructs if it interests them, especially if it is related to their career interests.

We are working to shorten the various constructs into 5 – 8 questions, in some cases using faculty experience and expertise to narrow down the questions, in some cases using our students to help collect large data sets ( n > 300 ). We are also interested in any student-developed constructs your students produce. Please let us know if you come across interesting ideas.

Answer Formats

We suggest students used closed response answer formats for most variables. Open response answer choices are begging for problems where participants will give completely impossible responses. For numeric data, we simply list as many reasonable alternatives as possible. For age and GPA, an open response blank generally works well. For “number of hours studied in the last 3 days,” I (Robb) suggest closed-response choices with enough latitude for all reasonable answers. I (Dianna) generally prefer open-responses for numeric variables with more than 10 possible answer choices.

We also suggest not using “ranges” for closed response answer choices. Here are two examples of the “hours studied” variable:

In the last three days (not including today), how many total hours have you studied?
   0-2   3-5   6-8   9-11   12-14   15 or more

In the last three days (not including today), how many total hours have you studied?
   0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21+

Several problems exist with the first example: unequal category or bin widths, a possibility of not having a large enough “top end” category and (almost) too few hierarchical categories for a meaningful numeric scale. The second example allows for a much greater freedom of answers and does not take any more space on the survey.

Category Responses

We generally avoid using ranges in answer choices. However, when we use closed response formats, we actually creating a hierarchical variable where respondents choose the category closest to the correct numeric value. The “hours studied” variable above requires someone who knows that they have studied exactly 40 minutes in the last 3 days to circle “1.” This is typically not much of a problem, and we generally suggest listing category choices as a single number, not a range. The responses tend to be more accurate and much easier to enter into the data set. For many numeric variables (like hours studied), responses mainly will be estimates anyway. Forcing them into pre-selected categories is not likely erode accuracy.

For distributions like income that will be strongly skewed, having exactly the same “bin width” or category ranges is not necessary. Here is an example:

What was the total combined annual salary of the household in which you lived during high school (circle the best answer):
   $15,000  $30,000  $45,000  $60,000  $75,000  $100,000  $150,000  $200,000+

This question is asking about socioeconomic status with the lower two categories indicating “poor” and the top two indicating “wealthy.” As long as teams understand they are creating a hierarchical category question, not a numeric one, they can proceed. Note that means and standard deviations will not make sense in terms of dollar units, and care is needed with interpretation. This question is woeful (in our experience) simply because teenagers rarely have any idea how much money their parents make. It also does not differentiate for one vs. two income households. However, if each participant actually knew the correct answer, it would generate a useable regression variable since there plenty of options near the mean.

Opinion Questions and Ratings

Asking participants to rank themselves on a scale of 1 to 10 or asking them their level of agreement to a statement (strongly disagree to strongly agree) generally provides very useable data. If enough answer choices are provided, Likert scale opinion questions are very good. We can also gauge opinion pretty accurately by combining 3 or more Likert scale questions. See the Self-Esteem construct above. As stated earlier, the best numeric variables have at least 5 or 6 likely responses. The key is likely. Age turns out to be a rather good numeric variable since many of our students are older than 22. The responses cluster around 19 – 21 but still have several participants in their 30’s. Age often correlates better with other variables than class status, especially for political views, for example.

The following example might measure how extroverted a person is:

On a scale of 1 to 10, 1 being total introvert and 10 being total extrovert, rank yourself:
   1 2 3 4 5 6 7 8 9 10

Generally speaking, our students have found that responses range from about 3 to 10 and are approximately normal (apparently there is a stigma against being introverted – rarely do we find student ratings themselves as “1” on this scale, but 10’s are reasonably common). Another way to get at the same variable is to use several opinion questions with the typical strongly agree / strongly disagree format. Students might develop the following three statements:

   People think I’m very outgoing.
   I much prefer spending time with my friends to being alone.
   I go out to lots of parties and social events.

If we use the typical 1 = strongly disagree to 5 = strongly agree for each question, we will have a scale variable ranging from 3 = very introverted to 15 = very extroverted.

Student-developed constructs work well when the questions are simple, clear, topically related and different. They also tend to measure something very specific which is clear to the instructor. For example, the above “extrovert” construct ignores time with family. In a college population, this might not be problematic, but married people who are extroverts might score lower than singles.

Students also sometimes have trouble with reverse-coding. Imagine adding this question to “extrovert” construct:
   I love spending time alone.

In this case, we would reverse the scores in our data set, either by entering them “backwards” or using a formula (we math professors prefer the formula, but students – well, you have them, too). We code strongly disagree = 5 and strongly agree = 1 (and so forth) so that large positive values for each question indicates “extroverted” behavior and attitudes. Reverse-coded questions are very good research technique, since humans tend to bias toward “agreeableness.” They do confuse our students a bit with data entry and scoring.

FAQ’s
How do you help students develop a good research topic?
How do you verify the surveys are good enough?
How much time is needed to brainstorm a topic and draft a survey?
What major problems do students have drafting a survey?
How does this extensive project time line fit with course topics?


Sampling

We ask students to collect convenient data and to use their demographic variables to determine if their samples are representative. Our university keeps a “Fall Facts” document, updated yearly, that tracks a multitude of demographic variables. The students’ goal is to adequately represent the campus population. We do not attempt to simulate “simple random samples.” In real-world research involving human subjects, random samples rarely exist.

There are five basic ways to collect survey data that we list in the Student Guide.

  1. Convenience Sampling. A sample of volunteers. Since participation in our projects is voluntary, all samples will be partly convenience samples.
  2. Cluster Sampling.The researcher tries to get a 100% response from an entire group: everyone in a psychology class, everyone in a sorority, etc.
  3. Index Sampling. The researcher chooses every nth person to participate. For example, asking every 5th person who walks through the door of the chemistry building to take the survey.
  4. Targeted Sampling. Identify the demographic characteristics you need in your sample, and then go to places where those types of participants are likely to hang out.
  5. Stratified Sampling. You divide the population into groups, and attempt to sample a certain number from each group. Technically, stratified sampling includes randomly sampling from the strata or groups, but we won’t do much random sampling in these projects. Instead, we often use stratified sampling with targeted sampling to identify demographics groups we need more data from.

Most of our project groups use two or more of the above sampling types together. First, they gather the easy data by sampling their dorm hallway, a French class and twenty friendly-looking people in the food court (cluster). They now have 70 of their needed 100 surveys, so they quickly tabulate how many males and females have responded, how many freshman, how many Greek-affiliated, how many commuters. They compare their partial sample to the overall campus demographics (stratified) and find out how they’re doing. Oops. To many sophomore and not enough seniors. Too few commuters. And too few males. So they target senior commuters who are male to balance their sample demographics. Of course, everyone in their sample is a volunteer, so they also used convenience sampling. Is this completely representative? No, but it’s the best we can do without the project taking 5 semesters.

Our students generally use combinations of convenience, cluster, targeted and stratified sampling to compare their sample to the overall campus population. Suppose the population is 60% female, 40% male. We suggest that sampling within 5% is reasonably representative, for example, having 65% females would be fine.

Often, our students “fail” to achieve a representative sample. Do they fail the project? No, this is not a big problem, as long as two things occur. First, they must demonstrate they understand what sampling error(s) happened and develop a lucid plan to correct it. Second, they need to narrow their findings. For example, suppose a team collected data from 90% females. They should admit their findings only truly apply to the subpopulation of females, not campus as a whole.



Reporting Statistical Conclusions


Reporting Regression Findings

We feel the following steps are important in each regression analysis and expect each step to be completed and explained for both oral presentations and written project reports:

  1. Scatter Plot
  2. Regression (generate r, R2, and coefficients for line of best fit)
  3. Correlation (strong or weak, positive or negative relationship)
  4. Determination (R2, and percentage of variance accounted for)
  5. Slope Coefficient (change in y-variable due to a unit increase in x-variable)

In the case where students find multiple significant correlations, we suggest the following. The team should choose their “best” result and present it in detail, explaining and interpreting each analysis step. Other significant correlations can be presented in sentence form with a correlation value in parentheses at the end. An example of this short-hand analysis might be, “We found a weak, negative correlation ( r = - .23 ) between ‘nights out partying’ and GPA. ‘Nights out partying’ accounted for 5.3% of the variance in GPA.”

We generally use the following ranges for rating the “strength” of a correlation.

Strong | r | > 0.5
Moderate 0.5 > | r | > 0.3
Weak 0.3 > | r | > 0.15
Little or None | r | < 0.15

I (Robb) will usually allow teams to report | r | ~ 0.1 as “very weak,” especially when they will otherwise have no significant correlations to report otherwise. Real-world egression studies in the social sciences often have significant predictors with these tiny bivariate correlations.

Often students propose a fine study topic, develop a great survey, collect the data and analyze it only to find that … r ~ 0.00423. Most of our students immediately feel like failures. But this “negative finding” is not a problem. They have successfully demonstrated there is evidence the two variables are simply not related. I (Robb) use this example: “What if you found there was no correlation between hours spent partying per week vs. GPA? Wouldn’t that be great if it were true?” That’s an example of a negative finding that we might like to see. Another example is that high school GPA and IQ have much lower correlation than most folks would think. This negative finding provides incentive to “work hard” since IQ is not academically deterministic. Negative results are solid research results.

Negative results in regression studies (meaning lack of significance, not a negative correlation, e.g. | r | < .1 ) do cause a procedural difficulty with the project tasks, but they are not problematic in the broader scope of real-world research. For the project, since teams are supposed to generate a scatterplot, line of best fit and the regression statistics, we have a technical difficulty. I generally suggest the team do their best, report the regression statistics including line of best fit and analysis of the slope (even though it’s basically 0), but acknowledge that their research conclusion was “no relationship at all between the study variables.”

FAQ’s
Why analyze both R2 and r?

Why do you consider such tiny correlations significant?


Reporting t-test Findings

We feel the following steps are important in each t-test analysis and expect each step to be completed and explained for written project reports:

  1. Statistical Test correctly chosen and identified
  2. Hypothesis (null and alternate) set up correctly and with correct mathematical symbols
  3. Error analysis: correct statements for Type I and Type II error and implications stated in terms of the real-world implications
  4. Choose an appropriate value for alpha
  5. Run test and report test statistic and p-value
  6. Correctly Reject or Fail to Reject the null
  7. State the conclusion in terms of the real-world investigation conducted

We deduct points for hypothesis statements in words only. For error analysis, they must correctly identify Type I and Type II error in terms of their project. For example, comparing GPA’s for males and females, they might say, “Type I error (falsely rejecting the null) would mean finding a difference between males and females that does not truly exist.”

The decision for level of significance is almost obviated (alpha = .1 is extremely common). Generally, we use high values for alpha when sample sizes are small ( n < 30 ) and exploratory studies are being conducted. Generally, we used low values for alpha when sample sizes are large ( n > 100) and a confirmatory study is being conducted. We also adjust a based on Type I and Type II error, whenever one is very problematic. However, for these projects, we have small samples and investigatory formats which argue for high values for alpha.



Frequently Asked Questions

What is a Significant Correlation?
What is the issue with Interval vs. Ordinal Data?
Why do you not allow z-tests for the group comparison projects?
Why do the projects ignore by-hand calculations?
How do you help students develop a good research topic?
How do you verify the surveys are good enough?
How much time is needed to brainstorm a topic and draft a survey?
What major problems do students have drafting a survey?
How does this extensive project time line fit with course topics?
Why analyze both R2 and r?
Why analyze the slope of the regression line?
What are lurking variables?
Why do students struggle with causation when they analyze correlation?
What is the IRB?
Do teams have to include similar majors?
Can individuals complete a regression project?
What if the Group “Melts Down”?
What’s wrong with multiple t-tests?
What do you mean by conceptual problems with ANOVA post hoc procedures?

What is a Significant Correlation?

We use a very liberal cutoff for “significance,” typically suggesting to students that correlations of | r | > 0.15 are significant. Generally, in social science settings, | r | > 0.2 indicates a “significant enough” finding. We often allow students to present their findings as significant even when | r | = 0.1 or higher.

Most mathematicians gasp at this point. “What? What kind of crazy researcher would ever think of | r | = 0.12 as significant? That’s no correlation at all!” Several considerations influence our view.

  1. Social science variables are influenced by a multitude of predictor variables, so the bivariate correlations between two related variables is often “small.” An example is GPA, an independent variable which correlates positively with IQ, high school GPA, hours studied per week and many others. It also is negatively correlated with partying, drinking, and other destructive student behaviors. In behavioral research, rarely does a single predictor capture more than a small percentage of variance accounted for ( R2 ) in the independent variable.
  2. Multivariate regression is most often reported in the behavioral research literature, not bivariate regression. In the social sciences, we construct a “predictor model” of several variables with an overall R2 for the model reported. A single variable can easily have an | r | = 0.12 bivariate correlation with the independent variable and still contribute significantly to the model.
  3. A bivariate correlation is strongly related to the cosine of the angle between the deviation vectors for the x- and y-variables. Suppose that two variables are related, but the angle of incidence is obtuse. The relationship exists, but the cosine of the angle will be small. In social science and human subjects research, small correlations in the sample may in fact be accurately measuring relationships which exist but are oblique in nature. A physical science equivalent would be Boyle’s Law, with students trying to correlate pressure and volume in an environment where temperature is unknown and fluctuates wildly. A correlation study would likely point to a connection between pressure and volume, but the deviation vectors would not point in exactly the “correct” directions. A low correlation is not necessarily evidence the two are unrelated. Rather, it is often the case that the two variables are related but that another variable also influences the independent variable.

What is the issue with Interval vs. Ordinal Data?

Vigorous debate among researchers has developed about the use of ordinal data types in regression and t-test analyses. Interval data derives from variables like an IQ test where each interval measures an equivalent underlying change. Ordinal data derives from rankings as in a Likert scale opinion question where unit changes may not denote equivalent changes. We allow ordinal data to be used in these student projects for three main reasons:

  1. Researchers in social science settings routinely use ordinal scales for their research. As do Thorne and Giesen (pp. 17 – 18), “We will take the position that rating scales can cautiously be assumed to be intra-level measurement and recommend using common sense in making interpretations. This position seems most consistent with how data from rating scales are frequently treated in the research literature in psychology and other behavioral sciences.”
  2. The appropriate development of research constructs is an advanced research topic suitable for advanced statistics and research methods courses.
  3. Pedagogically, ordinal data types allow a broader range of research topics and generally enhance learning outcomes.

Not all of the rating scales our students have developed would be appropriate for research published in journals. However, when carefully worded and allowed a large enough range of answer choices, these ordinal scales often prove valuable and consistent.

These projects will introduce students to well-constructed rating scales from the research literature. However, we feel students ought to learn more `correct techniques for developing and analyzing these data types in future research methods courses. Because the allowable data types vary depending upon discipline, we feel wide latitude should be offered during these projects. The students will learn the basic concepts of survey development and analytic techniques and postpone deeper discussions for later courses.

   Reference: Thorne, M. and Giesen J. (2003). "Statistics for the Behavorial Sciences," 4th ed. McGraw Hill, Boston, MA.

Why do you not allow z-tests for the group comparison projects?

The group comparison projects include small sample sizes ( n ~ 25 ) generally inappropriate for z-test procedures. Even when we assume that we “know” population parameters, they are rough estimates at best. T-tests provide the most robust and accurate approach for virtually all of the studies our students conduct. Most real-world research uses a multivariate approach (ANOVA) which is derived from the t-test, so these procedures offer insight into the more advanced statistical topics students will encounter in later courses and when reading reports of research in journal articles or the popular media. We do cover z-tests in class and assess those topics with in-class assignments, tests and quizzes.

Why do the projects ignore by-hand calculations?

We do teach students various traditional statistics methods. There is value in understanding how a statistic is computed. However, regression studies particularly require large data sets ( n > 100 ) which makes by-hand calculations extremely time-consuming. We prefer to focus on a technological approach with emphasis placed on analysis of the findings. This allows students to interact with larger data sets more similar to the ones used in real-world research settings. We generally use in-class assignments, tests and quizzes to assess computational understanding. The projects help develop analytical understanding and an appreciation of real-world value.

How do you help students develop a good research topic?

The key is to help them find three numeric variables that are related. We generally ask the students what they wish to study. Often we hear responses like, “We want to compare men and women drivers.” This would require a t-test, so we point that out. Usually, they want to study driving habits. For example, one of the boys in the group has been teasing one of the girls about putting on makeup in the car. Once we know that, we suggest they might study “dangerous driver activities.” They can start to think about variables like cell phone usage, eating, and listening to loud music while driving, along with getting information about how often the person speeds and how many speeding tickets they’ve gotten lately. Generally, after throwing several ideas at them, they are able to select some variables and generate a coherent topic statement for their research.

How do you verify the surveys are good enough?

We suggest instructors commit to three check-in points to verify quality surveys: study idea, survey rough draft and survey final draft, with comments and possibly a grade for each. The topic the team settles upon is generally better when they have brainstormed at least three different ideas. We collect a draft survey and provide extensive feedback, in writing. I (Robb) work with teams in class and assign a teammate to email their rough draft by 5 PM. I reply within the email prior to the next class. I (Dianna) provide written feedback on the hard copy they turn in. Before they go on to collect data, we both require that they submit their final draft for approval, including a student IRB form. The final drafts are quickly approved, and I (Robb) do it in 10 minutes at the end of a class period. As you would guess, about 20% of the final drafts still need work. I offer my suggestions and require them to email their update survey to me (and receive my response) before collecting data. High quality surveys generally produce analyzable data sets.

How much time is needed to brainstorm a topic and draft a survey?

Surprisingly, many teams find selecting an appropriate topic the most difficult part of the project. We expect a coherent topic within about a week after teams are formed. About 20% of teams have difficulty doing so, and we work with them after class or during office hours to assist. Once a topic has been chosen, drafting a survey is not difficult. But good surveys have been redrafted and edited several times, so you will want to build plenty of time (at least two weeks) into your survey development phase. The more coherent the research idea, the better the quality of the surveys. So feel free to allow students to think and churn through some ideas.

What major problems do students have drafting a survey?

The major issues are poorly worded questions, too frequent use of open-response answer formats, and inadequate closed response answer choices. These issues are dealt with extensively in the Student Guide. This sample survey is similar to what most of our top performing teams typically end up with. Students routinely confuse demographics variables and study variables, but “age” for example might truly be both. We suggest using closed response answer choices as much as possible for ease of use and to avoid “junk” responses. The poor range of answer choices generally derives from a failure to consider outliers on variables like “hours of sleep” or “hours spent studying.”

Can individuals complete a regression project?

Not really. This is difficult mainly because of the size of the data set ( n = 100 ) and the number of variables. I (Robb) opt for 2-person teams when the class roster is not divisible by 3, but the task can be a bit daunting. I (Dianna) opt for 4-person teams when the class roster is not divisible by 3. We both agree that a single person cannot adequately handle the work load to generate a large data set to be meaningful.

If two person teams have been assigned, but one person drops the course, we typically reassign the individual to another team. This is possible up until data collection has occurred. If data collection is complete, we generally ask the student analyze the data and then turn in either a written report or do an oral presentation. Every student offered this option has chosen the oral presentation. They were told that, if the presentation bombed, they would be able to correct their mistakes on a written report to earn a better grade. Each student offered this choice simply visited office hours enough prior the presentation to insure a high quality product. These aren't great solutions, but it has happened.

If a three-person team loses a member, we simply expect them to complete the project as assigned. I (Robb) offer 2-person teams extra credit for working "short" one member. The extra credit is simply consideration at the end of the semester if they're on the borderline between letter grades.

How does this extensive project time line fit with course topics?

The project teams can be assigned and brainstorming can begin before regression is covered in class. Typical textbook examples introduced in class do help somewhat but are often very different from the types of data they will end up collecting themselves. By the time they proceed to entering their data into a spreadsheet, all relevant regression topics need to have been fairly well covered. Generally the project presentations become a review of regression topics and occur after the test or midterm on which the regression topics are tested. Students often comment that, “I wish had known this (about regression) when we took that test two weeks ago!” As unfortunate as this might seem, the projects reinforce the class-taught concepts and help students prepare for the exam. The t-test project is shorter, and its major learning outcomes are much more synchronized with the test on that material.

Why analyze both R2 and r?

The correlation coefficient is a quick overview of the strength and direction of the relationship between two variables. A more detailed analysis use the slope coefficient to describe how the predictor variable causes changes in the dependent variable. The coefficient of determination describes the percent of variance in the dependent variable that is accounted for by the predictor. The reason we capitalize R2 relates to multiple regression where we use several independent variables in a prediction equation. Then R2 tells us the overall predictive value of regression model which can include dozens of independent variables. We tend to focus on R2 and treat the correlation as less important, since most regression studies reported in the research literature and the popular media are referring to a multiple regression approach.

Why analyze the slope of the regression line?

Most media accounts from regression studies quote the slope analysis: “For every additional five thousand cars on an interstate during rush hour, average speed drops by 2.4 mph.” These statements generally arise from the researcher’s analysis of the “unstandardized betas” which are estimates of the “slope” coefficients. The prediction equation for a multivariate regression has the following form:

Multiple Regression Equation

where k is the number of significant predictors. Standardized betas generally estimate the relative strength of each predictor leading to analysis statements like this one: “The three primary risk factors for predicting adult onset diabetes are…” Bivariate correlations are not useful for this purpose since the interaction of the predictors in a given model can be extremely complex. The betas give information about the predictors that is specific to the model.

Performing a slope analysis allows students to recognize statements from media reports that come from regression studies. Generally speaking, the “slope analysis” tells us how the variables are related in terms of their individual units. The coefficient of determination ( R2 ) tells the strength of that relationship.

What are lurking variables?

Our students often find it odd that two variables can be correlated, yet not be related to one another at all. The “Pirates vs. Global Warming” correlation farce is one humorous example. Also, the “number of Baptist churches vs. number of bars in a given zip code” shows strong correlation. The lurking variable, most students guess after a minute or so of thought, is population. Lots of establishments correlate with population: grocery stores, gas stations, movie theaters, and so on.

The story is not so clear in social science research and, especially, the behavioral sciences. Which variables “lurk” or, in research nomenclature, “mediate” the effects of other variables is often unclear without extensive studies and sophisticated statistical analysis. Generally, we allow our students to study any two variables that would appear to be related as long as no obvious lurking variable is present. Since we suggest 3 variables for the regression project, we can often ask them to “test” a lurking variable or another related variable to see which is more highly correlated with the dependent variable.

Why do students struggle with causation when they analyze correlation?

Correlation does not imply causation. Worse, correlation doesn’t imply any relationship between the variables at all (see lurking variables). Unfortunately, regression model implicitly presume causality. We use terms like dependent variable. The coefficient of determination ( R2 ) measures “percent of variance in the x-variable accounted for by the y-variable.” Analysis of the slope of the regression line leads to statements such as “for every unit change in the x-variable, the y-variable decreases by half a unit.”

Having a correct choice for the x-variable is important. Our students must understand that when comparing “hours studied” vs. GPA, one must predict the other. Presumably, hours studied would predict GPA and thus be the independent variable. In our quest to have linear regression models make sense, we must think in terms of causation. In order to analyze our results, we use terms that imply causation. It should not be surprising to find students fail to understand.

Finally, students are intrigued by the causality. They like to think in terms of GPA points “lost” per serving of alcohol.

What is the IRB?

All universities have an Internal Review Board to oversee human subjects research. Human subjects research cannot be conducted on a university campus without IRB approval. Our experience at NGCSU is this: students may conduct research on campus with volunteers under the direction of a faculty member without submitting formal IRB requests to the committee. It is important to verify the guidelines at your institution. Typically, undergraduate researchers have their own abbreviated form to fill out and face a low level of scrutiny for their projects.

We suggest not allowing certain research topics (rape, incest, drug abuse, suicide, and so on) because these research topics generally subject researchers to more intense IRB scrutiny. These topics are more likely to upset participants and lead to complaints. We recommend only using college students who are 18 years of age or older. Research with minors generally requires parental consent, as well as participant consent.

IRB approval is simple when the research is about mundane topics and when “protected” classes are not being studied. Protected classes include minors, pregnant women, those with disabilities and so forth. Researchers can study issues like rape and have subjects who are pregnant. The IRB process verifies that all rights of all participants are protected, that privacy is insured, that no harmful effects are anticipated, and that proper notification of all the participants’ rights are discussed with them, including their voluntary withdrawal from the research study at any time. Also, each participant must sign a consent form which the IRB must screen in advance. Typical consent forms discuss the aforementioned issues and rights.

Because students need to do basic research to learn how to research and evaluate research findings, most universities have a stream-lined process for facilitating undergraduate research on campus. Usually a quick consultation of the university’s IRB web page will quickly reveal the relevant information for professor-guided student research.

Do teams have to include similar majors?

This research presumes the Regression Project teams will have similar majors and/or career interests. We allow student-selected teams for the Group Comparison project. Of course, not every class will break down easily into 3-person teams with similar majors and/or career interests. We generally can get about 90% of teams in some reasonable combination of similar majors. For the others, we ask that they focus on one of the team member’s major or career interest.

What if the Group “Melts Down”?

In a regular but (thankfully) rare pattern, teams implode. They begin bickering, visiting office hours to complain about each and generally acting like four-year olds. Here are some suggestions for prevention and, failing that, correction.

  • Prevention. I (Robb) specifically address on the first day of class that they will be working in professor-assigned teams on projects for which the entire team earns the same grade. If they do not like the approach, I tell them, they can enroll in stats class across the hall. When the teams are assigned, I give them time to form up, exchange email addresses and cell phone numbers. I also have each group write down and turn in two hour-long periods during the week when they could meet outside of class. I routinely give 5 – 15 minutes of “team time” at the end of class during the regression project when they can coordinate their efforts. I also repeatedly remind them to work professionally at setting and meeting team deadlines. I also have a survey they complete about their team which gathers data about any problems they experienced. Typically, with 3-person teams, I can figure out from their combined responses what the problem was. As an aside, I will simply say that rarely is the “melt down” caused by a single person.
  • Correction. Despite all efforts, some teams melt down and start the dreaded infighting that kills a project. If I can “assign” blame, I have the person(s) at fault complete the final project alone, and assign them a “double” project, requiring that they collect and analyze more data for their t-tests than other individuals ( n > 50 instead of n > 25 , for example). I find this easier than splitting up the project grade and assigning different credit to different team members. When talking one-on-one with student I’m requiring this of, I generally find that extenuating circumstances like illness, a deteriorating work situation or other event caused much of the problem. I tell them that, if they do a commendable job on the final project, I will allow them to keep their “group” grade from the team project. If they turn in a miserable final project, I deduct some pre-decided amount from the earlier grade. I make these agreements after the team project is complete, generally advising team members who complain about teammates to “work it out amongst themselves.”

What’s wrong with multiple t-tests?

The overall Type I error rate rapidly grows out of control when too many statistical tests are conducted. Suppose we are comparing 4 subpopulations. We could use ANOVA and a post hoc procedure, or we could conduct all 6 possible pairwise independent samples t-tests. If each of the 6 t-tests had an alpha = .1 level of significance, then:

   Overall alpha = 1 – Probability of Making No Type I Errors = 1 – (1 – alpha )^6

So the overall alpha is about 0.47, a level of significance that is quite meaningless statistically speaking. Using multiple tests requires the researcher to set alpha lower for each individual test, which makes ferreting out real differences more difficult.

There is another error that is slightly deeper. Differences between 4 groups can be more complex than just pairwise. For example, two of the groups might be virtually identical while the remaining two are significantly different. ANOVA is the correct test and has a wide range of post hoc procedure to ferret out the correct differences while simultaneously keeping the overall level of significance reasonable.

What do you mean by conceptual problems with ANOVA post hoc procedures?

Differences between several groups can be more complex than just pairwise. This means it is possible (and not uncommon) to find an overall indication of significant differences (p-value from ANOVA less than alpha) while simultaneously not finding significant pairwise differences with the Tukey HSD, a common pairwise post hoc procure. This is not problematic for researchers, but conceptually it really bothers students. The difficulty is that the groups might in fact have a more complex relationship than simple pairwise differences can demonstrate. For example, two of the groups might be virtually identical while the remaining two are significantly different.

The problem is generally solved by using a different post hoc procedure. But different procedures have differing amounts of conservative or liberal bias. Conservative tests find significance too rarely, while liberal tests find significance too often. In sum, post hoc procedures are complicated conceptually, and detailed discussions of them is not generally appropriate in introductory statistics courses.

This does not mean that we never talk about ANOVA and post hoc procedures. I (Robb) often ask teams of talented students to perform ANOVA and show them how to calculate Tukey’s HSD value. This is easily done if (a) the sample sizes are identical and (b) they have the MSW or “mean sum of squares within” value from their graphing calculators. I find that a half hour of direct instruction during office hours is often a very valuable way for a small group to learn some valuable statistics. I am more likely to require this of social science majors like psychology students or sociology students who will, in future courses, need to understand ANOVA.

HSD stands for “honestly significant difference.” For equal group sizes, we compute the HSD value. Any pair of sample means with a difference greater than HSD is considered significant. We compute HSD by looking up the studentized range statistic ( q ) in a table based on the common group size ( n ) and degrees of freedom. We use N to denote the overall sample size and k to denote the number of groups. Then degrees of freedom between groups is k – 1, and degrees of freedom within groups is N – k. Once we find q, then

   HSD = q * sqrt ( MSW / n )

The process is not as difficult as this description makes it sound, and most q-tables are descriptive enough that you can find the value without an in-depth knowledge of degrees of freedom or even understanding the differences between “within” group variation and “between” group variation.






Appendix


The Research Literature and Preliminary Results

Research Literature

Statistics educators have repeatedly suggested improvements, especially ones that focus on implementing the scientific method utilizing authentic statistical experiences, but these calls for improvement have not been widely heeded (Bryce, 2005). When best-practice pedagogies have been implemented in statistics courses, the results have been positive for achievement and for improved attitudes toward statistics. There is a strong indication that apprentice learning, a modality wherein students complete real-world mathematics in authentic settings, develops better conceptual understanding as well as better transference of knowledge to non-mathematical and non-school settings (Boaler, 1998). Researchers found statistics courses based on more constructivist models improved student attitudes toward statistics and that personal relevance is important for successful learning in statistics (Mvududu, 2003). A researcher used case-study methodology to evaluate a real-world, project-based approach to learning statistics and found that students learned more from the project than from any other instructional component of the course. The researcher further reported improvements in student motivation (Yesilcay, 2000).

Evidence connects positive student outcomes to attitudes toward mathematics and motivation for learning mathematics. Researchers performing a meta-analysis of 113 mathematics education studies found a significant influence of attitude toward mathematics upon achievement in mathematics (Ma & Kishor, 1997). Results from an exploratory study using an activity theory model based on Vygotsky’s work suggests instructors in statistics courses would do well to consider variables from the affective domain as an integral — not peripheral — part of the statistical learning process (Gordon, 1995). Additionally, interest in mathematics is an increasingly important factor in course selection as students finish secondary school and move into college (Köller, Baumert, & Schnabel, 2001).

A review of the literature on using computer simulations in statistics courses found little current evidence that simulations improve student outcomes (Mills, 2002). Mathematicians may be the most abstract thinkers of all STEM disciplines (Science, Technology, Engineering and Mathematics) and are likely entranced by thought experiments or simulations, but students in liberal arts service courses often prefer more concrete activities. Hence, the co-PI’s offer the following definition: authentic statistics activities indicate throughout this document “the collection of real data pursuant to a student-developed hypothesis and connected to a prior interest.”

Middleton and Spanias (1999) conducted a review of the literature surrounding motivation to learn mathematics. They reported that careful design of instruction can strongly influence student motivation for mathematics achievement which increases the likelihood students will choose to take future mathematics courses. They provide a beautiful summation of the goals of the proposed initiative: “Students must understand that the mathematics instruction they receive is useful, both in immediate terms and in preparing them to learn more in the fields of mathematics and in areas in which mathematics can be applied (e.g., physics, business, etc.). Use of ill-structured, real-life problem situations in which the use of mathematics facilitates uncovering important and interesting knowledge promotes this understanding” (p.81).

References

Boaler, J. (1998). Open and closed mathematics: Student experiences and understandings. Journal for Research in Mathematics Education, 29(1), 41–62. URL: http://my.nctm.org/eresources/article_summary.asp?URI=JRME1998-01-41a&from=B

Bryce, G. R. (2005). Developing tomorrow's statistician. Journal of Statistics Education, 13(1). URL: http://www.amstat.org/publications/jse/v13n1/bryce.html

Gordon, S. (1995). A theoretical approach to understanding learners of statistics. Journal of Statistics Education, 3(3). URL: http://www.amstat.org/publications/jse/v3n3/gordon.html

Köller, O., Baumert, J., & Schnabel, K. (2001). Does interest matter? The relationship between academic interest and achievement in mathematics. Journal for Research in Mathematics Education, 32(5), 448-470. URL: http://my.nctm.org/eresources/article_summary.asp?URI=JRME2001-11-448a&from=B

Ma, X., & Kishor, N. (1997). Assessing the relationship between attitude toward mathematics and achievement in mathematics: A meta-analysis. Journal for Research in Mathematics Education, 28(1), 26-47. URL: http://my.nctm.org/eresources/article_summary.asp?URI=JRME1997-01-26a&from=B

Middleton, J. A., & Spanias, P. A. (1999). Motivation for achievement in mathematics: Findings, generalizations, and criticisms of the research. Journal for Research in Mathematics Education, 30(1), 65–88. URL: http://my.nctm.org/eresources/article_summary.asp?URI=JRME1999-01-65a&from=B

Mills, J. D. (2002). Using computer simulation methods to teach statistics: A review of the literature. Journal of Statistics Education, 10(1). URL: http://www.amstat.org/publications/jse/v10n1/mills.html

Mvududu, N. (2003). A cross-cultural study of the connection between students' attitudes toward statistics and the use of constructivist strategies in the course. Journal of Statistics Education, 11(3). URL: http://www.amstat.org/publications/jse/v11n3/mvududu.html

Yesilcay, Y. (2000). Research project in statistics: Implications of a case study for the undergraduate statistics curriculum. Journal of Statistics Education, 8(2). URL: http://www.amstat.org/publications/jse/secure/v8n2/yesilcay.cfm


Preliminary Results

A key objective for this model of instruction is to improve affective outcomes. In a treatment-control design, a six-question survey of attitudes toward the usefulness of statistics in real-world settings and in one’s future career was compared Sinn’s and Spence’s sections to a control group at NGCSU. The Cronbach alpha coefficient obtained for the instrument was 0.85 confirming its consistency and reliability. An independent samples t-test was significant at the 0.10 level and offers preliminary evidence that students who are exposed to this instructional model benefit by connecting course concepts to real-world ideas and their future careers.

N Mean SD
Treatment Group88 4.38 0.99
Control Group53 4.13 0.88

A qualitative analysis was conducted on the text of the final project write-ups in Sinn’s section (n = 33). The final paragraph of each student’s project report was a reflection guided by the two questions “What did you learn from this class?” and “What did you learn from doing this project?” Each student’s paragraph was coded by the research team according to the following scale:

Categories IV III II I
Attributes Used for Coding Explicit evidence of both real-world and career-specific connections Explicit evidence of either real-world or career-specific connections Implicit evidence of either real-world or career-specific connections No evidence of any real-world or career-specific connections
Number of Students 2 15 5 11

Students in Categories III and IV gave specific evidence of how they had used statistics already. For example, one student wrote that “the concepts taught in class I actually feel I will be able to take with me in the real world” and mentioned having already used course knowledge in another class. One student suggested further interest: “I will probably take another statistics class because I do think it will help me out with real life situations.” Two students mentioned how the projects enhance learning: “From this class we have learned that we can apply stats to everyday life to prove or disprove common theories.” Students repeatedly mentioned the real-world value of statistics, for example: “I like this class better than other math courses because it is more relevant to real life situations. This project is the perfect example of relating course material to real life.”

Half the students provided evidence they realized the important contributions statistical analysis makes to real-world knowledge. The results of both the survey and the text analysis indicate the benefits one would expect from the research literature are likely to be realized during this project.




Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

This page is not a publication of North Georgia College & State University, and NGCSU has not edited or examined the content.

The author of the page is solely responsible for the content.