Lab 9 Parametric Hypothesis Testing

9.1 Introduction

Researchers should always begin a project with a hypothesis and then gather data to see if the hypothesis supports an underlying theory in a process commonly called the scientific method. Continuous data gathered as part of the research project are analyzed using parametric techniques and two of the most commonly-used tests are described in this lab. This lab starts with a discussion of “hypothesis” since that is key to both parametric and nonparametric testing.


9.2 Hypothesis

A hypothesis is an attempted explanation for some observation and is often used as a starting point for further investigation. For example, imagine that a physician notices that babies born of women who smoke seem to weigh less than for women who do not smoke. That could lead to a hypothesis like “smoking during pregnancy is linked to lighter birth-weights.” As another example, imagine that a restaurant owner notices that tipping seems to be higher on weekends than through the week. That might lead to a hypothesis that “the size of tips is higher on weekends than weekdays.” After creating a hypothesis, a researcher would gather data and then statistically analyze those data to determine if the hypothesis is valid. Additional investigation may be needed to explain why that observation is true. In a research project, there are usually two related competing hypotheses: the Null Hypothesis and the Alternate Hypothesis.

  • Null Hypothesis (abbreviated H0). This is sometimes described as the “skeptical” view; that is, the explanation for some observed phenomena was mistaken. For example, the null hypothesis for the smoking mother observation mentioned above would be “smoking has no effect on a baby’s weight” and for the tipping observation would be “there is no difference in tipping on the weekend.”
  • Alternate Hypothesis (abbreviated Ha). This is the hypothesis that is being suggested as an explanation for the observed phenomenon. In the case of the smoking mothers mentioned, above the alternative hypothesis would be that smoking causes a decrease in birth weight. This is called the “alternate” because it is different from the status quo which is encapsulated in the null hypothesis.

One commonly-used example of the difference between the null and alternate hypothesis comes from the trial court system. When a jury deliberates about the guilt of a defendant they start from a position of “innocent until proven guilty,” which would be the null hypothesis. The prosecutor is asking the jury to accept the alternate hypothesis, or “the defendant committed the crime.”

For the most part, researchers will never conclude that the alternate hypothesis is true. There are always confounding variables that are not considered but could be the cause of some observation. For example, in the smoking mothers example mentioned above, even if the evidence indicates that babies born to smokers weigh less the researcher could not state conclusively that smoking caused that observation. Perhaps non-smoking mothers had better health care, perhaps they had better diets, perhaps they exercised more, or any of a number of other reasonable explanations not related to smoking. For that reason, the result of a research project is normally reported with one of two phrases:

  • The null hypothesis is rejected. If the evidence indicates that there is a significant difference between the status quo and whatever was observed then the null hypothesis would be rejected. For the “tipping” example above, if the researcher found a significant difference in the amount of money tipped on weekends compared to weekdays then the null hypothesis (that is, tipping is the same on weekdays and weekends) would be rejected.

  • The null hypothesis cannot be rejected. If the evidence indicates that there is no significant difference between the status quo and whatever was observed then the researcher would report that the null hypothesis could not be rejected. For example, if there was no significant difference in the birth weights of babies born to smokers and non-smokers then the researcher failed to reject the null hypothesis.

Often, a research hypothesis is based on a prediction rather than an observation and that hypothesis can be tested. Imagine a hypothesis like “walking one mile a day for one month decreases blood pressure.” A researcher could test this by measuring the blood pressure of a group of volunteers, have them walk a mile every day for a month, and then measure their blood pressure at the end of the experiment to see if there was any significant difference.


9.3 ANOVA

An Analysis of Variance (ANOVA) is used to analyze the difference in more than two groups of samples that are normally distributed. Notice that an ANOVA is used when there are more than two groups being analyzed, which will distinguish between an ANOVA and a t-test (which is covered in the next section). For example, imagine that a professor was testing a hypothesis that tutoring improves students’ grades. A class is split into three groups such that one group was not required to attend tutoring, a second group was required to attending tutoring once a week, and a third group was required to attend tutoring two or more times a week. The null hypothesis (H0) is “The amount of tutoring does not significantly change students’ scores on the final exam.” The alternate hypothesis (Ha) is “More frequent tutoring significantly changes students’ scores on the final test.” After the final exam was graded, an ANOVA could be administered and if that showed the test scores for those three groups of students had a significant difference then the null hypothesis would be rejected in favor of the alternate hypothesis.

9.3.1 Demonstration: ANOVA

The R ANOVA function requires the two variables being compared to be input as a linear model (lm) formula in the form of y ~ x, where y is the dependent variable (measured outcomes) and x is the independent variable (the groups used to divide the measured outcomes). Also the data source is specified with a data = parameter. In the case of the professor’s Tutoring Efficacy hypothesis mentioned in the previous paragraph, the students’ final exam score would be the dependent variable, the measured outcome, while the amount of tutoring, the three tutoring groups, would be the independent variable.

The following one-line script generates an ANOVA from the morley data frame. The speed of light was measured twenty times in each of five different experiments. To see if there is any significant difference in the groups of experiments an ANOVA would be calculated where the Speed is the measured outcome and the Expt are the experiment groups.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJcbiMgQU5PVkFcbmFub3ZhKGxtKFNwZWVkIH4gRXhwdCwgZGF0YSA9IG1vcmxleSkpXG4ifQ==

The ANOVA function returns a lot of information, most of which is beyond the scope of this lab:

  > # ANOVA
  > anova(lm(Speed ~ Expt, data = morley))
  Analysis of Variance Table

  Response: Speed
            Df Sum Sq Mean Sq F value    Pr(>F)
  Expt       1  72581   72581  13.041 0.0004827 ***
  Residuals 98 545444    5566
  ---
  Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

While an ANOVA function returns information that would be useful in a more thorough statistical analysis, this lab is only concerned with the p-value, 0.0004827, which is labeled Pr(>F) and is found near the end of Line 6. Following that p-value, R helpfully prints a code to aid in determining the significance of the result, three asterisks in this case. The last line in the results then lists the meaning of the various codes used. P-values that fall between 0 and 0.001 are marked with three asterisks, as in this case, so it is significant at the 0.1% level (0.001), the greatest possible level. (Note: the interpretation of the p-value was discussed in Correlation and Regression)

9.3.2 Guided Practice: ANOVA

Using the npk data frame, calculate an ANOVA for the yield output for when grouped by block.

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IlxuIyBObyBwcmUtZXhlcmNpc2UgY29kZSBmb3IgdGhpcyBleGVyY2lzZVxuIiwic2FtcGxlIjoiIyBDYWxjdWxhdGUgdGhlIEFOT1ZBIGZvciB5aWVsZCB3aGVuIGdyb3VwZWQgYnkgYmxvY2sgaW4gdGhlIG5wayBkYXRhIGZyYW1lLlxuXG4iLCJzb2x1dGlvbiI6IlxuYW5vdmEobG0oeWllbGQgfiBibG9jaywgZGF0YSA9IG5waykpXG4iLCJzY3QiOiJcbmV4MTFuZXEgPC0gXCJSZW1lbWJlciB0aGF0ICd5aWVsZCcgaXMgdGhlIGRlcGVuZGVudCB2YXJpYWJsZSAoY29tZXMgZmlyc3QpIGFuZCAnYmxvY2snIGlzIHRoZSBpbmRlcGVuZGVudCB2YXJpYWJsZSAoY29tZXMgc2Vjb25kKS5cIlxuZXgxMm5lcSA8LSBcIlJlbWVtYmVyIHRoYXQgdGhlIGRhdGEgZnJhbWUgbXVzdCBiZSBzcGVjaWZpZWQgYXMgJ25waycuXCJcbmV4MTNuZXEgPC0gXCJDaGVjayB0aGUgbG0gZnVuY3Rpb24gY2FyZWZ1bGx5LCB5b3UgaGF2ZSBzb21ldGhpbmcgb3V0IG9mIHBsYWNlLlwiXG5cbnN0YXRlMSA8LSBleCgpICU+JSBjaGVja19mdW5jdGlvbihcImxtXCIpXG5zdGF0ZTEgJT4lXG4gIGNoZWNrX2FyZyhcImZvcm11bGFcIikgJT4lXG4gIGNoZWNrX2VxdWFsKGluY29ycmVjdF9tc2cgPSBleDExbmVxKVxuXG5zdGF0ZTEgJT4lXG4gIGNoZWNrX2FyZyhcImRhdGFcIikgJT4lXG4gIGNoZWNrX2VxdWFsKGluY29ycmVjdF9tc2cgPSBleDEybmVxKVxuXG5leCgpICU+JVxuICBjaGVja19mdW5jdGlvbihcImFub3ZhXCIpICU+JVxuICBjaGVja19yZXN1bHQoKSAlPiVcbiAgY2hlY2tfZXF1YWwoaW5jb3JyZWN0X21zZyA9IGV4MTFuZXEpXG5cbnN1Y2Nlc3NfbXNnKFwiUGVyZmVjdCEgQW4gQU5PVkEgaXMgb25lIG9mIHRoZSBtb3N0IGNvbW1vbmx5LXVzZWQgcGFyYW1ldHJpYyB0ZXN0cyBhbmQgeW91IGNhbiBub3cgY29uZHVjdCBvbmUgZm9yIHlvdXIgb3duIHJlc2VhcmNoIHByb2plY3RzLlwiKVxuIn0=

9.3.3 Activity: ANOVA

Using the cafe data, conduct an ANOVA and report the p-value for the following variables. Note: in the document submitted for this lab, the Activity should have a simple listing, something like illustrated here. (Notes: these are not the correct answers to the listed tests. To indicate tiny values use scientific notation in a form like 1.6e-05 since that is easier to type.) Record the p-values in the deliverable document for this lab.

  1. 6.352e-09
  2. 0.0059
  3. 4.028e-06
  4. 0.3275

Here are the variables to test:

Table 9.1: ANOVA Lab
Num y x
1 Length Day
2 Miles Party Size (ptysize)
3 Age Food
4 Tip Service (svc)
eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IlxuY2FmZSA8LSByZWFkLmNzdignaHR0cHM6Ly9sYWJzLmJhc3YzMTYuY29tL2NhZmUuY3N2JylcbiIsInNhbXBsZSI6IiMgVXNpbmcgdGhlIGNhZmUgZGF0YSBmcmFtZSwgY29uZHVjdCBhbiBBTk9WQSBvbiB0aGUgZm9sbG93aW5nIHZhcmlhYmxlcy5cblxuIyBMZW5ndGggdnMuIERheVxuXG4jIE1pbGVzIHZzLiBQYXJ0eSBTaXplIChwdHlzaXplKVxuXG4jIEFnZSB2cy4gRm9vZFxuXG4jIFRpcCB2cy4gU2VydmljZSAoc3ZjKVxuIn0=

9.4 T-test

A t-test is used to analyze the difference in two groups of samples that are normally distributed. Compare this with ANOVA (presented in the previous section) where more than two groups are compared. As an example, imagine that the spending habits of two similar groups of people are compared; do the residents of Tucson spend more on dining out than the residents of Phoenix? The null hypothesis (H0) is “People in Tucson and Phoenix spend the same amount of money when dining out.” The alternate hypothesis (Ha) is “People in Tucson and Phoenix spend different amounts of money when dining out.” Imagine that the dining bills of 100 people from both cities were recorded and it was discovered that the mean bill in Phoenix is $15.13 and in Tucson is $12.47. If a t-test determines that there was a significant difference in those two numbers the null hypothesis would be rejected.

9.4.1 Demonstration: T-test

A t.test requires the two variables being compared to be input as a formula in the form of y ~ x, where y is the dependent variable (measured outcomes) and x is the independent variable (the groups used to divide the measured outcomes). Also the data source is specified with a data = parameter. In the case of the Dining Spending hypothesis mentioned in the previous paragraph, the size of the dining bills would be the dependent variable, the measured outcome, while the groups of diners, the two cities, would be the independent variable.

The following one-line script generates a t-test from the sleep data frame. In this experiment, two soporific drugs (increases the hours of sleep) were given in two different strengths to twenty patients. To see if there is any significant difference in the two groups of patients a t-test would be calculated where the extra sleep is the measured outcome and the group is the strength of the drug given.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJcbiMgVC1UZXN0IChJbmRlcGVuZGVudClcbnQudGVzdChleHRyYSB+IGdyb3VwLCBkYXRhID0gc2xlZXApXG4ifQ==

The t.test function returns a lot of information, most of which is beyond the scope of this lab:

    Welch Two Sample t-test

  data:  extra by group
  t = -1.8608, df = 17.776, p-value = 0.07939
  alternative hypothesis: true difference in means is not equal to 0
  95 percent confidence interval:
   -3.3654832  0.2054832
  sample estimates:
  mean in group 1 mean in group 2 
            0.75            2.33 

While a t.test function returns information that would be useful in a more thorough statistical analysis, this lab is only concerned with the p-value, 0.07939, which is found at the end of Line 3.

9.4.2 Guided Practice: T-test

Using the npk data frame, calculate a t-test for the yield output for when grouped by N.

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IlxuIyBObyBwcmUtZXhlcmNpc2UgY29kZSBmb3IgdGhpcyBleGVyY2lzZVxuIiwic2FtcGxlIjoiIyBDYWxjdWxhdGUgYSB0LXRlc3QgZm9yIHlpZWxkIHdoZW4gZ3JvdXBlZCBieSBOIGluIHRoZSBucGsgZGF0YSBmcmFtZS5cblxuIiwic29sdXRpb24iOiJcbnQudGVzdCh5aWVsZCB+IE4sIGRhdGEgPSBucGspXG4iLCJzY3QiOiJcbmV4MjFuZXEgPC0gXCJSZW1lbWJlciB0aGF0ICd5aWVsZCcgaXMgdGhlIGRlcGVuZGVudCB2YXJpYWJsZSAoY29tZXMgZmlyc3QpIGFuZCAnTicgaXMgdGhlIGluZGVwZW5kZW50IHZhcmlhYmxlIChjb21lcyBzZWNvbmQpLiBBbHNvLCAnTicgaXMgYSBjYXBpdGFsIGxldHRlci5cIlxuZXgyMm5lcSA8LSBcIlJlbWVtYmVyIHRoYXQgdGhlIGRhdGEgZnJhbWUgbXVzdCBiZSBzcGVjaWZpZWQgYXMgJ25waycuXCJcblxuc3RhdGUxIDwtIGV4KCkgJT4lIGNoZWNrX2Z1bmN0aW9uKFwidC50ZXN0XCIpXG5zdGF0ZTEgJT4lXG4gIGNoZWNrX2FyZyhcImZvcm11bGFcIikgJT4lXG4gIGNoZWNrX2VxdWFsKGluY29ycmVjdF9tc2cgPSBleDIxbmVxKVxuXG5zdGF0ZTEgJT4lXG4gIGNoZWNrX2FyZyhcImRhdGFcIikgJT4lXG4gIGNoZWNrX2VxdWFsKGluY29ycmVjdF9tc2cgPSBleDIybmVxKVxuXG5zdWNjZXNzX21zZyhcIlBlcmZlY3QhIEEgdC10ZXN0IGlzIGEgY29tbW9uIHBhcmFtZXRyaWMgdGVzdCBhbmQgeW91IGNhbiBub3cgY29uZHVjdCBvbmUgZm9yIHlvdXIgb3duIHJlc2VhcmNoIHByb2plY3RzLlwiKVxuIn0=

9.4.3 Activity: T-test

Using the cafe data, conduct a t.test and report the p-value for the following variables. Note: in the document submitted for this lab, the Activity should have a simple listing, something like illustrated here. (Notes: these are not the correct answers to the listed tests. To indicate tiny values use scientific notation in a form like 1.6e-05 since that is easier to type.) Record the p-values in the deliverable document for this lab.

  1. 6.352e-09
  2. 0.0059
  3. 4.028e-06
  4. 0.3275

Here are the variables to test:

Table 9.2: T.test Lab
Num y x
1 Miles Recommend (recmd)
2 Length Preference (pref)
3 Bill Recommend (recmd)
4 Tip Preference (pref)
eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IlxuY2FmZSA8LSByZWFkLmNzdignaHR0cHM6Ly9sYWJzLmJhc3YzMTYuY29tL2NhZmUuY3N2JylcbiIsInNhbXBsZSI6IiMgVXNpbmcgdGhlIGNhZmUgZGF0YSBmcmFtZSwgY29uZHVjdCBhIFQtdGVzdCBvbiB0aGUgZm9sbG93aW5nIHZhcmlhYmxlcy5cblxuIyBNaWxlcyB2cy4gUmVjb21tZW5kIChyZWNtZClcblxuIyBMZW5ndGggdnMuIFByZWZlcmVuY2UgKHByZWYpXG5cbiMgQmlsbCB2cy4gUmVjb21tZW5kIChyZWNtZClcblxuIyBUaXAgdnMuIFByZWZlcmVuY2UgKHByZWYpXG4ifQ==

9.5 Deliverable

Complete the activities in this lab and consolidate the responses into a single document. Name the document with your name and “Lab 9,” like “George Self Lab 9” and submit that document for grade.