Lecture 3

Hypothesis testing for a simple case

Andrew Pua

2024-09-14

Plan for these slides

Taking stock of what happened in Lecture 02
More examples for you to try
The language used in hypothesis testing
Moving on to understand all the pieces

When should you use the argument developed in Lecture 02?

There is an argument involved in determining whether the data supports “Harley cannot understand human gestures” and “Harley can understand human gestures”.
The argument you have seen is typically used for discovery: meaning figuring out if “something is out of the ordinary”.
It is not used to prove any specific claim. Why?
But it is not surprising that people think that they have proven something. So, there is huge potential for abuse.

The class of testing problems our example belongs to

The argument you have seen is called significance testing or hypothesis testing.
There are many procedures to conduct hypothesis testing depending on the hypothesis being tested and what kind of data was observed.
In our case, the technical phrase is “Inference for a Single Population Proportion”.

Ingredients of a hypothesis test

A model
A statement about the model which represents the status quo
A statement about the model which represents something out of the ordinary
A statistic which can be computed using data
A way to generate simulated datasets or at least imagine hypothetical datasets under the status quo
A way to decide which statement the data would favor

Your task

Lay out these ingredients for the examples so far:

Is the coin you are using fair?
Does Harley understand human gestures?

Definitions and connections

Definition 1 (Chihara and Hesterberg (2018, p. 48) Definition 3.1) The null hypothesis, denoted \(H_0\), is a statement that corresponds to no real eﬀect. This is the status quo, in the absence of the data providing convincing evidence to the contrary.

Definitions and connections

Definition 2 (Chihara and Hesterberg (2018, p. 48) Definition 3.1) The alternative hypothesis, denoted \(H_A\), is a statement that there is a real eﬀect. The data may provide convincing evidence that this hypothesis is true.

Definitions and connections

Definition 3 (Chihara and Hesterberg (2018, p. 48) Definition 3.1) A hypothesis should involve a statement about a population parameter or parameters, commonly referred to as \(\theta\); the null hypothesis is \(H_0\ ∶ \theta= \theta_0\) for some \(\theta_0\). A one-sided alternative hypothesis is of the form \(H_A\ ∶ \theta >\theta_0\) or \(H_A\ ∶ \theta< \theta_0\); a two-sided alternative hypothesis is \(H_A\ ∶ \theta \neq \theta_0\).

NOTE: You have to be explicit about what \(\theta\) means when you use hypothesis testing.

Definitions and connections

Definition 4 (Chihara and Hesterberg (2018, p. 49) Definition 3.2) A test statistic is a numerical function of the data whose value determines the result of the test. The function itself is generally denoted \(T =T(\mathbf{X})\) where \(\mathbf{X}\) represents the data. After being evaluated for the sample data \(\mathbf{x}\), the result is called an observed test statistic and is written in lowercase, \(t = T(\mathbf{x})\).

Definitions and connections

Definition 5 (Chihara and Hesterberg (2018, p. 49) Definition 3.3) The \(p\)-value is the probability that chance alone would produce a test statistic as extreme as the observed test statistic if the null hypothesis were true. For example, if large values of the test statistic are in the direction of the alternative hypothesis, the \(p\)-value is the probability \(\mathbb{P}(T \geq t)\) calculated under \(H_0\).

NOTE: The italicized parts are my modifications to the definition, hopefully to make things clearer.

Definitions and connections

Definition 6 (Chihara and Hesterberg (2018, p. 49) Definition 3.4) A result is statistically significant if it would rarely occur by chance.

NOTE: Statistically significant is a technical phrase. Clearly, it does not mean important!

Definitions and connections

Definition 7 (Chihara and Hesterberg (2018, p. 50) Definition 3.5) The null distribution is the distribution of the test statistic if the null hypothesis is true.

Complications from organ transplants (Diez, Barr, and Cetinkaya-Rundel)

People providing an organ for donation sometimes seek the help of a special medical consultant. These consultants assist the patient in all aspects of the surgery, with the goal of reducing the possibility of complications during the medical procedure and recovery. Patients might choose a consultant based in part on the historical complication rate of the consultant’s clients.

continuation

One consultant tried to attract patients by noting the historical complication rate for liver donor surgeries in the US is about 10%, but her clients have had only 3 complications in the 62 liver donor surgeries she has facilitated. She claims this is strong evidence that her work meaningfully contributes to reducing complications.

Your tasks

How is the example similar to what was discussed in Lecture 02?
What are the competing claims?
Write R code which will determine which claim would be favored by the data.

Reading auras

Start at 13:41. One of the ECONSTA students forwarded this video.

Repeat the tasks for this example.

Reading guide

No actual textbook reading is required, because the approach so far is not very traditional.
If you really want to read a textbook, Sections 3.1 and 3.2 of Chihara and Hesterberg (2018) would be an option.
The difference is that their setting is comparing two samples. But I think the underlying ideas are similar.
Think about what their data would look like compared to what we worked on in class.

Moving forward

We need to know more about

probabilities
distributions
why the hypothesis testing argument is a justifiable approach for discovery purposes

If you want to read ahead: Chapter 1 of Wasserman (2004), Chapters 2 and 3 of Dekking et al (2005), Chapters 1 and 2 of Arias-Castro, Chapter 1 of Evans and Rosenthal