Connections across Lectures 1 to 4
2024-10-07
Sample spaces and probabilities are idealizations. In our context, an idealization is something that arises in the long-run or in the population.
We are now going to study another idealization. You have already observed quantities in the real world that are subject to randomness.
Examples include:
Two students are selected at random (with replacement) from a college in which 60% of the students are male. Let \(X\) be the number of males in the sample.
Define \(M\) to be a label for male, and \(F\) for female.
Here we have \[\Omega=\{MM, MF, FM, FF\}.\]
Here \(X\left(MM\right)=2\). What happens to the other elements of \(\Omega\)?
Definition 1 (Wasserman (2004, p. 19) Definition 2.1) A random variable is a mapping \(X:\Omega\to \mathbb{R}\) that assigns a real number \(X(\omega)\) to each outcome \(\omega\).
A list of values of \(X\), together with the corresponding probabilities is called the distribution of \(X\).
In effect, we “distribute” 1 to the possible values that a random variable \(X\) can take.
For example, \[\mathbb{P}\left(X=2\right)= \mathbb{P}\left(\{MM\}\right)=0.6^2.\]
Now finish constructing the distribution of \(X\), where \(X\) is the number of males in the sample.
Given the distribution of \(X\), you can also find \(\mathbb{P}\left(X\leq 1\right)\). You can also generalize to \(\mathbb{P}\left(X\leq x\right)\) for any \(x\in\mathbb{R}\).
Definition 2 (Wasserman (2004, p. 20) Definition 2.2) The cumulative distribution function or cdf is the function \(F_X:\mathbb{R}\to [0,1]\) defined by \[F_X\left(x\right) = \mathbb{P}\left(X\leq x\right).\]
A fair coin is tossed independently for ten times.
Outcome \(\omega\) | Probability | \(X(\omega)\) |
---|---|---|
TTTTTTTTTT | \((0.5)^{10}\) | 0 |
TTTTTTTTTH | \((0.5)^9 (0.5)\) | 1 |
TTTTTTTTHT | \((0.5)^9 (0.5)\) | 1 |
\(\vdots\) | \(\vdots\) | \(\vdots\) |
HTTHHTHHH | \((0.5)^4 (0.5)^6\) | 6 |
\(\vdots\) | \(\vdots\) | \(\vdots\) |
HHHHHHHHHH | \((0.5)^{10}\) | 10 |
\[\mathbb{P}\left(X=k\right)=\begin{pmatrix}10 \\ k \end{pmatrix} (0.5)^k (0.5)^{10-k}, \ k=0,1,\ldots, 10\]
10 independent tosses of a coin
Parallels to dogs example: 10 independent trials involving Harley choosing the correct cup
Before we collect data:
We wanted to the test the null hypothesis that \(\theta=0.5\) against the alternative that \(\theta>0.5\).
We used simulation to compute a \(p\)-value which is given by \[\mathbb{P}_{\theta=0.5}\left(\overline{X}_{10} \geq 0.9 \right)\] or \[\mathbb{P}_{\theta=0.5}\left(10\overline{X}_{10} \geq 9 \right)=\mathbb{P}_{\theta=0.5}\left(X_1+X_2+\ldots+X_{10} \geq 9 \right).\]
What distribution springs to mind?
Can you now compute (by hand and using R) the exact \(p\)-value for Harley’s case?