Lecture 5d

Distributions involving more than one random variable

Andrew Pua

2024-11-04

Which is real, which is fake?

Sequence A: 0011100011/0010000100/0010001000/1000000001/0011001010

1100001111/1100110001/0101100100/1000100000/0011111001

Sequence B: 0100010100/1100010100/1110100110/0011110100/0111010001

1000110111/1000100101/1011011100/0110010001/0010000100

What is happening in the long run?

As a starting point, flip a fair coin 4 times. Define

\(Y\) to be the length of the longest run (run = a sequence of consecutive flips of the same type)
\(X\) to be the number of switches from heads to tails or tails to heads

What is the joint distribution of \((X, Y)\)? Create a table for this special case of 4 tosses. What happens when you have more than 4 tosses – say 100 tosses?

Visualization of the joint distribution of \((X, Y)\)

Simulating the joint distribution

Can we do a simulation to “recover” the joint distribution of \((X,Y)\)?

Algorithm:

Flip a coin 4 times. Record the result.
Compute \(X\) and \(Y\) for the result.
Repeat Steps 1 and 2 a large number of times, say 10000.
Create a frequency table for \(X\) and \(Y\).
Create a plot with \(X\) on the horizontal axis and \(Y\) on the vertical axis.

R code for simulation

nsim <- 10^4
a <- replicate(nsim, rbinom(4, 1, 0.5))
run.char <- function(x)
{
  temp <- rle(x)$lengths
  return(c(length(temp)-1, max(temp)))
}
runs <- apply(a, 2, run.char)
table(runs[1,], runs[2,])/nsim

   
         1      2      3      4
  0 0.0000 0.0000 0.0000 0.1238
  1 0.0000 0.1258 0.2539 0.0000
  2 0.0000 0.3757 0.0000 0.0000
  3 0.1208 0.0000 0.0000 0.0000

Plot of simulated joint distribution

plot(runs[1,], runs[2,], xlab="Number of switches", ylab="Length of longest run", cex.lab=1.5, cex.axis=1.5)

What if we have 100 flips?

nsim <- 10^4
a <- replicate(nsim, rbinom(100, 1, 0.5))
run.char <- function(x)
{
  temp <- rle(x)$lengths
  return(c(length(temp)-1, max(temp)))
}
runs <- apply(a, 2, run.char)
table(runs[1,], runs[2,])/nsim

    
          3      4      5      6      7      8      9     10     11     12
  31 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001
  33 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0001 0.0000
  34 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0000 0.0000 0.0001 0.0001
  35 0.0000 0.0000 0.0000 0.0002 0.0001 0.0003 0.0005 0.0001 0.0001 0.0001
  36 0.0000 0.0000 0.0001 0.0001 0.0001 0.0003 0.0006 0.0004 0.0006 0.0005
  37 0.0000 0.0000 0.0000 0.0002 0.0005 0.0008 0.0005 0.0005 0.0001 0.0001
  38 0.0000 0.0000 0.0001 0.0001 0.0011 0.0011 0.0006 0.0008 0.0005 0.0004
  39 0.0000 0.0000 0.0000 0.0011 0.0015 0.0023 0.0024 0.0011 0.0008 0.0004
  40 0.0000 0.0000 0.0001 0.0016 0.0023 0.0029 0.0028 0.0019 0.0007 0.0002
  41 0.0000 0.0000 0.0001 0.0024 0.0042 0.0042 0.0035 0.0013 0.0012 0.0004
  42 0.0000 0.0000 0.0008 0.0039 0.0064 0.0062 0.0030 0.0023 0.0017 0.0006
  43 0.0000 0.0000 0.0012 0.0060 0.0072 0.0089 0.0056 0.0030 0.0020 0.0010
  44 0.0000 0.0000 0.0020 0.0073 0.0119 0.0105 0.0064 0.0026 0.0023 0.0007
  45 0.0000 0.0001 0.0034 0.0110 0.0135 0.0105 0.0069 0.0039 0.0020 0.0010
  46 0.0000 0.0003 0.0050 0.0160 0.0159 0.0114 0.0075 0.0039 0.0018 0.0008
  47 0.0000 0.0005 0.0061 0.0203 0.0180 0.0129 0.0070 0.0044 0.0016 0.0010
  48 0.0000 0.0003 0.0101 0.0208 0.0245 0.0137 0.0084 0.0026 0.0013 0.0010
  49 0.0000 0.0009 0.0103 0.0227 0.0194 0.0126 0.0052 0.0029 0.0016 0.0009
  50 0.0000 0.0008 0.0122 0.0250 0.0175 0.0113 0.0059 0.0030 0.0011 0.0002
  51 0.0000 0.0014 0.0110 0.0260 0.0187 0.0089 0.0052 0.0021 0.0006 0.0004
  52 0.0000 0.0011 0.0151 0.0211 0.0152 0.0079 0.0034 0.0013 0.0009 0.0002
  53 0.0000 0.0018 0.0157 0.0216 0.0117 0.0065 0.0023 0.0009 0.0007 0.0002
  54 0.0000 0.0024 0.0129 0.0171 0.0101 0.0046 0.0020 0.0005 0.0002 0.0003
  55 0.0001 0.0024 0.0114 0.0127 0.0089 0.0036 0.0014 0.0003 0.0003 0.0001
  56 0.0000 0.0027 0.0109 0.0122 0.0057 0.0020 0.0007 0.0005 0.0001 0.0000
  57 0.0000 0.0027 0.0071 0.0086 0.0033 0.0019 0.0005 0.0004 0.0001 0.0000
  58 0.0001 0.0031 0.0078 0.0070 0.0029 0.0013 0.0005 0.0002 0.0000 0.0000
  59 0.0000 0.0018 0.0047 0.0049 0.0013 0.0006 0.0002 0.0000 0.0000 0.0000
  60 0.0001 0.0015 0.0037 0.0027 0.0007 0.0003 0.0004 0.0000 0.0000 0.0000
  61 0.0000 0.0011 0.0026 0.0016 0.0008 0.0002 0.0002 0.0000 0.0000 0.0000
  62 0.0001 0.0009 0.0016 0.0008 0.0003 0.0001 0.0001 0.0000 0.0000 0.0000
  63 0.0001 0.0001 0.0006 0.0008 0.0000 0.0002 0.0000 0.0000 0.0000 0.0000
  64 0.0000 0.0001 0.0004 0.0002 0.0002 0.0000 0.0000 0.0000 0.0000 0.0000
  65 0.0000 0.0007 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
  66 0.0000 0.0002 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
  69 0.0000 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
    
         13     14     15     16     17
  31 0.0000 0.0000 0.0000 0.0000 0.0000
  33 0.0000 0.0000 0.0000 0.0000 0.0000
  34 0.0001 0.0001 0.0001 0.0001 0.0000
  35 0.0000 0.0000 0.0000 0.0000 0.0000
  36 0.0001 0.0000 0.0001 0.0000 0.0000
  37 0.0000 0.0002 0.0001 0.0000 0.0000
  38 0.0002 0.0000 0.0000 0.0000 0.0000
  39 0.0004 0.0001 0.0000 0.0000 0.0000
  40 0.0005 0.0002 0.0000 0.0001 0.0000
  41 0.0002 0.0000 0.0003 0.0000 0.0000
  42 0.0003 0.0004 0.0001 0.0000 0.0001
  43 0.0007 0.0003 0.0002 0.0001 0.0001
  44 0.0001 0.0003 0.0001 0.0000 0.0000
  45 0.0003 0.0002 0.0001 0.0001 0.0000
  46 0.0002 0.0001 0.0002 0.0000 0.0000
  47 0.0002 0.0000 0.0000 0.0001 0.0000
  48 0.0004 0.0000 0.0000 0.0000 0.0001
  49 0.0003 0.0003 0.0000 0.0000 0.0000
  50 0.0003 0.0002 0.0000 0.0000 0.0001
  51 0.0000 0.0000 0.0000 0.0000 0.0000
  52 0.0001 0.0002 0.0000 0.0000 0.0000
  53 0.0001 0.0001 0.0000 0.0000 0.0000
  54 0.0000 0.0000 0.0000 0.0000 0.0000
  55 0.0000 0.0000 0.0000 0.0000 0.0000
  56 0.0000 0.0000 0.0000 0.0000 0.0000
  57 0.0001 0.0000 0.0000 0.0000 0.0000
  58 0.0000 0.0000 0.0000 0.0000 0.0000
  59 0.0000 0.0000 0.0000 0.0000 0.0000
  60 0.0000 0.0000 0.0000 0.0000 0.0000
  61 0.0000 0.0000 0.0000 0.0000 0.0000
  62 0.0000 0.0000 0.0000 0.0000 0.0000
  63 0.0000 0.0000 0.0000 0.0000 0.0000
  64 0.0000 0.0000 0.0000 0.0000 0.0000
  65 0.0000 0.0000 0.0000 0.0000 0.0000
  66 0.0000 0.0000 0.0000 0.0000 0.0000
  69 0.0000 0.0000 0.0000 0.0000 0.0000

What if we have 100 flips? A plot

A better plot

Joint distributions

Definition 1 (Dekking et al (2005, p. 116)) The joint probability mass function \(f\) of two discrete random variables \(X\) and \(Y\) is the function \(f : \mathbb{R}^2 \to [0, 1]\), deﬁned by \[f_{X,Y}(a, b) = P(X = a \ \mathsf{and}\ Y = b)\] for \(-\infty < a, b < \infty\).

Compare with Wasserman (2004, p. 31): \[f_{X,Y}(x, y) = P(X = x \ \mathsf{and}\ Y = y)\]

Joint probability mass functions should satisfy similar conditions as probability mass functions. What would they be?
One of the four words in the sentence “I SEE THE MOUSE” will be selected at random. Let \(Y\) be the number of letters in the word and \(X\) be the number of E’s in the word. Construct the joint probability mass function of \((X,Y)\).

Definition 2 (Dekking et al (2005, p. 118)) The joint cumulative distribution function \(F\) of two discrete random variables \(X\) and \(Y\) is the function \(F : \mathbb{R}^2 \to [0, 1]\), deﬁned by \[F_{X,Y}(a, b) = P(X \leq a \ \mathsf{and}\ Y \leq b)\] for \(-\infty < a, b < \infty\).

But this concept is not used as often compared to the joint probability mass function.
At this level, it would be rare to see the joint cdf in the discrete case. In fact, it really is there to accommodate the case of continuous random variables.
In fact, talking about “two-dimensional” quantiles can be difficult: Research on this aspect is ongoing.

Definition 3 (Wasserman (2004, p. 33 Definition 2.23)) If \((X,Y)\) have joint distribution with mass function \(f_{X,Y}\), then the marginal mass function for \(X\) is defined by \[\begin{eqnarray*} f_X(x) &=& \mathbb{P}\left(X=x\right) \\ &=& \sum_{y} \mathbb{P}(X = x \ \mathsf{and}\ Y = y) \\ &=&\sum_{y}f_{X,Y}(x,y). \end{eqnarray*}\]

Adjust the definition to obtain the marginal mass function for \(Y\).
Compute the marginal mass functions for \(X\) and \(Y\) for I SEE THE MOUSE.

Independence of random variables

Definition 4 (Wasserman (2004, p. 34 Definition 2.29)) Two random variables \(X\) and \(Y\) are independent if for every \(A\subseteq\mathbb{R}\) and \(B\subseteq\mathbb{R}\), \[\mathbb{P}\left(X\in A\ \mathsf{and}\ Y\in B\right)=\mathbb{P}\left(X\in A\right)\mathbb{P}\left(Y\in B\right).\]

The definition can be quite tedious to check. But the idea is that for independent random variables: joint probabilities are equal to the product of marginal probabilities.

We can express this in terms of probability mass functions as in Wasserman (2004, p. 35) Theorem 2.30, i.e., \[\mathbb{P}\left(X=x\ \mathsf{and}\ Y=y\right)=\mathbb{P}\left(X=x\right)\mathbb{P}\left(Y=y\right)\] or \[f_{X,Y}(x,y)=f_X(x)f_Y(y).\]
Are \(X\) and \(Y\) independent in I SEE THE MOUSE?

Expected values

Extending the idea of expected values to the case of two discrete random variables is relatively easy, i.e.

Definition 5 (Dekking et al (2005, p. 136 Two-dimensional change-of-variable formula)) Let \(X\) and \(Y\) be random variables, and let \(g: \mathbb{R}^2 \to \mathbb{R}\) be a function. If \(X\) and \(Y\) are discrete random variables with values \(a_1 , a_2, \ldots\) and \(b_1 , b_2 , \ldots\), respectively, then \[\mathbb{E}\left[g(X,Y)\right]=\sum_{i}\sum_j g(a_i,b_j)\mathbb{P}\left(X=a_i\ \mathsf{and}\ Y=b_j\right).\]

We can build properties of expected values, which will be repeatedly used.
1. Dekking et al (2005, p. 137) Linearity of Expectations: For all numbers \(r\), \(s\), and \(t\) and random variables \(X\) and \(Y\), one has \[\mathbb{E} (rX + sY + t) = r\mathbb{E} (X) + s\mathbb{E} (Y) + t\]
2. General version found in Wasserman (2004, p. 50) Theorem 3.11

Now, consider the variance of a sum of two random variables.
We can write \[\begin{eqnarray*} \mathsf{Var}\left(X+Y\right) &=& \mathbb{E}\left[\left((X+Y-\mathbb{E}\left(X+Y\right)\right)^2\right] \\ &=& \mathbb{E}\left[\left(X+Y-\mathbb{E}\left(X\right)-\mathbb{E}\left(Y\right)\right)^2\right] \\ &=& \underbrace{\mathbb{E}\left(X-\mathbb{E}\left(X\right)\right)^2}_{\mathsf{Var}\left(X\right)} + \underbrace{\mathbb{E}\left(Y-\mathbb{E}\left(Y\right)\right)^2}_{\mathsf{Var}\left(Y\right)} \\ &&+2\mathbb{E}\left(X-\mathbb{E}\left(X\right)\right)\left(Y-\mathbb{E}\left(Y\right)\right) \end{eqnarray*}\]

We can simplify the term \[\mathbb{E}\left(X-\mathbb{E}\left(X\right)\right)\left(Y-\mathbb{E}\left(Y\right)\right)=\mathbb{E}\left(XY\right)-\mathbb{E}\left(X\right)\mathbb{E}\left(Y\right).\]
This leads to the covariance between two random variables \(X\) and \(Y\), denoted as \(\mathsf{Cov}\left(X,Y\right)\). Refer to Wasserman (2004, p. 52) Definition 3.18 and Theorem 3.19.
If, in addition, \(X\) and \(Y\) are independent, we have \[\mathbb{E}\left(XY\right)=\mathbb{E}\left(X\right)\mathbb{E}\left(Y\right).\]

\[\begin{eqnarray*} \mathbb{E}\left(XY\right) &=& \sum_{i}\sum_{j} a_ib_j \mathbb{P}\left(X=a_i\ \mathsf{and}\ Y=b_j\right) \\ &=& \sum_{i}\sum_{j} a_ib_j \mathbb{P}\left(X=a_i\right) \mathbb{P}\left(Y=b_j\right) \\ &=& \sum_i \left(a_i\mathbb{P}\left(X=a_i\right)\sum_{j}b_j \mathbb{P}\left(Y=b_j\right)\right) \\ &=& \left(\sum_i a_i\mathbb{P}\left(X=a_i\right)\right) \left(\sum_{j}b_j \mathbb{P}\left(Y=b_j\right)\right) \\ &=& \mathbb{E}(X)\mathbb{E}(Y) \end{eqnarray*}\]

Therefore, independent random variables \(X\) and \(Y\) will have zero covariance.
We can now demonstrate one of the key results for a statistics course.
1. Refer to Wasserman (2004, p. 52) Theorem 3.17.
2. It is a result which provides the foundation for why the sample mean is such a useful procedure.
3. It can partially rationalize why we conduct surveys in real life.

Theorem 1 (Wasserman (2004, p. 52 Theorem 3.17)) Let \(X_1, \ldots, X_n\) be random variables. Define \[\overline{X}_n = \frac{1}{n}\sum_{i=1}^n X_i,\ S_n^2=\frac{1}{n-1}\sum_{i=1}^n \left(X_i-\overline{X}_n\right)^2\] as the sample mean and sample variance, respectively. If \(X_1, \ldots, X_n\) happen to be independent and identically distributed with \(\mu=\mathbb{E}\left(X_i\right)\) and \(\sigma^2=\mathsf{Var}\left(X_i\right)\), then, \[\mathbb{E}\left(\overline{X}_n\right)=\mu,\ \mathsf{Var}\left(\overline{X}_n\right)=\frac{\sigma^2}{n}, \ \mathbb{E}\left(S^2_n\right)=\sigma^2\]

The value of the previous theorem

What you saw is an extremely powerful result!
Specifically, it tells you that the sample average and the sample variance each have their own distributions. But you may not necessarily have a special name for the resulting distributions.
Sometimes, the distribution of the sample average is called the sampling distribution of the sample average.

It also tells you about some of the moments of that sampling distribution.
- Some refer to the result that \(\mathbb{E}\left(\overline{X}_n\right)=\mu\) as “\(\overline{X}_n\) is an unbiased estimator of \(\mu\)”.
- Some refer to the result that \(\mathbb{E}\left(S^2_n\right)=\sigma^2\) as “\(S^2_n\) is an unbiased estimator of \(\sigma^2\)”.
The result \(\mathsf{Var}\left(\overline{X}_n\right)=\sigma^2/n\) is also very important.
- The quantity \(\sigma/\sqrt{n}\) is usually called the standard error of \(\overline{X}_n\).

Notice that you do not see an expression for \(\mathsf{Var}\left(S^2_n\right)\). It can be computed but this is quite messy.
More importantly, we can use Chebyshev’s inequality to conclude that, for example, \[\mathbb{P}\left(\left|\frac{\overline{X}_n-\mu}{\sigma/\sqrt{n}}\right|>2\right)\leq \frac{1}{4}.\]

The event that the absolute value of the standardized sample mean is less than or equal to 2 could be written as \[\begin{eqnarray*} \left|\frac{\overline{X}_n-\mu}{\sigma/\sqrt{n}}\right|\leq 2 &\Leftrightarrow& -2\leq \frac{\overline{X}_n-\mu}{\sigma/\sqrt{n}} \leq 2 \\ &\Leftrightarrow& -2\frac{\sigma}{\sqrt{n}} \leq \overline{X}_n-\mu \leq 2\frac{\sigma}{\sqrt{n}} \\ &\Leftrightarrow& \overline{X}_n-2\frac{\sigma}{\sqrt{n}}\leq \mu \leq \overline{X}_n+2\frac{\sigma}{\sqrt{n}} \end{eqnarray*}\]

Therefore, \[\mathbb{P}\left(\overline{X}_n-2\frac{\sigma}{\sqrt{n}}\leq \mu \leq \overline{X}_n+2\frac{\sigma}{\sqrt{n}}\right)\geq \frac{3}{4}\]
Similarly, \[\mathbb{P}\left(\overline{X}_n-3\frac{\sigma}{\sqrt{n}}\leq \mu \leq \overline{X}_n+3\frac{\sigma}{\sqrt{n}}\right)\geq \frac{8}{9}\]
The results are true regardless of the size of \(n\).