Lecture 4c

More on probabilities

Andrew Pua

2024-09-30

Plan for these slides

  1. Probability calculations are not always intuitive.
  2. Monte Carlo simulation of the birthday problem
  3. Issues with infinite sample spaces
  4. Specifying the sample spaces correctly

5. Moving on to notion of dependence and independence

6. Updating probabilities

Probability calculations

Jane is 28 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice. Which is more likely?

  • Jane works at a museum.
  • Jane works at a museum and is a member of a feminist group.

Classroom activity: Matching numbers

  1. \(n\) students in a classroom
  2. Choose a number between 1 and \((n/2)^2\).
  3. Goal: Choose a number which is different from everyone else.

A birthday problem in disguise: How many people? How many birthdays?

How do you use the computer to help you calculate probabilities?

  • Setup the experiment. Put a group of \(k\) people into \(m\) rooms.

  • Set the group size: k <- 23

  • Set the number of rooms: m <- 10^4

  • Set storage to “summarize” a relevant aspect of the room: x <- numeric(m)

  • Repeat the following commands for each room i:

    1. Assign to \(k\) people any of the 365 possible birthdays with replacement: b <- sample(1:365, k, replace = TRUE)

    2. Pay attention to x[i] <- k - length(unique(b)) for every room.

  • Present the results: table(x)

Results of simulation

k <- 23 
m <- 10^4 
x <- numeric(m) 
for (i in 1:m)  
{
  b <- sample(1:365, k, replace = TRUE) 
  x[i] <- k - length(unique(b))
}
table(x)
x
   0    1    2    3    4    5    6 
4870 3704 1153  240   27    5    1 

You can also implement the code through the replicate command. But you need define a function.

nsim <- 10^4 
match <- function(k)
{
  b <- sample(1:365, k, replace = TRUE) 
  return(k - length(unique(b)))
}
x <- replicate(nsim, match(23))
table(x)
x
   0    1    2    3    4    5 
4920 3677 1175  198   26    4 

Yet another alternative

nsim <- 10^4 
match <- function(k)
{
  b <- sample(1:365, k, replace = TRUE) 
  return(sum(diff(sort(b))==0))
}
x <- replicate(nsim, match(23))
table(x)
x
   0    1    2    3    4    5 
4986 3604 1192  198   19    1 

Modify the code to allow for “fuzzy” matches: Say, birthdays that are within a day apart are counted as a match.

What went wrong here?

Let us watch the following video:

https://www.cornell.edu/video/the-tonight-show-with-johnny-carson-feb-6-1980-excerpt

Were they talking about the birthday problem? Why or why not?

Wasserman Chapter 1 Exercise 6

Let \(\Omega=\{0,1,\ldots\}\). Prove that there does not exist a uniform distribution on \(\Omega\).

Things to note:

  1. This exercise highlights the need to be careful when you have an infinite sample space.
  2. A uniform distribution here means that every subset of the sample space containing one element will be assigned a common probability.
  3. You will be exposed to a proof technique called proof by contradiction.

Wasserman Chapter 1 Exercise 10

A prize is placed at random behind one of three doors.

  • You pick a door. To be concrete, let’s suppose you always pick door 1.

  • Now Monty Hall chooses one of the other two doors, opens it and shows you that it is empty. He then gives you the opportunity to keep your door or switch to the other unopened door.

  • Should you stay or switch? Intuition suggests it doesn’t matter. The correct answer is that you should switch. Prove it.

Three options, using what we have so far:

  1. Let \(\Omega=\{(\omega_1,\omega_2): \omega_i\in\{1,2,3\}\}\), where \(\omega_1\) is where the prize is and \(\omega_2\) is the door Monty opens.
  2. Let \(\Omega=\{a,b,c\} \times \{a, b, c\}\), where \(a\) is the door behind which the car is parked, while the other two are \(b\) and \(c\). The first entry is the choice of the candidate and the second entry is the choice of Monty.
  3. Design a Monte Carlo simulation.