Welcome to the course webpage of ECONSTA (Term 1 AY 2024-2025 version)!

Frontmatter

In a nutshell

This is a course webpage for an undergraduate major course called Economic Statistics (ECONSTA), or perhaps better called Statistical Methods for Economists. Materials for a similar course I have taught before could be found here.

Information about using the materials

If you want to use my slides or the materials in this webpage, please abide by the license:

Lecture Slides on Statistics for Economists (2024 Version) © 2024 by Andrew Adrian Yu Pua is licensed under CC BY-NC-SA 4.0

To cite the slides, please use

Pua, A. A. Y. (2024). Lecture Slides for Statistics for Economists [Quarto slides]. https://econsta.neocities.org

Finding typos or unclear portions

If you find typos or unclear portions in the notes, please let me know. I will be monitoring your contributions during the term and I will acknowledge you in these notes. If you make substantial contributions, I will treat you to some non-alcoholic drinks at Auro Cafe located near the Brother Andrew Gonzalez Hall of De La Salle University.

Resources on time management, learning to learn, and the illusion of learning

I would ask you to take an opportunity to reexamine how you learn and study things. It does not matter if your motivation is only to pass the exam or something greater. It would be good for society if you study for something much greater. I have found the following resources to be helpful to students I have taught in the past. Of course, I am not sure if it would work for you, but do keep an open mind.

Main Body

Textbook examples and exercises for practice

Some of these examples and exercises were discussed in classes or in pre-recorded videos. There are also duplicated exercises across the references. Finally, most of these already have readily accessible solutions: they could be found as part of the chapters, at the back of the book, or as a separate file of solutions legitimately provided by the authors.

  • Related to Lecture 6 (as of 2024-11-28)

    • You may find it useful to refer to page 53 for a table of means and variances for random variables with special distributions
    • Wasserman Example 5.7, 5.9, 6.17, 6.18, 7.15, 10.15
    • Wasserman Chapter 5: Exercise 6, 8,
    • Wasserman Chapter 6 Exercise 6.1: Obtain the expected value and the standard error of \(\widehat{\lambda}\).
    • Wasserman Chapter 6 Exercise 6.3: Obtain the expected value and the standard error of \(\widehat{\theta}\).
    • Dekking et al Quick Exercises: 14.2, 23.1, 23.2, 23.3, 26.1
    • Dekking et al Exercises 13.2, 13.4, 13.5, 13.6, 14.1, 14.3, 14.9, 19.1, 19.2, 20.2, 23.1 to 23.3 (feel free to drop the assumption that the IID random variables have a normal distribution, use a large-sample approximation instead), 23.11 (a and c), 25.1, 25.2, 26.4
    • Evans and Rosenthal: Examples 4.4.7, 4.4.10,
    • Evans and Rosenthal: Exercises 4.4.12 to 4.4.14
  • Related to Lecture 5 (from 2024-10-10 onward, as of 2024-11-21)

    • Wasserman: Examples 2.24, 2.31, 3.2, 3.3, 3.12, 3.16,

    • Wasserman: Exercise 14

    • Dekking et al: Quick Exercises 7.1, 7.3, 7.4, 8.4, 9.1, 9.2, 9.6, 10.1 to 10.3

    • Dekking et al: Exercises 7.1 to 7.5, 7.15, 8.1, 8.12, 8.14, 9.1 to 9.7, 9.8(a), 10.1, 10.2 (skip correlation), 10.3, 10.5, 10.6, 10.7(a), 10.8, 10.16a, 10.19, 13.11

    • Evans and Rosenthal: Examples 2.7.1, 2.7.4, 2.7.5, 3.1.1 to 3.1.6, 3.1.8, 3.1.10 to 3.1.13, 3.1.16, 3.3.1 to 3.3.3, 3.3.8, to 3.3.10, 3.6.1 to 3.6.4, 3.6.9, 3.6.10, 4.1.1

    • Evans and Rosenthal (some of the exercises maybe too routine, perhaps think about how to do them in R?):

      • 3.1.1 to 3.1.3, 3.1.8 to 3.1.14
      • 3.3.1, 3.3.2, 3.3.6, 3.3.10 to 3.3.13, 3.3.15 (skip correlations)
      • 3.6.8, 3.6.9
      • (these may be good exercises for R simulation as well) 4.1.1 to 4.1.4, 4.1.6 to 4.1.8, 4.1.10 (confirm the predictions of Theorem 1 of Lecture 5d), 4.1.11 (to draw \(10^3\) random numbers from an N(0,1) distribution, use rnorm(10^3))
    • Arias-Castro: Problems 6.26, 7.32, 7.38, 7.40, 7.87, 7.88

  • Related to Lecture 5 (as of 2024-10-09)

    • Wasserman: Examples 2.2, 2.4, 2.6, 2.10
    • Wasserman: Exercises 2, 6
    • Dekking et al: Quick Exercises 4.1 to 4.5
    • Dekking et al: Exercises 4.2, 4.3, 4.4, 4.6 to 4.10
    • Evans and Rosenthal: Examples 2.1.1 to 2.1.10, 2.2.1 to 2.2.3, 2.3.3, 2.5.1, 2.6.1, 2.6.2
    • Evans and Rosenthal: Exercises 2.1.1 to 2.1.9, 2.2.1 to 2.2.10, 2.3.1 to 2.3.5, 2.3.7, 2.3.11, 2.3.14, 2.3.22, 2.5.1, 2.5.2
  • Related to Lecture 4

    • Wasserman: Examples 1.1 to 1.3, 1.7, 1.10, 1.11, 1.13, 1.15, 1.19
    • Wasserman: Exercises 5, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 23
    • Dekking et al: All Quick Exercises and Exercises of Chapters 2 and 3
    • Evans and Rosenthal: Examples 1.2.1 to 1.2.5, 1.4.1 to 1.4.9 (skip the examples about bridge), 1.5.1 to 1.5.3
    • Evans and Rosenthal: Exercises 1.2.1 to 1.2.12 (except 1.2.5), 1.3.1 to 1.3.9, 1.4.1 to 1.4.21 (except 1.4.5, 1.4.18), 1.5.1 to 1.5.18
    • Arias-Castro: Examples 1.8, 1.9, 1.12, 1.15, 1.16, 1.17, 1.33, 1.48
    • Arias-Castro: Problems 1.10, 1.11, 1.13, 1.18, 1.32, 1.35, 1.36, 1.37, 1.38, 1.39, 1.40, 1.41, 1.45, 1.46

Course Diary

Lecture 6c, 2024-12-02

Last class: talk about bits and pieces we did not get to work on and how everything will be connected to the next course and beyond

Lecture 6b continued, 2024-11-28

Hypothesis tests involving \(\mu\), why are hypothesis tests design the way they are, performance of hypothesis tests based on the central limit theorem, and sample size calculations

Lecture 6b, 2024-11-25

Recap of parts of Lecture 6a, digging deeper into what confidence sets are and why they are useful, simulation evidence to document the performance of large-sample symmetric 95% confidence intervals for \(\mu\) if \(\sigma\) is known and when \(\sigma\) is not known

Scribbles here

Lecture 6a, 2024-11-21

Moving on to the three typical tasks in statistical inference, introduce the two building blocks useful in statistical inference, digressed into continuous random variables but focusing on the standard normal, usefulness of the central limit theorem

Lecture 5d continued, 2024-11-14

Wrapping up the biggest ideas and consequences related to Theorem 1 of Lecture 5d

Code used in the lecture, specifically for demonstrating the performance of a finite sample confidence interval (note that we have not defined formally what a confidence interval is, but I demonstrated its performance as a procedure:

# Draw n=4 observations from the distribution of X found in Wasserman Chapter 2 Exercise 2
# Repeat 10^4 times
a <- replicate(10^4, sample(c(2, 3, 5), 4, prob = c(0.1, 0.1, 0.8), replace = TRUE))
# Calculate all the column means
sample.means <- colMeans(a)
# Calculate all the lower limits
lower <- sample.means - 2 * sqrt(1.05)/4
# Calculate all the upper limits
upper <- sample.means - 2 * sqrt(1.05)/4
# Determine how many times you "capture" the population mean of 4.5
mean(lower <= 4.5 & 4.5 <= upper)
# Determine which replication had an interval which was not able to "capture" the population mean
which(lower >= 4.5 | 4.5 >= upper)
# Install plotrix to visualize the intervals
install.packages("plotrix")
# Load plotrix
library(plotrix)
# plot the first 50 intervals
# Adjust 1:50 to 51:100 if you want to see the next 50 intervals
plotCI(1:50, sample.means[1:50], li = lower[1:50], ui = upper[1:50], xlab = "", ylab = "")

Lecture 5d continued, 2024-11-11

Mostly recap of pre-recordings, extending the variance of a sum of random variables, explaining the key idea of Theorem 1 of Lecture 5d

Scribbles here

Meme used in class here

Code used in lecture:

# Generate artificial data according to the distribution in Wasserman Chapter 2 Exercise 2
# Try changing the sample size of 10 to 100.
a <- replicate(10^4, sample(c(2, 3, 5), 10, prob = c(0.1, 0.1, 0.8), replace = TRUE))
# Get the sample average of each column
sample.means <- colMeans(a)
# Display the simulated distribution of sample means
hist(sample.means)
# Display the table of simulated sample means
table(sample.means)
# Mean of all the sample means
# What should we expect to see here given Theorem 1 of Lecture 5d
mean(sample.means)
# Variance of all the sample means
# What should we expect to see here given Theorem 1 of Lecture 5d
var(sample.means)
# Exploring the procedure called a trimmed mean
# Create a function to apply the trimmed mean
# This is one way of doing the implementation
trimmed <- function(x)
{
  #  0.3 here could be changed
  # throw away lowest 15% and highest 15% of observations before taking the sample average
  return(mean(x, trim = 0.3))
}
# Apply the trimmed function to every column of a
trimmed.means <- apply(a, 2, trimmed)
# Display the simulated distribution of trimmed means
hist(trimmed.means)
# Display the table of simulated trimmed means
table(trimmed.means)
# Mean of all the trimmed means
mean(trimmed.means)

Lecture 5d continued, 2024-11-07

Joint distributions, marginal distributions, expected values and variances involving functions of two discrete random variables, covariances, independence

Lecture 5d, 2024-11-04

  1. Alternative ways to calculate the variance and the usefulness of the variance in Chebyshev’s inequality
  2. Skipped Lecture 5c slides 22 to 27, will move to some other time
  3. Distinguishing between real and fake coin sequences
  4. Start of joint distributions

Lecture 5c continued, 2024-10-24

Mostly explaining Jensen’s inequality and Chebyshev’s inequality

Lecture 5c continued, 2024-10-21

  1. Recap of expected value
  2. Apply expected values to decision making under uncertainty
  3. Why study the expected value?
  4. What is the variance? What is the standard deviation? Hard to interpret these quantities but Chebyshev’s inequality is the key.
  5. Fix the code related to how the sequence of sample variances (as you have more and more observations become “stable” around the population variance)
  6. Scribbles

A demonstration that the way I set up the code cumsum((a-means)^2)/(1:length(a)) is wrong:

a <- c(2, 3, 5)
means <- cumsum(a)/(1:length(a))
a
[1] 2 3 5
means
[1] 2.000000 2.500000 3.333333
a-means
[1] 0.000000 0.500000 1.666667
(a-means)^2
[1] 0.000000 0.250000 2.777778
cumsum((a-means)^2)/(1:length(a))
[1] 0.000000 0.125000 1.009259

What is wrong? The sample variance of 2 alone is zero. This is correctly captured. But, the sample variance of \(2,3\) should be \(((2-2.5)^2+(3-2.5)^2)/2=0.25\). It should be the second mean that was deducted from each observation. When I used a-means, what happened was \(((2-2)^2+(3-2.5)^2)/2=0.125\). That is why my code is wrong. So, I have to adjust my code. To do this, I have to change the formula for the sample variance into something which can be tracked as we add more and more observations and will not suffer the problem I pointed out earlier.

We are then going to calculate The sample variance can be written in so many forms. Let \(\{X_1,\ldots,X_n\}\) be some data representing \(n\) observations of some measurement. Let \(\overline{X}\) be the sample mean. One way to write the sample variance is as follows: \[\frac{(X_1-\overline{X})^2+(X_2-\overline{X})^2+\cdots+(X_n-\overline{X})^2}{n}.\] Another way requires some algebra. Observe that \[(X_1-\overline{X})^2 = X_1^2-2X_1 \overline{X} +\left(\overline{X}\right)^2\] and a similar thing would be true for \((X_2-\overline{X})^2\), etc. Therefore, \[\begin{eqnarray*} &&(X_1-\overline{X})^2+(X_2-\overline{X})^2+\cdots+(X_n-\overline{X})^2 \\ &=& \left[X_1^2-2X_1 \overline{X} +\left(\overline{X}\right)^2\right] + \left[X_2^2-2X_2 \overline{X} +\left(\overline{X}\right)^2\right] +\cdots+ \left[X_n^2-2X_n \overline{X} +\left(\overline{X}\right)^2\right] \\ &=& X_1^2+X_2^2+\cdots+X_n^2 -2\overline{X}\left(X_1+X_2+\cdots+X_n\right) +n\left(\overline{X}\right)^2 \\ &=& X_1^2+X_2^2+\cdots+X_n^2 - 2\overline{X} n \overline{X} +n\left(\overline{X}\right)^2 \\ &=& X_1^2+X_2^2+\cdots+X_n^2 - n\left(\overline{X}\right)^2 \end{eqnarray*}\]

The second equality follows from grouping similar terms together. The third equality follows from the fact that the the number of observations multiplied by the sample mean has to be the sum of the observed values, i.e., \[X_1+X_2+\cdots+X_n=n\overline{X}.\] If you have trouble here, try \(n=2\) and \(n=3\) to get the hang of things.

In other words, when \(n=1\), we have \(X_1^2-1\left(X_1/1\right)^2=0\). When \(n=2\), we have \(X_1^2+X_2^2-2\left((X_1+X_2)/2\right)^2\). When \(n=3\), we have \(X_1^2+X_2^2+X_3^2-3\left((X_1+X_2+X_3)/3\right)^2\).

So, the code now has to involve a cumsum() of squares of the observations, the number of observations (1:length()), and the cumsum() of the observations divided by the number of observations (1:length()). Therefore, the codes (after breaking down everything) would be:

a <- c(2, 3, 5)
means <- cumsum(a)/(1:length(a))
a^2
[1]  4  9 25
means
[1] 2.000000 2.500000 3.333333
1:length(a)
[1] 1 2 3
means^2
[1]  4.00000  6.25000 11.11111
(1:length(a))*means^2
[1]  4.00000 12.50000 33.33333
cumsum(a^2)-(1:length(a))*means^2
[1] 0.000000 0.500000 4.666667
# Correct command
(cumsum(a^2)-(1:length(a))*means^2)/(1:length(a))
[1] 0.000000 0.250000 1.555556

You can check that everything is correct now.

Recall the random variable \(X\) from Wasserman Chapter 2 Exercise 2. We are going to draw random numbers from that distribution. Recall also that we computed \(\mathsf{Var}\left(X\right)=1.05\).

# 10^4 draws from the distribution of X defined in Wasserman Chapter 2 Exercise 2
a <- sample(c(2, 3, 5), 10^4, prob = c(0.1, 0.1, 0.8), replace = TRUE)
# sequence of sample means as you draw more observations
means <- cumsum(a)/(1:length(a))
# sequence of sample variances as you draw more observations
sample.variances <- (cumsum(a^2)-(1:length(a))*means^2)/(1:length(a))
# Picture similar to the plot dealing with sample averages and the expected value
plot(1:length(sample.variances), sample.variances, 
     type = "l", xlab = "number of draws from X", ylab = "sample variance", ylim = c(0, 2),
     cex.lab=1.5, cex.axis=1.5)
lines(1:length(sample.variances), rep(1.05, length(sample.variances)), 
      lty = 3, col = 2, lwd = 3)

Lecture 5c, 2024-10-14, 2024-10-17

Mostly focused on the construction of a quantile function, simulating a quantile function in R, and seeing a business application

Lecture 5b, 2024-10-10

  1. R commands related to binomial distributions
  2. More details and examples regarding discrete random variables
  3. Working on some exercises by hand and using R

Lecture 5a, 2024-10-07

  1. Random variables, distributions, cumulative distribution functions
  2. Connection to past lectures
  3. Quiz 02

Lecture 4d, 2024-10-03, asyncrhonous

  1. Wrap up Monty Hall problem, also do a simulation
  2. Conditional probabilities and their uses
  3. Independence
  4. Exercises and more practice with simulations

Lecture 4c, 2024-09-30

  1. Using the infrastructure for calculating probabilities systematically

  2. Using R again to estimate probabilities, but for the birthday problem

  3. We cannot just assign probabilities to events of an infinite sample space, willy-nilly. The idea of equally likely outcomes in this context does not make sense.

  4. Thinking through the Monty Hall problem

  5. Below you will find some code we used in class to break apart some of the commands and understand what they are doing.

group.size <- 2:30
# Approximation of the probability of at least 2 in a group sharing the same birthday
1-exp(-group.size^2/730)
# Figure out which entries will have more than 50% probability
# Thank you to Clea Velina for spotting this! 2024-10-17
which(1-exp(-group.size^2/730) > 0.5)
# Making sense of unique(b) through examples
# In the end, what matters is whether k-length(unique(b)) is equal to zero
b <- c(1, 2, 3, 4, 5)
unique(b)
5-length(unique(b))
b <- c(1, 1, 3, 4, 5)
unique(b)
5-length(unique(b))
b <- c(1, 1, 1, 4, 5)
unique(b)
5-length(unique(b))
# Making sense of sum(diff(sort(b))==0)
# In the end, what matters is whether sum(diff(sort(b))==0) is equal to zero
b <- c(1, 2, 3, 4, 5)
diff(sort(b))
diff(sort(b))==0
sum(diff(sort(b))==0)
b <- c(1, 1, 3, 4, 5)
diff(sort(b))
diff(sort(b))==0
sum(diff(sort(b))==0)
b <- c(1, 1, 1, 4, 5)
diff(sort(b))
diff(sort(b))==0
sum(diff(sort(b))==0)

Lecture 4b, 2024-09-26, asynchronous

  1. Definition of probability distributions and its implications
  2. Demonstration of the constraints on probability assignments imposed by the definition of probability distributions
  3. Probability assignments based on the equally likely outcomes or equiprobable outcomes
  4. Mistakes made in the past arising from assuming equally likely outcomes but incorrectly specifying the sample space
  5. Digression into counting outcomes for subsets of a finite sample space
  6. Using R to estimate probabilities
  7. Making sense of the birthday problem and developing an approximation for the probability that a room with \(C\) people would have at least 2 people sharing the same birthday

Lecture 4a, 2024-09-23, moved online

  1. Transportation strike pushed this lecture online.
  2. Class administration: What classroom activities and discussion boards we have so far
  3. Online quiz: Explain how quizzes work, explain how the “scoring” for feedback works
  4. Actual quiz: Copy of quiz questions along with your answers sent to your email. Solutions already sent.
  5. Classroom activity on matching numbers
  6. Set-theoretic language and building the infrastructure

Lecture 3b, 2024-09-19, asynchronous

  1. Wrap up the slides of Lecture 03.

  2. Give a preview of Lecture 04.

    • RSS article on statistical literacy of MPs here
    • Ingredients which make up the definition of a probability function

Lecture 3a, 2024-09-16

  1. Recap and clarifying aspects of Lecture 02

  2. Card prediction activity

    • Why do you think this activity was conducted? What do you think is the phenomenon we are trying to investigate?
    • You collected data on whether you were able to predict correctly the card (suit and number) which will be drawn from a deck of 52 cards.
    • Go through the ingredients of a hypothesis test. Match it to the card activity you did.
    • Calculate a \(p\)-value. Depending on your observed data (specific to your case), it may not be necessary to use the computer to find this \(p\)-value (Why?).
    • But it would be useful to try modifying the code you have so far and use R to calculate a \(p\)-value.

Make-up for suspended classes, asynchronous

Classes were suspended on 2024-09-02 and 2024-09-05. We lost 3 hours of contact time.

  1. (about 2 hours) Introductory R tutorials using M&M data have been recorded. Links to these recordings are available at Canvas/Animospace. Exercises for you to try are part of the recordings.
  2. (about 1 hour) Continuation of Lecture 02

Lecture 2, 2024-09-12, asynchronous

  1. More details about the course syllabus
  2. Introductory R for simulation purposes
  3. Elaborating more on evaluating claims in a statistical manner

Lecture 1, 2024-09-09

  1. M&M activity
  2. What the course is going to be about
  3. Setting some expectations