Stat 200A Fall 2018

A. Adhikari

UC Berkeley

Lecture 3, Tuesday 8/30

These notes contain only the terminology and main calculations in the lectures. For the associated discussions, please come to lecture.

The lecture was about joint distributions and was drawn directly from Sections 3.1 and 5.2 of the text by Pitman. These notes include some comments and the lecture examples that are not in those sections.

Summary of Terminology and General Results

The joint distribution of the random pair $(X, Y)$ is the set of all possible values of the pair and any way of specifying how probability is distributed over the pair.

The Discrete Case

If $X$ and $Y$ are discrete, the joint distribution can be described by the joint probability mass function $P(X=x, Y=y)$ for all possible pairs $(x, y)$.

The joint distribution allows you to calculate the probability of any event determined by the two variables. For any subset $B$ of the possible values,

$$ P((X, Y) \in B) ~ = ~ \sum_{\{(x,y): (x,y) \in B\}} P(X=x, Y=y) $$

In particular, the distribution of $X$ can be found by

$$ P(X = x) ~ = ~ \sum_{\text{all }y} P(X=x, Y=y) ~~~~ \text{for all }x $$

Because of a standard visualization of discrete joint distributions in tables, $P(X=x)$ is a sum along a column (or row) of the table, and the distribution of $X$ is sometimes displayed in the margins of the table. So another name for the distribution of $X$ is the marginal distribution of $X$.

For any fixed $x$, the conditional distribution of $Y$ given $X=x$ is the set of all possible values of $Y$ along with the conditional probabilities of those values given $X=x$.

$$ P(Y=y \mid X=x) ~ = ~ \frac{P(X=x, Y=y)}{P(X=x)} ~~~~ \text{for all }y $$

$X$ and $Y$ are independent if knowing the value of one of the variables doesn't affect the conditional chance of the other. So $X$ and $Y$ are independent if

$$ P(Y=y \mid X=x) ~ = ~ P(Y=y) ~~ \text{for all }(x,y), ~~ \text{that is, } ~~ P(X=x, Y=y) ~ = ~ P(X=x)P(Y=y) $$

The Continuous Case

$X$ and $Y$ have joint density $f$ if $f(x, y)$ is non-negative for all $(x, y)$, integrates to 1 over the whole plane, and for very subset $B$ of the plane, the chance of $B$ is the volume under the joint density surface over the region $B$:

$$ P((X,Y) \in B) ~ = ~ \mathop{\int\int}_{B} f(x,y)dxdy $$

So $$ P(X \in dx, Y \in dy) ~ \sim ~ f(x,y)dxdy ~~~~~~ \text{for all } (x,y) $$

$X$ and $Y$ are independent if $$ f(x, y) ~ = ~ f_X(x)f_Y(y) ~~~~~~ \text{for all } (x, y) $$

where $f_X$ is the density of $X$ and $f_Y$ is the density of $Y$.

A Joint Distribution Table

Consider three tosses of a coin.

  • $X$: the number of heads on tosses 1 and 2
  • $Y$: the number of heads on tosses 3 and 4

The joint distribution table of the pair $(X, Y)$ is given by

$~~~~~~~~~~~~~~$ $X=0$ $X=1$ $X=2$
$Y=2$ $~~~~~~~~~~~~~~~~~$0$~~~$ $~~~$0.125$~~~$ $~~~$0.125$~~~$
$Y=1$ $~~~$0.125$~~~$ $~~~$0.25$~~~$ $~~~$0.125$~~~$
$Y=0$ $~~~$0.125$~~~$ $~~~$0.125$~~~$ $~~~~~~~~~~~~~~~~~$0$~~~$

If you didn't know the description of $X$ and $Y$ in terms of coin tosses, the distribution table would still have told you immediately that the two random variables are dependent. The possible values of $Y$ change depending on the value of $X$.

Minimum of Independent Geometrics

To find the distribution of a sample minimum, it is a good idea to work with the right hand tail probabilities:

$$ P(\text{minimum } > k) ~ = ~ P(\text{each element of the sample is } > k) $$

Let $X_1$ be geometric $(p_1)$ on $\{1, 2, 3, \ldots \}$ and let $X_2$ be geometric $(p_2)$ on $\{1, 2, 3, \ldots \}$. Suppose $X_1$ and $X_2$ are independent. Let $M = \min(X_1, X_2)$.

To find the distribution of $M$, first note that the possible values of $M$ are $\{1, 2, 3, \ldots \}$. The right tail probabilities (that is, the survival probabilities) are

$$ P(M > k) ~ = ~ P(X_1 > k, X_2 > k) ~ = ~ q_1^kq_2^k ~ = ~ (q_1q_2)^k $$

This is the form of the tail probabilities of the geometric $(1 - q_1q_2)$ distribution on $\{1, 2, 3, \ldots \}$.

If you fail to notice that, you can always find the mass $P(M = k)$ by using

$$ P(M = k) ~ = ~ P(M > k-1) - P(M > k) $$

A Joint Density

The function $$ f(x,y) ~ = ~ 120x(y-x)(1-x), 0 < x < y < 1 $$

and $f(x,y) = 0$ everywhere else is a joint density function. The set of possible values $(x,y)$ is the upper left triangle of the unit square. You can check that the function integrates to 1.

To find $P(Y > X + 0.5)$ you have to integrate $f$ over the appropriate region:

$$ P(Y > X + 0.5) ~ = ~ \int_0^{0.5} \int_{x+0.5}^1 f(x,y)dydx ~ = ~ \int_{0.5}^1 \int_0^{y-0.5} f(x,y)dxdy $$