\documentclass{article}
\usepackage[margin=1.2in]{geometry}
\usepackage{graphicx}
\usepackage{amsmath,amssymb,amsthm,bm}
\usepackage{latexsym,xcolor,minipage-marginpar,caption,multirow,verbatim}
\usepackage[round]{natbib}
\usepackage{enumerate}
\usepackage{times}
\newcommand{\RR}{\mathbb{R}}
\newcommand{\PP}{\mathbb{P}}
\newcommand{\EE}{\mathbb{E}}
\newcommand{\ZZ}{\mathbb{Z}}
\newcommand{\cP}{\mathcal{P}}
\newcommand{\cQ}{\mathcal{Q}}
\newcommand{\cY}{\mathcal{Y}}
\newcommand{\cX}{\mathcal{X}}
\newcommand{\cT}{\mathcal{T}}
\newcommand{\cB}{\mathcal{B}}
\newcommand{\ep}{\varepsilon}
\newcommand{\widebar}{\overline}
\newcommand{\simiid}{\overset{\textrm{i.i.d.}}{\sim}}
\newcommand{\simind}{\overset{\textrm{ind.}}{\sim}}
\newcommand{\td}{\,\textrm{d}}
\newcommand{\red}{\color{red}}
\definecolor{darkblue}{rgb}{0.2, 0.2, 0.5}
\newcommand{\sol}{~\\\color{darkblue}{\bf Solution:~\\}}
\begin{document}
\title{Stats 210A, Fall 2023\\
Homework 3\\
{\large {\bf Due date}: Wednesday, Sep. 20}}
\date{}
\maketitle
\vspace{-5em}
You may disregard measure-theoretic niceties about conditioning on measure-zero sets, almost-sure equality vs. actual equality, ``all functions'' vs. ``all measurable functions,'' etc. (unless the problem is explicitly asking about such issues).
\begin{description}
\item[1. Interpretation of completeness]\hfill\\
The concept of {\em completeness} for a family of measures was introduced in \citet{lehmann1950completeness} as a precursor to their definition, in the same paper, of a complete statistic. The definition of a complete family did not stick, and lives on only in the (consequently confusingly named) idea of complete statistic (in particular it has nothing to do with the definition of a {\em complete measure} that you can find on Wikipedia).
If $\cP = \{P_\theta:\; \theta \in \Theta\}$ is a family of measures on $\cX$, we say that $\cP$ is {\em complete} if
\[
\int f(x)\td P_\theta(x) = 0, \;\forall \theta \quad\Rightarrow\quad
P_\theta(\{x:\; f(x) \neq 0\}) = 0, \;\forall \theta.
\]
This can be interpreted as an inner product $\langle f, P_\theta\rangle = \int f\td P_\theta$, where $f \perp P_\theta$ if $\langle f, P_\theta\rangle = 0$. Then, the family is {\bf not} complete if there is some nonzero function $f$ that is orthogonal to every $P_\theta$. We will try to gain some intuition for this definition and, thereby, for the definition of a complete statistic.
For the following parts, let $\cP = \{P_\theta:\; \theta \in \Theta\}$ be a family of probabilty measures on $\cX$, assume $T(X)$ is a statistic, and let $\cT = T(\cX)$ be the range of the statistic $T(X)$. Let $\cP^T = \{P_\theta^T:\; \theta\in \Theta\}$ denote the induced model of push-forward probability measures on $\cT$ denoting the possible distributions of $T(X)$:
\[
P_\theta^T(B) = P_\theta(T^{-1}(B)) = \PP_\theta(T(X) \in B).
\]
\begin{enumerate}[(a)]
\item Show that $T(X)$ is a complete statistic for the family $\cP$ if and only if $\cP^T$ is a complete family.
\item Assume (for this part only) that $\cX$ is a finite set, i.e. $\cX = \{x_1,\ldots, x_n\}$ for some $n<\infty$, and assume without loss of generality that every $x\in\cX$ has $P_\theta(\{x\}) > 0$ for at least one value of $\theta$ (otherwise we could truncate the sample space).
Let $p_\theta(x) = \PP_\theta(X = x) \geq 0$, and $v^\theta = (p_\theta(x_1),\ldots,p_\theta(x_n)) \in \RR^n$. Show that $\cP$ is complete if and only if $\text{Span}\{v^\theta:\;\theta\in\Theta\} = \RR^n$.
\item Let $X_1,\ldots,X_n \simiid \text{Pois}(\theta)$ for $\theta\in \Theta = \{\theta_1,\ldots, \theta_m\}$ with $2 \leq m < \infty$. Find a sufficient statistic that is minimal but not complete (prove both properties).
\item In the same scenario but with $\Theta = \pi\ZZ_+ = \{0, \pi, 2\pi, \ldots\}$, show that the same statistic is minimal but not complete.
{\bf Hint:} Recall the Taylor series
\[
\sin(\theta) = \theta - \frac{\theta^3}{3!} + \frac{\theta^5}{5!} - \frac{\theta^7}{7!} + \cdots.
\]
\item {\bf Optional} (not graded, no extra points). Let $X_1,\ldots,X_n \simiid \text{Pois}(\theta)$ for $\theta\in \Theta$, and assume that $\Theta$ has an accumulation point at 0, i.e. $\Theta$ includes an infinite sequence of positive values $\theta_1,\theta_2,\ldots\in \Theta$ such that $\lim_{m\to\infty} \theta_m = 0$. Find a complete sufficient statistic and prove it is complete sufficient.
{\bf Hint:} suppose $f$ is a counterexample function; what is $f(0)$? It may be helpful to recall that $\int f\td \mu$ is undefined unless either $\int \max(0,f(x))\td \mu(x)$ or $\int \max(0, -f(x))\td \mu(x)$ is finite; as a result $\int f\td \mu = 0 \Rightarrow \int|f|\td \mu < \infty$.
\end{enumerate}
{\bf Moral 1:} The definition of a complete statistic is easier to remember if we recall its interpretation as saying that the set of distributions $P_\theta^T$ ``spans'' a certain vector space, so that only the zero function is orthogonal to all $P_\theta^T$.
{\bf Moral 2:} If $\cP = \{P_\eta:\; \eta \in \Xi\}$ is a full-rank exponential family with natural parameter $\eta$, meaning $\Xi$ contains an open set, our result from class allows us to prove completeness of $T(X)$. But the converse is far from true: it is possible for $T$ to be complete if $\Xi$ is discrete, or even finite.
\item[2. Ancillarity in location-scale families]\hfill\\
In a parameterized family where $\theta = (\zeta, \lambda)$, we say a statistic $T$ is {\em ancillary for $\zeta$} if its distribution is independent of $\zeta$; that is, if $T(X)$ is ancillary in the subfamily where $\lambda$ is known, for each possible value of $\lambda$.
Suppose that $X_1,\ldots,X_n\in \cX=\RR$ are an i.i.d. sample from a {\em location-scale family}
\[
\cP = \{F_{a,b}(x) = F((x-a)/b): \; a\in \RR, b>0\},
\]
where $F(\cdot)$ is a known cumulative distribution function. The real numbers $a$ and $b$ are called the {\em location} and {\em scale} parameters respectively. ({\bf Note:} recall it is {\em not} enough to prove ancillarity of the coordinates.)
\begin{enumerate}[(a)]
\item Show that the vector of differences $\left(X_1 - X_i\right)_{i = 2}^n$ is ancillary for $a$.
\item Show that the vector of ratios $\left(\frac{X_1 - a}{X_i - a}\right)_{i=2}^n$ is ancillary for $b$. (Note: this is only a statistic when $a$ is known).
\item Show that the vector of difference ratios $\left(\frac{X_1 - X_i}{X_2 - X_i}\right)_{i=3}^n$ is ancillary for $(a,b)$.
\item Let $X_1,\ldots,X_n$ be mutually independent with $X_i \sim \text{Gamma}(k_i, \theta)$. Show that $X_+ = \sum_{i=1}^n X_i$ is independent of $(X_1,\ldots,X_n)/X_+$.
\end{enumerate}
{\bf Moral:} Location-scale families have common structure that we can exploit in some problems.
\item[3. Unbiased estimation in replicated studies]\hfill\\
One focal issue in the ongoing scientific replication crisis is the ``file drawer problem,'' i.e. the tendency of researchers to report findings (or of journals to publish them) only if they have a $p$-value less than 0.05. Replication studies typically represent cleaner estimates of the results under study, since they are reported regardless of whether they are statistically significant. This is one of the reasons that replication studies often find much smaller effect size estimates than the original studies: if the original study had gotten a good estimate of the (small) true effect, we wouldn't have heard about it.
We can introduce a toy model for a replicated study where the original study is $X_1 \sim N(\mu, 1)$ and the replication study is $X_2 \sim N(\mu, 1)$, but we only observe the study pair given that $X_1 > c$ for some significance cutoff $c \in \RR$, e.g. $c=1.96$. In other words, the distribution for a study pair conditional on our observing it is
\begin{align*}
p_\mu(x_1,x_2) &= \PP_\mu(X_1=x_1,X_2=x_2 \mid X_1 > c)\\
&= \frac{\phi(x_1-\mu)1\{x_1 > c\}}{1-\Phi(c-\mu)} \phi(x_2-\mu),
\end{align*}
where $ \phi(x) = \frac{1}{\sqrt{2\pi}}e^{-x^2/2}$ is the standard normal pdf and $\Phi(x) = \int_{-\infty}^x \phi(u)\td u$ is the standard normal cdf. We will consider the problem of estimating $\mu$ after observing a study pair.
Arguably, we should only care about the {\em conditional} bias or risk of an estimator, given that we actually get to see the data, since the conditional distribution more accurately describes the set of published results. Thus, all questions below about bias, admissibility, UMVU, etc.\, should be answered in terms of the conditional distribution given that $X_1>c$ (i.e., with densities $p_{\mu}(x_1,x_2)$ above), {\em not} in terms of the marginal distribution (whose densities would be $\phi(x_1-\mu)\phi(x_2-\mu)$.) For example, in part (a) it would not be true to say that $\widebar X$ is marginally biased, but I want you to show it is conditionally biased given that it is observed.
\begin{enumerate}[(a)]
\item Show that $\widebar X = (X_1 + X_2)/2$ is an upwardly biased estimator of $\mu$ (we can call this the {\em naive} estimator since it ignores the selection bias).
\item Show that $X_2$ is unbiased for $\mu$, but it is inadmissible under any strictly convex loss function (we can call this the {\em data splitting} estimator since we ignore $X_1$, which was used for selection, and use the fresh data $X_2$.)
\item Show that the UMVU estimator for $\mu$ is
\[
\delta(\widebar{X}) = \widebar{X} -
\frac{1}{\sqrt{2}}\;\zeta\left(\sqrt{2}(c-\widebar{X})\right),
\]
where
\[
\zeta(x) = \EE_{Z \sim N(0,1)}[Z \mid Z > x] = \frac{\int_x^\infty u\phi(u)\td u}{1-\Phi(x)}.
\]
{\bf Hint:} It may help to note that $X_1+X_2$ is marginally independent of $X_1 - X_2$ (but note they are {\bf not} conditionally independent given $X_1 > c$.)
\item Show that
\[
\lim_{\widebar{X} \to \infty} \delta(\widebar{X}) - \widebar{X} = 0.
\]
In other words, if $\widebar{X} \gg c$, then $\delta(\widebar{X}) \approx \widebar{X}$, the naive estimator.
Can you give any intuition for why this limit makes sense?
\item {\bf Optional:} (not graded, no extra points). Show that
\[
\lim_{\widebar{X} \to -\infty} \delta(\widebar{X}) - \left(X_2 + (X_1-c)\right) = 0,
\]
and furthermore that for any $\ep > 0$, we have
\[
\lim_{\widebar{X} \to -\infty} \PP(X_1 - c > \ep \mid \widebar{X}, X_1 > c) \to 0.
\]
In other words, if $\widebar{X} \ll c$, we have $\delta(\widebar{X}) \approx X_2 + (X_1-c) \approx X_2$, the data splitting estimator. Can you give any intuition for why this limit makes sense?
{\bf Hint:} It may be helpful to use the tail inequality
\[
\left(\frac{1}{x} - \frac{1}{x^3}\right)\phi(x) \leq 1-\Phi(x) \leq \frac{1}{x} \phi(x),
\]
for $x>0$.
\end{enumerate}
{\bf Moral:} This is a nice estimator that transitions adaptively between the data splitting estimator (when $X_1$ is subject to extreme selection bias) and the unadjusted sample mean (when $X_1$ is nearly unaffected by selection bias). It manages to do this even though we don't know how bad the selection bias is, since that depends on $\mu$. It would be difficult to come up with an estimator like this without the theory of exponential families and UMVU estimators, specifically the idea of Rao-Blackwellization.
\item[4. Poisson UMVU estimation]\hfill\\
Let $X_1,\ldots,X_n \simiid \text{Pois}(\theta)$ and consider estimating
\[
g(\theta) = e^{-\theta} = \PP_\theta(X_1 = 0)
\]
\begin{enumerate}[(a)]
\item Find the UMVU estimator for $g(\theta)$ by Rao-Blackwellizing a simple unbiased estimator. You may use without proof the fact that $(X_1,\ldots,X_n) \sim \text{Multinom}(t, (n^{-1},\ldots,n^{-1}) )$ given $\sum_{i=1}^n X_i = t$.
\item Find the UMVU estimator for $g(\theta)$ directly, using the power series method from class.
{\bf Moral:} This problem is for practice deriving UMVU estimators using the two methods from class.
\end{enumerate}
\item[5. Complete sufficient statistic for a nonparametric family]\hfill\\
Consider an i.i.d. sample from the nonparametric family of {\em all} distributions on $\RR$:
\[
X_1,\ldots,X_n \simiid P,
\]
Formally we can write this model as $\cP = \left\{P^n:\; P \text{ is a probability measure on } \RR\right\}$. Let $T(X) = (X_{(1)},\ldots,X_{(n)})$ denote the vector of order statistics.
\begin{enumerate}[(a)]
\item For a finite set of size $m$, $\cY = \{y_1,\ldots,y_m\} \subseteq \RR$, consider the subfamily $\cP_\cY$ of distributions supported on $\cY$:
\[
\cP_\cY = \{P^n:\; P(\cY) = 1\} \subseteq \cP.
\]
Show that $T(X)$ is complete sufficient for this family.
{\bf Hint:} It may help to review different ways to parameterize the multinomial family.
\item Show that the vector of order statistics $T(X) = (X_{(1)},\ldots,X_{(n)})$ is a complete sufficient statistic for $\cP$.
\item Next, consider the restricted subfamily
\[
\cQ_k = \{P^n:\; \EE_P[|X_1|^k] < \infty\} \subseteq \cP,
\]
and define the sample mean and variance respectively as
\[
\widebar X = \frac{1}{n}\sum_{i=1}^n X_i, \quad S^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \widebar X)^2.
\]
Show that $\widebar X$ is the UMVU estimator of $\EE_P X_1$ in $\cQ_1$, and $S^2$ is the UMVU estimator of $\text{Var}_P(X_1)$ in $\cQ_2$.
\item In the original family $\cP$, find the UMVU estimator of the probability
\[
\pi_c = \PP_P(X \leq c).
\]
{\bf Note:} If we come up with estimators for every $c$ we can ``assemble'' them all into an estimator for the CDF of $P$.
\end{enumerate}
{\bf Moral:} Without any restrictions on the family $\cP$, we can't do much better than estimating population quantities with sample quantities (when the sample quantities are unbiased). In the case of the mean, for examples, $\widebar X$ is always available as an unbiased estimator of $\EE X$, but if we impose additional assumptions on the family then we might be able to do better.
\end{description}
\bibliography{biblio}
\bibliographystyle{plainnat}
\end{document}