Maximum Likelihood Estimation

$\newcommand{\THETA}{ \boldsymbol\theta } \newcommand{\x}{ \boldsymbol{x} }$

$\newcommand{\pdata}{ p_{\text{data}}(\x) } \newcommand{\pmodel}[2][;\THETA]{ p_{\text{model}}(#2#1) }$

$\DeclareMathOperator*{\argmax}{ arg\,max }$

There are many ways we can define in statistics good estimators for a model, like the mean or variance. However, instead of relying on authority and tradition to provide such estimators it would be useful to have a principled approach.

Hence, we are going to have a look at the maximum likelihood estimation, which is a very commonly used principle to derive the sought after estimators.

Let’s start by examining $m$ independent samples $\mathbb{X} = \{\x^{(1)}...\x^{(m)}\}$ from a distribution $\pdata$ , where the latter is actually not known. Then, we can write down the model distribution as:

$\begin{equation} p_{\text{model}}: (\x;\THETA) \mapsto \hat{p}\in\mathbb{R} \;\hat{=}\; p_{data}(\mathbf{x}) \end{equation}$

where $\THETA$ is a parameter over the family of probability distributions $\pmodel{\x}$ . Then the maximum likelihood estimator for $\THETA$ is defined as:

$\begin{align} \THETA_{\text{ML}} &= \argmax_{\THETA} \pmodel{\mathbb{X}} \\ &= \argmax_{\THETA} \prod_{i=1}^{m} \pmodel{\x^{(i)}} \end{align}$

Please note, that working with the product $\prod$ is for a variety of reasons rather inconvenient. Hence, let’s transform it into a sum $\sum$ by taking the logarithm:

$\begin{equation} \THETA_{\text{ML}} = \argmax_{\THETA} \sum_{i=1}^{m} \log\pmodel{\x^{(i)}} \end{equation}$

This works because applying the logarithm does not change the $\argmax$ for $\THETA$ . We can continue transforming by dividing by $m$ , which also has no effect on the $\argmax$ for $\THETA$ :

$\begin{equation} \THETA_{\text{ML}} = \argmax_{\THETA} \frac{1}{m} \sum_{i=1}^{m} \log\pmodel{\x^{(i)}} \end{equation}$

which we can then express as an expectation with respect to the empirical distribution $\hat{p}_\text{data}$ :

$\begin{equation} \THETA_{\text{ML}} = \argmax_{\THETA} \mathbb{E}_{{\x} \sim {\hat{p}_\text{data}}} \log\pmodel{\x^{(i)}} \end{equation}$

Summarized: So, by maximizing the expectation $\mathbb{E}$ over the logarithm of our model distribution $p_\text{model}$ for the given samples ${\x} \sim {\hat{p}_\text{data}}$ and with respect to the parameter $\THETA$ , we tend to end up with good estimators.

Kârūn The Rich — Science and Technology

Friday, November 30, 2018

Maximum Likelihood Estimation

1 comment:

About Me

Blog Archive