Processing math: 100%

Friday, November 30, 2018

Maximum Likelihood Estimation

There are many ways we can define in statistics good estimators for a model, like the mean or variance. However, instead of relying on authority and tradition to provide such estimators it would be useful to have a principled approach.

Hence, we are going to have a look at the maximum likelihood estimation, which is a very commonly used principle to derive the sought after estimators.

Let’s start by examining m independent samples X={x(1)...x(m)} from a distribution pdata(x), where the latter is actually not known. Then, we can write down the model distribution as:

pmodel:(x;θ)ˆpRˆ=pdata(x)

where θ is a parameter over the family of probability distributions pmodel(x;θ). Then the maximum likelihood estimator for θ is defined as:

θML=argmaxθpmodel(X;θ)=argmaxθmi=1pmodel(x(i);θ)

Please note, that working with the product is for a variety of reasons rather inconvenient. Hence, let’s transform it into a sum by taking the logarithm:

θML=argmaxθmi=1logpmodel(x(i);θ)

This works because applying the logarithm does not change the argmax for θ. We can continue transforming by dividing by m, which also has no effect on the argmax for θ:

θML=argmaxθ1mmi=1logpmodel(x(i);θ)

which we can then express as an expectation with respect to the empirical distribution ˆpdata:

θML=argmaxθExˆpdatalogpmodel(x(i);θ)

Summarized: So, by maximizing the expectation E over the logarithm of our model distribution pmodel for the given samples xˆpdata and with respect to the parameter θ, we tend to end up with good estimators.

1 comment:

  1. This presentation largely follows Goodfellow et alia (2016). Deep Learning. USA: MIT. 128.

    ReplyDelete