Maximum Likelihood for Classification

Let’s say we want to classify an input text \(y\) and give it a label \(x\). Formally, we want to find:

\[ \textrm{argmax} P(x | y) \]

By Bayes’ rule this is the same as

\[ \textrm{argmax} \frac{P(y|x)P(y)}{P(x)} \]

Suppose we have five documents as training data and one document as the input as testing data. Our objective is to give a label to the test sentence.

text-example

Credit: Eunsol Choi

Let’s define the probability of class as (\(N\) is the total number of classes)

\[ p(x) = \frac{count(x)}{N} \]

and the probability of a word appearing given a class label (total number of vocabs)

\[ p(w_i|x) = \frac{count(w_i,x) + 1}{count(x) + |V|} \]

The conditional probabilities for \(p(w_i|y)\) is

conditional-probabilities

Now, we want to find out which...