Giovanni's Diary > Programming > Notes > Intro to Machine Learning >

Beyond binary classification

Binary Classification

In binary classification, given:

  • an input space \(X\)
  • an unknown distribution \(D\) over \(X \times \{ -1,+1 \}\)
  • a training set D sampled from \(D\)

Compute: A function \(f\) minimizing \(\mathbb{E}_{(x, y)\sim D}[f(x) \ne y]\) Example: linear classification, classify between two classes

Multi-class classification

In multi-class classification, given:

  • An input space \(X\) and number of classes \(K\)
  • An unknown distribution \(D\) over \(X \times [K]\)
  • A training set D sampled from \(D\)

Compute: A function \(f\) minimizing \(\mathbb{E}_{(x, y)\sim D}[f(x) \ne y]\) Idea: one line does not suffice, but we can combine more lines

There are two approaches to achieve multi-class classification which we will discuss:

  • one vs all
  • all vs all

One vs All (OVA)

For each label \(k=1, ..., K\) define a binary problem where"

  • all examples with label \(k\) are positive
  • all other examples are negative

In practice, learn \(K\) different classification models.

To classify we pick the most confident positive, in none vote positive, pick least confident negative.

OneVsAllTrain(D, BinaryTrain):
  for i=1 to K do
    D_1 = relabel D so class i is positive and \not i is negative
    f_i = BinaryTrain(D_1)
  end for
  return f_1,...,f_k 
OneVsAllTest(f_1,...,f_k, x):
  score = (0,...,0)
  for i=1 to K do
    y = f_1(x)
    score[i] = score[i] + y
  end for
  return max(score)

All vs All (AVA)

All vs All, sometimes called all pairs, It trains \(K(K-1)/2\) classifiers:

  • the classifier \(F_{ij}\) receives all the examples of class \(i\) as positive and all the examples of class \(j\) as negative, for each pair \((i, j)\)
  • every time \(F_{ij}\) predicts positive, the class \(i\) gets a vote, otherwise, class \(j\) gets a vote
  • after running all \(K(K-1)/2\) classifiers, the class with the most votes wins

Note: The teacher might ask to explain the algorithm in more detail

Ova vs Ava

Train time:

  • AVA learns more classifiers, however, they are trained on much smaller data this tends to make it faster if the labels are equally balanced

Test time:

  • AVA has more classifiers, so often is slower

Error:

  • AVA tests with more classifiers and therefore has more changes for errors

Macroaveraging vs Microaveraging

  • Microaveraging: average over examples
  • Macroaveraging: calculate evaluation score (e.g. accuracy) for each label, then average over labels

Travel: Intro to Machine Learning, Index