Giovanni's Diary > Programming > Notes > Intro to Machine Learning >

Linear Models

Some machine learning approaches make strong assumptions about the data. If the assumptions hold, this can lead to better performance. Otherwise, the model would fail. Other approaches don't make any assumptions about the data, so they are very general but are prone to overfitting, for example K-NN.

Bias

The bias of a model is how strong the model assumptions are

  • low-bias classifiers make minimal assumptions about the data
  • high-bias classifiers make strong assumptions about the data

Linear Model (Perceptron)

A strong high-bias assumption is linear separability

  • the classes can be separated by a line in 2 dimensions
  • in higher dimensions using hyperplanes

Any hyperplane defines a partition of the space.

Any pair of values \((w_1, w_2)\) defined a line through the origin (using the Cartesian equation of a plane):

\[0=w_1f_1+w_2f_2+b\]

  • \(b\) is called the bias

In n-dimensions, a hyperplane: \[0=b+\sum_{i=1}^{n}w_if_i\] We can classify a linear model by checking the sign.

How to train a linear model

Differently from k-nn, linear models use online learning. Online learning are useful in:

  • data streams
  • large-scale dataset
  • privacy-preserving applications

To learn a linear model:

  • the algorithm receives an unlabeled example \(x_i\)
    • \(w(1, 0),\ p(-1,1)\)
  • the algorithm predicts a classification of this example
    • \(1*(-1)+0*1 = -1\)
  • the algorithm is the told the correct answer \(y_i\) and update the model
    • idea: when a solution is wrong, to choose which weight to change, we can see what are the weights that contribute more to the wrong answer, and change them
repeat until convergence (or some # of iterations):
  for each random training example (f1, f2, ..., fn, label):
    check if it is correct based on the current model
    if not correct, update all the weights:
      for each wi:
        wi = wi + fi*label
      b = b + label

The algorithm will converge only if the data can be linearly separated.


Travel: Intro to Machine Learning, Index