Machine Learning 3 - Support Vector Machines¶

A SVM classifier builds a set of hyper-planes to try and separate the data by maximizing the distance between the borders and the data points.

SVM

This separation is generally not possible to achieve in the original data space. Therefore, the first step of the SVM is to project the data into a high or infinite dimensions space in which this linear separation can be done. The projection can be done with linear, polynomial, or more comonly "RBF" kernels.

In [1]:

Copied!

from lab_tools import CIFAR10, evaluate_classifier, get_hog_image

dataset = CIFAR10('../../extern_data/CIFAR10/')
from lab_tools import CIFAR10, evaluate_classifier, get_hog_image

dataset = CIFAR10('../../extern_data/CIFAR10/')

Pre-loading training data
Pre-loading test data

Build a simple SVM using the SVC (Support Vector Classfiication) from sklearn. Train it on the CIFAR dataset.

In [2]:

Copied!

from sklearn.svm import SVC
from sklearn.model_selection import StratifiedKFold

# -- Your code here -- #
from sklearn.svm import SVC
from sklearn.model_selection import StratifiedKFold

# -- Your code here -- #

Explore the classifier. How many support vectors are there? What are support vectors?

In [3]:

Copied!

#all_support_vectors = clf.support_vectors_ #Each line = 1 "Support Vector" ; 1024 columns forming a 32x32 image 
#vectors_per_class = clf.n_support_ #Number of "Support Vector" for each class

# -- Your code here -- #
#all_support_vectors = clf.support_vectors_ #Each line = 1 "Support Vector" ; 1024 columns forming a 32x32 image 
#vectors_per_class = clf.n_support_ #Number of "Support Vector" for each class

# -- Your code here -- #

Try to find the best "C" (error penalty) and "gamma" parameters using cross-validation. What influence does "C" have on the number of support vectors?

In [4]:

Copied!

# -- Your code here -- #

# -- Your code here -- #

Comparing algorithms¶

Using the best hyper-parameters that you found for each of the algorithms (kNN, Decision Trees, Random Forests, MLP, SVM):

Re-train the models on the full training set.
Compare their results on the test set.

In [5]:

Copied!

# -- Your code here -- #

# -- Your code here -- #