Machine Learning 1 - Nearest Neighbors and Decision Trees¶

Lab objectives¶

Classification with decision trees and random forests.
Cross-validation and evaluation.

In [1]:

Copied!

from lab_tools import CIFAR10, get_hog_image

dataset = CIFAR10('../../extern_data/CIFAR10/')
from lab_tools import CIFAR10, get_hog_image

dataset = CIFAR10('../../extern_data/CIFAR10/')

Pre-loading training data
Pre-loading test data

1. Nearest Neighbor¶

The following example uses the Nearest Neighbor algorithm on the Histogram of Gradient decriptors in the dataset.

In [2]:

Copied!

from sklearn.neighbors import KNeighborsClassifier

clf = KNeighborsClassifier(n_neighbors=1)
clf.fit( dataset.train['hog'], dataset.train['labels'] )
from sklearn.neighbors import KNeighborsClassifier

clf = KNeighborsClassifier(n_neighbors=1)
clf.fit( dataset.train['hog'], dataset.train['labels'] )

Out[2]:

KNeighborsClassifier(n_neighbors=1)

What is the descriptive performance of this classifier ?
Modify the code to estimate the predictive performance.
Use cross-validation to find the best hyper-parameters for this method.

In [3]:

Copied!

# -- Your code here -- #

# -- Your code here -- #

2. Decision Trees¶

Decision Trees classify the data by splitting the feature space according to simple, single-feature rules. Scikit-learn uses the CART algorithm for its implementation of the classifier.

Create a simple Decision Tree classifier using scikit-learn and train it on the HoG training set.
Use cross-validation to find the best hyper-paramters for this method.

In [4]:

Copied!

from sklearn import tree

# --- Your code here --- #
from sklearn import tree

# --- Your code here --- #

3. Random Forests¶

Random Forest classifiers use multiple decision trees trained on "weaker" datasets (less data and/or less features), averaging the results so as to reduce over-fitting.

Use scikit-learn to create a Random Forest classifier on the CIFAR data.
Use cross-validation to find the best hyper-paramters for this method.

In [5]:

Copied!

from sklearn import ensemble

# --- Your code here --- #
from sklearn import ensemble

# --- Your code here --- #