Ensemble Learning - Decision Trees, Boosting (AdaBoost), Bagging (Random Forests)

Avisek Gupta, Senior Research Fellow, ECSU

Dr. Swagatam Das, Associate Professor, ECSU

A Short Course on Machine Learning for Practitioners

Organized by Centre for Artificial Intelligence and Machine Learning

Indian Statistical Institute, Kolkata.

November 22, 2019

1. Training Decision Trees:

(i) Generating random data

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

X = np.vstack((
    np.random.normal(loc=[0,0], scale=1, size=(100,2)),
    np.random.normal(loc=[10,0], scale=1, size=(100,2)),
    np.random.normal(loc=[5,6], scale=1, size=(100,2))
))
y = np.hstack((
    np.zeros((100)),
    np.zeros((100)) + 1,
    np.zeros((100)) + 2
))
n_classes = 3

plt.figure(dpi=200)
for j in range(n_classes):
    plt.scatter(X[y==j,0], X[y==j,1], marker='x')
plt.show()

(ii) Decision Tree Classification

import numpy as np
import matplotlib.pyplot as plt

from sklearn.tree import DecisionTreeClassifier

# Train a Decision Tree Classifier
clf = DecisionTreeClassifier().fit(X, y)

y_pred = clf.predict(X)
from sklearn.metrics import accuracy_score
print('Training Accuracy =', accuracy_score(y, y_pred))

# Plot the decision surface
plt.figure(dpi=200)
plot_colors = "ryb"
plot_step = 0.02
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step),
    np.arange(y_min, y_max, plot_step))
plt.tight_layout(h_pad=0.5, w_pad=0.5, pad=2.5)
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
cs = plt.contourf(xx, yy, Z, cmap=plt.cm.RdYlBu)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
for i, color in zip(range(n_classes), plot_colors):
    idx = np.where(y == i)
    plt.scatter(X[idx, 0], X[idx, 1], c=color, 
        cmap=plt.cm.RdYlBu, edgecolor='black', s=15)
plt.title("Decision surface of a decision tree")
plt.axis("tight")
plt.show()

Training Accuracy 1.0

help(DecisionTreeClassifier)

Help on class DecisionTreeClassifier in module sklearn.tree.tree:

class DecisionTreeClassifier(BaseDecisionTree, sklearn.base.ClassifierMixin)
 |  DecisionTreeClassifier(criterion='gini', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, class_weight=None, presort=False)
 |  
 |  A decision tree classifier.
 |  
 |  Read more in the :ref:`User Guide <tree>`.
 |  
 |  Parameters
 |  ----------
 |  criterion : string, optional (default="gini")
 |      The function to measure the quality of a split. Supported criteria are
 |      "gini" for the Gini impurity and "entropy" for the information gain.
 |  
 |  splitter : string, optional (default="best")
 |      The strategy used to choose the split at each node. Supported
 |      strategies are "best" to choose the best split and "random" to choose
 |      the best random split.
 |  
 |  max_depth : int or None, optional (default=None)
 |      The maximum depth of the tree. If None, then nodes are expanded until
 |      all leaves are pure or until all leaves contain less than
 |      min_samples_split samples.
 |  
 |  min_samples_split : int, float, optional (default=2)
 |      The minimum number of samples required to split an internal node:
 |  
 |      - If int, then consider `min_samples_split` as the minimum number.
 |      - If float, then `min_samples_split` is a fraction and
 |        `ceil(min_samples_split * n_samples)` are the minimum
 |        number of samples for each split.
 |  
 |      .. versionchanged:: 0.18
 |         Added float values for fractions.
 |  
 |  min_samples_leaf : int, float, optional (default=1)
 |      The minimum number of samples required to be at a leaf node.
 |      A split point at any depth will only be considered if it leaves at
 |      least ``min_samples_leaf`` training samples in each of the left and
 |      right branches.  This may have the effect of smoothing the model,
 |      especially in regression.
 |  
 |      - If int, then consider `min_samples_leaf` as the minimum number.
 |      - If float, then `min_samples_leaf` is a fraction and
 |        `ceil(min_samples_leaf * n_samples)` are the minimum
 |        number of samples for each node.
 |  
 |      .. versionchanged:: 0.18
 |         Added float values for fractions.
 |  
 |  min_weight_fraction_leaf : float, optional (default=0.)
 |      The minimum weighted fraction of the sum total of weights (of all
 |      the input samples) required to be at a leaf node. Samples have
 |      equal weight when sample_weight is not provided.
 |  
 |  max_features : int, float, string or None, optional (default=None)
 |      The number of features to consider when looking for the best split:
 |  
 |          - If int, then consider `max_features` features at each split.
 |          - If float, then `max_features` is a fraction and
 |            `int(max_features * n_features)` features are considered at each
 |            split.
 |          - If "auto", then `max_features=sqrt(n_features)`.
 |          - If "sqrt", then `max_features=sqrt(n_features)`.
 |          - If "log2", then `max_features=log2(n_features)`.
 |          - If None, then `max_features=n_features`.
 |  
 |      Note: the search for a split does not stop until at least one
 |      valid partition of the node samples is found, even if it requires to
 |      effectively inspect more than ``max_features`` features.
 |  
 |  random_state : int, RandomState instance or None, optional (default=None)
 |      If int, random_state is the seed used by the random number generator;
 |      If RandomState instance, random_state is the random number generator;
 |      If None, the random number generator is the RandomState instance used
 |      by `np.random`.
 |  
 |  max_leaf_nodes : int or None, optional (default=None)
 |      Grow a tree with ``max_leaf_nodes`` in best-first fashion.
 |      Best nodes are defined as relative reduction in impurity.
 |      If None then unlimited number of leaf nodes.
 |  
 |  min_impurity_decrease : float, optional (default=0.)
 |      A node will be split if this split induces a decrease of the impurity
 |      greater than or equal to this value.
 |  
 |      The weighted impurity decrease equation is the following::
 |  
 |          N_t / N * (impurity - N_t_R / N_t * right_impurity
 |                              - N_t_L / N_t * left_impurity)
 |  
 |      where ``N`` is the total number of samples, ``N_t`` is the number of
 |      samples at the current node, ``N_t_L`` is the number of samples in the
 |      left child, and ``N_t_R`` is the number of samples in the right child.
 |  
 |      ``N``, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum,
 |      if ``sample_weight`` is passed.
 |  
 |      .. versionadded:: 0.19
 |  
 |  min_impurity_split : float, (default=1e-7)
 |      Threshold for early stopping in tree growth. A node will split
 |      if its impurity is above the threshold, otherwise it is a leaf.
 |  
 |      .. deprecated:: 0.19
 |         ``min_impurity_split`` has been deprecated in favor of
 |         ``min_impurity_decrease`` in 0.19. The default value of
 |         ``min_impurity_split`` will change from 1e-7 to 0 in 0.23 and it
 |         will be removed in 0.25. Use ``min_impurity_decrease`` instead.
 |  
 |  class_weight : dict, list of dicts, "balanced" or None, default=None
 |      Weights associated with classes in the form ``{class_label: weight}``.
 |      If not given, all classes are supposed to have weight one. For
 |      multi-output problems, a list of dicts can be provided in the same
 |      order as the columns of y.
 |  
 |      Note that for multioutput (including multilabel) weights should be
 |      defined for each class of every column in its own dict. For example,
 |      for four-class multilabel classification weights should be
 |      [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of
 |      [{1:1}, {2:5}, {3:1}, {4:1}].
 |  
 |      The "balanced" mode uses the values of y to automatically adjust
 |      weights inversely proportional to class frequencies in the input data
 |      as ``n_samples / (n_classes * np.bincount(y))``
 |  
 |      For multi-output, the weights of each column of y will be multiplied.
 |  
 |      Note that these weights will be multiplied with sample_weight (passed
 |      through the fit method) if sample_weight is specified.
 |  
 |  presort : bool, optional (default=False)
 |      Whether to presort the data to speed up the finding of best splits in
 |      fitting. For the default settings of a decision tree on large
 |      datasets, setting this to true may slow down the training process.
 |      When using either a smaller dataset or a restricted depth, this may
 |      speed up the training.
 |  
 |  Attributes
 |  ----------
 |  classes_ : array of shape = [n_classes] or a list of such arrays
 |      The classes labels (single output problem),
 |      or a list of arrays of class labels (multi-output problem).
 |  
 |  feature_importances_ : array of shape = [n_features]
 |      The feature importances. The higher, the more important the
 |      feature. The importance of a feature is computed as the (normalized)
 |      total reduction of the criterion brought by that feature.  It is also
 |      known as the Gini importance [4]_.
 |  
 |  max_features_ : int,
 |      The inferred value of max_features.
 |  
 |  n_classes_ : int or list
 |      The number of classes (for single output problems),
 |      or a list containing the number of classes for each
 |      output (for multi-output problems).
 |  
 |  n_features_ : int
 |      The number of features when ``fit`` is performed.
 |  
 |  n_outputs_ : int
 |      The number of outputs when ``fit`` is performed.
 |  
 |  tree_ : Tree object
 |      The underlying Tree object. Please refer to
 |      ``help(sklearn.tree._tree.Tree)`` for attributes of Tree object and
 |      :ref:`sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py`
 |      for basic usage of these attributes.
 |  
 |  Notes
 |  -----
 |  The default values for the parameters controlling the size of the trees
 |  (e.g. ``max_depth``, ``min_samples_leaf``, etc.) lead to fully grown and
 |  unpruned trees which can potentially be very large on some data sets. To
 |  reduce memory consumption, the complexity and size of the trees should be
 |  controlled by setting those parameter values.
 |  
 |  The features are always randomly permuted at each split. Therefore,
 |  the best found split may vary, even with the same training data and
 |  ``max_features=n_features``, if the improvement of the criterion is
 |  identical for several splits enumerated during the search of the best
 |  split. To obtain a deterministic behaviour during fitting,
 |  ``random_state`` has to be fixed.
 |  
 |  See also
 |  --------
 |  DecisionTreeRegressor
 |  
 |  References
 |  ----------
 |  
 |  .. [1] https://en.wikipedia.org/wiki/Decision_tree_learning
 |  
 |  .. [2] L. Breiman, J. Friedman, R. Olshen, and C. Stone, "Classification
 |         and Regression Trees", Wadsworth, Belmont, CA, 1984.
 |  
 |  .. [3] T. Hastie, R. Tibshirani and J. Friedman. "Elements of Statistical
 |         Learning", Springer, 2009.
 |  
 |  .. [4] L. Breiman, and A. Cutler, "Random Forests",
 |         https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm
 |  
 |  Examples
 |  --------
 |  >>> from sklearn.datasets import load_iris
 |  >>> from sklearn.model_selection import cross_val_score
 |  >>> from sklearn.tree import DecisionTreeClassifier
 |  >>> clf = DecisionTreeClassifier(random_state=0)
 |  >>> iris = load_iris()
 |  >>> cross_val_score(clf, iris.data, iris.target, cv=10)
 |  ...                             # doctest: +SKIP
 |  ...
 |  array([ 1.     ,  0.93...,  0.86...,  0.93...,  0.93...,
 |          0.93...,  0.93...,  1.     ,  0.93...,  1.      ])
 |  
 |  Method resolution order:
 |      DecisionTreeClassifier
 |      BaseDecisionTree
 |      sklearn.base.BaseEstimator
 |      sklearn.base.MultiOutputMixin
 |      sklearn.base.ClassifierMixin
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __init__(self, criterion='gini', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, class_weight=None, presort=False)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  fit(self, X, y, sample_weight=None, check_input=True, X_idx_sorted=None)
 |      Build a decision tree classifier from the training set (X, y).
 |      
 |      Parameters
 |      ----------
 |      X : array-like or sparse matrix, shape = [n_samples, n_features]
 |          The training input samples. Internally, it will be converted to
 |          ``dtype=np.float32`` and if a sparse matrix is provided
 |          to a sparse ``csc_matrix``.
 |      
 |      y : array-like, shape = [n_samples] or [n_samples, n_outputs]
 |          The target values (class labels) as integers or strings.
 |      
 |      sample_weight : array-like, shape = [n_samples] or None
 |          Sample weights. If None, then samples are equally weighted. Splits
 |          that would create child nodes with net zero or negative weight are
 |          ignored while searching for a split in each node. Splits are also
 |          ignored if they would result in any single class carrying a
 |          negative weight in either child node.
 |      
 |      check_input : boolean, (default=True)
 |          Allow to bypass several input checking.
 |          Don't use this parameter unless you know what you do.
 |      
 |      X_idx_sorted : array-like, shape = [n_samples, n_features], optional
 |          The indexes of the sorted training input samples. If many tree
 |          are grown on the same dataset, this allows the ordering to be
 |          cached between trees. If None, the data will be sorted here.
 |          Don't use this parameter unless you know what to do.
 |      
 |      Returns
 |      -------
 |      self : object
 |  
 |  predict_log_proba(self, X)
 |      Predict class log-probabilities of the input samples X.
 |      
 |      Parameters
 |      ----------
 |      X : array-like or sparse matrix of shape = [n_samples, n_features]
 |          The input samples. Internally, it will be converted to
 |          ``dtype=np.float32`` and if a sparse matrix is provided
 |          to a sparse ``csr_matrix``.
 |      
 |      Returns
 |      -------
 |      p : array of shape = [n_samples, n_classes], or a list of n_outputs
 |          such arrays if n_outputs > 1.
 |          The class log-probabilities of the input samples. The order of the
 |          classes corresponds to that in the attribute `classes_`.
 |  
 |  predict_proba(self, X, check_input=True)
 |      Predict class probabilities of the input samples X.
 |      
 |      The predicted class probability is the fraction of samples of the same
 |      class in a leaf.
 |      
 |      check_input : boolean, (default=True)
 |          Allow to bypass several input checking.
 |          Don't use this parameter unless you know what you do.
 |      
 |      Parameters
 |      ----------
 |      X : array-like or sparse matrix of shape = [n_samples, n_features]
 |          The input samples. Internally, it will be converted to
 |          ``dtype=np.float32`` and if a sparse matrix is provided
 |          to a sparse ``csr_matrix``.
 |      
 |      check_input : bool
 |          Run check_array on X.
 |      
 |      Returns
 |      -------
 |      p : array of shape = [n_samples, n_classes], or a list of n_outputs
 |          such arrays if n_outputs > 1.
 |          The class probabilities of the input samples. The order of the
 |          classes corresponds to that in the attribute `classes_`.
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __abstractmethods__ = frozenset()
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from BaseDecisionTree:
 |  
 |  apply(self, X, check_input=True)
 |      Returns the index of the leaf that each sample is predicted as.
 |      
 |      .. versionadded:: 0.17
 |      
 |      Parameters
 |      ----------
 |      X : array_like or sparse matrix, shape = [n_samples, n_features]
 |          The input samples. Internally, it will be converted to
 |          ``dtype=np.float32`` and if a sparse matrix is provided
 |          to a sparse ``csr_matrix``.
 |      
 |      check_input : boolean, (default=True)
 |          Allow to bypass several input checking.
 |          Don't use this parameter unless you know what you do.
 |      
 |      Returns
 |      -------
 |      X_leaves : array_like, shape = [n_samples,]
 |          For each datapoint x in X, return the index of the leaf x
 |          ends up in. Leaves are numbered within
 |          ``[0; self.tree_.node_count)``, possibly with gaps in the
 |          numbering.
 |  
 |  decision_path(self, X, check_input=True)
 |      Return the decision path in the tree
 |      
 |      .. versionadded:: 0.18
 |      
 |      Parameters
 |      ----------
 |      X : array_like or sparse matrix, shape = [n_samples, n_features]
 |          The input samples. Internally, it will be converted to
 |          ``dtype=np.float32`` and if a sparse matrix is provided
 |          to a sparse ``csr_matrix``.
 |      
 |      check_input : boolean, (default=True)
 |          Allow to bypass several input checking.
 |          Don't use this parameter unless you know what you do.
 |      
 |      Returns
 |      -------
 |      indicator : sparse csr array, shape = [n_samples, n_nodes]
 |          Return a node indicator matrix where non zero elements
 |          indicates that the samples goes through the nodes.
 |  
 |  get_depth(self)
 |      Returns the depth of the decision tree.
 |      
 |      The depth of a tree is the maximum distance between the root
 |      and any leaf.
 |  
 |  get_n_leaves(self)
 |      Returns the number of leaves of the decision tree.
 |  
 |  predict(self, X, check_input=True)
 |      Predict class or regression value for X.
 |      
 |      For a classification model, the predicted class for each sample in X is
 |      returned. For a regression model, the predicted value based on X is
 |      returned.
 |      
 |      Parameters
 |      ----------
 |      X : array-like or sparse matrix of shape = [n_samples, n_features]
 |          The input samples. Internally, it will be converted to
 |          ``dtype=np.float32`` and if a sparse matrix is provided
 |          to a sparse ``csr_matrix``.
 |      
 |      check_input : boolean, (default=True)
 |          Allow to bypass several input checking.
 |          Don't use this parameter unless you know what you do.
 |      
 |      Returns
 |      -------
 |      y : array of shape = [n_samples] or [n_samples, n_outputs]
 |          The predicted classes, or the predict values.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from BaseDecisionTree:
 |  
 |  feature_importances_
 |      Return the feature importances.
 |      
 |      The importance of a feature is computed as the (normalized) total
 |      reduction of the criterion brought by that feature.
 |      It is also known as the Gini importance.
 |      
 |      Returns
 |      -------
 |      feature_importances_ : array, shape = [n_features]
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from sklearn.base.BaseEstimator:
 |  
 |  __getstate__(self)
 |  
 |  __repr__(self, N_CHAR_MAX=700)
 |      Return repr(self).
 |  
 |  __setstate__(self, state)
 |  
 |  get_params(self, deep=True)
 |      Get parameters for this estimator.
 |      
 |      Parameters
 |      ----------
 |      deep : boolean, optional
 |          If True, will return the parameters for this estimator and
 |          contained subobjects that are estimators.
 |      
 |      Returns
 |      -------
 |      params : mapping of string to any
 |          Parameter names mapped to their values.
 |  
 |  set_params(self, **params)
 |      Set the parameters of this estimator.
 |      
 |      The method works on simple estimators as well as on nested objects
 |      (such as pipelines). The latter have parameters of the form
 |      ``<component>__<parameter>`` so that it's possible to update each
 |      component of a nested object.
 |      
 |      Returns
 |      -------
 |      self
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from sklearn.base.BaseEstimator:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from sklearn.base.ClassifierMixin:
 |  
 |  score(self, X, y, sample_weight=None)
 |      Returns the mean accuracy on the given test data and labels.
 |      
 |      In multi-label classification, this is the subset accuracy
 |      which is a harsh metric since you require for each sample that
 |      each label set be correctly predicted.
 |      
 |      Parameters
 |      ----------
 |      X : array-like, shape = (n_samples, n_features)
 |          Test samples.
 |      
 |      y : array-like, shape = (n_samples) or (n_samples, n_outputs)
 |          True labels for X.
 |      
 |      sample_weight : array-like, shape = [n_samples], optional
 |          Sample weights.
 |      
 |      Returns
 |      -------
 |      score : float
 |          Mean accuracy of self.predict(X) wrt. y.

from sklearn.datasets import load_iris

X = load_iris().data
print(X.shape)

y = load_iris().target
n_classes = len(np.unique(y))
print('#Classes =', n_classes)

(150, 4)
#Classes = 3

import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

n_classes = 3
plot_colors = "ryb"
plot_step = 0.02
plt.figure(dpi=200)
for pairidx, pair in enumerate([[0, 1], [0, 2], [0, 3],
                                [1, 2], [1, 3], [2, 3]]):
    # Only take two features
    X = load_iris().data[:, pair]
    y = load_iris().target

    # Train
    clf = DecisionTreeClassifier().fit(X, y)

    # Plot the decision boundary
    plt.subplot(2, 3, pairidx + 1)
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step),
                         np.arange(y_min, y_max, plot_step))
    plt.tight_layout(h_pad=0.5, w_pad=0.5, pad=2.5)
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    cs = plt.contourf(xx, yy, Z, cmap=plt.cm.RdYlBu)
    plt.xlabel(iris.feature_names[pair[0]])
    plt.ylabel(iris.feature_names[pair[1]])
    # Plot the training points
    for i, color in zip(range(n_classes), plot_colors):
        idx = np.where(y == i)
        plt.scatter(X[idx, 0], X[idx, 1], c=color, label=iris.target_names[i],
                    cmap=plt.cm.RdYlBu, edgecolor='black', s=15)
plt.suptitle("Decision surface of a decision tree using paired features")
plt.legend(borderpad=0, handletextpad=0, bbox_to_anchor=(1.05, 1.05))
plt.axis("tight")
plt.show()

2. Decision Trees are interpretable classification models

import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, plot_tree

# Load data
iris = load_iris()

plt.figure(dpi=200)
clf = DecisionTreeClassifier().fit(iris.data, iris.target)
plot_tree(clf, feature_names=load_iris().feature_names, filled=True)
plt.show()

3. Application of interpretable classification models: Medical Diagnosis

import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier, plot_tree


from sklearn.datasets import load_breast_cancer
X = load_breast_cancer().data
y = load_breast_cancer().target


print('Dsta size and dimensions =', X.shape)
print('Number of clusters =', len(np.unique(y)))


plt.figure(dpi=800)
clf = DecisionTreeClassifier().fit(X, y)
plot_tree(clf, feature_names=load_breast_cancer().feature_names, filled=True)
plt.show()

Dsta size and dimensions = (569, 30)
Number of clusters = 2

4. Boosting: Training the AdaBoost Classifier

import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets

X, y = datasets.make_hastie_10_2(n_samples=12000, random_state=1)
n_classes = len(np.unique(y))

print('data shape =', X.shape)
print('#Classes =', n_classes)

data shape = (12000, 10)
#Classes = 2

import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import zero_one_loss
from sklearn.ensemble import AdaBoostClassifier


X, y = datasets.make_hastie_10_2(n_samples=12000, random_state=1)
X_test, y_test = X[2000:], y[2000:]
X_train, y_train = X[:2000], y[:2000]

dt_stump = DecisionTreeClassifier(max_depth=1)
dt_stump.fit(X_train, y_train)
dt_stump_err = 1.0 - dt_stump.score(X_test, y_test)

dt = DecisionTreeClassifier(max_depth=9)
dt.fit(X_train, y_train)
dt_err = 1.0 - dt.score(X_test, y_test)

n_estimators = 400
learning_rate = 1.

ada_real = AdaBoostClassifier(
    base_estimator=dt_stump,
    learning_rate=learning_rate,
    n_estimators=n_estimators,
)
ada_real.fit(X_train, y_train)

fig = plt.figure(dpi=200)
ax = fig.add_subplot(111)
ax.plot([1, n_estimators], [dt_stump_err] * 2, 'b-',
        label='Decision Stump Error')
ax.plot([1, n_estimators], [dt_err] * 2, 'b--',
        label='Decision Tree Error')
ada_real_err = np.zeros((n_estimators,))
for i, y_pred in enumerate(ada_real.staged_predict(X_test)):
    ada_real_err[i] = zero_one_loss(y_pred, y_test)
ada_real_err_train = np.zeros((n_estimators,))
for i, y_pred in enumerate(ada_real.staged_predict(X_train)):
    ada_real_err_train[i] = zero_one_loss(y_pred, y_train)
ax.plot(np.arange(n_estimators) + 1, ada_real_err,
        label='AdaBoost Test Error',
        color='orange')
ax.plot(np.arange(n_estimators) + 1, ada_real_err_train,
        label='AdaBoost Train Error',
        color='green')
ax.set_ylim((0.0, 0.5))
ax.set_xlabel('n_estimators')
ax.set_ylabel('error rate')
leg = ax.legend(loc='upper right', fancybox=True)
leg.get_frame().set_alpha(0.7)
plt.show()

5. Bagging: Training Random Forest of Decision Trees

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.tree import plot_tree
from sklearn.metrics import accuracy_score

from sklearn.ensemble import RandomForestClassifier

# Load data
X = load_iris().data
y = load_iris().target

clf = RandomForestClassifier(n_estimators=10).fit(X, y)

print('Training Accuracy =', accuracy_score(y, clf.predict(X)))

i = 1
for dtree in clf.estimators_:
    print('Tree #'+str(i),':')
    plt.figure(dpi=200)
    plot_tree(dtree, feature_names=load_iris().feature_names,  filled=True)
    i = i + 1
    plt.show()

Training Accuracy = 1.0
Tree #1 :

Tree #2 :

Tree #3 :

Tree #4 :

Tree #5 :

Tree #6 :

Tree #7 :

Tree #8 :

Tree #9 :

Tree #10 :

6. Competing on the Digits data set:

(i) The Digits data set

import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import load_digits

X = load_digits().data
y = load_digits().target

# Randomly select 10 images to be shown
rnd_idx = np.random.randint(0, X.shape[0], 10)

fig, ax = plt.subplots(2,5, dpi=150)
for i in range(2):
    for j in range(5):
        ax[i,j].imshow(X[rnd_idx[i*5+j]].reshape(8,8), cmap='gray')
        ax[i,j].set_xticks([],[])
        ax[i,j].set_yticks([],[])
        ax[i,j].set_title('Digit '+str(y[rnd_idx[i*5+j]]))
plt.show()

(ii) Visualizing the data set with TSNE

from sklearn.datasets import load_digits

X = load_digits().data
y = load_digits().target
n_classes = len(np.unique(y))

from sklearn.manifold import TSNE

projX = TSNE(n_components=2).fit_transform(X)

plt.figure(dpi=200)
for j in range(n_classes):
    plt.scatter(projX[y==j,0], projX[y==j,1], marker='x')
plt.title('TSNE projection of the digits data set')
plt.show()

(iii) Decision Trees vs. AdaBoost vs. Random Forest on the Digits data set

import time
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

X = load_digits().data
y = load_digits().target
n_classes = len(np.unique(y))

X_train, X_test, y_train, y_test = train_test_split(X, y)

results = []
train_time = []
test_time = []

clf1 = DecisionTreeClassifier()
start = time.time()
clf1.fit(X_train, y_train)
train_time.append(time.time() - start)
time.time()
y_pred1 = clf1.predict(X_test)
test_time.append(time.time() - start)
results.append(accuracy_score(y_test, y_pred1))

clf2 = AdaBoostClassifier(
    base_estimator=DecisionTreeClassifier(max_depth=4),
    n_estimators=30,
)
start = time.time()
clf2.fit(X_train, y_train)
train_time.append(time.time() - start)
time.time()
y_pred2 = clf2.predict(X_test)
test_time.append(time.time() - start)
results.append(accuracy_score(y_test, y_pred2))

clf3 = RandomForestClassifier(n_estimators=30)
start = time.time()
clf3.fit(X_train, y_train)
train_time.append(time.time() - start)
time.time()
y_pred3 = clf3.predict(X_test)
test_time.append(time.time() - start)
results.append(accuracy_score(y_test, y_pred3))

train_time = np.array(train_time)
train_time = train_time / train_time.max()
test_time = np.array(test_time)
test_time = test_time / test_time.max()

indices = np.arange(len(results))
plt.figure(dpi=200)
plt.title("Score")
plt.barh(indices, results, .2, label="results", color='navy')
plt.barh(indices + .3, train_time, .2, label="train time",
         color='c')
plt.barh(indices + .6, test_time, .2, label="test time", color='darkorange')
plt.yticks(())
plt.legend(loc='best')
plt.subplots_adjust(left=.25)
plt.subplots_adjust(top=.95)
plt.subplots_adjust(bottom=.05)

clf_names = ['Decision Tree', 'AdaBoost', 'Random Forest']
for i, c in zip(indices, clf_names):
    plt.text(-.3, i, c)

plt.show()

test_idx = np.random.randint(0, X_test.shape[0], 10)

y_pred1 = clf1.predict(X_test[test_idx])
y_pred2 = clf2.predict(X_test[test_idx])
y_pred3 = clf3.predict(X_test[test_idx])

fig, ax = plt.subplots(2,5, dpi=150)
for i in range(2):
    for j in range(5):
        ax[i,j].imshow(X_test[test_idx[i*5+j]].reshape(8,8), cmap='gray')
        ax[i,j].set_xticks([],[])
        ax[i,j].set_yticks([],[])
        ax[i,j].set_xlabel('Digit '+str(y_test[test_idx[i*5+j]])+'\n'
            +'DT pred: '+str(y_pred1[i*5+j])+'\n'
            +'AdB pred: '+str(y_pred1[i*5+j])+'\n'
            +'RF pred: '+str(y_pred1[i*5+j]))
plt.show()

Ensemble Learning - Decision Trees, Boosting (AdaBoost), Bagging (Random Forests)

Avisek Gupta, Senior Research Fellow, ECSU

Dr. Swagatam Das, Associate Professor, ECSU

A Short Course on Machine Learning for Practitioners

Organized by Centre for Artificial Intelligence and Machine Learning

Indian Statistical Institute, Kolkata.

November 22, 2019

1. Training Decision Trees:

(i) Generating random data

(ii) Decision Tree Classification

2. Decision Trees are interpretable classification models

3. Application of interpretable classification models: Medical Diagnosis

4. Boosting: Training the AdaBoost Classifier

5. Bagging: Training Random Forest of Decision Trees

6. Competing on the Digits data set:

(i) The Digits data set

(ii) Visualizing the data set with TSNE

(iii) Decision Trees vs. AdaBoost vs. Random Forest on the Digits data set

References:

1. The scikit-learn documentation: https://scikit-learn.org/stable/user_guide.html

For Queries: avisek003@gmail.com (Avisek Gupta)

Ensemble Learning - Decision Trees, Boosting (AdaBoost), Bagging (Random Forests)

Avisek Gupta, Senior Research Fellow, ECSU Dr. Swagatam Das, Associate Professor, ECSU

A Short Course on Machine Learning for Practitioners Organized by Centre for Artificial Intelligence and Machine Learning Indian Statistical Institute, Kolkata.

November 22, 2019

1. Training Decision Trees:

(i) Generating random data

(ii) Decision Tree Classification

2. Decision Trees are interpretable classification models

3. Application of interpretable classification models: Medical Diagnosis

4. Boosting: Training the AdaBoost Classifier

5. Bagging: Training Random Forest of Decision Trees

6. Competing on the Digits data set:

(i) The Digits data set

(ii) Visualizing the data set with TSNE

(iii) Decision Trees vs. AdaBoost vs. Random Forest on the Digits data set

References:

1. The scikit-learn documentation: https://scikit-learn.org/stable/user_guide.html

For Queries: avisek003@gmail.com (Avisek Gupta)

Avisek Gupta, Senior Research Fellow, ECSU

Dr. Swagatam Das, Associate Professor, ECSU

A Short Course on Machine Learning for Practitioners

Organized by Centre for Artificial Intelligence and Machine Learning

Indian Statistical Institute, Kolkata.