The science behind Ensemble Learning

Tooba Jamal
3 min readDec 10, 2021
Photo by Derek Lamar on Unsplash

Building multiple machine learning models and comparing their accuracies can be tiring and time-consuming if the problem is a bit complex. luckily, we have got ensemble learning algorithms that make this task easier for data scientists. As the name suggests, ensemble learning algorithms train a group of models on the same dataset and aggregate their predictions to return more robust prediction that is less prone to errors.

Let’s understand different ensemble learning strategies that can be used to pick models that perform better and give higher accuracy. The strategies we are going to discuss in this post are

  1. Voting Classifier
  2. Bagging Classifier
  3. Boosting

Voting Classifier

Voting Classifier trains multiple machine learning models on the same training data and predicts an output with the highest votes. For example, Decision Tree Classifier, Logistic Regression, and K Nearest Neighbors are trained on a dataset that predicts if the gender of a service user is male or female. If Logistic Regression and Decision Tree Classifier predict 1 and K Nearest Neighbors predicts 0 for a given input then 1 will be the final prediction as it is what the majority predicted.

Bagging (Bootstrap Aggregation)

Bagging or Bootstrap aggregation is an ensemble method in which one model is trained on different subsets of the training dataset with replacement. The final prediction is the aggregate of each individual subset. The aggregate works by average in regression problems and voting in classification problems. Bagging reduces the variance of the model and improves accuracy. The base estimator can be any model.

Boosting

This is really an interesting ensemble method in which every model learns from the errors of the previous model and tries to correct its predecessor. The two popular boosting methods are Ada Boost and Gradient Boost. The default machine learning algorithm used in both Ada Boost and GRadient Boost is Decision Tree.

1. Ada Boosting or Adaptive Boosting

Ada Boost or Adaptive Boosting is a boosting method in which each prediction pays attention to the wrongly predicted by its predecessor and assigns more weight to the instance that wrongly classifies the data.

2. Gradient Boosting

Gradient boosting works by turning weak learners into strong learners by minimizing the loss through updating the weights of the model. However, in Gradient Boosting, exhaustive search is involved therefore, the Stochastic Gradient Boosting technique is used where random subsets of data are given as input for the training of base learner which is a decision tree. In Stochastic Gradient Boosting, samples and features are sampled without replacement which adds variance to the ensemble of trees.

Conclusion

Ensemble learning is a great way to improve machine learning model predictions using various methods in fewer lines of code and of course, less time. In this article, we have discussed voting classifiers, bagging, and boosting. If you have any questions, ask in the comments or share your feedback!

--

--