Some important terms:
Variance is defined as spread of data from its mean position. It indicates whether or not the model is generalised properly for new data presented. High variance indicates overfitting of data.
Bias is defined as inability of model due to error or differences between actual value and predicted value. It occurs due to wrong assumptions during the hypothesis of the model.High bias indicates under fitting of data.
Types of Ensemble Learning Models:
● Bagging: Main goal is to reduce variance.
Steps for training:
- The original dataset is partitioned into numerous subgroups of equal tuples.
- Each of these subsets is used to build a base model, which is then trained independently in parallel on multiple datasets.
- Final results are predicted by combining the results of all the models
Random forest is one the example of bagging ensemble learning model
● Boosting: This algorithm is mainly used for decreasing biasSteps for training:
- Complete dataset is passed as input, with equal weight to each attribute, to the model.
- Identify the wrongly classified data points in the result and increase their weights.
- If the required result is not obtained, repeat the process.
Some of the examples are AdaBoost, Gradient Boosting and XGBoost etc
● Stacking:
This is slightly different from other forms of ensemble machine learning models.Instead of adopting similar models, it investigates the entire space of possible models for solving the same problem. It is primarily used for multiple classifications and regression models. The idea is to attack various models that are capable of learning some aspects of the problem but not the entire problem. In this scenario, we create several separate learner sand use them to get an intermediate prediction. Then, using these intermediate predictions, we create a model that learns the same target.
The final model is stated to be layered on top of the others. This approach may boost your overall performance and result in a model that outperforms any individual model utilised.
For example, my problem was to detect any nsfw(Not Safe for Work) content in the provided but I got models which have some categories but not all. But when combining them the resultant set of categories was complete what I wanted.
Here, using stacking to generate their combined result and simultaneously training a classification model from the results generated and verified would be a very good solution.
Reference:
1. Figure2: https://www.geeksforgeeks.org/bagging-vs-boosting-in-machine-learning/