Precision-based boosting for regression

Journal Title
Journal ISSN
Volume Title
Faculty of Graduate Studies and Research, University of Regina

Regression is a type of predictive modeling problem that involves estimating a continuous numerical value based on input variables. The goal of this research is to investigate whether incorporating the precision of regression models on specific target values can improve the performance of ensemble-based regression models. We begin by reviewing two existing ensemble methods for classification, namely AdaBoost and PrAdaBoost, which will form the basis of our proposed ensemble method for regression. We also provide a formal analysis of the training error upper bounds for PrAdaBoost and AdaBoost. The mathematical proof shows that PrAdaBoost’s upper bound is always less than or equal to AdaBoost’s. This result is important because it implies that PrAdaBoost’s training error upper bound decreases exponentially as the number of iterations increases, assuming that each individual predictor in the ensemble is better than random guessing. We modify the PrAdaBoost algorithm and implement it in the context of regression, thus introducing a new regression algorithm called PrSAMME-R. To evaluate the performance of PrSAMME-R, several experiments are conducted on various regression datasets, and the results are compared to those obtained from other ensemble-based regression models. The results show that incorporating the precision of regression models on specific target values into their weights can improve the performance of ensemble-based regression models significantly. PrSAMME-R outperforms other ensemble-based rei gression models such as Random Forest, Gradient-based Boosting, AdaBoost.R2 and AdaBoost.RT, in terms of mean absolute error.

A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science, University of Regina. xi, 75 p.