Self-training for cyberbully detection: Achieving high accuracy with a balanced multi-class dataset
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Cyberbullying has become an alarming issue in the digital era, causing significant harm to its victims. The development of automated methods for detecting cyberbullying in social media is of paramount importance to safeguard vulnerable individuals. In this thesis, we propose a robust approach based on Machine Learning (ML) and Deep Learning (DL) techniques for cyberbully detection in social media platforms. Our approach involves the meticulous curation of a balanced dataset specifically designed for training the ML/ DL models. To overcome the challenge of limited labeled data, we employ a semi-supervised self-training algorithm, which effectively expands the size of the labeled dataset. By leveraging real-world social media data, we train and test the model, evaluating its performance using key metrics such as precision, recall, and F1-score. In addition, we present our meticulously annotated dataset comprising 99,991 tweets, which we have made publicly available for future scientific investigations. This dataset serves as a valuable resource for further research in this field, facilitating the development and evaluation of novel techniques for cyberbully detection. Our results underscore the near-perfect performance of the proposed approach in the context of cyberbully detection, reaffirming the efficacy of ML and DL techniques for addressing this pervasive problem. These findings offer crucial insights for future research endeavors in this domain and hold practical implications for the development of automated systems capable of detecting and combating cyberbullying in social media platforms. By continuously advancing our understanding of cyberbullying detection and developing sophisticated ML and DL models, we can foster safer digital environments and protect individuals from the detrimental effects of cyberbullying.