Self-training for cyberbully detection: Achieving high accuracy with a balanced multi-class dataset

Ahmadinejad, Mohamad Hosein

Self-training for cyberbully detection: Achieving high accuracy with a balanced multi-class dataset

Files

Ahmadinejad,MohamadHosein_MSc_CS_Thesis_2023Fall.pdf (1.52 MB)

Date

2023-08

Authors

Ahmadinejad, Mohamad Hosein

Publisher

Faculty of Graduate Studies and Research, University of Regina

Abstract

Cyberbullying has become an alarming issue in the digital era, causing significant harm to its victims. The development of automated methods for detecting cyberbullying in social media is of paramount importance to safeguard vulnerable individuals. In this thesis, we propose a robust approach based on Machine Learning (ML) and Deep Learning (DL) techniques for cyberbully detection in social media platforms. Our approach involves the meticulous curation of a balanced dataset specifically designed for training the ML/ DL models. To overcome the challenge of limited labeled data, we employ a semi-supervised self-training algorithm, which effectively expands the size of the labeled dataset. By leveraging real-world social media data, we train and test the model, evaluating its performance using key metrics such as precision, recall, and F1-score. In addition, we present our meticulously annotated dataset comprising 99,991 tweets, which we have made publicly available for future scientific investigations. This dataset serves as a valuable resource for further research in this field, facilitating the development and evaluation of novel techniques for cyberbully detection. Our results underscore the near-perfect performance of the proposed approach in the context of cyberbully detection, reaffirming the efficacy of ML and DL techniques for addressing this pervasive problem. These findings offer crucial insights for future research endeavors in this domain and hold practical implications for the development of automated systems capable of detecting and combating cyberbullying in social media platforms. By continuously advancing our understanding of cyberbullying detection and developing sophisticated ML and DL models, we can foster safer digital environments and protect individuals from the detrimental effects of cyberbullying.

Description

A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science, University of Regina. xiii, 99.

URI

https://hdl.handle.net/10294/16190

Collections

Master’s and Doctoral Theses

Full item page

Self-training for cyberbully detection: Achieving high accuracy with a balanced multi-class dataset

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections