Doctoral Theses and Dissertations
Permanent URI for this collectionhttps://hdl.handle.net/10294/2900
Browse
Browsing Doctoral Theses and Dissertations by Author "Bui, Francis"
Now showing 1 - 1 of 1
- Results Per Page
- Sort Options
Item Open Access Develop innovative methodology to optimally fill in missing values and predict progression on multiple sclerosis(Faculty of Graduate Studies and Research, University of Regina, 2024-11) Pilehvari, Shima; Peng, Wei; Shirif, Ezeddin; Khan, Sharfuddin; Fan, Lisa; Bui, FrancisApplying Machine Learning (ML) to predict and track Multiple Sclerosis (MS) progression is a significant advancement in medical research, with the potential to enhance patient outcomes. Accurate MS prediction enables personalized treatment, timely interventions, and improved quality of life by slowing disease progression and preventing complications. This research aims to deepen our understanding of MS by developing ML models and comprehensive risk assessments to support early prognosis, guide treatment strategies, and reduce disease impact. A major challenge in medical research, especially in predicting MS progression, is effectively managing missing data in MS datasets. This study introduces an innovative sequential Multi-Imputation (MI) bootstrapping method to address the challenge of missing data in MS datasets. Initially, several ML algorithms, including k-Nearest Neighbors (kNN), Random Forest (RF), and Multilayer Perceptron (MLP), are evaluated for imputation efficiency. RF and MLP perform best, achieving overall accuracies of 92% and 91.5%, respectively, in handling missing data more accurately than other models. Given the effectiveness of RF and MLP in capturing complex patterns in data, these models are selected for further development. The next step applies Multi-Imputation (MI) bootstrapping in a sequential manner, prioritizing features based on the strength of their relationships, as determined by Pearson correlation analysis. This statistical technique identifies features with the highest correlations, ensuring that attributes with stronger relationships with other attributes, are imputed first. These imputed features then inform the next imputation in the sequence, cooperating with the subsequent ranked feature in the order. Bootstrapping, a resampling technique that involves replacement, creates multiple training datasets by repeatedly sampling from the original data, enhancing the robustness of the imputation process. The proposed sequential imputation method integrates bootstrapping with RF, achieving an accuracy up to 97 % for MS datasets. This iterative approach effectively imputes missing data attributes while accounting for feature significance and relationships. The results also show that prioritizing normalization improves scaling impact, and that the significant features in the original dataset are crucial to the accuracy of MS missing data estimations. These findings provide valuable insights into effective imputation techniques for MS prediction, offering a foundation for future improvements in handling missing data in specific datasets. In addition, this study solves the common overfitting problem caused by data imbalance through a comprehensive method combining feature extraction, undersampling, Synthetic Minority Oversampling Technique (SMOTE) and optimal threshold method. Support Vector Machine (SVM), Logistic Regression (LogR), Decision Tree (DT), RF, KNN, MLP and Naive Bayes (NB) are used for prognostic modeling while examining risk factor associations. The results showed that the proposed method prevented overfitting during model training and developed a robust MS progression prognosis model, achieving a prediction accuracy of 98%, particularly for SVM and MLP The methods proposed in this dissertation can help develop more concise guidelines for the medical research communities and improve their evaluation processes. These innovations not only advance prognostic analysis in MS, but also pave the way for future research focused on optimizing patient outcomes and treatment strategies.