Classifying ovarian cancer using machine learning methods
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Ovarian cancer is one of the most fatal cancers for women nowadays. It is ranked as fifth most common cancer deaths among women resulting more deaths than any other cancers in female reproductive system. According to Canadian Cancer society that about 3000 ovarian cancer patients were detected, and among them 1950 patients died in 2022 which indicating more than 50% of mortality rate. Ovarian cancer is mainly generated from cancerous ovarian tumour. So, it is very important to classify cancerous tumour from noncancerous tumour to prevent false positive for ovarian cancer. Moreover, if cancerous tumour is diagnosed in early stage, it can be prevented from spreading and thus survival rate for ovarian cancer can be increased. Also, by separating cancer patients from benign tumour patients, it will be easier for doctors to know the stages of the cancer and know patient’s prognosis and life expectancy. The principal and initial objective of this thesis is building a feasible system using Artificial Intelligence which is easy to use and compatible to classify ovarian cancer. Proposed study will give a new non-conventional way to classify ovarian cancer from ovarian tumour which will be affordable for the patients. Moreover, one of the primary benefits of this study is that doctors/physicians can detect ovarian cancer with only blood test/ serum test. There is no need to do any expensive tests such as: ultrasound, MRI or CT-Scan. The main concept of this research is the application of several machine learning techniques to correctly classify ovarian cancer and finding best technique among those in terms of Accuracy, Precision, Sensitivity, and Specificity. Original dataset is taken from website named Kaggle (https://www.kaggle.com/). This dataset is screened, cleaned and normalized first and then expert’s advice has been taken to extract the most important features to do the correct classification. Later, a correlation test has been done for better understanding of the relations and independency among the input features. 10 input features have been selected including age, menopause, CA-125, AFP, NEU etc. From correlation test result 7 inputs were taken again and a comparison had been made between 10 inputs and 7 inputs. And the output is TYPE which denotes 1 for benign ovarian tumour and 0 for ovarian cancer. Four machine learning models have been used for classification and they are, ANN, SVM, Naïve Bayes, and k-NN. Training of each model is performed and after training, each algorithm is tested and hence performance is calculated and compared. After analysing results, it is found that for the problem considered, the Artificial Neural Network (ANN) is the best classifier in terms of accuracy giving 85.91% accurate results on test data whereas SVM, NB and k-NN gave accuracy of 76.05%, 83.09% and 76.06% respectively. In terms of sensitivity and precision calculation, Naïve Bayes is best, and the ANN Classifier is second best algorithm. Taking specificity into Consideration, the ANN is best with 87.50%. Keywords: Machine Learning Classifier, Ovarian Cancer, Benign Ovarian Tumour, Artificial Intelligence, Artificial Neural Network (ANN), Support Vector Machine (SVM), Naïve Bayes (NB), k-nearest Neighbour (k-NN), Confusion Matrix.