The Discovery and Prediction of Genetic Interactions Using Data Science Genetic Interactions

Kumar, Ashwani

The Discovery and Prediction of Genetic Interactions Using Data Science Genetic Interactions

Files

Kumar_Ashwani_PhD_CS_Spring2020_Final.pdf (6.72 MB)

Date

2020-02

Authors

Kumar, Ashwani

Publisher

Faculty of Graduate Studies and Research, University of Regina

Abstract

The overarching goal of my thesis work was to develop and use data science techniques in order to discover as well as predict genetic interactions (GIs), i.e., functional interactions between gene pairs) so that novel functional associations between genes and their higher-order organizations (protein complexes, pathways and bioprocesses) could be established. Keeping GIs in the center, the work can be divided into three objectives: create useful data on GIs (research article 1; chapter 2), improve methods for predicting the strength of GIs (research article 2; chapter 3), and transfer GI knowledge from one organism to another (research article 3; chapter 4). In research article 1, we generated two GI networks corresponding to two different cellular growth conditions. More than 140,000 gene pairs were analyzed in both conditions, which led to a large amount of data. A comprehensive computational framework was designed to pre-process, benchmark, generate and validate GI networks. The thus produced GI networks were then exhaustively analyzed computationally to obtain new biological insights. These computational analyses helped form an array of biological hypotheses, some of which were then experimentally validated. Since GIs vary in strength, scoring models are used in order to express the strength of a GI. For more than a decade, the multiplicative model has been the scoring model of choice for all Synthetic Genetic Array (SGA) technology-based projects. However, we believed that a better scoring model could be developed by applying machine learning algorithms. We developed multiple scoring models by applying several ii machine learning-based methods and showed that Gaussian Processes (GP) was able to train a model that outperformed all other models, including the multiplicative model. Our new scoring model can help any SGA technology-based study in achieving a better quantification of GIs. Large-scale studies, like research article 1, are hugely informative. However, they are expensive and take years to complete. Moreover, technological limitations prevent us from conducting such studies in higher-order organisms, such as humans. In research article 3, we propose a machine learning-based computational framework to predict GIs in one organism by exploiting GI information from another organism. We predicted over 4,000 previously unknown human GIs by exploiting orthologous GIs in yeast.

Description

A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Computer Science, University of Regina. xiv, 284 p.

URI

https://hdl.handle.net/10294/9173

Collections

Master’s and Doctoral Theses

Full item page

The Discovery and Prediction of Genetic Interactions Using Data Science Genetic Interactions

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections