The Discovery and Prediction of Genetic Interactions Using Data Science Genetic Interactions
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The overarching goal of my thesis work was to develop and use data science techniques in order to discover as well as predict genetic interactions (GIs), i.e., functional interactions between gene pairs) so that novel functional associations between genes and their higher-order organizations (protein complexes, pathways and bioprocesses) could be established. Keeping GIs in the center, the work can be divided into three objectives: create useful data on GIs (research article 1; chapter 2), improve methods for predicting the strength of GIs (research article 2; chapter 3), and transfer GI knowledge from one organism to another (research article 3; chapter 4). In research article 1, we generated two GI networks corresponding to two different cellular growth conditions. More than 140,000 gene pairs were analyzed in both conditions, which led to a large amount of data. A comprehensive computational framework was designed to pre-process, benchmark, generate and validate GI networks. The thus produced GI networks were then exhaustively analyzed computationally to obtain new biological insights. These computational analyses helped form an array of biological hypotheses, some of which were then experimentally validated. Since GIs vary in strength, scoring models are used in order to express the strength of a GI. For more than a decade, the multiplicative model has been the scoring model of choice for all Synthetic Genetic Array (SGA) technology-based projects. However, we believed that a better scoring model could be developed by applying machine learning algorithms. We developed multiple scoring models by applying several ii machine learning-based methods and showed that Gaussian Processes (GP) was able to train a model that outperformed all other models, including the multiplicative model. Our new scoring model can help any SGA technology-based study in achieving a better quantification of GIs. Large-scale studies, like research article 1, are hugely informative. However, they are expensive and take years to complete. Moreover, technological limitations prevent us from conducting such studies in higher-order organisms, such as humans. In research article 3, we propose a machine learning-based computational framework to predict GIs in one organism by exploiting GI information from another organism. We predicted over 4,000 previously unknown human GIs by exploiting orthologous GIs in yeast.