BIO22-027: Exploiting Machine Learning to Unravel Prognostic Biomarkers in Lung Cancer

Authors: Hemant Kumar Joon, Pre-Doctoral Fellow1,2, Anamika Thalor, Pre-Doctoral Fellow1, and Dinesh Gupta PhD1
View More View Less
  • 1 Translational Bioinformatics Group, International Centre for Genetic Engineering and Biotechnology, Aruna Asaf Ali Marg, New Delhi-110067, India
  • | 2 Regional Centre for Biotechnology, Faridabad -121001, Haryana, India

Recent advances in machine learning have created opportunities to decode the genetic regulatory code and develop precision medicine for patients diagnosed with various kinds of diseases, including cancer. We hypothesize that machine learning and system biology could be applied to lung cancer gene expression profiles to identify novel biomarkers with therapeutic implications. Multiple gene expression profiles of lung cancer and normal lung tissue were obtained from the NCBI GEO datasets comprising>20k gene expressions. After pre-processing and merging the multiple datasets, the batch effect was corrected to purge the non-biological variations among the multiple datasets. Z-score standardization was employed on batch effect corrected dataset to get all the features on same scale. Further, a feature selection algorithm was employed to obtain the top 10, 20, 30, 40, 50, and 60 features from the complete gene expression profile having >20k genes to reduce the curse of dimensionality. Later, five machine learning algorithms were employed viz. support vector machine (SVM), k Nearest Neighbour (kNN), random forest (RF), decision tree (DS) and XGBoost on the above-selected features. kNN outperformed all the machine learning algorithms in training and on the external validation dataset downloaded from the same platform. The differential gene expression analysis (logFC > 1.5 and p-value < 0.05) was simultaneously employed on the complete dataset to identify the differentially expressed genes from the selected gene features obtained using machine learning. The next step is to identify prognostic biomarkers from the differentially expressed selected genes, using Kaplan Meier analysis.

Corresponding Author: Dinesh Gupta, PhD
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 3791 3791 7
PDF Downloads 0 0 0
EPUB Downloads 0 0 0