Background and Aims: Colorectal cancer (CRC) is the second leading cause of cancer-related death worldwide. Many molecular classification strategies have been proposed for CRC but few studies included prognosis data in their models. Here we aim to construct a prognosis-oriented CRC classifier by adapting the natural partially labeled censored survival data into a customized semi-supervised learning algorithm, and to further identify the potential biomarkers for predicting CRC prognosis. Method: A semi-supervised voting ensemble classifier combining Monte-Carlo sampling and K-Nearest Neighbor (KNN) algorithm was designed to provide a comprehensive classification for CRC based on both molecular features and clinical outcomes. Genetic data and survival data obtained from TCGA were subjected to this Monte-Carlo KNN Voting (MC-KV) classifier to divide CRC patients into different subtypes. A degenerated machine learning model was constructed by combining WGCNA and LASSO for variable selection and four algorithms (random survival forest, SVM, Adaboost, and logistic regression) for optimization. Dataset GSE192667 was used for model verification. Results: Our classifier divided all CRC patients into three subtypes with distinct prognoses. Cell proliferation, DNA repair, and energy metabolism were enriched in one subtype, while angiogenesis and epithelial-mesenchymal transition (EMT) were enriched in the other two subtypes. Differentially expressed genes (DEG) analysis identified 1410 DEGs that were mainly involved in the formation of extracellular matrix. With a variable selection process combining WGCNA and LASSO regression, a 6-gene (HIST2H2BF, TIMP1, NOG, HOXA4, TMEM91, NGF) risk model was constructed. The optimized model showed great performance in distinguishing high-risk from low-risk patients with a maximum AUC of 0.869, 0.906, and 0.921 in 1-, 3- and 5-year survival, respectively. Additionally, the 6-gene signature identified by MC-KV also exhibited great predictive efficiency for other cancer types. Conclusion: We have proposed a prognosis-oriented classification for CRC patients by a newly developed semi-supervised voting classifier named MC-KV. Based on this classifier, we constructed a 6-gene signature that exhibits promising prognostic performance for both CRC and other cancer types.

Graphic abstract.
This study was conducted in three main parts: (1) Identification of three distinctive molecular subtypes of CRC by MC-KV algorithm and exhibition of their clinical and biological features; (2) Discovery of a 6-gene signature by a combination of WGCNA and LASSO regression, and establishment of a nomogram combining clinicopathological parameters; (3) Extensive application of the 6-gene signature in pan-cancer patients.
Citation: Journal of the National Comprehensive Cancer Network 21, 3.5; 10.6004/jnccn.2022.7168

Graphic abstract.
This study was conducted in three main parts: (1) Identification of three distinctive molecular subtypes of CRC by MC-KV algorithm and exhibition of their clinical and biological features; (2) Discovery of a 6-gene signature by a combination of WGCNA and LASSO regression, and establishment of a nomogram combining clinicopathological parameters; (3) Extensive application of the 6-gene signature in pan-cancer patients.
Citation: Journal of the National Comprehensive Cancer Network 21, 3.5; 10.6004/jnccn.2022.7168
Graphic abstract.
This study was conducted in three main parts: (1) Identification of three distinctive molecular subtypes of CRC by MC-KV algorithm and exhibition of their clinical and biological features; (2) Discovery of a 6-gene signature by a combination of WGCNA and LASSO regression, and establishment of a nomogram combining clinicopathological parameters; (3) Extensive application of the 6-gene signature in pan-cancer patients.
Citation: Journal of the National Comprehensive Cancer Network 21, 3.5; 10.6004/jnccn.2022.7168