• 2022-09
  • 2022-08
  • 2022-07
  • 2022-06
  • 2022-05
  • 2022-04
  • 2021-03
  • 2020-08
  • 2020-07
  • 2020-03
  • 2019-11
  • 2019-10
  • 2019-09
  • 2019-08
  • 2019-07
  • br In this work we introduce


    In this work, we introduce a machine learning method, CHASMplus, to predict the driver status of missense mutations in a cancer-type-specific manner. After careful benchmarking (Results), we applied CHASMplus to 8,657 sequenced tumors from The Cancer Genome Atlas (TCGA) spanning 32 types of can-cer, using a statistically rigorous mutational background model to control false discoveries. We explore the emerging role for rare driver missense mutations in cancer and, when possible, relate 
    predictions to supporting functional evidence. We provide an interactive resource for exploring driver missense mutations iden-tified from the TCGA ( and a user-friendly tool ( to pre-dict whether newly observed mutations from further sequencing are likely cancer drivers. Last, we examine the diversity of driver missense mutations across various types of cancer, which leads to a refined understanding of the likely trajectory of driver missense mutation discovery with further sequencing.
    Overview of CHASMplus
    We have developed a method named CHASMplus that uses ma-chine learning to discriminate somatic missense mutations (referred to hereafter as missense mutations) as either cancer drivers or passengers (Figure 1A; STAR Methods). In AMG 925 to our recent analysis of TCGA mutations (Bailey et al., 2018), the method is designed so that predictions can be done in a can-cer-type-specific manner (Figure 1B), as opposed to only considered across multiple cancer types in aggregate (‘‘pan-cancer’’). To generate predictions, CHASMplus is trained using somatic mutation calls from TCGA covering 8,657 samples in
    32 cancer types (Figure S1; Table S1; STAR Methods). Because there is no gold standard set of driver and passenger missense mutations, we developed a semi-supervised approach to assign class labels to missense mutations. Finally, mutation scores from
    CHASMplus are weighted by a driver gene score for the respec-tive gene, producing gene-weighted (gwCHASMplus) scores (STAR Methods). r> CHASMplus Predicts Cancer-Type Specificity of Driver Missense Mutations
    CHASMplus provides a predictive model for each of 32 cancer types sequenced by TCGA. In contrast, most previous methods provide a single impact score for each missense mutation (Adz-hubei et al., 2010; Carter et al., 2013; Gonzalez-Perez et al., 2012; Ioannidis et al., 2016; Jagadeesh et al., 2016; Kumar et al., 2016; Ng and Henikoff, 2001; Reva et al., 2011; Shihab et al., 2013), regardless of cancer type. However, two methods (CHASM [Carter et al., 2009] and CanDrA [Mao et al., 2013]) do provide cancer-type-specific prediction models, but this capability has not been validated. To illustrate the significant advance in cancer-specific prediction made by CHASMplus, we compared the cancer type specificity of CHASMplus to CHASM and CanDrA, along with, for reference, two additional methods (ParsSNP [Kumar et al., 2016] and REVEL [Ioannidis et al., 2016]) that are not cancer-type specific.
    First, a cancer-type-specific model should accurately predict the oncogenic effects of missense mutations in an appropriate cell line (Fro¨hling et al., 2007; Wan et al., 2004). We therefore compared predictions of breast-cancer-specific CHASMplus, CHASM, and CanDrA models in known breast cancer-driver genes to a previous large-scale validation of 698 missense mu-tations in MCF10A (breast epithelium) cells that measured cell viability (Ng et al., 2018) (Figure 2A; STAR Methods). We used the area under the receiver operating characteristic curve (auROC) as a performance metric, similar to many prior studies of variant effect prediction (Adzhubei et al., 2010; Ioannidis et al., 2016; Kircher et al., 2014; Kumar et al., 2016; Mao et al., 2013). In general, auROC values range from 0.5 (random predic-tion performance) to 1.0 (perfect). We found that CHASMplus had substantially higher auROC than compared to CHASM and CanDrA (p < 2.2e 16; DeLong test; Table S2). It was also signif-icantly higher than ParsSNP, which is not cancer-type specific, and REVEL, a general-purpose pathogenicity predictor (p < 2.2e 16; DeLong test). In fact, CanDrA and CHASM had a lower auROC than ParsSNP, suggesting that these prior methods only captured a limited amount of cancer-type specificity.
    A cancer-type-specific model should also be able to distin-guish the relevant cancer type among driver mutations in human tumors. We therefore used a literature-curated mutation data-base (OncoKB; Chakravarty et al., 2017) to annotate an indepen-dent cohort of 10,000 patients whose tumors were sequenced on a targeted gene panel (MSK-IMPACT; Zehir et al., 2017) for oncogenic mutations (STAR Methods). We compared perfor-mance on four cancer types (breast invasive ductal carcinoma [BRCA], glioblastoma multiforme [GBM], high-grade serous ovarian cancer [OV], and colon adenocarcinoma [COAD]), as these overlapped cancer types found in the TCGA, and CHASM-plus, CHASM, and CanDrA had cancer-specific models for these types. CHASMplus had a significantly higher auROC compared to all other methods for each of the cancer types (p < 0.05; DeLong test; Figure 2B; Table S2). In general, neither CanDrA nor CHASM showed consistent improvements over ParsSNP and REVEL.