Background
Esophageal and gastric cancer are the eighth and fifth most common cancers worldwide, respectively. 1,2 Both types of cancer have a high mortality rate due in part to the high prevalence of distant metastases at diagnosis (∼20%–30% of patients with esophageal cancer and ∼30%–40% of patients with gastric cancer). 2 –4
Multiple treatment options are available for patients with metastatic or potentially curable disease. Potentially curable gastric and esophageal cancer can be treated with surgery with or without (neo)adjuvant chemo(radio)therapy. Potentially curable esophageal cancer can also be treated with definitive chemoradiotherapy. Metastatic disease is treated mainly with systemic therapy but also with best supportive care. 5,6
However, even in the curative setting, outcome of these treatments is poor in esophageal and gastric cancer, with 5-year survival rates <50%, whereas treatment-related morbidity is high. 7,8 Therefore, patient preferences and values should play a significant role in shared decision-making concerning treatment options. 9 When deciding on treatment, it is therefore vital that patients are provided accurate and preferably personalized information about the risks and benefits of the treatment trajectories, ideally based on prediction models. 10,11
Several prediction models have been developed to predict risks and benefits in patients with esophageal and gastric cancer. 12 However, these models are largely focused on the curative setting and predict survival after completion of a curative resection; they therefore cannot be used for clinical decision-making before the start of treatment, nor can they be used to compare different treatment options. Furthermore, prediction models for the metastatic setting are scarce.
We recently developed 2 models to predict survival in patients with metastatic esophageal or gastric cancer based on tumor, patient, and treatment characteristics. 13 Although these models had good calibration and a fair C-index, some important information was not available at the time of development, such as HER2/neu status and WHO performance status, which would improve the models’ performance. 14,15 The aim of this study (the SOURCE study) is to create 2 new models to predict survival in patients with potentially curable esophageal or gastric cancer and to update our previously published models for patients with metastatic esophageal and gastric cancer.
Methods
This article adheres to the TRIPOD guidelines. 16 According to the Central Committee on Research Involving Human Subjects, this type of study does not require approval from an ethics committee in the Netherlands. However, the study was approved by the Privacy Review Board of the Netherlands Cancer Registry (NCR; project code K17134).
Dataset
NCR data were used in the development and validation of the SOURCE prediction models. 17 This nationwide population-based registry is prospectively maintained. Since 2015, additional potential predictors were added to the database. We therefore decided to include only patients diagnosed in 2015 through 2018 (the year with the last available data in the NCR) with a primary esophageal or gastric tumor. Patients with cM1 disease were classified as having metastatic cancer, and patients with stage cT1–4a,xN0–3,xM0 disease were classified as having potentially curable disease. This classification was also used in previous studies. 18,19 Patients with metastatic disease whose first metastasis was discovered ≥4 days after treatment initiation were classified as having cM0 disease because they were diagnosed without any metastases.
Data from the NCR dataset were divided into 4 cohorts based on primary tumor location (esophageal vs gastric cancer) and cM stage (metastatic vs potentially curable cancer). The primary tumor was classified as esophageal cancer if the ICD-O topography code was C15.x or C16.0 (cardia) and as gastric cancer for C16.1–9. 20 Four prediction models were created based on these 4 datasets. The follow-up period lasted until January 2019 for a maximum of 4-year follow-up for all patients.
Exclusion criteria included unknown vital status at the end of follow-up, unknown follow-up or survival of at most 14 days, primary cT0 or in situ tumor, and unknown tumor histology. For patients with multiple primary tumors, duplicates were removed and only the earliest entry per patient was retained. Patients with metastatic disease who had distant metastases confined to lymph nodes of the head and neck area were excluded from the analyses because they could be treated with curative intent (Figure 1).

Patient inclusion flowchart. The 4 initial cohort sizes in the Netherlands Cancer Registry and the number of patients excluded are shown. The final selection was used in creating the SOURCE prediction models.
Citation: Journal of the National Comprehensive Cancer Network 19, 4; 10.6004/jnccn.2020.7631

Patient inclusion flowchart. The 4 initial cohort sizes in the Netherlands Cancer Registry and the number of patients excluded are shown. The final selection was used in creating the SOURCE prediction models.
Citation: Journal of the National Comprehensive Cancer Network 19, 4; 10.6004/jnccn.2020.7631
Patient inclusion flowchart. The 4 initial cohort sizes in the Netherlands Cancer Registry and the number of patients excluded are shown. The final selection was used in creating the SOURCE prediction models.
Citation: Journal of the National Comprehensive Cancer Network 19, 4; 10.6004/jnccn.2020.7631
Development and Validation of the Models
The methods for constructing the SOURCE prediction models were described in detail previously. 13 In short, the following procedures were followed. First, a preliminary predictor selection was made for each cohort. Predictors were selected if they were available for ≥50% in the dataset, had <50 levels (for categorical variables only), and did not have the same values for all patients (and would therefore have been noninformative). Performance status, body mass index (BMI), American Society of Anesthesiologists performance status classification, HER2/neu status, and laboratory results (hemoglobin, creatinine, lactate dehydrogenase, albumin levels) were also included in the preliminary predictor selection, in contrast to the previous study, because these variables became available for patients diagnosed as of 2015. 14,15 All predictors included in the SOURCE prediction models were determined at the time of diagnosis.
Next, a multivariate Cox proportional hazards regression model was created in each cohort, with overall survival as the outcome. 21 Overall survival was measured from diagnosis to death or censored at the date of last follow-up. In contrast to the previous study, the present models do not include interaction terms. It was found that the interaction terms did not increase model performance (data not shown), and the interaction terms were removed to avoid overfitting. Initially, all predictors from the preliminary selection were included in the model, and multiple imputation with 10 iterations via chained equations (multivariate imputation by chained equations [MICE]) was used to handle missing data. 22 A bidirectional predictor selection using the Akaike information criterion (AIC) was used to create a final predictor selection in each cohort. 23 From the resulting models, the C-index, calibration slope, intercept, and deviance were obtained. The C-index is a measure of discrimination and ranges from 0.5 (not able to discriminate survival outcome among individuals) to 1 (perfect discrimination). 24 The calibration refers to the concordance between predicted and observed survival. With a perfect calibration, the calibration intercept is 0, and the calibration slope is 1. 24 The calibration deviance refers to the mean absolute difference between predicted and observed survival. 25 All models were developed using the rms (regression modeling strategy) package in the RStudio environment with R version 3.6.1 (R Foundation for Statistical Computing).
To test the robustness of the models, an internal–external temporal cross-validation scheme was used. 26 Within this framework, the aforementioned model development was used to create models for patients diagnosed in earlier years, after which the model was evaluated based on patients in later years. This mimics the way in which models are evaluated when used in real life and reflects model performance behavior in the face of potential population drift over time. This method allows the simulation of a true temporal external validation while using the entire available dataset. 26 This cross-validation is explained in more detail in our previous publication. 13 First, data from the earliest diagnosis year (2015) were used to create prediction models. The model’s performance was then evaluated based on patients diagnosed in the subsequent year (2016). This process was then repeated for later diagnosis years; the training cohort included 2015 through 2016 and was validated based on the 2017 cohort, after which the model was trained on patients diagnosed in 2015 through 2017 and validated based on patients diagnosed in 2018. The performance statistics were pooled to obtain a cross-validated estimation of the model performance. To summarize, a Cox proportional hazards regression model was created for each cohort based on all available data, and a meta-analysis of cross-validated performance statistics was calculated to determine the model quality.
Results
Table 1 provides an overview of patient characteristics for the included cohort. Additional patient characteristics are provided in supplemental eTable 1 (available with this article at JNCCN.org). Kaplan-Meier curves for the 4 cohorts are provided in supplemental eFigure 1.
Characteristics of Included Patients


A complete overview of the parameters of the 4 SOURCE models is provided in supplemental eTables 2 through 5. These tables show the final predictor selection and the associated hazard ratios for each parameter in the multivariate Cox proportional hazards regression models. Table 2 shows an overview of the selected parameters in each prediction model.
Overview of Selected Parameters in Each Prediction Model


The SOURCE models are also displayed graphically as nomograms in supplemental eFigures 2–5. 27 In a nomogram, the value of each predictor (eg, the weight of the patient) is marked on its scale and then associated with a number of points that can be read from the top scale. The sum of all points can then be placed in the bottom scale, after which the survival estimate is determined. The nomograms provide survival estimates at 6 and 12 months for metastatic cancers and at 1 through 4 years for potentially curable cancers.
The performance statistics of all models are shown in Table 3. These results show an overall good calibration in all models. The 95% confidence intervals of the slopes and those of the intercepts include 1 and 0, respectively. The calibration deviance shows average prediction errors of 1% to 5%. The c-indices are 0.72 for metastatic cancers and are even higher for potentially curable cancers, with magnitudes of 0.78 and 0.80. Additional calibration plots are displayed in supplemental eFigure 6 and show the correspondence between predicted and observed survival per year cohort of validation.
SOURCE Model Performance Statistics


Discussion
The primary aim of this study was to create prediction models for overall survival in patients with potentially curable and metastatic esophageal or gastric cancer. The SOURCE models are based on a large national cohort of patients diagnosed in recent years and form a complete set of models for use in upper gastrointestinal cancers. In contrast with other previously developed prediction models, the SOURCE models stand out due to their applicability to the full range of patients with curative and palliative esophageal and gastric cancer and are to be used before the start of treatment. 28 –30 Moreover, they are the first esophageal and gastric cancer prediction models that include treatment as a predictor.
The robustness and generalizability of the models were considered during model development. The AIC method was used to automatically guide the predictor selection. Missing data were handled with multiple imputations (MICE). With this method, the prediction models are based on multiple datasets in which the missing values were imputed. The number of patients with at most 2 missing variables is 10.490 (78.5%). Because multiple imputations were made, the uncertainty of each individual imputation is taken into account. 31 This has the benefit of reducing bias compared with other methods, such as complete-case analysis. 31 To investigate the effect of overfitting, the models were also analyzed with an internal–external temporal cross-validation. With this method, it is possible to simulate a temporal validation of the models that helps to examine how well the models might work with patient cohorts diagnosed in later years, provided they are more or less comparable. 26 This is especially relevant when developing models for clinical practice, because predictions will be made for patients diagnosed after the model has been developed.
The performance measures of the SOURCE models are similar for the complete model and for the internal–external cross-validation, indicating a lack of overfit. The c-indices of the potentially curable models are >0.75 (the average C-index of other prediction models for esophageal and gastric cancer models), whereas the metastatic models had a C-index of 0.72 to 0.73, which can be considered fair. 12 There is also a good calibration slope and intercept for all models.
The presented metastatic models represent an update of our previously published models. 13 Model updating is an important part in the lifecycle of a prediction model. 32 The current models significantly differ from the previous models. First, the current models are developed based on more recent cohorts (2015–2018) than the previous models (2005–2015). In recent years, the NCR has extended its data collection to incorporate additional variables that could potentially be included as predictors and improve model performance. Indeed, WHO performance status and HER2/neu status are now included in the SOURCE models, as are BMI and albumin, hemoglobin, lactate dehydrogenase, and creatinine levels. 14,15 Second, parameter interaction terms were removed from the models; this had no significant effect on model performance, and further decreases the potential of overfitting. The resulting updated models showed stable or even increased performance statistics, and the C-index of the gastric cancer model increased from 0.68 to 0.73. The model calibration demonstrates results in the updated models similar to those of the previous SOURCE models.
Some limitations of the SOURCE models should be mentioned. Patients were included as of 2015, implying a relatively short follow-up period, particularly for the cohorts with potentially curable disease. In this case, it was not possible to increase the follow-up to 5 years. In future models, a longer follow-up will be available, allowing predictions over a longer period of time for curative cohorts.
Another limitation is that information about treatment intent is not included in the NCR because it includes only the treatments patients actually received. For example, patients who intended to receive a neoadjuvant chemotherapy and surgery but did not advance to surgery because of clinical deterioration are classified as having received definitive chemotherapy. Predictions for definitive chemotherapy, for example, are therefore based on patients who intended to undergo definitive chemotherapy and on those who did not proceed to surgery after neoadjuvant treatment, which are clinical situations with likely different survival estimates. Furthermore, limited treatment details in the NCR led to broad treatment categories, as shown in supplemental eTable 1. These limitations should be considered when using the SOURCE models.
In addition, these models are based solely on a Dutch population, which may impact the generalizability of this study. External validation should be performed to further determine the robustness of the SOURCE models and applicability to other populations of patients with esophageal and gastric cancer. 26 For this undertaking, it is vital to take into consideration the comparability of cohorts with respect to, for example, tumor histology and primary tumor origin. 33
The main strength of the SOURCE models lies in their clinical applicability. SOURCE forms a complete set of models that cover both potentially curable and metastatic esophageal and gastric cancer. The predictors used in the models are readily available in standard clinical care and do not require additional testing. The inclusion of treatment as a model parameter makes it possible to compare the survival for various relevant treatment options, which can help with shared decision-making. 11 Figure 2 illustrates how the SOURCE models can be used to create predictions. The median predicted survival and confidence intervals are displayed for various patients with metastatic and potentially curable disease. It is also possible to compare the survival for various treatments, although one must be aware that not all treatments are relevant for each patient.

Individual predictions made by the SOURCE models. The vertical line within each bar represents the predicted median survival for a random selection of patients. The bars show the 50% confidence interval, and the lines show the 80% confidence interval. The table on the right shows a selection of patient characteristics used for the predictions.
Abbreviations: BSC, best supportive care; CRT, chemoradiotherapy; nC, neoadjuvant chemotherapy; nCRT, neoadjuvant chemoradiotherapy; NOS, not otherwise specified; RT, radiotherapy.
Citation: Journal of the National Comprehensive Cancer Network 19, 4; 10.6004/jnccn.2020.7631

Individual predictions made by the SOURCE models. The vertical line within each bar represents the predicted median survival for a random selection of patients. The bars show the 50% confidence interval, and the lines show the 80% confidence interval. The table on the right shows a selection of patient characteristics used for the predictions.
Abbreviations: BSC, best supportive care; CRT, chemoradiotherapy; nC, neoadjuvant chemotherapy; nCRT, neoadjuvant chemoradiotherapy; NOS, not otherwise specified; RT, radiotherapy.
Citation: Journal of the National Comprehensive Cancer Network 19, 4; 10.6004/jnccn.2020.7631
Individual predictions made by the SOURCE models. The vertical line within each bar represents the predicted median survival for a random selection of patients. The bars show the 50% confidence interval, and the lines show the 80% confidence interval. The table on the right shows a selection of patient characteristics used for the predictions.
Abbreviations: BSC, best supportive care; CRT, chemoradiotherapy; nC, neoadjuvant chemotherapy; nCRT, neoadjuvant chemoradiotherapy; NOS, not otherwise specified; RT, radiotherapy.
Citation: Journal of the National Comprehensive Cancer Network 19, 4; 10.6004/jnccn.2020.7631
Conclusions
Currently, predictions can be made using the nomograms provided in supplemental eFigure 2. Although useful, these nomograms are not suitable for informing patients, and graphs or icon arrays should be used when informing patients about treatment outcomes. 34 The SOURCE models will be tested extensively in a clinical trial (ClinicalTrials.gov identifier: NCT04232735) to examine their effect on shared decision-making. The SOURCE models will become available through a web interface (https://source.amc.nl/) that is currently under development and the subject of a clinical trial, and they are therefore not accessible yet to the general public. This web interface will be used to facilitate the use of the prediction models and to display the predictions with user-friendly visualizations.
References
- 1.↑
Arnold M , Soerjomataram I , Ferlay J , et al.. Global incidence of oesophageal cancer by histological subtype in 2012. Gut 2015;64:381–387.
- 2.↑
Ferlay J , Soerjomataram I , Dikshit R , et al.. Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012. Int J Cancer 2015;136:E359–386.
- 3.↑
Napier KJ , Scheerer M , Misra S . Esophageal cancer: a review of epidemiology, pathogenesis, staging workup and treatment modalities. World J Gastrointest Oncol 2014;6:112–120.
- 4.↑
Riihimäki M , Hemminki A , Sundquist K , et al.. Metastatic spread in patients with gastric cancer. Oncotarget 2016;7:52307–52316.
- 5.↑
Smyth EC , Verheij M , Allum W , et al.. Gastric cancer: ESMO clinical practice guidelines for diagnosis, treatment and follow-up. Ann Oncol 2016;27(Suppl 5):v38–49.
- 6.↑
Lordick F , Mariette C , Haustermans K , et al.. Oesophageal cancer: ESMO clinical practice guidelines for diagnosis, treatment and follow-up. Ann Oncol 2016;27(Suppl 5):v50–57.
- 7.↑
Gavin AT , Francisci S , Foschi R , et al.. Oesophageal cancer survival in Europe: a EUROCARE-4 study. Cancer Epidemiol 2012;36:505–512.
- 8.↑
van den Ende T , Abe Nijenhuis FA , van den Boorn HG , et al.. COMplot, a graphical presentation of complication profiles and adverse effects for the curative treatment of gastric cancer: a systematic review and meta-analysis. Front Oncol 2019;9:684.
- 9.↑
Barry MJ , Edgman-Levitan S . Shared decision making—pinnacle of patient-centered care. N Engl J Med 2012;366:780–781.
- 11.↑
Henselmans I , Van Laarhoven HWM , Van der Vloodt J , et al.. Shared decision making about palliative chemotherapy: a qualitative observation of talk about patients’ preferences. Palliat Med 2017;31:625–633.
- 12.↑
van den Boorn HG , Engelhardt EG , van Kleef J , et al.. Prediction models for patients with esophageal or gastric cancer: a systematic review and meta-analysis. PLoS One 2018;13:e0192310.
- 13.↑
van den Boorn HG , Abu-Hanna A , Ter Veer E , et al.. SOURCE: a registry-based prediction model for overall survival in patients with metastatic oesophageal or gastric cancer. Cancers (Basel) 2019;11:187.
- 14.↑
Chan DSY , Twine CP , Lewis WG . Systematic review and meta-analysis of the influence of HER2 expression and amplification in operable oesophageal cancer. J Gastrointest Surg 2012;16:1821–1829.
- 15.↑
Yates JW , Chalmer B , McKegney FP . Evaluation of patients with advanced cancer using the Karnofsky performance status. Cancer 1980;45:2220–2224.
- 16.↑
Collins GS , Reitsma JB , Altman DG , et al.. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med 2015;162:55–63.
- 17.↑
Netherlands Comprehensive Cancer Organisation (IKNL). IKNL and the NCR. Accessed November 10, 2020. Available at: https://www.iknl.nl/en
- 18.↑
van Putten M , Verhoeven RHA , van Sandick JW , et al.. Hospital of diagnosis and probability of having surgical treatment for resectable gastric cancer. Br J Surg 2016;103:233–241.
- 19.↑
van der Sluis PC , van der Horst S , May AM , et al.. Robot-assisted minimally invasive thoracolaparoscopic esophagectomy versus open transthoracic esophagectomy for resectable esophageal cancer: a randomized controlled trial. Ann Surg 2019;269:621–630.
- 20.↑
Fritz A , Percy C , Jack A , et al.. International Classification of Diseases for Oncology, 3rd ed. Geneva, Switzerland: World Health Organization; 2000.
- 21.↑
Fox J , Weisberg S . Cox proportional-hazards regression for survival data in R. An appendix to an R companion to applied regression, second edition. Accessed November 10, 2020. Available at: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.459.4496&rep=rep1&type=pdf
- 22.↑
van Buuren S , Groothuis-Oudshoorn K . MICE: Multivariate Imputation by Chained Equations in R. J Stat Softw 2011;45:1–67.
- 23.↑
Bozdogan H . Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika 1987;52:345–370.
- 24.↑
Steyerberg EW , Vickers AJ , Cook NR , et al.. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology 2010;21:128–138.
- 25.↑
Gerds TA , Schumacher M . Consistent estimation of the expected Brier score in general survival models with right-censored event times. Biom J 2006;48:1029–1040.
- 26.↑
Steyerberg EW , Harrell FE Jr . Prediction models need appropriate internal, internal-external, and external validation. J Clin Epidemiol 2016;69:245–247.
- 28.↑
Hagens ERC , Feenstra ML , Eshuis WJ , et al.. Conditional survival after neoadjuvant chemoradiotherapy and surgery for oesophageal cancer. Br J Surg 2020;107:1053–1061.
- 29.↑
Woo Y , Son T , Song K , et al.. A novel prediction model of prognosis after gastrectomy for gastric carcinoma: development and validation using Asian databases. Ann Surg 2016;264:114–120.
- 30.↑
Kattan MW , Karpeh MS , Mazumdar M , et al.. Postoperative nomogram for disease-specific survival after an R0 resection for gastric carcinoma. J Clin Oncol 2003;21:3647–3650.
- 31.↑
Sterne JAC , White IR , Carlin JB , et al.. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 2009;338:b2393.
- 32.↑
Steyerberg EW. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. New York, NY: Springer Science+Business Media; 2009.
- 33.↑
van Kleef JJ , van den Boorn HG , Verhoeven RHA , et al.. External validation of the Dutch SOURCE survival prediction model in Belgian metastatic oesophageal and gastric cancer patients. Cancers (Basel) 2020;12:834.
- 34.↑
Zipkin DA , Umscheid CA , Keating NL , et al.. Evidence-based risk communication: a systematic review. Ann Intern Med 2014;161:270–280.