Clinical trials are valuable in guiding evidence-based practice in medicine.1 Oncologists' decision-making regarding therapeutic regimens generally depends on information from publications in peer-reviewed journals, the main channel through which trial results are publicly disclosed and communicated.2 Thus, the reporting quality of publications is of vital importance to ensure accurate dissemination of evidence.3,4 is the largest publicly accessible trial registry, and the only one with a results database.5,6 In September 2007, the FDA Amendments Act (FDAAA; section 801) was passed mandating the timely reporting of results of applicable clinical trials to,7 which greatly expanded the legal requirements for the public reporting of trial results and enhanced reporting transparency. In contrast to peer-reviewed publications, which might be subject to the selective judgments of editors and reviewers, information posted on goes through a quality assurance (QA) process when required information is missing or internally inconsistent.
Cancer is a major public health problem worldwide and is the leading and second-leading cause of death in China and the United States, respectively.8,9 The interpretation and accuracy of trial results are of particular concern in medical oncology, in which therapeutic regimens are often rapidly developed and prompt treatment decisions are important for saving lives. So how accurately does the published literature convey information to the oncologic community regarding the efficacy and safety of cancer drugs assessed in clinical trials?
Currently, only one study has investigated the reporting consistency between result database and publications.10 However, trials included in the study were completed before January 1, 2009—2 years after the enactment of mandatory result reporting law—and only 5 trials (3%) related to oncology. The accuracy of published literature conveying information to the oncologic community on cancer drug trials remains unknown. Has reporting consistency improved in the 10 years since the mandatory reporting laws were enacted?
To address these questions, we included cancer drug trials with results posted on and that were completed between 2004 and 2014. Our study had 2 objectives: to identify the degree of completeness and consistency of results reported between the database and the subsequent publications, and to identify the trends of reporting quality and associated characteristics.
Data Source and Study Sample
Data were obtained through the Aggregate Analysis of (AACT) database, reflecting data downloaded up until September 27, 2015. Among approximately 200,000 studies registered on, we focused on clinical trials with results posted on the results database (n=18,474). A total of 323 phase III/IV cancer drug trials with a randomized controlled design that posted results on and were completed between January 1, 2004, and January 1, 2014, were included in the final selection (Figure 1). The detailed selection process is shown in supplemental eAppendix 1, available online with this article at
We then screened a 50% sample through random sampling by STATA, version 12 (StataCorp LP, College Station, TX) (n=160) and searched for matching publications. Trial characteristics were well-balanced between selected and unselected samples (Table 1); search strategies are detailed in supplemental eAppendix 1. We chose the publication that first reported the primary outcome measures (POMs) for publication date records and reviewed it for completeness and consistency. Because no trials uploaded their results before the primary completion date, and the results posted on were required to report on primary outcomes, we believed this method would largely help reduce bias. We found that some trials with publications only recruited a few participants; because small trials are less likely to influence clinical practice, we restricted our final comparison to trials with a minimum sample size of 60 participants (N=117; Figure 1).
Data Extraction and Criteria for Discrepancy
Information on the following 3 dimensions were extracted and compared: (1) basic design information, including study design, number of arms, and number of patients undergoing randomization; (2) efficacy measurements, including the number of POMs, descriptions, and measurements; timing of assessment; number of patients involved in efficacy analysis; and specific metrics; and (3) benefit/risk reports, including individuals affected by at least 1 serious adverse event (SAE) and other adverse event (OAE), risk difference (experimental arm vs control arm risk), and number of individuals at risk per group. For trials with multiple experimental arms, we selected the arm of primary interest stated in the registration. The criteria for discrepancy are stated in supplemental eAppendix 1; specific items in each dimension with definitions and examples showing discrepant results are presented in supplemental eTable 1.
Two investigators (J-W.L., X.L.) independently assessed the completeness and consistency of information between sources. The percentage of data that were discrepant for each item of comparison was calculated to show the disagreement between the 2
investigators; percentages ranged from 0% to 6.8%, which was generally low. Any disagreement was solved by consensus, and a third reviewer (Y-P.C.) randomly rechecked a 50% sample for QA.Scoring System and Statistical Analysis
We applied a scoring system to determine characteristics associated with reporting completeness and consistency (detailed in supplemental eAppendix 1). To compare the basic trial characteristics between selected and unselected samples, chi-square analysis was performed. The completeness of results posted on and in publications was compared using McNemar's test of equality of paired proportions.11 To identify trial characteristics associated with reporting completeness and consistency, we used total score as the outcome variables using the linear regression model. The characteristics of each trial were obtained from the National Library of Medicine (definitions of trial characteristics are presented in supplemental eAppendix 1). The multivariable model with backward elimination included every variable associated with P≤.10 in univariate analysis. Variables significant at P<.05 in the final multivariate model were considered independent predictors. Collinearity among variables was identified by collinearity diagnostics. The significance level of the model was evaluated by F value. All analyses were performed using STATA, version 12. Two-sided P<.05 was considered significant.
Trial Characteristics
Of the 50% random samples (n=160), 1 trial registered as phase IV was found to be phase II and another was found to be a noncancer trial. Thus, these trials were excluded (n=2). Of the remaining 158 trials, 121 (76.6%) had publications. Trial characteristics
Comparison of the Basic Trial Characteristics
Completeness of Reporting
After excluding trials without a minimum sample size of 60 participants, 117 of 121 trials entered into our final comparison. Table 2 compared the completeness of results reporting in the results database and the matching publications. Reporting was significantly more complete on than in publications for SAEs (100% vs 43.6%) and OAEs (100% vs 62.4%). No statistical significance was observed in basic design information (100% vs 100%) and efficacy measurements (92.3% vs 90.6%).
Consistency of Reporting
Table 3 summarized the major discrepancies among trials with both posted and published results for basic design information (n=117), efficacy measurements (n=98), SAEs (n=51), and OAEs (n=73). In basic design information, 16 of 117 trials (13.7%) indicated at least 1 discrepancy. Among these, 2 trials with discrepant study design were attributed to amended protocol during the trial processing; however, the crucial information was not updated in Generally, published articles suggested broader study population for randomization compared with the information in the results database. The median relative difference was 2.5% (range, 0.3%–22.8%).
For efficacy measurements, 86 of 98 (87.8%) trials reported at least 1 discrepancy. The most common discrepancy occurred with secondary outcome measurements (75/98; 76.5%). Of 18 trials with different numbers of POMs and measurement tools, 13 trials reported more POMs in than in publications, 3 trials had the same number of POMs but referred to different measurement tools, and 2 trials reported more POMs in publications. A total of 18 trials differed in treatment effects (specific metrics), of which 2 trials were not comparable as they referred to different measurements. Among 16 comparable trials, 7 reported larger treatment effects in publications; 9 reported larger treatment effects in (1 trial was noninferior). On an absolute scale, observed discrepancy did not change interpretation of results, except for 1 trial. We further investigated the alteration in POMs and the influence of selective reporting. Of the 18 accessible
Completeness of Reporting in the Results Database and the Matching Publications
Reporting Discrepancy Between the Results Database and the Matching Publications
In SAE reporting, 26 of 51 trials (51.0%) reported at least 1 discrepancy. Among these, 20 of 23 trials reported fewer numbers of individuals at risk in publications (median relative difference, 7.1%; range, 0.4%–229.0%); 3 of 23 trials reported fewer numbers of individuals at risk on (median relative difference, 45.2%; range, 25.0%–108.0%). It is worth noting that although relative/absolute differences were minor in some trials, due to the discrepancy between risk groups (experimental arm vs control arm) in 7 trials, the discrepant or even opposite interpretation of SAEs may occur.
In OAE reporting, 54 of 73 trials (74.0%) reported at least 1 discrepancy. Among these, 14 reported fewer numbers of individuals at risk in publications (median relative difference, 4.4%; range, 0.4%–220.9%) and 40 reported fewer numbers of individuals at risk on (median relative difference, 5.1%; range, 0.8%–100.0%). Of note, different interpretations of OAEs may have occurred occur due to the discrepancy between risk groups (experimental arm vs control arm) in 10 trials, although the relative/absolute differences were minor.
Multivariable Factors Associated With Reporting Completeness and Consistency
We then moved one step further to investigate the characteristics associated with reporting quality. The results database was chosen as reference because of its greater completeness; trials with incomplete reporting on were excluded (n=9). A total of 14 (see supplemental eTable 1) items compared in the previous discussion entered the quality scoring. For the 108 trials eligible for quality scoring, the median score was 21 (range, 14–28). Linear regression identified parallel assignment, phase IV trials, primary funding by industry (vs other funding, but not NIH funding), primary completion after 2009, and earlier results posted after primary completion as independent factors associated with greater completeness and consistency (Table 4). The significance level was indicated by F value, which equals 5.28 (P<.001). Collinearity diagnostics showed no evidence of collinearity among the variables (Toleranceall >0.9). Greater completeness and consistency did not favor statistically significant primary outcomes (P=.21).
Trial Characteristics Associated With Completeness and Consistency of Reporting (N=108)a
Discrepancy in reporting clinical trials has triggered widespread concerns. Previous literature has endeavored to explore the reporting discrepancy between publications and other relevant sources, including regulatory documents, clinical study reports, and registrations.12–16 However, these documents are often less available. Resorting to freely available descriptions of trial results, such as the results database, is therefore more applicable and convenient for the public. With the FDAAA requiring mandatory posting of results within 1 year after the primary completion date and standardized reporting of results,7 has become an interesting source of trial results. To date, only one study by Hartung et al10 has investigated the reporting bias between the results database and publications. However, trials included in that study were completed before January 1, 2009, 2 years after the enactment of mandatory result reporting law. The accuracy and trends of modern publications conveying information to the oncology community after long-term enactment of mandatory reporting law remain unknown. By including cancer drug trials with results posted on and completed between 2004 and 2014, we found that the median score of reporting completeness and consistency was 21, indicating generally reasonable reporting quality. However, certain discrepancies are prevalent and persistent, and need to be addressed.
Overestimation and Selective Reporting in Efficacy Measurements
In POMs, we identified 18.4% trials with inconsistent reporting. This estimate was lower than other studies that explored inconsistencies between trial protocols and published results (62%),17 or trial registrations and publications (31%)18; however, it was much the same as the report by Hartung et al.10 It seems that the past years did not witness great improvements in consistent reporting of primary outcomes after the implementation of FDAAA. Additional efforts and tailored policy alongside with FDAAA section 801 should be made to improve reporting quality.
Overestimation and alteration of primary end points were the predominant discrepancy, which increase the prevalence of spurious results and give a false impression about cancer drugs. The purpose of reporting on primary outcomes is to define the most clinically relevant outcomes and protect against selective reporting.19 Additionally, primary outcomes are generally used for the calculation of sample size. However, if primary outcomes are subsequently omitted or altered, their protective mechanism may no longer function.
Incompleteness and Underestimation of Benefit/Risk Reports
Our study found that all trials posted complete AE reporting on, whereas approximately 50% of the trials published complete AE reports in the literature. Similar to our findings, Riveros et al20 pointed out that AE reporting was significantly more complete on than in publications (73% vs 45%). The completeness rate for in our study was higher. Possible reasons were that Riveros et al only included 4% of cancer drug trials, and the search was completed by 2012. The higher rates also reflected the inspiring work performed by the oncology community to the results posted on However, little improvement was seen in the publications. Although this was not surprising in light of word count limits imposed by journal editors,20 it might also be attributed to the poorly measured benefit/harm events or purposeful concealment of unfavorable data.4,21 The findings underlined the need to consult for more information on benefits/risks reported in trials.
Among trials with both posted and published AE reports, discrepant reporting was prevalent, with most trials reporting fewer SAEs in publications. Underreporting of SAEs was of particular concern, because even if some differences did not alter the interpretation of safety issues, it may distort how oncologists balance the benefits and harms of cancer drugs. Moreover, these distortions may be amplified in systematic reviews and meta-analyses.22,23
Improving the Reliability of Trial Results Reporting
We were pleased to see that trials with earlier results posted on tended to have better reporting completeness and consistency. In addition, compared with trials completed before 2009, those completed in the subsequent 5 year period (2009–2014) showed much improvement in reporting quality. The improved reporting quality of later publications reflected the positive feedback for timely posting of results to, and the supervisory function of the mandatory reporting law. Disappointedly, publishing in high-impact journals (impact factor >10) and trials with larger sample sizes (>1,000) did not guarantee better reporting conditions. Special attention should be paid to this phenomenon, because these trials are more likely to influence evidence-based clinical oncology practice.
To improve the reliability of results reporting in clinical trials, active participation of the various stakeholders is needed. First, our findings highlight a growing sense that both clinicians and peer reviewers should access trial results systematically from both and the published literatures, when available. Consulting participant-level “raw data” for reference and resolving discrepancies between the 2 sources are the most crucial procedures to improve transparency and disclosure in clinical trials. Second, the mandatory result reporting law alone is still not enough, although a positive correlation was identified between timely result posting to and better reporting consistency. Tailored policy is also needed, such as regarding QA and reporting timeline, alongside FDAAA section 801, which guarantees the quality of results posting to and reduces discrepancy between the 2 sources.
We note a number of limitations to our study. First, the result database was used as a reference for comparison and scoring. Although result reporting might be more complete and objective in, the information remains suboptimal and invalid. Some discrepancies may be entry mistakes on due to the urgency to report results or inexperience with the submission requirements. Second, we did not evaluate the changes of outcome measurements archived over time; only those reported in the results database were evaluated. Modifications of registered trial protocols, specifically secondary outcome measure additions or deletions, are common prior to publication. Third, the discrepancy in OAEs could be exaggerated possibly due to the different timelines for safety follow-up between primary publication and If there is a longer timeline for safety evaluation in the entry, then it could explain why patients on would be more likely to experience OAEs. Special attention should be paid to interpreting these findings.
Although results reporting of clinical trials assessing cancer drugs showed generally reasonable completeness and consistency, some discrepancies are prevalent and persistent, which jeopardizes evidence-based clinical decision-making. Making results publicly accessible on may provide participant-level “raw data” as reference and help ameliorate reporting bias.
