Background
Mammography screening reduces breast cancer mortality by an estimated 26% to 41%.1,2 However, false-positive recalls—recalling women with abnormal mammograms who, on further testing, are not found to have breast cancer—can cause psychological burden.3–6 Such burden may further decrease women’s screening reattendance by undermining women’s confidence in the benefits of mammography.4,7,8 In Sweden, approximately 2.5 per 100 women attending mammography screening experience a false-positive recall at a single screening round.9 Because Swedish women are screened every second year from aged 40 to 74 years, there is a high lifetime risk of having a false-positive mammography recall. Similarly, in the United States, it is estimated that 30% to 50% of women who participate in mammography screening will have a false-positive recall over a 10-year period.10,11
Although false-positive recalls cannot be eliminated, they can be minimized. Better understanding the association between breast cancer risk factors and false-positive recalls may help reduce their occurrence. Previous studies have found that high breast density is associated with false-positive recalls.11–13 However, no study thus far has investigated the association of false-positive recalls with other mammographic features (eg, microcalcifications and masses) and breast cancer risk prediction models (eg, Tyrer-Cuzick model). Furthermore, false-positive recalls should be reduced, but not at the cost of missing true tumors. Therefore, when investigating determinants of false-positive recalls, the risk of true-positive recalls should be considered, but unfortunately this has been neglected in previous studies.11,12
Using the Karolinska Mammography Project for Risk Prediction of Breast Cancer (KARMA), a population-based screening cohort, we investigated the association of mammographic features, nonmammographic features, and breast cancer risk prediction models with false-positive recalls compared with women who were not recalled as well as those who received a true-positive recall.
Methods
Data Sources
The KARMA study comprises women who attended mammography screening or clinical mammography at 4 hospitals in Sweden between January 2011 and March 2013.14 Blood samples were collected at baseline from 98% of KARMA participants. The study further collected data from questionnaires and mammograms. Detailed information about recruitment, participant characteristics, questionnaires, mammograms collection, and follow-up can be found elsewhere.14
Using the unique Swedish personal identification number,15 we linked KARMA to the Stockholm-Gotland Breast Cancer Register, the National Quality Register for Breast Cancer, and the Stockholm mammography screening program. This program invites all women in Stockholm aged 40 to 74 years for mammography screening at 18- to 24-month intervals.9,16–18
Study Population
We identified all 32,185 KARMA participants who participated in the Stockholm mammography screening program. We then excluded women who had breast enlargement (n=713), breast reduction (n=975), or other breast surgeries (n=816) and those who were diagnosed with breast cancer before the KARMA baseline (n=552), leaving 29,129 for the final analyses (supplemental eFigure 1, available with this article at JNCCN.org).
For analysis of recall rates, we included all mammography screening records within 30 days of KARMA recruitment (n=28,192). Among these screenings, there were 796 recalls. To examine associations of risk factors with both false-positive and true-positive recalls, we conducted a matched case-control study based on screenings performed between 2011 and 2015 for the 29,129 women (supplemental eFigure 1). Specifically, we identified 1,550 women who received their first mammography recall at or after entering the KARMA cohort. Selecting recalls in this way increased our study’s statistical power (compared with using only recalls at enrollment), given that most women attended >1 screening during the study period. For each recalled woman, we randomly selected 5 age-matched (±1 year) and screening history–matched control individuals (women who were not recalled at the time and had never been recalled previously). We then further categorized recalled women as having a false-positive (n=1,233) or true-positive recall (n=317).
In Stockholm, all screening units have used double-reading with a consensus decision method, which involves 2 radiologists independently assessing mammograms for each participant to decide whether the woman is “healthy” or needs to be recalled for further assessment.9 Each radiologist decides whether a woman should be recalled or not predominantly based on any suspicious mammographic findings, such as masses, microcalcifications, and asymmetry of density, while reviewing prior mammograms (for women at second or later screens) for comparison. If either radiologist notes a suspicious finding, the case is discussed until a consensus is reached. Information on family history of breast cancer or other breast cancer risk factors is not collected or considered at mammography clinics. In this study, false-positive recalls were defined as not being diagnosed with breast cancer (including invasive and in situ breast cancer) between the date of being recalled and the next scheduled screening visit. True-positive recalls were defined as being diagnosed with breast cancer within the same period. To categorize the screening outcomes, we used data from the Swedish Cancer Register with follow-up until the end of 2017.
Nonmammographic Features
Information on the following breast cancer risk factors was retrieved from the KARMA questionnaire14: years of education (<10, 10–12, >12 years), family history of breast cancer (no, yes), history of benign breast disease (no, yes), age at menarche (<14, ≥14 years), nulliparity (no, yes), number of children (0, 1 or 2, >2), age at first birth (<25, 25–35, >35 years), duration of breastfeeding (0, <6, 6–12, >12 months), use of oral contraceptive (never, ever), use of any hormone replacement therapy (never, ever), body mass index (<25, 25.0–29.9, ≥30.0 kg/m2), physical activity (<40, 40–44.9, ≥45 metabolic equivalents of task h/d), smoking status (never, ever), and alcohol consumption (0, 0.1–10, >10 g/d).
Mammographic Features
Mammograms of both breasts were used to measure mammographic features. Dense area was measured for each breast using the automated STRATUS method.19 The number of masses and microcalcification clusters was measured using FDA-approved computer-aided detection software (M-Vu CAD; iCAD).20 Mammographic features were categorized as dense area (<9, 9 to <20, >20 cm2), number of masses (0, ≥1), and microcalcifications (0, ≥1). For a woman with a recall (and her subsequent matching control individuals), we defined the recalled (right or left) side of the breast as the “recalled side” and the other as the “contralateral side.” Asymmetry of mammographic features between the 2 breasts was defined using the difference between dense area and the number of masses and microcalcification clusters on the recalled side compared with the contralateral side. Equal was defined as within 6 cm2 of dense area, same number of masses, and same number of microcalcification clusters, respectively.
Breast Cancer Risk Models
The Tyrer-Cuzick 10-year breast cancer risk score was categorized as low (<3%), medium (3% to <5%), or high (≥5%).14,21 The KARMA 2-year risk score was categorized as low (<0.6%), medium (0.6% to <1.0%), or high (≥1.0%).20 A weighted breast cancer polygenic risk score for each genotyped individual with European ancestry was calculated by including 313 single-nucleotide polymorphisms that have previously been identified.22
Statistical Analyses
We estimated the mammography screening recall rates (numbers per 1,000 screening) by nonmammographic features, mammographic features, and breast cancer risk models. Chi-square tests (P<.1) were used to examine and select risk factors that differed by recall status. We then used conditional logistic regression models to investigate the association between these selected breast cancer risk factors and both false-positive and true-positive recalls, respectively, compared with age-matched control individuals. Furthermore, we also conducted logistic regression to directly compare women with false-positive recalls with those with true-positive recalls. Additional analyses were also conducted for factors (P≥.1 in Table 1) that were not associated with recalls.
Association Between Breast Cancer Risk Factors, Risk Models, and Mammography Screening Recall Rates
All analyses were performed using SAS 9.4 (SAS Institute Inc.) and R version 3.6 (R Foundation for Statistical Computing). All P values were 2-sided. The Regional Ethical Review Board in Stockholm, Sweden, approved the study.
Results
For every 1,000 mammography screenings, there were, on average, 28 women recalled for further examinations (Table 1). Women with the following breast cancer risk factors had significantly higher recall rates than those without: family history of breast cancer, history of benign breast disease, dense breasts (dense area ≥20 cm2), masses, microcalcifications, and asymmetry of these 3 mammographic features (more prevalent in the recalled breast) (Table 1 and supplemental eTable 1). In addition, women with high breast cancer risk, measured by all 3 risk models (Tyrer-Cuzick, KARMA, and polygenic models), also had higher recall rates than those with low risk (Table 1).
In contrast, age was the only breast cancer risk factor negatively associated with recall rates (Table 1). Specifically, we found that false-positive recall rates were highest in women aged 40 to 49 years and declined with age, whereas true-positive recall rates increased with age (Figure 1). Of note, the number of false-positive recalls per true-positive recall was 9.78 (95% CI, 7.11–15.09) in women aged 40 to 49 years, which was 4 times higher than the 1.78 (95% CI, 1.38–2.33) observed among women aged 60 to 74 years (Figure 1).
Nonmammographic Factors
Women with a history of benign breast disease were more likely than those without to have mammography recalls (Table 1), including false-positive and true-positive recalls (Table 2). Further analyses restricted to women who were recalled found that those with and without a history of benign breast disease had a similar risk of having a false-positive and true-positive recall (Figure 2). Women with a family history of breast cancer were more likely than those without to have a recalled mammogram (Table 1), particularly for true-positive recalls (Table 2). Further analyses restricted to women who were recalled found that those with a family history of breast cancer were more likely to have a true-positive recall than false-positive recall (Figure 2). None of the other factors favor false-positive over true-positive recalls (supplemental eFigure 2).
Association Between Nonmammographic and Mammographic Features and FP and TP Recalls
Mammographic Factors
Women with high breast density, masses, and microcalcifications were more likely than those without to have a recalled mammogram (Table 1), including both false-positive and true-positive recalls (Table 2). Further analyses restricted to women who were recalled found that those with masses and microcalcifications were more likely to have a true-positive than a false-positive recall, whereas no significant difference was found for breast density (Figure 2). Furthermore, women with asymmetric breast features were also more likely to have a recalled mammogram (supplemental eTable 1), particularly for true-positive recalls with asymmetry of masses and microcalcifications (supplemental eTable 2 and supplemental eFigure 3).
Breast Cancer Risk Models
Women with high risk scores—measured using Tyrer-Cuzick 10-year, KARMA 2-year, and polygenic risk models—were more likely to have a recalled mammogram (Table 1), including both false-positive and true-positive recalls (Table 3). Further analyses restricted to women who were recalled found that women with high risk scores were more likely to have a true-positive than a false-positive recall (Figure 2).
Association Between Breast Cancer Risk Models and FP and TP Recalls
Discussion
In this large, population-based screening cohort, we identified several breast cancer risk factors and risk models that were associated with a higher risk of having a mammography recall. Further dividing recalls into false-positive and true-positive found that, among these factors, age was negatively associated with false-positive recall rates and positively associated with true-positive recall rates. Breast density and having a history of benign breast disease were equally associated with false-positive and true-positive recalls. Moreover, having a family history of breast cancer, masses, microcalcifications, and increased risk of breast cancer, measured using Tyrer-Cuzick, KARMA, and polygenic risk models, were associated with an increased risk of having a true-positive rather than a false-positive recall.
We found that 2.83% of women were recalled with false-positive recalls at the screening taken at enrollment in KARMA. In Sweden, all women aged 40 to 74 years are invited to attend breast cancer screening biennially (every 18–24 months) by mammography. Therefore, even if the rate of false-positive recalls is low in a single screening, a woman’s lifetime risk of having a false-positive recall could be high. In the United States, half of women aged 40 to 69 years will have at least one false-positive mammogram after 10 screenings.23 This large number of false-positive recalls, together with the fact that false-positive recalls can cause psychological burden,3 highlights the importance of reducing false-positive recalls.
We showed that age was strongly associated with false-positive recalls in mammography screening. Among women aged 40 to 49 years, there were, on average, 10 false-positive recalls per true-positive recall (tumor detected), which was almost 4 times higher than the observed rate among older women (aged >60 years). In part, this can be explained by younger women tending to have denser breasts than older women. In our study population, the mean dense area among women aged 40 to 49 years was 42.6 cm2, which is higher than the 21.4 cm2 observed among women aged >60 years. This finding suggests that young women generally may experience a higher harm/benefit ratio when attending mammography screening.12 This finding has 2 important implications. First, for young women (aged 40–49 years) invited to screening, tailored interventions24 to promote the benefits of screening while providing information about the likelihood of receiving a false-positive recall by age could be useful, because this experience may shape women’s perceptions of mammography screening and therefore their future adherence. Second, given that there is still debate regarding whether to start mammography screening at 40 or 50 years of age, these concerns are important for policymakers to consider together with other factors to minimize the harm/benefit ratio and select the right women for mammography screening.
Mammographic features are the main basis for radiologists to determine screening results. In line with previous studies, we found that higher mammographic density is associated with false-positive recalls in mammography screening, suggesting that dense breasts may mask the appearance of the tumor, making it difficult to determine the screening result.11,12 Our research question, however, was whether these factors were more strongly associated with false-positive than with true-positive recalls. Our findings suggest that high density is equally associated with both false-positive and true-positive recalls. Furthermore, our study showed that although microcalcifications and masses (identified through iCAD software) were positively associated with having a false-positive recall, they were actually more strongly associated with having a true-positive recall. The FDA-approved iCAD software is reluctantly used at some clinics, due to a fear of increasing false-positive recalls.25,26 Our findings support the use of iCAD software to better detect breast tumors.
To the best of our knowledge, this is the first study to investigate and show that breast cancer risk models—Tyrer-Cuzick, KARMA, and polygenic risk models—are positively associated with having a false-positive recall. These results are novel but not surprising, given that these risk models directly or indirectly take mammographic features into account. Specifically, the KARMA model27 directly incorporates breast density, masses, microcalcifications, and their asymmetries; the Tyrer-Cuzick model21 incorporates hormone-related factors, which can affect mammographic features28,29; and the polygenic risk model positively correlates with not only mammographic density but also microcalcifications and masses.30 However, when restricting our analyses to women who were recalled, we found that model estimated risks were associated with higher risk of having true-positive than false-positive recalls. Therefore, incorporating breast cancer risk models into mammography screening may help to identify true tumors rather than false-positive recalls. This finding is important because risk-based screening may be a reality in the near future.
We found that all investigated factors (except for age) either were not associated with false-positive recalls or were more closely associated with true-positive recalls. Therefore, these factors cannot be used to develop target interventions to diminish false-positive recalls. However, this does not mean that false-positive recalls cannot be minimized. For example, the recall rate in the United States is double that in Europe, even though cancer detection rates are similar,23 indicating an unnecessary burden for women without breast cancer being recalled for further testing after screening. Furthermore, there are several novel ways that may help to decrease false-positive recalls. First, an artificial intelligence support system might help radiologists to improve both the specificity and the sensitivity of mammography screening.31 Second, other modalities, such as digital breast tomosynthesis and contrast-enhanced spectral mammography, have also been shown to reduce the number of false-positive recalls.32,33
Our study has several strengths. Our large sample size, together with detailed information collected in registers and a questionnaire, allows us to take a large number of breast cancer risk factors into consideration. We had full information on each woman’s mammography screening history dated back to 1989, which guaranteed an accurate definition of screening results. Furthermore, we have measured breast density, masses, and microcalcifications using an automated method, thus strengthening the reproducibility and comparability of these mammographic features. Despite this, our study was limited to women participating in KARMA,14 who are generally more highly educated and likely to have a family history of breast cancer than the general Swedish female population.14 The generalizability of our results to other countries with different mammography screening strategies is limited, as with all studies investigating false-positive mammography recalls.34
Conclusions
Our study provides a better understanding of false-positive mammography recalls by comparison with both women who were not recalled and women who received true-positive recalls. Although several risk factors and risk models were associated with having a false-positive recall, they were equally or more strongly associated with having a true-positive recall. Our findings indicate that none of the studied breast cancer risk factors can be used to develop target interventions to reduce false-positive mammography recalls.
References
- 1.↑
Mandelblatt JS, Stout NK, Schechter CB, et al. Collaborative modeling of the benefits and harms associated with different U.S. breast cancer screening strategies. Ann Intern Med 2016;164:215–225.
- 2.↑
Yen AM, Tsau HS, Fann JC, et al. Population-based breast cancer screening with risk-based and universal mammography screening compared with clinical breast examination: a propensity score analysis of 1,429,890 Taiwanese women. JAMA Oncol 2016;2:915–921.
- 4.↑
Brewer NT, Salz T, Lillie SE. Systematic review: the long-term effects of false-positive mammograms. Ann Intern Med 2007;146:502–510.
- 5.↑
Canelo-Aybar C, Ferreira DS, Ballesteros M, et al. Benefits and harms of breast cancer mammography screening for women at average risk of breast cancer: a systematic review for the European Commission Initiative on Breast Cancer. J Med Screen 2021;28:389–404.
- 6.↑
Tosteson AN, Fryback DG, Hammond CS, et al. Consequences of false-positive screening mammograms. JAMA Intern Med 2014;174:954–961.
- 7.↑
Román R, Sala M, De La Vega M, et al. Effect of false-positives and women’s characteristics on long-term adherence to breast cancer screening. Breast Cancer Res Treat 2011;130:543–552.
- 8.↑
McCann J, Stockton D, Godward S. Impact of false-positive mammography on subsequent screening attendance and risk of cancer. Breast Cancer Res 2002;4:R11.
- 9.↑
Lind H, Svane G, Kemetli L, et al. Breast cancer screening program in Stockholm County, Sweden - aspects of organization and quality assurance. Breast Care (Basel) 2010;5:353–357.
- 10.↑
Elmore JG, Barton MB, Moceri VM, et al. Ten-year risk of false positive screening mammograms and clinical breast examinations. N Engl J Med 1998;338:1089–1096.
- 11.↑
Hubbard RA, Kerlikowske K, Flowers CI, et al. Cumulative probability of false-positive recall or biopsy recommendation after 10 years of screening mammography: a cohort study. Ann Intern Med 2011;155:481–492.
- 12.↑
Nelson HD, O’Meara ES, Kerlikowske K, et al. Factors associated with rates of false-positive and false-negative results from digital mammography screening: an analysis of registry data. Ann Intern Med 2016;164:226–235.
- 13.↑
Ho PJ, Bok CM, Ishak HMM, et al. Factors associated with false-positive mammography at first screen in an Asian population. PLoS One 2019;14:e0213615.
- 14.↑
Gabrielson M, Eriksson M, Hammarström M, et al. Cohort profile: the Karolinska Mammography Project for Risk Prediction of Breast Cancer (KARMA). Int J Epidemiol 2017;46:1740–1741g.
- 15.↑
Ludvigsson JF, Otterblad-Olausson P, Pettersson BU, et al. The Swedish personal identity number: possibilities and pitfalls in healthcare and medical research. Eur J Epidemiol 2009;24:659–667.
- 16.↑
Holm J, Humphreys K, Li J, et al. Risk factors and tumor characteristics of interval cancers by mammographic density. J Clin Oncol 2015;33:1030–1037.
- 17.↑
Socialstyrelsen [National Board of Health and Welfare]. Screening för bröstcancer: rekommendation och bedömningsunderlag [Screening for breast cancer: recommendation and assessment basis]. Accessed January 15, 2022. Available at: https://www.socialstyrelsen.se/globalassets/sharepoint-dokument/artikelkatalog/nationella-screeningprogram/2014-2-32.pdf
- 18.↑
Olsson S, Andersson I, Karlberg I, et al. Implementation of service screening with mammography in Sweden: from pilot study to nationwide programme. J Med Screen 2000;7:14–18.
- 19.↑
Li J, Szekely L, Eriksson L, et al. High-throughput mammographic-density measurement: a tool for risk prediction of breast cancer. Breast Cancer Res 2012;14:R114.
- 20.↑
Eriksson M, Czene K, Pawitan Y, et al. A clinical model for identifying the short-term risk of breast cancer. Breast Cancer Res 2017;19:29.
- 21.↑
Tyrer J, Duffy SW, Cuzick J. A breast cancer prediction model incorporating familial and personal risk factors. Stat Med 2004;23:1111–1130.
- 22.↑
Mavaddat N, Michailidou K, Dennis J, et al. Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. Am J Hum Genet 2019;104:21–34.
- 23.↑
Fletcher SW, Elmore JG. False-positive mammograms—can the USA learn from Europe? Lancet 2005;365:7–8.
- 24.↑
Sabatino SA, Lawrence B, Elder R, et al. Effectiveness of interventions to increase screening for breast, cervical, and colorectal cancers: nine updated systematic reviews for the guide to community preventive services. Am J Prev Med 2012;43:97–118.
- 25.↑
Khoo LA, Taylor P, Given-Wilson RM. Computer-aided detection in the United Kingdom National Breast Screening Programme: prospective study. Radiology 2005;237:444–449.
- 26.↑
Cole EB, Zhang Z, Marques HS, et al. Impact of computer-aided detection systems on radiologist accuracy with digital mammography. AJR Am J Roentgenol 2014;203:909–916.
- 27.↑
Michailidou K, Lindström S, Dennis J, et al. Association analysis identifies 65 new breast cancer risk loci. Nature 2017;551:92–94.
- 28.↑
Boyd NF, Rommens JM, Vogt K, et al. Mammographic breast density as an intermediate phenotype for breast cancer. Lancet Oncol 2005;6:798–808.
- 29.↑
Azam S, Eriksson M, Sjölander A, et al. Predictors of mammographic microcalcifications. Int J Cancer 2021;148:1132–1143.
- 30.↑
Holowko N, Eriksson M, Kuja-Halkola R, et al. Heritability of mammographic breast density, density change, microcalcifications, and masses. Cancer Res 2020;80:1590–1600.
- 31.↑
Rodríguez-Ruiz A, Krupinski E, Mordang JJ, et al. Detection of breast cancer with mammography: effect of an artificial intelligence support system. Radiology 2019;290:305–314.
- 32.↑
Mann RM, Hooley R, Barr RG, et al. Novel approaches to screening for breast cancer. Radiology 2020;297:266–285.
- 33.↑
Chong A, Weinstein SP, McDonald ES, et al. Digital breast tomosynthesis: concepts and clinical practice. Radiology 2019;292:1–14.
- 34.↑
Elmore JG, Nakano CY, Koepsell TD, et al. International variation in screening mammography interpretations in community-based programs. J Natl Cancer Inst 2003;95:1384–1393.