Tuberculin skin test and Quantiferon test agreement and influencing factors in tuberculosis screening of healthcare workers: a systematic review and meta-analysis

Objective A systematic review and meta-analysis was conducted to evaluate the agreement between Tuberculin Skin Test (TST) and Quantiferon (QFT) in screening for tuberculosis (TB) infection among healthcare workers (HCWs) and to estimate associations between TST and QFT agreement and variables of interest, such as Bacillus Calmette-Guérin (BCG) vaccination and incidence of TB. Methods Cross-sectional and longitudinal studies on HCWs, published in English until October 2013, comparing TST and QFT results, were selected. For each study Cohen’s κ value and a 95% confidence interval were calculated. Summary measures and indexes of heterogeneity between studies were calculated. Results 29 studies were selected comprising a total of 11,434 HCWs. Cohen’s κ for agreement between TST and QFT for 24 of them was 0.28 (95% CI 0.22 to 0.35), with the best value in high TB incidence countries and the lowest rate of BCG vaccination. Conclusion Currently, there is no gold standard for TB screening and the most-used diagnostic tools show low agreement. For evidence-based health surveillance in HCWs, occupational physicians need to consider a number of factors influencing screening results, such as TB incidence, vaccination status, age and working seniority.


Introduction
Occupational exposure to biological agents is a major risk for healthcare workers (HCWs) thus in health surveillance much has to be dedicated to infectious disease screening. Tuberculosis (TB) is an ongoing risk in lowincome countries due to abandonment of vaccination campaigns, immigrations flows, wide diffusion of primary or secondary immunosuppression, poor efficacy of vaccine currently in use [1,2] so that tuberculosis remains a major public health problem.
According to the World Health Organization (WHO), in 2012 there were an estimated 8.6 million cases of TB (range 8.3-9.0 million) globally, equivalent to 122 cases per 100,000 population. Most of the estimated case numbers in 2012 occurred in Asia (58%) and Africa (27%); smaller proportions occurred in the Eastern Mediterranean region (8%), the European region (4%) and the Americas (3%). One third of the world's population is estimated to be latently infected with Mycobacterium tuberculosis: people with latent TB infection (LTBI) do not show symptoms of TB and are not infectious, but they are at risk of developing active disease and becoming infectious. Several factors increase the risk of progressing from infection to active TB, for example, HIV infection or immunosuppressive treatment, malnutrition, diabetes and alcohol abuse. Preventing active TB by addressing these risk factors as well as proper diagnosis and treatment of LTBI in selected risk groups is thus important for the individual and for public health [3].
It is clear that LTBI diagnosis is mostly based on screening programs that address the general population or occupational categories such as healthcare workers. Actually, there is a lack of gold-standard for LTBI diagnosis: traditionally, TB infection screening is conducted by tuberculin skin testing (TST). Some years ago interferon-gamma release assays (IGRAs) became commercial available. IGRAs are used as a confirmatory test for TST in a two steps procedure or as a replacement of the TST particularly in situations in which the TST is not recommended [4]. TST remains the major tool used around the world for diagnosis of TB infection because of well-established algorithms for test interpretation. In addition TST is easy to use and it has a good cost-effectiveness. Although widely used, TST has limitations; its sensitivity may be reduced by malnutrition, severe TB diseases and immunodeficiency. Decreased TST specificity might occur in settings where non-tuberculous mycobacteria (NTM) are prevalent and in populations who have received Bacillus Calmette-Guérin (BCG) vaccine post-infancy or via multiple vaccinations [5], although its effect on TST reactions could be modest after 10 years. Additionally, completing the TST requires two healthcare visits, resulting in loss of reading in approximately 10% of cases [6]. This method is affected by inter-observer variability and the positive result does not distinguish recent from remote infection [6].
Most recent national guidelines present IGRAs (especially Quantiferon, QFT) as a new valid tool for diagnosis of latent tuberculosis, also because they are ex-vivo bloodbased tests that, in contrast to the TST, can be repeated any number of times without sensitization or boosting, they require only one visit and do not need a baseline two-step protocol [5]. However, reviews have suggested that IGRA performance differs in high versus low TB incidence settings as well as in presence of some risk factors [6]. Moreover, IGRA reproducibility is influenced by several technical factors and immunomodulation [5]; subsequently, appropriate cut-offs and borderline zones need jet to be derived especially for interpreting of IGRA result in serial testing of HCWs in light of an individual's tuberculosis risk factors [7]. Although a single IGRA is more expensive than the intradermal investigation, the cost-effectiveness analysis depends on epidemiological and individual elements, as explained in several studies that also sought to elaborate specific models [8].
Several recent systematic reviews showed that HCWs are at an increased risk of exposure to Mycobacterium tuberculosis [1,2,9,10]. For this reason, periodic screening of HCWs is an important component of TB programs, according to the background TB incidence in the population, resulting in TB as an occupational disease.
In specific working population, such as HCWs, serial testing for TB seems to be more appropriate in order to identify recent infections and to target infected individuals for preventive therapy [8]. Some guidelines from highincome low-incidence countries have not recommended IGRAs for serial testing of HCWs while others state that IGRAs may be used for serial testing of HCWs in place of the TST [11]: according to WHO guidelines [12] IGRAs should not be used in HCW screening programs for lowand middle-income countries (strong recommendation). This indication derives from reversion or conversion rates that reduce IGRA reproducibility.
Until now, there have been various systematic reviews of literature, evaluating prevalence or incidence of latent TB disease among HCWs [9] or IGRA performance [11] for tuberculosis screening in HCWs; agreement between TST and IGRA has been generally evaluated as secondary outcome. A systematic review and meta-analysis had compared the accuracy of Quantiferon TB Gold in Tube and the T-SPOT assays with the TST, but has not considered HCWs and not quantified the agreement between skin testing and IGRA [13].
The present study aims to conduct a systematic review with a meta-analysis of the impact of some factors on the agreement between the two tests (TST and QFT) for TB screening programs in HCWs, measured with Cohen's κ in order to be more rigorous and informative than narrative and systematic reviews [14,15]. The impact of some risk factors on the outcome of those tests has also been examined in order to derive an "evidence-based" protocol for healthcare workers at risk of tuberculosis.

Data sources and searches
A systematic review with meta-analysis, according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement, was conducted [16].
Original articles were searched through PubMed, Embase, Web of Science and Scopus from 1 January 2004 to17 October 2013, using various combinations, in line with the specific database language, of the terms "workers" AND "tuberculosis" OR "TB infection" or "TB disease" OR "TB" AND "tuberculin skin test" OR "tuberculin skin testing" AND "Quantiferon". Additional studies were taken by means of reference lists from previous review articles, and citations of relevant original articles were screened.

Study outcomes and selection
Original articles were evaluated, including cross-sectional and longitudinal studies, meeting all the following criteria: screening of LTBI in HCWs with contemporary TST and QFT, comparison between TST and QFT results, sample vaccination rates, English language. The following studies were excluded: duplicates, case reports or studies on close contacts, editorials, immunological or laboratory studies, NTM studies. Articles about HCWs affected by HIV, chronic rheumatological diseases or inflammatory bowel diseases were also eliminated in order to avoid other influence factors and to obtain a more homogeneous sample.
Epidemiological studies often did not apply to specific medical occupational groups, but instead calculated the risk of infection or disease for the overall group of healthcare workers with a highly heterogeneous definition of 'healthcare worker' that included both occupational groups with a potentially increased risk and groups without any contact with tuberculosis patients [10]. In studies physicians, nurses, midwives, laboratory personnel, radiographers, medical or nursing students were considered as HCWs; if the authors also included a few contacts or administrative workers, only data on HCWs was extrapolated wherever possible.
Three reviewers (RU, MM, MGLM) independently screened the citations (title and abstract) identified from all sources. Subsequently, full-text articles selected by titles and abstracts were reviewed to identify the final set of eligible studies. Disagreements were resolved by discussion.

Data extraction
The following characteristics of each study were listed: country, publication and screening years, sample size, type of study, incidence of TB disease in the general population, gender, mean HCW age, working age, type of tuberculin used for testing, BCG vaccination rates (Table 1).
According to several national guidelines, different types of purified tuberculin proteins (PPD) were used, PPD RT23 and PPD-S; 5 TU dose of PPD-S (0.1 ml) are accepted as bio-equivalent to 2 TU of PPD RT23 (0.1 ml) [17].
Different types of QFT have been considered, such as in the technological development of IFN-γ release assays over the years: from the first assay, known as Quantiferon-TB, to the last and more specific Quantiferon-TB-Gold in Tube (QFT-GIT), that replaced all other types, which are no longer marketed [18]. In order to provide clear information about IGRA/TST agreement, we considered all of them as "QFT".
In order to calculate overall agreement between the two main screening tools, data on TST induration (cutoff positivity ≥ 10 mm) and QFT positivity (cutoff positivity 0.35 IU/ml) in HCWs was considered among with the impact of variables of interest on that agreement (vaccination rates and TB incidence). In particular, studies were divided in two groups on the basis of the BCG vaccination rate using different cutoff values because no reference value was available. The incidence of all forms of TB disease in the general population were obtained from the WHO global TB database for each study year, or publication year, if unavailable. Studies were classified in three groups based on TB incidence in each country: low ≤20 cases/100,000, intermediate 21-99 cases/100,000, and high ≥100 cases/100,000. In order to best characterize TB incidence, the screening year(s) was/were identified in each study; studies lasting for two or more years had an incidence rate calculated by mean.
Any discrepancies were resolved by consensus with the help of the team coordinator, thus obtaining an interreviewer agreement of 100%.

Data synthesis and analysis
Study characteristics and results are presented in tables and plots. The primary outcome of the meta-analysis was the Cohen's κ. For each study the κ value and 95% confidence interval (CI) were calculated. Weighted mean effect size was used as a summary measure. Heterogeneity of studies was assessed by using Q statistics and I 2 [19]. The P value of Q statistics of less than 0.10 was considered significant [20]. I 2 values of 25%, 50%, and 75% correspond to cutoff points for low, moderate, and high degrees of heterogeneity. If overall heterogeneity was significant, a random-effect model was used, otherwise a fixed-effect model was used. Meta-regression was applied to test difference of study-level covariates. Metaregression is a regression model that relates the effect to study-level covariates, while assuming additivity of within-study and between-studies components of variance. Restricted maximum likelihood estimators were used to estimate model parameters [21]. A permutation test (using 1,000 re-allocations) was used to assess the true statistical significance of an observed meta-regression finding [22]. Data was analyzed using Stata, version 11.0 (Stata Corp., College Station, TX). All statistical tests were twosided and p-values <0.05 were regarded as significant. Figure 1 shows the flow chart of the studies selection. From the initial 1,430 abstracts, 29 studies [23-51] met our inclusion criteria; 871 studies were discarded as not concerning HCWs, as expected, because a more sensitive than specific search string was preferred in order to obtain as many studies of the health sector as possible. Two hundred and eighty studies were rejected owing to the lack of all results about agreement between TST and QFT or about vaccination status; 13 were rejected for enrolling the same sample of HCWs. Characteristics of the selected studies in this review are shown in Table 1, data are presented by TB incidence. Most studies were   cross-sectional (25/29); in prospective studies, baseline information was taken. Study sizes ranged from 54 to 2,884 HCWs, for a total of 11,434 HCWs across the 29 studies. In total, there were 8,098 female and 2,725 male workers; for 611, gender was not indicated [40]. The age mostly represented in the sample was between 30 and 50 or between 18 and 30 when students were included. The overall mean or median age could not be calculated because the sample age was differently reported in the studies as mean, median or percentage.

Description of included studies
Working seniority was defined by the number of years or months spent in contact with patients: it played an important role in TB screening results, but in 11 studies data was not shown; in the other 18, this information was heterogeneous (reported through mean, median or percentage). Therefore as with age a mean or median for working seniority could not be calculated; one study carried out a pre-employment evaluation in a healthcare occupational surveillance program [40].
Skin testing was conducted using different types and doses of tuberculin: the dose varied from 1 to 10U according to the national guideline. TST was performed using 2U of RT23 in most studies (14/29) while in 12/29 5U of PPD-S were used.
On the basis of TB incidence identified by WHO information for each year of screening, studies were divided into three groups of incidence: the low TB incidence group was the most represented, being included in 10 studies, followed by the intermediate group with 7 studies and the high group with 7 studies (Figure 2). In a few cases, there was a relevant difference between the two incidence rates [31,32,46,35] owing to a delay in publication. Table 2 shows data by TB incidence on subjects with both TST and QFT results, excluding indeterminate QFT results; for this reason the effective sample size was 10,314, or lower than in Table 1.

Agreement between TST and QFT
Screening results were reported as positive TST alone, positive QFT alone and crossed of TST and QFT. Out of the 10,314 tests performed, TST and QFT agreed for 6,893 of them and failed to do so for 3,421. TST positive QFT negative discordance occurred about four times more often than TST negative QFT positive discordance [2,711 (26.3%) versus 710 (6.9%)].
In order to evaluate TST and QFT agreement, a statistical analysis was conducted using Cohen's κ in each study. However, only for 24/29 studies, which used a TST positivity cutoff at 10 mm, overall agreement was calculated. Κ values appeared in a wide range from 0.10 to 0.61, with a significant heterogeneity (p < 0.0001, I 2 = 91.6%). Overall κ value, estimated using the random effect model, was 0.28 (95% CI 0.22 to 0.35), which is quite low reflecting that almost one third of TST and QFT results were discordant (Table 2).
According to TB incidence classification, TST and QFT agreement was calculated with Cohen's κ resulting in 0.25 (95% CI 0.17 to 0.34) in the low incidence group, 0.19 (95% CI 0.17 to 0.21) in the intermediate, and 0.38 (95% CI 0.23 to 0.53) in the high group. The best agreement was observed in the high incidence group, while the worst was seen in the intermediate one, with the highest rate of vaccination; comparing the three κ figures there was a significant difference between the intermediate and the high incidence group (p = 0.041).
Furthermore, studies were divided in two groups (lower and higher vaccination rate) to best elucidate BCG vaccination impact on agreement; considering a 90% cutoff value as a statistically significant difference (p = 0.013) was found, with an agreement of 0.34 (95% CI 0.25 to 0.43) in the lower rate group (15 studies), and 0.17 Studies not included in the meta-analysis.

Discussion
This meta-analysis had the aim of analyzing screening tools for diagnosis of TB infection among HCWs and examining the impact of some risk factors on the outcome of those tests in order to derive an "evidencebased" protocol for screening of healthcare workers at risk of tuberculosis. As already highlighted [11], overall agreement between TST and QFT was quite low. This result can be related to the different immunological targets of the two tests, so that any immunological dysfunction can variously influence their respective results. Moreover, QFT is often characterized by fluctuating results so that its reproducibility is unclear [52]; on the other hand, TST has an inter-observer variability. TST and QFT also differ in specificity and sensitivity. A lower rate of positivity in the QFT can be explained by a higher specificity of QFT than TST that could come from the intrinsic difference in the methods used by the two tests. QFT uses antigens showing higher specificity to Mycobacterium tuberculosis and to only a limited number of NTM, in contrast with the tuberculin used in TST, which represents a mixture of more than 200 nonspecific antigens shared with NTM and with the strains developed from Mycobacterium bovis used for BCG vaccination [53]. An important feature of the present systematic review was the high heterogeneity of the studies chosen owing to different impact of each variable (vaccination rates, incidence of TB in each country, age, working seniority, induration diameter cutoff and type of PPD) on the TST/QFT agreement. Again, this variability could be considered as a strength of the study because it offered an opportunity to best elucidate the agreement, taking these factors into account.
BGC vaccination status and incidence of TB, influencing TST and QFT agreement at the same time, could not be valued separately. BCG vaccination reduced the agreement, influencing the TST positivity (rather than QFT positivity) and increasing the risk of a false positive result, especially in recently vaccinated subjects. Although some studies [29,45,47,34] showed a QFT positivity amount lower than the TST positivity one, in the BCG vaccinated group, this result could not be explained with a crossreaction between vaccination and QFT antigens, but with a TB infection among vaccinated subjects. In the high TB incidence group, a vaccination status lower than 90% was found and, at the same time, the higher observed agreement, although two studies in particular contributed to the decrease of the agreement [49,50]; in detail, some authors found high rates of TST-/QFT+ [49] and others found high rates of TST+/QFT- [50]. In the first case, the result could be explained by a high TB risk ward and a history of TB infection for some subjects; in the other study, false positive results can be explained by re-vaccination. Some authors also affirmed that repeated vaccination influenced quantitative TST positivity but decreased a probability of positive QFT in the case of three or more repeated doses [42]. Moreover, in the low TB incidence group, studies with higher agreement were characterized by the lower rate of vaccination.
Increasing age of HCWs is correlated with concordant TST and QFT positive results [33,34]; Discordant QFT positive TST negative results were associated with an age of over 40 [46,50] or over 50 [49,42] and anyhow this association increased each year [39], although data was not statistically elaborated due to heterogeneity of presentation.
In consideration of working seniority and TST/QFT results, most studies [27,44,42,49] found an association between increasing working years and positivity of both tests; this information cannot be accounted for alone but it has to be contextualized in the risk evaluation at each worksite (high or low TB risk ward). However, in this review there was no statistical analysis of overall age or working seniority possible because of heterogeneity of data presentation or missing information.
Some authors affirmed that the type of tuberculin could play a significant effect on skin response: both vaccinated and unvaccinated subjects receiving RT23 2 TU or 1TU were more likely to have a positive result than those receiving 5 TU PPD [54]. Others [55] considered that TST induration size was larger with PPD-S than with PPD RT23 at 48, 72 and 96 hours, resulting in a statistically lower number of false negatives with PPD-S than with PPD RT23. Despite this evidence, no significant difference was found between various types and doses of tuberculin in the chosen studies, confirming bio-equivalence of RT23 and PPD-S [17].

Study limitations and strength
This systematic review has several strengths. A more sensible than specific search string was elaborated, using multiple databases. Three reviewers (RU, MM, MGLM) independently assessed eligible articles for inclusion. Selection criteria were quite restrictive, so that information obtained was as comparable as possible in order to realize a meta-analysis of overall agreement between the two main screening tools and the influence of some factors (BCG vaccination, TB incidence). However, this was not possible for other variables of interest (age and working seniority) owing to heterogeneity in study design, outcomes and data presentation, despite the limited selection. Still, different national guidelines contribute different study characteristics, particularly on TST procedures and vaccine indications. Lastly, there was a lack of evidence at the highest level of hierarchy on reference standards: a majority of the studies included were crosssectional. Our study would have appeared more relevant if we had considered longitudinal studies and TST/QFT agreement in serial testing, analyzing all factors that could impact on reversion and conversion. Nevertheless, longitudinal studies did not allow us to analyze both tests in each measurement because it is not always appropriate to repeat both TST and QFT in every HCW.

Conclusion
Screening for TB infection is a major objective of health surveillance programs. Nowadays, even if there is no gold standard, the most-used diagnostic tools are TST and IGRAs (such as QFT) that show a low agreement and are also influenced by few variables that partially justify their variability alone.
Choosing the proper protocol is a prerogative of the occupational physician, who needs to know about TB and BCG vaccination incidence in the general population and the immunological status and risk factors for each individual worker. TST remains the first-step exam, especially when a higher agreement can be expected, i.e. when there is a low prevalence of vaccination or a high incidence of TB infection. Indeed, QFT is helpful in cases of a higher prevalence of vaccination. Further studies with a unique protocol of health surveillance carried out in variously burdened countries will best clarify the role of TST and QFT for HCW screening.