Defining Care Patterns and Outcomes Among Persons Living with HIV in Washington, DC: Linkage of Clinical Cohort and Surveillance Data

,


Introduction
A central feature of the updated 2020 National HIV/AIDS (human immunodeficiency virus/acquired immunodeficiency syndrome) Strategy is to measure progress along the HIV care continuum to ensure that target goals are met for each stage. The ability to monitor progress in meeting these goals is often hampered by varying methodologies for data collection, analyses, and variation in measurement approaches, with estimates often relying on either clinic-level or population-based data [1,2]. Both approaches have their advantages and disadvantages. Clinic data provides more detailed and real-time data from the site where care is being delivered, and whether the patient kept or missed primary care visits. However, compared with surveillance-based data, clinic data is less informative for tracking patients who become incarcerated, move, or transfer care [3][4][5]. These silent transfers of care and the limitation that clinic-attending populations may not represent the general population, present a challenge when trying to make robust estimates of HIV care [6][7][8]. In contrast, surveillance data are useful for monitoring population-based outcomes but sometimes lack data accuracy and completeness for describing patient-level characteristics and often are subject to reporting time lags [9][10][11][12][13][14][15][16].
In the absence of a unified health record for HIV infected persons, triangulating data from multiple sources such as clinical cohort and surveillance data can help improve our ability to describe care patterns, service utilization, comorbidities and ultimately measure and monitor clinical outcomes. For example, collaborations between local HIV clinics and health departments seeking to identify out-of-care HIV infected patients have found that their combined efforts resulted in timelier, more accurate and complete data, and improved ascertainment of care status [4,8,17,18].
The District of Columbia (DC) Cohort study is a prospective observational clinical cohort study of persons living with HIV/AIDS and receiving care across 13 clinical care sites in Washington, DC [19]. Through an innovative data linkage process, DC Cohort participant data, including sociodemographics, HIV-related diagnosis and laboratory values, and sexually transmitted disease (STD) diagnosis data, are matched with the DC Department of Health (DOH) surveillance data every 6 months [19]. After recognizing the limitations of each database alone, the linkage process was designed to improve the completeness and accuracy of both databases. The primary objectives of this analysis were to perform an assessment of the utility of the linkage process in its ability to improve the completeness of the DC Cohort database and the DOH data. We sought to do this by (1) quantifying the differences between the pre-and postlinked databases, (2) evaluating HIV care continuum outcomes, STD diagnoses, and HIV clinic visit patterns using the prelinked databases compared with the postlinked database, and (3) using the postlinked database to compare sociodemographic characteristics and HIV care continuum outcomes among participants receiving HIV care at multiple sites.

The DC Cohort Study
Washington, DC has one of the highest HIV rates among cities in the United States, with 2.0% of its population living with HIV-about 14,000 residents as of 2015 [20]. The design of the DC Cohort study, which began enrollment in 2011, has been described previously [19,21,22]. Its source population consists of adults and children diagnosed with HIV infection who received outpatient HIV care at one or more DC Cohort sites and consented to participate. Participants can consent to participate at multiple clinics in which they receive HIV care. DC Cohort sites include 8 hospital-based or affiliated sites and 5 community-based clinics that collectively serve over half of persons living with HIV/AIDS (PLWHA) in DC [19,20]. Clinical data recorded during HIV care visits were abstracted from each site's electronic medical record and merged into a centralized Web-based database (Discovere; Cerner Corporation, Kansas City, MO) that collects data on demographics, diagnoses, laboratory tests, pathology and clinical procedures, medications, and drug resistance information. Informed consent included participant acknowledgment of record linkage between patient data collected by DC Cohort study sites and data reported to DC DOH. The study protocol, consent forms, and research instruments were approved by the George Washington University Institutional Review Board (IRB), the DC DOH IRB, and individual study sites' IRBs [20].

DC Department of Health HIV/AIDS Hepatitis, STD, Tuberculosis Administration
The DC DOH has conducted confidential name-based HIV reporting since 2007 and HIV-related electronic laboratory data reporting of cluster of differentiation 4 (CD4) counts and viral load (VL) values since 2009 (22 District of Columbia Municipal Regulations § 206, 21,23). STD reporting is also conducted in a confidential named-based manner, with over 45,000 syphilis, gonorrhea, and chlamydia cases being reported annually [20]. The HIV/AIDS, Hepatitis, STD, Tuberculosis Administration (HAHSTA) receives over 140,000 HIV-and STD-related laboratory reports from 29 different laboratories annually (HAHSTA internal communication).

Linkage Methods
Linkage of the DC Cohort and DC DOH databases is performed semiannually and is ongoing. Data on DC Cohort patients enrolled between January 1, 2011 and June 15, 2015 were linked to this analysis. The linkage algorithm is shown in Figure 1. First, each DC Cohort site sends a limited dataset electronically via a secure file transfer protocol (FTP) site to the DC DOH. The limited dataset includes the study ID, patient name, date of birth, and social security number, if available. Simultaneously, the DC Cohort Data and Statistics Coordinating Center (DSCC) prepares a limited dataset for the DC DOH containing the study identification (ID) as well as HIV-related variables collected at the site. The DOH is authorized to receive both these files since it is already authorized to receive named data on all persons living with HIV/AIDS diagnosed with or receiving HIV care in DC. Additionally, DC Cohort participants provided consent for the linkage [9,23].
Data from the sites and the DSCC are then merged with data from the DC enhanced HIV/AIDS Reporting System (eHARS) and the STD surveillance database (STD*MIS). The postlinkage database containing only the DC Cohort ID is sent back to the DSCC through the FTP site.

Linkage Algorithm
Electronic linkage of HIV-related datasets is conducted using an 11-key algorithm, using identifiers including patient first and last name, date of birth, sex at birth and social security number. For both the DC Cohort prelinked and DOH datasets, the algorithm creates identifier-based keys that generated variables to systematically match records in the datasets, while taking into account the misspellings of names and data entry errors. After these 11 variables are created in both datasets, each key is matched separately, producing 11 discrete datasets that were later merged and deduplicated by a patients' study and eHARS ID. Similarly, DC Cohort and STD surveillance data are matched using a 10-key algorithm, based on identifiers such as first name, last name, date of birth, and sex at birth. After linkage, the combined dataset is deduplicated by study ID, disease type, and disease date.

Postlinkage Database
Results from the match (the postlinked database) include data on HIV, AIDS, and STD diagnoses, AIDS-defining opportunistic infections (OIs), laboratory data such as CD4 counts and VL, and vital status. Differences in laboratory dates or laboratory values by the data source (DC Cohort vs DC DOH) are reconciled using fuzzy matching. For the date of HIV or AIDS diagnosis, the earlier date is used regardless of the data source. The United States Centers for Disease Control and Prevention surveillance guidelines regarding the hierarchical risk of HIV transmission are used to reconcile differences in documented transmission risk, independent of data source [24,25].

Eligibility Criteria
For this analysis, participants' data were matched if they were actively enrolled in the DC Cohort as of January 1, 2011, had not withdrawn from the study, or transferred care to another clinical site. To assess continuum of care measures such as retention in care, we reviewed viral load, CD4 tests, and encounters for those participants with at least 1 year of follow-up for the period of June 15, 2014 to June 15, 2015. Participants were considered lost-to-follow-up if, after manual review, no laboratory data from either the DOH or the DC Cohort, and no medical-chart based data from the DC Cohort data were available for 18 months or longer as of June 15, 2015, as per study protocol.

Receipt of Care by Number of Clinical Sites
To determine the number of clinical sites where a participant was receiving care, CD4 and VL test results, proxies for HIV care, were flagged as originating from either a DC Cohort site or a non-DC Cohort site [26,27]. Receipt of care was grouped into three categories: care at one, two, or three or more sites. Care at one site included participants who only had labs from their DC Cohort enrollment site. Care at two sites included participants enrolled at only one DC Cohort site but who had labs from another site (ie, either a non-DC Cohort site or a site that was not their enrollment site). Care at three or more sites included participants enrolled at only one DC Cohort who had 2 or more labs from two or more other sites (ie, either a non-DC Cohort site or another DC Cohort site that was not their enrollment site). Of note, additional labs obtained through the linkage may have been related to HIV primary care or the result of referrals to specialists who were also drawing HIV-related labs. Since we were unable to determine the reason for the CD4 and VL tests conducted outside of the DC Cohort sites, receipt of care at more than one site does not necessarily indicate receipt of HIV primary care at more than one site.

HIV Care Continuum Outcomes: Retention in Care on Antiretroviral Therapy and Viral Suppression
A participant was defined as meeting the definition of being retained in care (RIC) if there was evidence of at least two HIV-related encounters (eg, either HIV-related medical visit and/or laboratory test results) at least 90 days apart in a 12-month period from June 15, 2014 to June 15, 2015 [6,16,[28][29][30][31]. For the purposes of this analysis, a participant was considered RIC even if the encounters occurred at multiple sites. Being on antiretroviral therapy (ART) was defined as being prescribed an ART regimen anytime during the study period, that is, from June 15, 2014 to June 15, 2015. ART status was based solely on prelinked data as ART data are not collected by the DC DOH. Viral suppression (VS) was defined as participants whose last VL on file was <200 copies/mL among those who were retained in care and on ART.

Statistical Analysis
Frequencies on demographic and clinical characteristics at study enrollment (baseline) were computed in the prelinked DC Cohort database, prelinked DC DOH database, and postlinked database. Chi-square test statistics and Wilcoxon rank-sum tests were used to determine differences among categorical and continuous variables, respectively. Percent concordance between select variables in the prelinked DC Cohort database and postlinked databases were computed using kappa test statistics to assess the comparative accuracy of the databases. Participant outcomes (ie, RIC and VS) in the prelinked DC Cohort database and postlinked database were compared. In the postlinked database, participant demographic and clinical data were also compared based on the number of sites where a participant had evidence of receiving care (1 site, 2 sites, ≥3 sites). These comparisons were made using chi-square test statistics. Statistical comparisons with P values <.05 were considered statistically significant. Analyses were conducted in SAS 9.3 (SAS Institute, Inc., Cary, NC) and R (version 3.2.4).

Assessment of Differences in Demographic and Clinical Characteristics Between the Pre-and Postlinkage Databases
The DC Cohort DSCC submitted data on 6054 study IDs to the DC DOH of which 5633/5064 (93.05%) unique participants matched to the DC DOH database and 421/6054 (6.95%) did not (see Figure 2). Of those who did not match, 352/421 (83.6%) were non-DC residents. Among those that matched, 5521/5633 (98.01%) were enrolled at a single DC Cohort site; 112/5521 (2.03%) were enrolled at more than one DC Cohort site. Of the matched participants, 4476/5521 (81.07%) were actively enrolled in the study with at least 1 year of follow-up by the end of 2015.
The demographic and clinical characteristics are displayed by the database from which they were calculated: the prelinked DC Cohort database, the prelinked DC DOH database, and the postlinked database as shown in When comparing the prelinked DC Cohort database with the postlinked database, a significantly higher percentage of participants were found to be black, deceased, infected through MSM sexual contact, to have had an OI at AIDS diagnosis, and to have ever been virally suppressed. (P<.001 for all). The mean duration of HIV diagnosis in the postlinked database increased from 14 to 14.8 years, indicative of earlier diagnosis dates.
Additionally, the number of STD diagnoses increased from 2123 to 2739. Furthermore, post linkage, a higher percentage of participants were Maryland and Virginia residents, more infections were attributed to MSM sexual contact and fewer to MSM/IDU, the mean duration of infection increased from 12.2 to 14.8 years, and the proportion of participants ever virally suppressed increased (P<.001 for all).
Interrater reliability of selected variables that overlapped between the prelinked DC Cohort database and the postlinked database varied in agreement. There was strong agreement for race/ethnicity (.75) and state of residence (.72); moderate agreement for vital status (κ=.55) and OI at AIDS diagnosis (κ=. 40), and poor to fair agreement for transmission risk (κ=.36) and whether a participant had ever been virally suppressed (ie, <200 copies/mL; κ=.20).

Differences in Demographic and Clinical Characteristics by Number of HIV Care Sites
Differences in demographic and clinical characteristics of DC Cohort participants who were matched through June 15, 2015 were assessed based on the number of HIV care sites using the postlinked database. The number of sites where a participant received HIV care was determined using the source of HIV labs. Of the sample, 4242/5521 (76.83%) had evidence of receiving HIV care at only one DC Cohort site, 855/5521 (15.49%) at two sites, and 424/5521 (7.68%) at three or more sites (Table 2). Those who received care at three or more sites differed demographically and clinically from those who received care at fewer sites; they were more likely to be non-Hispanic black, have a history of AIDS, be homeless or report temporary housing, and to have been referred to substance use treatment. Those receiving care at three or more sites were also more likely to have public insurance, be enrolled in primary care at their DC Cohort site, and receive care at a community-based DC Cohort site. This group also fared worse clinically; they were more likely to have lower CD4 counts (≤350 cells/mm 3 ), have a detectable VL (ie, >200 copies/mL), and have uncontrolled viremia (ie, VL ≥100,000 copies/mL) on their most recent VL test. They were also more likely to suffer from comorbid conditions, including hypertension, cardiovascular disease, and mental health issues (P<.001; Table 2) and more likely to have died by June 2015.  j Opportunistic infections at AIDS diagnosis is an AIDS-defining condition that does not include those with CD4 counts <200 cells/mm 3 or CD4% <14.
The denominator for OIs was 2273, 3229, and 3433 for prelinked DC Cohort data, prelinked DOH data, and postlinked data, respectively. k STD: sexually transmitted disease. l CD4: cluster of differentiation 4. m The denominator for ever virally suppressed was 5521, 5333, and 5333 for prelinked DC Cohort data, prelinked DOH data, and postlinked data, respectively. Any viral load <200 copies/mL since enrollment was considered suppressed among participants enrolled anytime between January 1, 2011 and June 15, 2015.  June 15, 2015. Care at one site included singly-enrolled participants who had 0 or ≥1 lab from their DC Cohort enrollment site. Care at two sites included singly-enrolled participants who had 0 or ≥1 lab from their DC Cohort enrollment site and ≥1 lab from a second site (ie, a non-DC Cohort site or another DC Cohort site that was not their enrollment site). Care at three or more sites included singly-enrolled participants who had 0 or ≥1 lab from their DC Cohort site and ≥2 labs from ≥2 other sites (ie, a non-DC Cohort site or another DC Cohort site that was not their enrollment site). b P values for categorical variables were calculated using chi-square tests; P values for continuous distributions were obtained from Wilcoxon rank-sum tests. P values in italics denote statistical significance at the .001 level. c Other race groups include those with multiple races and missing; unknown is unknown race/ethnicity.  Being on ART was defined as the number of Cohort participants who were prescribed an antiretroviral therapy (ART) regimen that overlapped with the study period. ART status was based on prelinked data as ART data are not collected by the DC DOH. Suppressed viral load (VL) was defined as matched participants whose last VL was <200 copies/mL among those who were retained in care and on ART.

Differences in Care Continuum Outcomes
Among the 4476 participants who were actively enrolled in the study with at least 1 year of follow-up as of June 15, 2014, when measuring the care continuum using the prelinked DC Cohort database compared with the postlinked database, we found that retention in care was higher (59.83% (2678/4476) vs 64.95% (2907/4476); however, the proportion with viral suppression was lower (87.85% (2277/2592) vs 85.15% (2391/2808) (P<.001 for both) (see Figure 3). The proportion of participants on ART was high at 96.79% (2592/2678), and was only able to be assessed in the prelinked DC Cohort database. In the postlinked database, the proportion of participants classified as retained and as virally suppressed differed according to the number of sites where care was being received (see Figure 4). Those participants who received care at three or more sites were more likely to meet the definition of retention in care (80.7%, 234/290) compared with those receiving care at one site (62.61%, 2197/3509; P<.001) but were less likely to be virally suppressed (72.3% (154/213) vs 89.51% (1869/2088); P<.001). Being on ART was defined as the number of Cohort participants who were prescribed an antiretroviral therapy (ART) regimen that overlapped with the study period. ART status was based on prelinked data as ART data are not collected by the DC DOH. Suppressed viral load (VL) was defined as matched participants who had a VL test in the time period and whose last VL was <200 copies/mL among those who were retained in care and on ART.

Principal Findings
Linking clinical data collected in an observational HIV cohort with routinely collected public health surveillance data was mutually beneficial for both the clinical database as well as public health surveillance efforts. Specifically, for the DC Cohort, the linkage improved the accuracy of dates of diagnosis, vital status, and modes of transmission and resulted in the identification of more than 600 additional STD diagnoses, which may otherwise have not been captured in the Cohort database. From the DC DOH perspective, linkage of surveillance data to the DC Cohort database found that the majority of participants had been captured in the DC DOH surveillance data (93%) consistent with the relatively high completeness of surveillance reporting [32]. In addition, among those not matching, most were not DC residents, highlighting the large volume of care being delivered to non-DC residents by DC-based clinics. The linkage also improved measurement of dates of diagnoses, modes of transmission for HIV, and viral suppression for both databases.
With respect to the completeness of laboratory reporting, the linkage resulted in a substantial increase in the number of VLs post linkage. Further examination of the VLs included in the prelinked DC DOH surveillance data revealed that very few results were under 200 copies/mL. This is also reflected in the finding that only 47.5% of participants in the prelinked DC DOH database had ever achieved viral suppression. Given that a relatively high proportion of individuals obtaining care in DC have achieved an undetectable VL [33], this likely reflects that although reportable, all VL values may not be as routinely reported to the DC DOH surveillance program, whereas CD4 results are included in surveillance data regardless of the numeric value [34]. Furthermore, because the prelinked DC Cohort data includes non-DC residents using DC health care facilities, VL labs for individuals who live outside of Washington, DC, are reported to their respective health departments and may not be captured in the DC DOH surveillance database. Given the relatively high proportion of non-DC residents participating in the Cohort who did not match to the DC DOH surveillance database, this may further explain the differences in VL reporting. The DC DOH surveillance program is constantly striving to improve the completeness of all laboratory reporting through routine checks with laboratory facilities and standard regional data exchanges. Nevertheless, despite the low initial number of VLs included in the DC DOH database, they were still of added value to the DC Cohort database.
Discriminating between DC Cohort and non-DC Cohort HIV laboratories was also key to ascertaining whether participants were coenrolled at more than one DC Cohort site or receiving care at multiple sites throughout the city, or receiving care at DC Cohort sites and non-DC Cohort sites. While most DC Cohort participants were receiving care at one site, almost one-quarter had evidence of receipt of HIV-related care at two or more sites. Furthermore, participants with evidence of care at three or more sites fared worse clinically and while they were most likely to be retained in care, they were less likely to be virally suppressed. These findings were consistent with previous analyses on-site migration in DC which also found lower CD4 and higher VLs among persons seeking care at more than one site [35]. These trends may be reflective of other individual-level factors such as homelessness, substance abuse, and more fragmented care in general, among patients with multiple comorbid conditions. Furthermore, this more vulnerable group may have had seemingly higher retention in care as they may have returned to care more often for follow-up visits based on provider concerns about client health, fear of losing contact with the most transient clients, or based on receipt of more referrals to other clinics or specialists for their comorbid conditions [35,36]. However, while a higher proportion of these participants may have met the retention in care definition, their care patterns appear to reflect more disparate care, as meeting the definition did not translate into higher viral suppression. Thus, given the complexity in measuring retention in care with the shifting standards of clinical care and movement across clinics, emphasis should be placed on achieving viral suppression-a clear goal of treatment.
Additional laboratory data, used as supplemental and complementary information, allowed for re-estimation of retention and viral suppression and improved understanding of drop-offs along the HIV care continuum. Using a nested care continuum approach in which each step is dependent on the prior step, our initial clinic-based care continuum would have underestimated the percentage of participants meeting the retention in care definition and overestimated viral suppression; however, by combining additional laboratories from the health department, we were able to achieve a more accurate measure of these key indicators. Hence, routine data linkages such as these could assist in refining the accuracy of care continua and help prioritize clinical and public health interventions that seek to re-engage persons who are not optimally in care [37,38].
Overall, our care continuum estimates were similar to those of other HIV cohorts in the United States, including the HIV Outpatient Study (HOPS), a convenience sample of patients at selected HIV clinics in the United States, and the Medical Monitoring Project (MMP) study, a multisite supplemental surveillance system in the United States designed to provide nationally representative data on PLWHA [38]. DC Cohort estimates for proportion 'on ART' and viral suppression fell within the range of HOPS and MMP estimates in 2012 (97% and 92% on ART, respectively and 85% and 78% virally suppressed, respectively). DC Cohort estimates were also comparable to findings from the North American AIDS Cohort Collaboration on Research and Design (NA-ACCORD). Among more than 35,000 NA-ACCORD participants with at least one HIV care visit in the first 6 months of 2008, 82% were prescribed ART and 78% had suppressed VLs [39].

Limitations
This study has certain noteworthy limitations. Without knowing the full context in which CD4 and VL labs were drawn, deriving inference about the ability of DC Cohort participants to establish and maintain a primary medical home for HIV care is a challenge. We cannot exclude the possibility that additional labs may be the result of referrals to specialists who are also drawing HIV-related labs or acute encounters with the medical system such as emergency department visits and hospitalizations. Given the way laboratory results are reported, we are unable to fully describe the source of the laboratory (eg, inpatient, emergency department, outpatient, a specialty of reporting provider, etc), which would allow for better characterization of care pattern by type of encounter. However, additional analyses to determine whether participants were receiving care sequentially at these sites versus in an overlapping manner, may help further delineate these care patterns. Finally, DC Cohort participants may not be generalizable to all HIV-infected persons in DC, given that a certain proportion of PLWH in DC is not consistently engaged in care [30]. In future analyses, we intend to compare characteristics of DC Cohort participants to city-wide HIV population characteristics to assess whether Cohort-based care estimates approximate care trajectories for the city as a whole.

Conclusions
Despite these limitations, this analysis represents a successful triangulation of data from clinical cohort and public health surveillance data and demonstrated that the data linkages were mutually beneficial. The linkage not only helped to improve the accuracy and completeness of each database but also helped to describe care patterns among PLWHA, and enhanced measurement of clinical outcomes and the HIV care continuum at a population-level. The results derived through combining these databases will help inform HIV programmatic efforts and strengthen the DC DOH surveillance system as they will not only enhance the completeness of case data but contribute to the measurement of a more complete care continuum. The DC Cohort intends to use these data to inform the development of interventions focused on case management and improved care coordination across clinical sites and across jurisdictions. The DC DOH will be retooling its approaches to ensure continuity of care in DC and the surrounding metropolitan area in an enhanced data-to-care intervention strategy. With a more complete and relevant dataset, DC DOH will collaborate with community providers and deploy its public health team to address interruptions in care. DC DOH also has a data sharing agreement and protocol with Maryland and Virginia to ensure that the most complete data available can be used to inform jurisdictional partners in their data-to-care activities. Performance and findings from this type of linkage provide a reference point for design and interpretation of data from similar data linkages in North America and could potentially be used at the regional and national level as we strive to improve care outcomes [28,39,40].