University of Sussex
Browse

Agreement and utility of coded primary and secondary care data for long-term follow-up of clinical trial outcomes

Download (1.69 MB)
journal contribution
posted on 2025-07-04, 10:24 authored by A Wang, AE Seeley, MR Sydes, N Jones, S de Lusignan, FR Hobbs, Richard McManusRichard McManus, M Williams, JP Sheppard

Background

Whilst interest in efficient trial design has grown with the use of electronic health records (EHRs) to collect trial outcomes, practical challenges remain. Commonly raised concerns often revolve around data availability, data quality and issues with data validation. This study aimed to assess the agreement between data collected on clinical trial participants from different sources to provide empirical evidence on the utility of EHRs for follow-up in randomised controlled trials (RCTs).

Methods

This retrospective, participant-level data utility comparison study was undertaken using data collected as part of a UK primary care-based, randomised controlled trial (OPTiMISE). The primary outcome measure was the recording of all-cause hospitalisation or mortality within 3 years post-randomisation and was assessed across (1) Coded primary care data; (2) Coded-plus-free-text primary care data; and (3) Coded secondary care and mortality data. Agreement levels across data sources were assessed using Fleiss’ Kappa (K). Kappa statistics were interpreted using an established framework, categorising agreement strength as follows: <0 (poor), 0.00–0.20 (slight), 0.21–0.40 (fair), 0.41–0.60 (moderate), 0.61–0.80 (substantial), and 0.81–1.00 (almost perfect) agreement. The impact of using different data sources to determine trial outcomes was assessed by replicating the trial’s original analyses.

Results

Almost perfect agreement was observed for mortality outcome across the three data sources (K = 0.94, 95%CI 0.91–0.98). Fair agreement (weak consistency) was observed for hospitalisation outcomes, including all-cause hospitalisation or mortality (K = 0.35, 95%CI 0.28–0.42), emergency hospitalisation (K = 0.39, 95%CI 0.33–0.46), and hospitalisation or mortality due to cardiovascular disease (K = 0.32, 95%CI 0.19–0.45). The overall trial results remained consistent across data sources for the primary outcome, albeit with varying precision.

Conclusion

Significant discrepancies according to data sources were observed in recording of secondary care outcomes. Investigators should be cautious when choosing which data source(s) to use to measure outcomes in trials. Future work on linking participant-level data across healthcare settings should consider the variations in diagnostic coding practices. Standardised definitions for outcome measures when using healthcare systems data and using data from different data sources for cross-checking and verification should be encouraged.

History

Publication status

  • Published

File Version

  • Published version

Journal

BMC Medical Research Methodology

ISSN

1471-2288

Publisher

Springer Science and Business Media LLC

Issue

1

Volume

25

Article number

156

Department affiliated with

  • Clinical and Experimental Medicine Publications
  • BSMS Publications

Institution

University of Sussex

Full text available

  • Yes

Peer reviewed?

  • Yes