Studying the comparative effectiveness of gout treatments in large-scale real-world datasets is hindered by the difficulty of identifying instances of gout flare. The goal of gout management is to avoid gout flares. ICD codes for “gout” are used for the acute presentation of gout and the chronic diagnosis of gout; however, they cannot distinguish between a flare or simply having the condition.
Previously reported in Pharmacoepidemiology & Drug Safety, a research team at Brigham and Women’s Hospital developed several rule-based algorithms to detect gout flares in claims data by matching an ICD-9 code for gout in temporal proximity to information on relevant medications and procedures. When the models were compared against electronic health records (EHRs), their positive predictive values (PPVs) ranged from 50% to 68%.
Now, Katherine P. Liao, MD, MPH, director of the VERITY Bioinformatics Core in the Division of Rheumatology, Inflammation, and Immunity at the Brigham, Kazuki Yoshida, MD, MPH, ScD, formerly an epidemiologist in the Division, and colleagues have determined that combining claims data with information from narrative notes in the EHR improves the capture of gout flares. Their report appears in Pharmacoepidemiology & Drug Safety.
Methods
The researchers linked 2007–2016 Medicare claims data with EHR data from Mass General Brigham. They identified 4,402 patients with gout who were ages ≥65, had newly initiated either allopurinol or febuxostat, and had at least one EHR encounter during the previous 365 days.
500 patients were randomly selected to have their EHR notes processed with natural language processing (NLP), a form of artificial intelligence, to extract concepts related to a gout flare. The first 300 patients reviewed were designated as the training dataset and the others provided validation data.
The year-long lookback period was segmented into one-month time segments, and for each segment the researchers determined whether the patient had a gout flare, a possible flare, no flare, or inadequate data.
EHR Yield
The review of EHR narrative notes yielded:
- Training set—2,074 notes from 179 patients (the other 121 patients did not have notes in the chosen periods)
- Validation set—1,408 notes from 118 patients (the other 82 patients did not have notes)
Predictive Model Training
The target of the prediction models was a gout flare that resulted in a healthcare encounter.
From the claims data, the researchers chose 93 potentially predictive variables, including demographics, comorbidities, relevant medications, healthcare utilization, and laboratory orders. The EHR notes provided 88 concepts. Three predictor models were constructed using LASSO (least absolute shrinkage and selection operator) regression analysis to select from those variables:
- Claims variables only—20 variables were selected, including medications (NSAIDs, colchicine, oral glucocorticoids and opioids); interestingly, cardiovascular-related variables such as hyperlipidemia and electrocardiogram orders were also selected
- NLP concepts only—15 concepts were selected, such as “gout,” “flare” and “arthralgia”; concepts for potential inciting events were also selected, such as “stroke,” “heart failure” and “alcohol”
- The two combined—32 claims variables and 13 NLP concepts
Predictive Model Evaluation
In the validation set, the combination model performed somewhat better than the others:
- Claims only—AUC, 0.693 (95% CI, 0.635–0.751)
- NLP concepts only—AUC, 0.688 (95% CI, 0.629–0.746)
- The two combined—AUC, 0.731 (95% CI, 0.676–0.786)
When the previously developed rules-based algorithms were reassessed in this cohort, their PPVs ranged from 39% to 58%. The claims-only model performed better: PPV of 64% and negative predictive value (NPV) of 70% at specificity of 90%.
The addition of NLP concepts to the model allowed further improvement, with PPV of 76% and NPV of 71% at specificity of 95%.
A Potential New Approach
This combined claims/NLP-concept model may improve the feasibility of real-world claims-based comparative effectiveness studies of gout treatments. How it will perform in younger cohorts with fewer comorbidities or in non-academic practice settings remains to be seen.