Diagnosis Codes Suitable for Researching Obesity Using Claims-based Databases

Physician typing on laptop with clipboard/stethoscope on table

When conducting an epidemiological study using an administrative database, researchers often want to identify obese patients so they can establish a study population or adjust for confounders. However, insurance claims databases generally capture only International Classification of Disease (ICD) codes, not direct measurements of body mass index (BMI).

Elisabetta Patorno, MD, DrPH, associate epidemiologist in the Division of Pharmacoepidemiology and Pharmacoeconomics at Brigham and Women’s Hospital, Karine Suissa, PhD, a fellow in the division, Seoyoung C. Kim, MD, ScD, a rheumatologist in the division, and colleagues determined obesity-related ICD codes accurately identify patients with obesity in claims-based data. Their report appears in Diabetes, Obesity and Metabolism.


The research team linked electronic health records on about 550,000 Mass General Brigham patients to the Medicare fee-for-serve claims database. They proceeded to study 73,644 patients who had a BMI measurement recorded between January 1, 2014, and June 31, 2014 (the era of ICD-9 codes), or between January 1, 2016, and June 31, 2016 (after ICD-10 codes were introduced).

Reporting of Obesity-related ICD Codes

Weight-related ICD codes were substantially underreported, the researchers found. Of the 73,644 patients, only 16,280 (22%) had an obesity-related ICD code recorded six months before or after the BMI measurement.

Validity of Obesity-related ICD Codes

A key finding was that the specificity of codes was high in all bodyweight categories, indicating patients were accurately classified:

  • Underweight and normal weight (BMI <25 kg/m2)—sensitivity 6.5%, specificity 99.7%, positive predictive value 90.0% and negative predictive value 70.7%
  • Overweight (BMI 25–29.9 kg/m2)—7.9%, 97.4%, 63.5% and 64.9%
  • Obese (BMI 30–39.9 kg/m2)—5.2%, 99.7%, 86.3% and 72.7%
  • Severely obese (BMI ≥40 kg/m2)—30.9%, 98.9%, 58.6% and 96.7%

Applying the Results

These findings have several implications for epidemiologic research:

  • The prevalence of obesity can’t be estimated using obesity-related ICD codes because they’re extensively underreported and have low sensitivity
  • Claims-based obesity ICD codes are accurate enough to be used for characterizing patients as obese, which will allow identification of target populations to assess the use and the effects of interventions for obesity and to investigate effect measure modification by obesity
  • Obesity ICD codes can be used to reduce potential residual confounding by BMI, but only within cohorts where the codes are available on most individuals

Leave a Reply