Skip to main navigation Skip to search Skip to main content

Evaluation of the Quality and Reliability of ChatGPT-4's Responses on Allergen Immunotherapy Using Validated Instruments for Health Information Quality Assessment

  • Ivan Cherrez-Ojeda
  • , Torsten Zuberbier
  • , Gabriela Rodas-Valero
  • , Jorge Sanchez
  • , Michael Rudenko
  • , Stephanie Dramburg
  • , Pascal Demoly
  • , Davide Caimmi
  • , René Maximiliano Gómez
  • , German D. Ramon
  • , Ghada E. Fouda
  • , Kim R. Quimby
  • , Herberto Chong-Neto
  • , Oscar Calderon Llosa
  • , Jose Ignacio Larco
  • , Olga Patricia Monge Ortega
  • , Marco Faytong-Haro
  • , Oliver Pfaar
  • , Jean Bousquet
  • , Karla Robles-Velasco

Research output: Contribution to journalArticlepeer-review

Abstract

Background: Chat Generative Pre-Trained Transformer 4 (ChatGPT-4) represents an advancing large language model (LLM) with potential applications in medical education and patient care. While Allergen Immunotherapy (AIT) can change the course of allergic diseases, it can also bring uncertainty to patients, who turn to readily available resources such as ChatGPT-4 to address these doubts. This study aimed to use validated tools to evaluate the information provided by ChatGPT-4 regarding AIT in terms of quality, reliability, and readability. Methods: In accordance with EAACI clinical guidelines about AIT, 24 questions were selected and introduced in ChatGPT-4. Independent reviewers evaluated ChatGPT-4 responses using three validated tools: the DISCERN instrument (quality), JAMA Benchmark criteria (reliability), and Flesch-Kincaid Readability Tests (readability). Descriptive statistics summarized findings across categories. Results: ChatGPT-4 responses were generally rated as “fair quality” on DISCERN, with strengths in classification/formulations and special populations. Notably, the tool provided good-quality responses on the preventive effects of AIT in children and premedication to reduce adverse reactions. However, JAMA Benchmark scores consistently indicated “insufficient information” (median = 0–1), primarily due to absent authorship, attribution, disclosure, and currency. Readability analyses revealed a college graduate–level requirement, with most responses classified as “very difficult” to understand. Overall, ChatGPT-4 demonstrated fair quality, insufficient reliability, and difficult readability for patients. Conclusions: ChatGPT-4 provides generally well-structured responses on AIT but lacks reliability and readability for clinical or patient-directed use. Until specialized, reference-based models are developed, healthcare professionals should supervise its use, particularly in sensitive areas such as dosing and safety.

Original languageEnglish
Article numbere70130
JournalClinical and Translational Allergy
Volume15
Issue number12
DOIs
StatePublished - Dec 2025

Keywords

  • allergen immunotherapy
  • allergic rhinitis
  • artificial intelligence

Fingerprint

Dive into the research topics of 'Evaluation of the Quality and Reliability of ChatGPT-4's Responses on Allergen Immunotherapy Using Validated Instruments for Health Information Quality Assessment'. Together they form a unique fingerprint.

Cite this