Evaluation of the Quality and Reliability of ChatGPT-4's Responses on Allergen Immunotherapy Using Validated Instruments for Health Information Quality Assessment

  • Ivan Cherrez-Ojeda
  • , Torsten Zuberbier
  • , Gabriela Rodas-Valero
  • , Jorge Sanchez
  • , Michael Rudenko
  • , Stephanie Dramburg
  • , Pascal Demoly
  • , Davide Caimmi
  • , René Maximiliano Gómez
  • , German D. Ramon
  • , Ghada E. Fouda
  • , Kim R. Quimby
  • , Herberto Chong-Neto
  • , Oscar Calderon Llosa
  • , Jose Ignacio Larco
  • , Olga Patricia Monge Ortega
  • , Marco Faytong-Haro
  • , Oliver Pfaar
  • , Jean Bousquet
  • , Karla Robles-Velasco

Producción científica: Contribución a una revistaArtículorevisión exhaustiva

Resumen

Background: Chat Generative Pre-Trained Transformer 4 (ChatGPT-4) represents an advancing large language model (LLM) with potential applications in medical education and patient care. While Allergen Immunotherapy (AIT) can change the course of allergic diseases, it can also bring uncertainty to patients, who turn to readily available resources such as ChatGPT-4 to address these doubts. This study aimed to use validated tools to evaluate the information provided by ChatGPT-4 regarding AIT in terms of quality, reliability, and readability. Methods: In accordance with EAACI clinical guidelines about AIT, 24 questions were selected and introduced in ChatGPT-4. Independent reviewers evaluated ChatGPT-4 responses using three validated tools: the DISCERN instrument (quality), JAMA Benchmark criteria (reliability), and Flesch-Kincaid Readability Tests (readability). Descriptive statistics summarized findings across categories. Results: ChatGPT-4 responses were generally rated as “fair quality” on DISCERN, with strengths in classification/formulations and special populations. Notably, the tool provided good-quality responses on the preventive effects of AIT in children and premedication to reduce adverse reactions. However, JAMA Benchmark scores consistently indicated “insufficient information” (median = 0–1), primarily due to absent authorship, attribution, disclosure, and currency. Readability analyses revealed a college graduate–level requirement, with most responses classified as “very difficult” to understand. Overall, ChatGPT-4 demonstrated fair quality, insufficient reliability, and difficult readability for patients. Conclusions: ChatGPT-4 provides generally well-structured responses on AIT but lacks reliability and readability for clinical or patient-directed use. Until specialized, reference-based models are developed, healthcare professionals should supervise its use, particularly in sensitive areas such as dosing and safety.

Idioma originalInglés
Número de artículoe70130
PublicaciónClinical and Translational Allergy
Volumen15
N.º12
DOI
EstadoPublicada - dic. 2025

Huella

Profundice en los temas de investigación de 'Evaluation of the Quality and Reliability of ChatGPT-4's Responses on Allergen Immunotherapy Using Validated Instruments for Health Information Quality Assessment'. En conjunto forman una huella única.

Citar esto