TY - JOUR
T1 - Evaluation of the Quality and Reliability of ChatGPT-4's Responses on Allergen Immunotherapy Using Validated Instruments for Health Information Quality Assessment
AU - Cherrez-Ojeda, Ivan
AU - Zuberbier, Torsten
AU - Rodas-Valero, Gabriela
AU - Sanchez, Jorge
AU - Rudenko, Michael
AU - Dramburg, Stephanie
AU - Demoly, Pascal
AU - Caimmi, Davide
AU - Gómez, René Maximiliano
AU - Ramon, German D.
AU - Fouda, Ghada E.
AU - Quimby, Kim R.
AU - Chong-Neto, Herberto
AU - Llosa, Oscar Calderon
AU - Larco, Jose Ignacio
AU - Monge Ortega, Olga Patricia
AU - Faytong-Haro, Marco
AU - Pfaar, Oliver
AU - Bousquet, Jean
AU - Robles-Velasco, Karla
N1 - Publisher Copyright:
© 2025 The Author(s). Clinical and Translational Allergy published by John Wiley & Sons Ltd on behalf of European Academy of Allergy and Clinical Immunology.
PY - 2025/12
Y1 - 2025/12
N2 - Background: Chat Generative Pre-Trained Transformer 4 (ChatGPT-4) represents an advancing large language model (LLM) with potential applications in medical education and patient care. While Allergen Immunotherapy (AIT) can change the course of allergic diseases, it can also bring uncertainty to patients, who turn to readily available resources such as ChatGPT-4 to address these doubts. This study aimed to use validated tools to evaluate the information provided by ChatGPT-4 regarding AIT in terms of quality, reliability, and readability. Methods: In accordance with EAACI clinical guidelines about AIT, 24 questions were selected and introduced in ChatGPT-4. Independent reviewers evaluated ChatGPT-4 responses using three validated tools: the DISCERN instrument (quality), JAMA Benchmark criteria (reliability), and Flesch-Kincaid Readability Tests (readability). Descriptive statistics summarized findings across categories. Results: ChatGPT-4 responses were generally rated as “fair quality” on DISCERN, with strengths in classification/formulations and special populations. Notably, the tool provided good-quality responses on the preventive effects of AIT in children and premedication to reduce adverse reactions. However, JAMA Benchmark scores consistently indicated “insufficient information” (median = 0–1), primarily due to absent authorship, attribution, disclosure, and currency. Readability analyses revealed a college graduate–level requirement, with most responses classified as “very difficult” to understand. Overall, ChatGPT-4 demonstrated fair quality, insufficient reliability, and difficult readability for patients. Conclusions: ChatGPT-4 provides generally well-structured responses on AIT but lacks reliability and readability for clinical or patient-directed use. Until specialized, reference-based models are developed, healthcare professionals should supervise its use, particularly in sensitive areas such as dosing and safety.
AB - Background: Chat Generative Pre-Trained Transformer 4 (ChatGPT-4) represents an advancing large language model (LLM) with potential applications in medical education and patient care. While Allergen Immunotherapy (AIT) can change the course of allergic diseases, it can also bring uncertainty to patients, who turn to readily available resources such as ChatGPT-4 to address these doubts. This study aimed to use validated tools to evaluate the information provided by ChatGPT-4 regarding AIT in terms of quality, reliability, and readability. Methods: In accordance with EAACI clinical guidelines about AIT, 24 questions were selected and introduced in ChatGPT-4. Independent reviewers evaluated ChatGPT-4 responses using three validated tools: the DISCERN instrument (quality), JAMA Benchmark criteria (reliability), and Flesch-Kincaid Readability Tests (readability). Descriptive statistics summarized findings across categories. Results: ChatGPT-4 responses were generally rated as “fair quality” on DISCERN, with strengths in classification/formulations and special populations. Notably, the tool provided good-quality responses on the preventive effects of AIT in children and premedication to reduce adverse reactions. However, JAMA Benchmark scores consistently indicated “insufficient information” (median = 0–1), primarily due to absent authorship, attribution, disclosure, and currency. Readability analyses revealed a college graduate–level requirement, with most responses classified as “very difficult” to understand. Overall, ChatGPT-4 demonstrated fair quality, insufficient reliability, and difficult readability for patients. Conclusions: ChatGPT-4 provides generally well-structured responses on AIT but lacks reliability and readability for clinical or patient-directed use. Until specialized, reference-based models are developed, healthcare professionals should supervise its use, particularly in sensitive areas such as dosing and safety.
KW - allergen immunotherapy
KW - allergic rhinitis
KW - artificial intelligence
UR - https://www.scopus.com/pages/publications/105023276396
U2 - 10.1002/clt2.70130
DO - 10.1002/clt2.70130
M3 - Artículo
AN - SCOPUS:105023276396
SN - 2045-7022
VL - 15
JO - Clinical and Translational Allergy
JF - Clinical and Translational Allergy
IS - 12
M1 - e70130
ER -