An AI reasoning system developed by OpenAI outperformed experienced internal-medicine physicians at diagnosing patients using real emergency-department cases, according to a study from Harvard Medical School and Beth Israel Deaconess published in Science. Evaluated across multiple points in care and using only electronic health records, the model matched or exceeded doctors’ performance and bested GPT-4 on clinical reasoning tasks and benchmarks. Researchers and outside clinicians said the results highlight rapid advances in large language models but cautioned that integrating such tools into hospital workflows and proving outcome benefits will require rigorous, prospective trials. The authors stressed the AI is not a substitute for physicians and that real-world care demands inputs beyond text, including images and bedside assessment. A correction clarified the comparison was to internal-medicine doctors, not ER physicians.
Related articles:
— Performance of ChatGPT on USMLE: Potential for AI-Assisted Medical Education
— Ethics and Governance of Artificial Intelligence for Health





























