Table 2:

Number of correct responses by models and tasks

TaskGPT-3.5 TurboGPT-4P Value
Direct diagnosis30/115 (26%)47/115 (41%)<.001
Case report search11/115 (10%)8/115 (7%).579
Total37/115 (32%)50/115 (43%).009
(Overlap)(4)(5)