ChatGPT passes part of the medical exam in the United States

DIt’s been a few weeks we don’t stop to put challenges to ChatGPT to see how far the AI ​​is capable of reaching at this time. A mixture of curiosity and disturbing information that has its latest data in medicine: the tool has been able to approve go awaythe exam of access to medicine in the United States.

To be more exact, a team of researchers has put the program to the test to measure your clinical reasoning skills using questions from the United States Medical Licensing Examination (USMLE). According to the authors of a study that has been published in medRxiv:

We chose to test the AI ​​generative language on the USMLE questions as it was a high-stakes, three-step comprehensive standardized test program that covered all topics in the clinicians’ pool of knowledge. Knowledge, covering basic sciences, clinical reasoning, medical management, and bioethics.

The results could not be more surprising considering that the language model was not trained on the version of the test used by the researchers, nor did it receive any additional medical training prior to the study, in which it answered a series of open-ended questions and multiple choice. According to the authors of the work:

In this current study, ChatGPT performed with >50% accuracy in all tests, exceeding 60% in most tests. The USMLE passing threshold, though it varies by year, is about 60%. Therefore, ChatGPT is now comfortably within the range of approval. Being the first experiment to reach this benchmark, we think this is a surprising and impressive result.

Not only that. Following the results, the team believes that the AI’s performance could be improved with more prompts and interaction with the model. In fact, when the AI ​​performed poorly, providing less consistent answers, they believe it was partly due to a lack of information that the AI ​​hasn’t found. As the study indicates:

Paradoxically, ChatGPT outperformed PubMedGPT (50.8% accuracy, unpublished data), a peer [modelo de aprendizaje de idiomas] with a similar neural structure, but trained exclusively in biomedical domain literature. We speculate that domain-specific training may have created further ambivalence in the PubMedGPT model, as it absorbs real-world text of ongoing academic discourse that tends to be inconclusive, contradictory, or highly conservative or evasive in its language.

The next? The researchers suggest that AI may very soon become commonplace in healthcare settings, given the speed of progress in the industry.[[IFLScience]

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.