Evaluating GPT-4’s Performance in Surgical Training: Insights from a Recent Study

With the groundbreaking advancements made in the field of technology over the years, artificial intelligence (AI) is now being applied to every aspect of human life.  From transportation to education to even medicine, AI can now perform tasks that used to be solely human-initiated. It involves algorithms that enable machines to do mental tasks which include decision-making, speech recognition, language processing, and problem-solving.

Read Also: Artificial Intelligence Boosts Breast Tumor Detection Rates in Mammography Screening: Findings from a Swedish Study

Surgery

Surgery

Machine learning (ML) in particular has turned the tide of events in healthcare. ML involves an aspect of AI that learns from human interactions over time leading to improved machine thinking and decision making. Reports show that AI in healthcare will expand at a compound annual growth rate of 43.4% between 2022 and 2030. ML has great potential, especially in the field of surgery.

AI for training surgeons

AI is being used in various fields including surgical training. Using AI to teach surgical trainees is a productive way of learning. This can provide a platform for them to get a large amount of information related to any topic in the course of their training program.

Large language models (LLMs) like OpenAI’s GPT-4 have been taught to interpret large amounts of human information or any other kind of complex data. This means they can generate human-like content from prompts via texts sent in human language. Various training programs now utilize LLMs for education because it is fast and cost-effective.

Benefits of AI-directed surgical training

There are so many advantages to using AI for surgical training. They include remote training, provision of real-time learning scenarios, cost saving, and personalized learning.

AI can be used to achieve remote training for surgical residents and trainees. People in remote areas who can’t make it physically can access training tools and materials via AI. This helps to overcome geographical barriers to learning.

AI is very useful in creating simulations of various surgical procedures. This helps trainees to practice skills in controlled environments which helps to sharpen and strengthen their knowledge and skills preventing complications. This also helps to save costs as surgical training is quite cost-intensive.

The medical field is advancing rapidly, more information is being added and more modifications are being made. This also means that trainees and students have to keep up with this new information as fast as possible. AI makes it possible to find the newest papers, articles, and research.

Read Also: The Hype, Greed and Power Fueling the AI Craze, Disregarding Humanity’s Needs

On the educator’s part, AI can be used to create learning and clinical scenarios unique to each trainee. Personalized training materials can be crafted to suit each trainee based on individual needs and learning styles. AI can even help examiners score scripts in real-time.

Ethical Considerations

When it comes to matters of technology, there will always be concerns. Over-reliance on technology is an issue of great concern when it comes to surgical training. Surgeons fear that using too much AI might make them grow out of touch with reality which can result in the loss of some vital skills.

Another issue that gives people great concern is data privacy. The utilization of AI requires the gathering of large amounts of information on various subjects. There is a risk of breaching this privacy and data theft. This is a major barrier when it comes to embracing AI.

GPT-4 performs poorly in generating clinical solutions

Reports from a pilot study conducted by Joshua Roshal et al. revealed that GPT-4 was only efficient in answering multiple-choice questions (MCQs) but not lengthy oral questions.

They evaluated the app’s accuracy in answering 250 random MCQs and 4 oral examination questions which they modeled after the American Board of Surgery Qualifying Examination (ABS QE) and Certifying Examinations (ABS CE). These were all questions related to general surgery. They also recruited two former general surgery oral board examiners and Members of the Academy of Master Surgeon Educators (MAMSE).

Out of 250 MCQs, GPT-4 got 197 answers correct leading to a 78.8% pass rate. For the clinical scenarios, GPT-4 failed 75% of the questions. The reasons for the failure were the wrong timing of intervention and suggested operations that were not correct.

Read Also: Artificial Intelligence Accurately Diagnoses Brain Tumors in Austrian Study

Clinical Significance

The study shows that LLM cannot be relied on to give accurate clinical solutions. Traditionally, surgical training has been “do as you see”, giving trainees room to learn in real-time and gain hands-on skills. However, simulation scenarios using AI prepare surgeons for real-life scenarios, in relation to concerns about patient safety.

LLM also performed poorly when required to analyze lengthy text prompts. The researchers think that the length and structure of the questions might have affected their ability to process the right information. Furthermore, this test was done on GPT-4, other LLMs have to be evaluated, who knows, the results may be different.

References

Roshal, J., Silvestri, C., Sathe, T., Townsend, C., Klimberg, V. S., & Perez, A. (2024). ChatGPT-4 as a board-certified surgeon: A pilot study. medRxiv. https://doi.org/10.1101/2024.05.31.24307894

FEEDBACK:

Want to live your best life?

Get the Gilmore Health Weekly newsletter for health tips, wellness updates and more.

By clicking "Subscribe," I agree to the Gilmore Health and . I also agree to receive emails from Gilmore Health and I understand that I may opt out of Gilmore Health subscriptions at any time.