When "Do No Harm" Becomes "Do Not Proceed": Why I Couldn't Build a Medical Imaging GPT
Innovation meets roadblocks, and what that says abou the fugure of AI in healthcare.
Imagine an AI that could flag ultrasound mistakes in seconds. Not replace a radiologist, not diagnose patients, but assist sonographers, students, and quality assurance teams in catching errors early.
Sounds reasonable, right? I thought so too. Yes, they are starting to make such things in healthcare, but like anything else, at a financial cost.
Nevertheless, I attempted to build such a GPT.
I envisioned a tool that could review anonymized ultrasound images, compare them against established standards, and give feedback for educational purposes. Not to override specialists…but to learn where AI stands today.
The result…I hit a wall (accompanied by a mental smack and bruise)
Despite anonymized data, disclaimers, and a clear scope (“This is not a definitive diagnosis, just for educational purposes”), ChatGPT wouldn’t go there…(I heard Gandolf’s, “THOU SHALL NOT PASS!”) - yes I know - very nerdish.
Why this frustrates me:
On one hand, I get it. Healthcare stakes are high. One misinterpreted image could cause harm if someone uses it irresponsibly and guardrails are essential.
But here's the conundrum:
Students and clinicians are using AI anyway; in uncontrolled, unregulated ways because the technology exists.
Quality assurance teams like mine are exploring structured reporting, peer review, and AI as a support tool… not a replacement.
The roadblock can stall medicine’s learning, efficiency and innovation.
So, what’s the alternative? Pretend this doesn’t exist? Keep banning tools instead of guiding their use? As I keep saying in multiple posts, that’s like telling student’s/people not to use the internet 30 years ago.
The reality is…
AI gets it wrong.
Case in point — I once uploaded an ultrasound image labeled “ Rt bulb.”
In vascular imaging, “bulb” usually refers to part of a blood vessel (carotid system) in the neck, but it can also mean something completely different in another context.
AI took one look at the 2D image and decided it was a scrotal case (yep, that’s exactly as awkward as it sounds). Any trained radiologist, or even an experienced sonographer, would know it wasn’t that. The labeling was misleading, and the anatomy didn’t match, but the AI didn’t have that deeper reasoning.
So, I showed it another image from the same case, this time with a clearer depiction - blood flow. Only then did it correct itself and realize it was dealing with a blood vessel.
The point being:
A student learning from that first AI answer would have walked away thinking the wrong anatomy was correct.
It took my experience to recognize the mistake, dig deeper, and then teach the AI by adjusting its guidelines in the GPT.
In the right hands, that’s powerful; in the wrong hands, it’s dangerous.
For now, AI in imaging is just a tool. It needs human expertise to guide it, correct it, and refine its knowledge. That’s exactly why I wanted my students to use this custom GPT so that they could see its strengths and its limits in real time.
The problem? I couldn’t even create a shareable link for them to use in assignments. They’d have to use regular ChatGPT, which, in this case, wasn’t as accurate or efficient. Not yet, anyway.
And that’s the frustrating part. It’s not that AI makes mistakes (humans do too), but that we can’t even test it in an educational setting without roadblocks. If we could, students could learn both the “what” and the “why not” of AI in medicine, instead of pretending these tools don’t exist until they’re behind a paywall.
P.S. — Nerd Corner:
For those curious about just how off AI can be without the right context… in this case, it wasn’t just the “bulb” label that threw it. The AI mistook a vascular image for a scrotal structure. Fair enough - the still frame was ambiguous. But when I gave it a second image from the same study, any trained eye would have instantly recognized it as a blood vessel.
Here’s where it got weirder: the AI then misidentified which vessel it was; swapping the internal carotid artery (ICA) with the external carotid artery (ECA). In the real world, that’s not just a minor “oops.” Those vessels supply different parts of the head and face, and confusing them could completely change a diagnosis or treatment plan.
The only reason we got to the correct conclusion was because I could spot the error, challenge it, and feed in more prompts to help it self-correct. A student or novice sonographer relying solely on the first AI answer wouldn’t have had that advantage.This was actually a pathological case and for those that know, the ECA can take on the ICAs waveform if the ICA the ICA is occluded/blocked in some cases. The take home message being, AI can’t yet diagnose/recognize cases accurately in complex cases where it’s needed most.

As a teacher, I completely relate to your frustration because I see this same dynamic playing out in education. We’re not trying to replace human expertise—we’re trying to give students the chance to work with AI as a learning partner, to see its strengths and, more importantly, its blind spots. Shielding them from these tools in the name of safety doesn’t make the tools disappear; it just forces them to use less controlled, less educational versions outside of our guidance. Whether it’s interpreting an ultrasound or analyzing a literary text, AI will sometimes be confidently wrong. That’s why the classroom is the ideal place to explore those errors—so students can develop the critical thinking to question results, cross-check information, and understand context. By blocking access, we lose the opportunity to teach responsible use, and that’s a disservice to both the learners and the professions they’ll enter.