I Tested ChatGPT, Perplexity, Co-Pilot, and Gemini on a Real Job Task. Here’s Who Cracked Under Pressure

Four tools enter the ring. Which one should you trust for research, creativity, or precision?

Jul 27, 2025

Everyone loves to brag about their favorite AI tool — until you need it for something that actually matters. Not a blog post. Not a vacation itinerary. I’m talking real-world work where accuracy isn’t optional.

The Setup:

As someone who works in quality assurance for medical imaging, I ran into a practical problem:
Our reporting system uses dropdown menus to describe renal cysts. If you select “anechoic,” that implies a simple cyst — but the report didn’t actually say that. For compliance, we needed a sentence that spelled it out and cited medical literature or accrediting bodies like the American College of Radiology (ACR).

Sounds straightforward, right? Wrong. Here’s the kicker: this isn’t creative writing. This is medical reporting. A mistake isn’t just embarrassing — it’s risky. So, I decided to see if AI could handle it.

The Showdown:

I picked four heavy hitters: Perplexity, ChatGPT, Co-Pilot, and Gemini. Here’s how they stacked up.

Perplexity

What I liked: It gave me 7 citations, mostly legit. It felt like an optimized Google search with some added context.
Where it fell short: Over-relied on one article. Worse? It confused renal cysts with adnexal cysts (which, for the non-medical folks, is like mixing up your kidneys and your ovaries).
Verdict: Good for fast sourcing… but nuance? Not its strong suit.

ChatGPT

What I liked: Fantastic for framing the concept in natural language. Made my report read like a human wrote it.
Where it fell short: Citations? Hit or miss. Even when it gave them, I had to fact-check every single one.
Verdict: Great creative partner, terrible compliance officer.

Co-Pilot

What I liked: Clean summaries. Gave me two different articles (different from Perplexity).
Where it fell short: Those articles weren’t peer-reviewed or accrediting sources. BUT… when I asked again, it pulled ACR guidelines. Big win for persistence.
Verdict: Not bad when you dig deeper. You just have to know the right follow-ups.

Gemini

What I liked: Polished interface, easy to use.

Where it fell short: Felt more like a lightweight version of ChatGPT with citations that didn’t feel as authoritative as Perplexity’s.
Verdict: Pretty, but didn’t wow me.

The Bottom Line:

If I needed speed and citations: Perplexity.
If I wanted clear, natural summaries: ChatGPT.
If compliance was on the line: Co-Pilot (with persistence).
If I wanted something shiny to play with: Gemini.

Here’s the truth: none of these tools are perfect. They’re not built for judgment. They’re assistants, not authorities. And if you’re in healthcare (or any high-stakes field), you better not treat them as more than that.

Why This Matters:

It’s easy to love AI when you’re writing tweets. It’s another story when accuracy could impact a patient, a legal decision, or a financial report. If this experiment taught me anything, it’s this: AI will give you confidence faster than it gives you correctness. That’s dangerous if you stop asking “Why?” and “Where did this come from?”

Thought for you:

Would you trust AI to help you in medicine? Law? Finance? Where do we draw the line?
Drop your thoughts below — and if you want the full battle card comparison of these four tools, subscribe.

Until next time, stay human.

Dr. D

Dr. D | Still Being Human

Discussion about this post

Ready for more?