When AI Gets It Wrong | Tragically Human

Last verified: February 2026. AI moves fast. If something here doesn't match what you're seeing, it probably changed after this was written. Let me know and I'll update it.

Why It Happens

AI doesn't check its own answers. It generates the most likely text based on patterns — and sometimes "likely" and "correct" aren't the same thing. For the full picture of why, see How AI Actually Works.

The Five Failure Modes

1. Hallucinated Functions

The AI references functions, methods, or entire libraries that do not exist — a hallucination. Perfect syntax. Correct-looking usage. Completely made up.

# AI-generated code:
from reportutils import generate_quarterly_summary

report = generate_quarterly_summary(data, format="executive")

That import looks fine. The function name is reasonable. The parameter makes sense. But reportutils doesn't exist. Neither does generate_quarterly_summary. The AI invented both because they're plausible — not because they're real.

How to catch it: If you see a function or library you haven't used before, search for it. If the official docs don't have it, it doesn't exist.

2. Outdated Code

AI's training data has a cutoff. It generates code that worked in 2023 but uses APIs or syntax that have since changed. The code looks correct because it was correct — just not anymore.

How to catch it: For anything involving external libraries or services, check the version. Ask the AI: "Is this the current approach, or has this changed?" — but verify its answer independently. The AI's knowledge of "current" has the same cutoff problem.

3. Works Alone, Breaks With Your Project

AI writes code that runs perfectly in isolation but doesn't fit your existing project. Wrong naming conventions. Libraries you haven't installed. Assumptions about your data that don't match reality.

This is the most common failure mode when using AI inside an IDE — the AI has your files in its context window, but it might not be looking at the right ones.

How to catch it: Tell the AI about your project structure before asking it to build something. The more context it has, the fewer assumptions it makes.

4. Security Gaps

AI writes code that works but is unsafe. It's trained on public repos — including the ones with bad practices. Common gaps:

Storing passwords in plain text
No input validation
Hardcoded API keys
HTTP instead of HTTPS
SQL queries built from raw user input

How to catch it: Security is the one area where you should always get a second opinion. After the AI writes anything touching user data, authentication, or external services — ask a different AI to review it specifically for security. Better yet, ask a human. This is not a "probably fine" situation.

AI-generated security code is where real damage happens. Not "my app looks wrong" damage — "someone's data got leaked" damage. If the code handles passwords, payments, or personal information, treat every line as suspect until verified.

5. Subtle Logic Errors

The code runs. No errors. No crashes. It just doesn't do what you wanted. Sorts in the wrong direction. Skips the last item. Calculates tax before the discount instead of after.

These are the hardest to catch because everything looks right.

How to catch it: Test with real data, empty data, one item, a thousand items, and deliberately wrong input. If you're not sure how to test it, ask the AI: "Write tests for this code that cover normal usage, edge cases, and invalid input."

Red Flags

When reviewing AI output, these should make you pause:

You See This	It Might Mean
AI says "This should work"	Even the AI isn't confident — verify carefully
Code is unusually long or complex	Ask: "Is there a simpler way to do this?"
Lots of new library installs	Ask: "Can we do this with what's already installed?"
No error handling	Ask: "What happens when this fails?"
AI writes code immediately, no questions asked	It guessed your requirements instead of clarifying them
Confident explanation of something you can't verify	Classic hallucination pattern — check the source

How to Verify

Use a Second AI

This is the same second opinion technique from The Feedback Loop. Paste the code into a different AI tool:

Another AI wrote this code. Review it for:
1. Bugs or logical errors
2. Security vulnerabilities  
3. Outdated or deprecated functions
4. Edge cases that aren't handled

Different models have different blind spots. One often catches what the other missed.

Ask It to Explain Line by Line

Walk me through this code line by line.
For each line, explain what it does and why.

If the explanation contradicts itself or doesn't make sense — the code has issues. This also helps you understand what you're deploying, which matters when it breaks at 2 AM.

Ask It to Write Tests

Write tests for this code that cover:
- Normal usage
- Empty input  
- Invalid input
- Edge cases

If the AI can't write coherent tests for its own code, the code has problems.

Run It

Modern AI tools can execute code, not just write it. If your IDE supports it — actually run the code. Give it real inputs. Watch what happens. A working demo beats a confident explanation every time.

When to Call a Human

Part of getting good at working with AI is knowing when to stop asking the AI:

Security-critical code — payments, medical data, personal information. Not negotiable.
Same bug, 5+ rounds — if the feedback loop isn't converging, a human will spot the pattern faster.
You don't understand the code — if you can't follow the AI's explanation, don't deploy it. Reading Code Without Knowing It can help bridge that gap.
It "works" but feels wrong — trust that instinct. It's usually right.

The Actual Rule

Trust AI with syntax and boilerplate. Be skeptical of its architecture, its security, and its claims about how things work.

The verification habit: Before you accept any AI-generated code, ask yourself — "If this is wrong, will I know?" If yes, ship it. If no, test it first. The five minutes you spend verifying will save the five hours you'd spend debugging a confident mistake.