Last verified: February 2026. AI moves fast. If something here doesn't match what you're seeing, it probably changed after this was written. Let me know and I'll update it.
Why It Happens
AI doesn't check its own answers. It generates the most likely text based on patterns — and sometimes "likely" and "correct" aren't the same thing. For the full picture of why, see How AI Actually Works.
The Five Failure Modes
1. Hallucinated Functions
The AI references functions, methods, or entire libraries that do not exist — a hallucination. Perfect syntax. Correct-looking usage. Completely made up.
# AI-generated code:
from reportutils import generate_quarterly_summary
report = generate_quarterly_summary(data, format="executive")
That import looks fine. The function name is reasonable. The parameter makes sense. But reportutils doesn't exist. Neither does generate_quarterly_summary. The AI invented both because they're plausible — not because they're real.
How to catch it: If you see a function or library you haven't used before, search for it. If the official docs don't have it, it doesn't exist.
2. Outdated Code
AI's training data has a cutoff. It generates code that worked in 2023 but uses APIs or syntax that have since changed. The code looks correct because it was correct — just not anymore.
How to catch it: For anything involving external libraries or services, check the version. Ask the AI: "Is this the current approach, or has this changed?" — but verify its answer independently. The AI's knowledge of "current" has the same cutoff problem.
3. Works Alone, Breaks With Your Project
AI writes code that runs perfectly in isolation but doesn't fit your existing project. Wrong naming conventions. Libraries you haven't installed. Assumptions about your data that don't match reality.
This is the most common failure mode when using AI inside an IDE — the AI has your files in its context window, but it might not be looking at the right ones.
How to catch it: Tell the AI about your project structure before asking it to build something. The more context it has, the fewer assumptions it makes.
4. Security Gaps
AI writes code that works but is unsafe. It's trained on public repos — including the ones with bad practices. Common gaps:
- Storing passwords in plain text
- No input validation
- Hardcoded API keys
- HTTP instead of HTTPS
- SQL queries built from raw user input
How to catch it: Security is the one area where you should always get a second opinion. After the AI writes anything touching user data, authentication, or external services — ask a different AI to review it specifically for security. Better yet, ask a human. This is not a "probably fine" situation.
AI-generated security code is where real damage happens. Not "my app looks wrong" damage — "someone's data got leaked" damage. If the code handles passwords, payments, or personal information, treat every line as suspect until verified.
5. Subtle Logic Errors
The code runs. No errors. No crashes. It just doesn't do what you wanted. Sorts in the wrong direction. Skips the last item. Calculates tax before the discount instead of after.
These are the hardest to catch because everything looks right.
How to catch it: Test with real data, empty data, one item, a thousand items, and deliberately wrong input. If you're not sure how to test it, ask the AI: "Write tests for this code that cover normal usage, edge cases, and invalid input."
Red Flags
When reviewing AI output, these should make you pause:
| You See This | It Might Mean |
|---|---|
| AI says "This should work" | Even the AI isn't confident — verify carefully |
| Code is unusually long or complex | Ask: "Is there a simpler way to do this?" |
| Lots of new library installs | Ask: "Can we do this with what's already installed?" |
| No error handling | Ask: "What happens when this fails?" |
| AI writes code immediately, no questions asked | It guessed your requirements instead of clarifying them |
| Confident explanation of something you can't verify | Classic hallucination pattern — check the source |
How to Verify
Use a Second AI
This is the same second opinion technique from The Feedback Loop. Paste the code into a different AI tool:
Another AI wrote this code. Review it for:
1. Bugs or logical errors
2. Security vulnerabilities
3. Outdated or deprecated functions
4. Edge cases that aren't handled
Different models have different blind spots. One often catches what the other missed.
Ask It to Explain Line by Line
Walk me through this code line by line.
For each line, explain what it does and why.
If the explanation contradicts itself or doesn't make sense — the code has issues. This also helps you understand what you're deploying, which matters when it breaks at 2 AM.
Ask It to Write Tests
Write tests for this code that cover:
- Normal usage
- Empty input
- Invalid input
- Edge cases
If the AI can't write coherent tests for its own code, the code has problems.
Run It
Modern AI tools can execute code, not just write it. If your IDE supports it — actually run the code. Give it real inputs. Watch what happens. A working demo beats a confident explanation every time.
When to Call a Human
Part of getting good at working with AI is knowing when to stop asking the AI:
- Security-critical code — payments, medical data, personal information. Not negotiable.
- Same bug, 5+ rounds — if the feedback loop isn't converging, a human will spot the pattern faster.
- You don't understand the code — if you can't follow the AI's explanation, don't deploy it. Reading Code Without Knowing It can help bridge that gap.
- It "works" but feels wrong — trust that instinct. It's usually right.
The Actual Rule
Trust AI with syntax and boilerplate. Be skeptical of its architecture, its security, and its claims about how things work.
The verification habit: Before you accept any AI-generated code, ask yourself — "If this is wrong, will I know?" If yes, ship it. If no, test it first. The five minutes you spend verifying will save the five hours you'd spend debugging a confident mistake.