Is AI reliable enough to underwrite loans today?

AI is reliable enough to augment specific qualitative steps in underwriting — borrower summaries, document consistency checks, narrative-risk assessment. It is not reliable enough to replace the quantitative scorecard or the final committee decision. The design pattern that works is AI-assisted, human-decided.

Which LLM should I use for underwriting tasks?

As of April 2026: Claude 3.5 Sonnet for long-form analysis and narrative-risk writing, GPT-4o for structured extraction and document checks, Gemini 1.5 Pro when you need very large context windows. Model-agnostic prompt design is more durable than model-specific optimisation.

Can I replace my scorecard with an LLM?

No. Replace qualitative columns in your scorecard (employment-stability narrative, document-consistency check) with LLM-assisted ones. Keep the quantitative columns (DTI, FOIR, bureau score) deterministic. A scorecard your credit committee can't audit is a scorecard your regulator will eventually question.

How do I stop the model from hallucinating?

Three levers: (1) constrain the input to exactly what's needed, nothing more; (2) specify the output schema precisely and reject free-form output; (3) demand explicit 'I don't know' when the input doesn't support a conclusion. Hallucination is almost always a prompt problem, not a model problem.

What about regulatory risk?

In India, the RBI Digital Lending Guidelines require explainability and human oversight in automated credit decisions. In the US, ECOA and Regulation B enforce non-discrimination in credit decisions regardless of model type. The short answer: AI is allowed where explainable and audited; it is not allowed as an unexplained decision-maker. Design for this from day one.

How to underwrite loans with AI: a builder's guide (2026)

If you’ve tried to do loan underwriting with ChatGPT, you’ve probably felt the same thing everyone else has: the first prompt is magical, the third is suspect, and by the fifth you’re back to your old spreadsheet, convinced AI isn’t ready. The honest version is that AI is ready — but not for the job you were asking it to do.

This is the full builder’s guide to the job it is ready for: an AI-assisted underwriting workflow that compresses the hour-long “read the whole file” step into five minutes, while keeping every decision auditable, defensible, and human-gated. It’s the workflow I’ve stress-tested against real (anonymised) lender files and synthetic data, and it’s the one the prompt library on this site is built to feed.

The design principle that makes this work

AI-assisted, human-decided. That’s the whole philosophy. Wherever the model is allowed to produce an output that drives a decision, a human must accept, edit, or reject before the loan moves forward. Wherever a human is doing a summarisation, extraction, or pattern-check that an LLM can do as well or better, the LLM does it and the human reviews.

If you follow this principle, AI earns its seat cleanly. If you drop it — if you let the model decide — you end up with the over-trusting demos that make seasoned credit folks roll their eyes, because eventually one of those confidently-wrong outputs reaches a real loan, and you own it.

The six-stage workflow

Most underwriting workflows have twelve or fifteen micro-steps. At the useful level of abstraction, they collapse into six:

Intake — the applicant submits the application and supporting documents.
Borrower summary — someone reads the file and produces a neutral one-page summary.
Verification — income documents, employment, identity, bureau are all checked for consistency with the summary and with each other.
Affordability — DTI, FOIR, stress-tested EMI, co-applicant blending.
Narrative risk — the qualitative paragraph that asks: “knowing everything we know, would I lend this borrower this amount at this price?”
Credit memo and decision — the structured output that goes to the committee or auto-approve engine.

AI earns a seat in three of these six. Steps 2, 3, and 5. It does not earn a seat in 1, 4, or 6, and trying to insert it there is the classic beginner’s mistake.

Stage 2: borrower summary

What a human does: reads the full application plus supporting documents, writes a 200-word neutral summary that feeds every subsequent step. Takes 25 minutes when a human does it carefully, five when they don’t.

What the LLM does: takes the structured application data plus extracted document text, produces the 200-word summary to a specified schema (employment, income, obligations, reason for borrowing, notable features). Takes ten seconds.

Why it works: the task is summarisation, not decision-making. The model has no latitude to invent — the inputs are the inputs, the schema constrains the output, and the human reviewer sees both the summary and the source documents side-by-side. Net saving: ~20 minutes per file, at a quality that’s at least as good as a tired underwriter on a Friday afternoon.

Where it breaks: borrowers with unusual but legitimate situations (sabbatical returning, just-inherited-assets, blended-family finances). The model can misrepresent these as risk factors when they aren’t. Fix: the schema forces the model to say “ambiguous” for fields it can’t confidently fill, and a reviewer catches these before they propagate.

Stage 3: verification (document consistency)

What a human does: cross-references payslips against bank statements against the employment letter against the application form. Slow, error-prone, easy to miss.

What the LLM does: runs a document-consistency prompt that explicitly asks: “Are these documents consistent with each other and with the application?” Returns a structured list of any inconsistencies found, with evidence citations back to specific documents.

Why it works: pattern-matching across long, noisy text is exactly what LLMs are good at. A well-designed prompt catches income-document manipulation patterns that trained underwriters miss under time pressure.

Where it breaks: sophisticated manipulation (doctored PDFs with internally-consistent numbers), high-legitimacy edge cases (freshly-started jobs, deliberate salary-in-kind arrangements). Fix: run the AI check first, then route flagged files to senior underwriters; do not auto-decline on an AI flag.

Stage 5: narrative risk

What a human does: writes the one-paragraph qualitative risk narrative — the paragraph a credit committee actually reads before voting.

What the LLM does: drafts the first version of that paragraph from the summary, verification results, and affordability numbers. The human underwriter edits or rejects; the committee sees the human-edited version, not the raw model output.

Why it works: writing neutral, well-structured prose from structured inputs is where LLMs shine. The human underwriter retains authorship of the paragraph; the model just handles the first draft. Time saving per file: ~10 minutes.

Where it breaks: the model tends toward over-confident language (“the borrower is unlikely to default…”). Fix: prompt-engineer the model to write in the conditional, to flag uncertainty, and to refuse to take a final position.

What doesn’t work (yet)

Stage 1 — intake. Tempting to let an LLM “interview” the applicant. Does not work in 2026 at a quality that meets regulatory scrutiny. Use structured forms.

Stage 4 — affordability. This is arithmetic. LLMs are bad at arithmetic. They are also bad at admitting they’re bad at arithmetic. Use a spreadsheet and a calculator.

Stage 6 — decision. A loan committee exists for reasons that are partially legal, partially governance, partially cultural. Replacing a committee with a model is not a time-saving move — it is a liability-reshaping move. Design for committee-augmentation, not replacement.

The scorecard sits underneath all of this

A reminder: the workflow above is the process. The scorecard is the artifact that captures decisions. AI augments the process; it does not replace the scorecard. If you need a starting scorecard with the AI-assist columns pre-wired (for Stage 2 employment-stability and Stage 5 narrative-risk), that’s what the Credit Scorecard Template is.

Three failure modes to internalise

I could write ten of these. I’ll give you the three that actually show up.

Hallucinated income. The model reads a messy payslip, can’t quite tell the base from the allowances, and confidently outputs a number that’s 15% off. Happens maybe once in fifty files. The fix is not “a better prompt” — the fix is two sources for every income number (payslip + bank statement) and a hard-stop rule that the two must agree within 5% or the model is overridden.

Over-confident employment-stability claims. The model sees a borrower with seven years at one employer and writes “highly stable employment.” Fine. Then it sees a borrower with three jobs in two years and writes “moderately stable” because it’s been prompt-engineered to hedge. Neither of those is always right. Fix: force the model to quote the evidence for every employment-stability claim, not summarise it.

Document-consistency false negatives. The model checks payslip-against-bank-statement, both consistent, returns “no issues.” Turns out both documents were manipulated by the same fraudster, in the same way, from the same template. The model was right that they were consistent — but that’s the failure mode, not the success case. Fix: combine AI consistency check with deterministic checks (known employer registry, bank statement metadata, statement period sanity checks).

A realistic expectation

An AI-assisted workflow, done right, compresses underwriting time per file by 50–70%, improves consistency across underwriters by reducing reliance on individual judgment for the summarisation and consistency-check steps, and leaves the big decisions — and the liability — exactly where they belong: with the humans who sign them.

Don’t expect 10x. Expect 2–3x on speed, modestly better on consistency, unchanged on final-decision quality (because the humans are still deciding). That’s the honest delta, and it’s worth the investment.

Where to go from here

The Prompt Library contains the exact prompts for Stages 2, 3, and 5. The Scorecard Template gives you the scorecard with those stages pre-wired. The 7-Day Course walks you through designing the workflow yourself for a specific loan product, which is the best way to internalise the craft.

If you just want to try one thing this afternoon: grab the $9 Prompt Starter, open Stage 2’s borrower-summary prompt, run it against a file from last month, and see if the five-minute output is as useful as the 25-minute one a human wrote. That’s the cheapest way to find out whether the rest of this is worth your time.

How to underwrite loans with AI: a builder's guide (2026)

The design principle that makes this work

The six-stage workflow

Stage 2: borrower summary

Stage 3: verification (document consistency)

Stage 5: narrative risk

What doesn’t work (yet)

The scorecard sits underneath all of this

Three failure modes to internalise

A realistic expectation

Where to go from here

Frequently asked questions

Sources

The design principle that makes this work

The six-stage workflow

Stage 2: borrower summary

Stage 3: verification (document consistency)

Stage 5: narrative risk

What doesn’t work (yet)

The scorecard sits underneath all of this

Three failure modes to internalise

A realistic expectation

Where to go from here

Frequently asked questions

Sources

More from this series

AI fraud detection for lenders: patterns, prompts, and the playbook