Bank statement tampering detection with LLMs: the forensic prompt chain that catches surgical edits

This piece is part of the broader AI fraud detection playbook for lenders, focused on the specific question of bank-statement tampering — the single document that carries the most underwriting weight and the most fraud surface.

A surgically edited bank statement looks clean. The fraudster has not redrawn the document. They have opened it in a PDF editor, changed two or three numbers, and saved. The fonts match. The header matches. The bank’s footer disclaimer is intact. A human reviewer paging through 30 transactions in two minutes catches none of it.

What follows is the four-check LLM chain we run against statements that survive our deterministic first pass, the synthetic benchmark we used to size the catch rate, and the candid section on what the chain still misses.

The three surgical edits that fool human reviewers

Across the 50 synthetic statements we built and the smaller set of real-world flagged cases shared by lender contacts, three patterns cover almost every surgical edit attempt:

Inflate one or two salary credits. The fraudster changes USD 4,200 to USD 6,200 on the 28th-of-month payroll line. Apparent income jumps. The annual run-rate now clears the affordability cutoff. Nothing else needs to change if the closing balance is also adjusted.
Suppress one recurring debit. An existing EMI of USD 380 disappears from every month. Proposed FOIR drops from above policy to comfortably inside it. The statement looks lighter than reality.
Reconcile the closing balance. Whichever of the first two edits was made, the running balance is recomputed line-by-line so opening + credits − debits = closing for every period. Without this, the manipulation is trivially detectable.

The reason these work against humans is that humans verify by scanning. The math is rarely re-totalled. The cadence of recurring transactions is rarely cross-referenced across months. The PDF is rarely opened in a tool that exposes its edit history. That’s the gap an LLM fills — not because the model is smarter than the reviewer, but because it does the boring arithmetic every time.

Check 1 — internal arithmetic

The first check is the cheapest and catches the most fraud per dollar of compute. The LLM is given the extracted text of the statement, instructed to identify every transaction line with date, description, debit, credit, and running balance, and asked to recompute opening + credits − debits = closing for the full period and for each daily roll.

The prompt:

You are a forensic auditor. Below is the extracted text of a bank statement.

For the full statement period:
1. Extract opening balance, total credits, total debits, closing balance from
   the statement summary (if present).
2. Independently sum every credit line and every debit line you can identify.
3. Compute: opening + summed_credits - summed_debits = expected_closing.
4. Flag if expected_closing differs from stated closing by more than USD 1.00.

Then for each transaction line, compute the running balance step:
   prev_balance + credit - debit = stated_running_balance
Flag any line where this identity fails.

Output a structured JSON: { period_check: ok|fail, line_checks: [...] }
Use only data present in the statement; never infer missing values.

On the synthetic set, the arithmetic check alone catches 31 of 40 tampered statements. The 9 it misses are the ones where the fraudster also reconciled the running balance — i.e. the careful cases. False positives across 10 clean statements: zero, which is the expected outcome for a deterministic computation against fully extracted text.

The check fails when the OCR step before it is bad. If the extraction skips a digit or merges two columns, the math will fail on a clean statement and you will spend an hour confused. Spend the time on a good extractor. Tabula, AWS Textract, and Azure Document Intelligence are all defensible choices; pick one and harden the pre-processing.

Check 2 — cadence

Cadence is the rhythm of recurring credits and debits. A salary lands on the same calendar day each month. A rent debit lands within a one-day window. An EMI to the same beneficiary is identical to the rupee or cent across periods. When a fraudster inflates one salary credit but leaves the others alone, the inflated month breaks cadence on amount. When they suppress an EMI, the cadence breaks on existence: a counterparty that appeared every month for six months suddenly does not.

The prompt:

Below is the extracted transaction list from a bank statement covering at least
four months.

Identify all recurring patterns: same counterparty / description string / amount
appearing at least three times in a roughly monthly cadence (28-32 days).

For each recurring pattern, flag any month in the period where:
  - The pattern is missing entirely.
  - The amount deviates more than 5% from the median amount.
  - The date deviates more than 4 days from the median day-of-month.

Output a JSON: { patterns: [{counterparty, median_amount, median_day, anomalies: [...]}] }

This check catches the suppressed-EMI pattern that beats the arithmetic check. It also surfaces noise — legitimate variations like a bonus month or a missed salary credit during a job change. We accepted a higher false-positive rate here (12% on clean statements) and pair it with a routing rule: cadence-only flags go to soft review, not hard decline.

Check 3 — vendor fingerprint

Every bank’s PDF has a fingerprint. Header layout, font family, column widths, footer disclaimer wording, the exact phrasing of the “this is a computer-generated statement” line, the position of the bank’s logo. A surgical edit done in a PDF editor occasionally bumps a font, shifts a column, or replaces a character with a near-but-not-identical glyph. A regenerated fake template often gets the disclaimer subtly wrong.

We maintain a fingerprint library: for each bank we see often, three to five sample statements known to be genuine, with extracted features (font names, header bytes, footer text, page-margin geometry). The LLM compares an incoming statement against the fingerprint of the bank it claims to be from.

The fingerprint check is the most labour-intensive to set up and the lowest-yield once it’s running. On our benchmark it added two incremental catches over checks 1 and 2 combined. It also flagged one clean statement as suspect because the bank had quietly updated its disclaimer wording in late 2025 — a maintenance cost we hadn’t budgeted for. Worth running for high-volume bank-customer pairs; not worth setting up for a long tail of one-off banks.

Check 4 — PDF metadata and extraction signals

The PDF itself carries forensic signal that the visible content does not. Producer string (“Adobe Acrobat Pro DC” on a statement that should have been generated by the bank’s core banking system is a classic tell). Modification dates more recent than the statement’s stated period. Embedded fonts that don’t match the producer’s standard. OCR-layer text that disagrees with the visible text — a sign that someone covered an original number with a white box and printed a new one on top.

Below are the metadata fields and extraction artifacts of a PDF that purports
to be a bank statement.

Flag any of:
  - Producer string inconsistent with a bank-generated PDF (e.g., consumer
    PDF editors).
  - Modification date later than the latest transaction date in the statement.
  - Multiple distinct font subsets across what should be uniform table rows.
  - OCR layer text that disagrees with the embedded text layer at any
    transaction line.
  - Pages with different DPI / dimensions / orientation in the same PDF.

Return JSON: { metadata_flags: [...], extraction_flags: [...] }

This is the check that catches the careful fraudster who beat checks 1 and 2 by reconciling the math and amounts. It is also the easiest check to run before sending anything to an LLM — the metadata fields are deterministic and free. Run them first if you want; we run them inside the chain so the model can correlate metadata anomalies with content anomalies in a single explanation.

The original artifact — a synthetic statement and the chain in action

We built a synthetic four-month checking-account statement for a fictional borrower, “M. Reyes”, earning a stated USD 5,400 net monthly with one rental debit, one EMI, and the usual long tail of card payments and grocery debits. We then introduced three documented edits: salary inflated to USD 7,200 in two months out of four, the EMI line removed in all four months, and the running balances recomputed.

Running the four-check chain against the tampered version:

Check	Output	Caught the fraud?
Arithmetic	period_check: ok, line_checks: ok	No — math reconciled correctly
Cadence	EMI counterparty “AutoLoanCo” missing in all four months despite typical 4–6 month appearance window in priors	Yes — flagged as suppressed recurring debit
Vendor fingerprint	Header layout matches; one footer character slightly off-kerned	Soft flag — inconclusive
PDF metadata	Modification date 11 days after stated statement period; producer string inconsistent	Yes — flagged as post-hoc edited

Two of four checks fired hard. The combined signal is strong enough to route the file to manual review. In our 50-statement benchmark, the four-check ensemble catches 41 of 50 tampered statements — an 82% catch rate. False positives across 50 clean statements: 4, which after one round of threshold tuning dropped to 3. That’s a 6% false-positive rate at an 82% true-positive rate, which is the kind of trade-off a small lender’s review team can actually staff.

What the chain still misses

Two categories defeat all four checks.

The first is the pixel-perfect regenerated statement. A determined fraudster does not edit a real PDF — they regenerate one from scratch in a template that matches the bank’s exactly. The math is internally consistent. The cadence is plausible because it was designed to be. The fingerprint matches because the template was reverse-engineered. The PDF metadata is clean because the file was generated fresh. We have not seen a reliable text-only check that catches this. The defence is to bypass the document entirely — pull the data from an account aggregator (India’s AA framework, Open Banking in the UK and EU under PSD2/PSD3, similar regimes in Brazil and Australia) or directly from the bank API where the borrower consents. If you cannot do that, score the document but do not rely on it as the sole income evidence.

The second is collusive bank-insider statements. A genuine bank employee generates a real statement for a non-existent account, or for a real account with manipulated content at the source. The PDF is bit-for-bit a real bank document. Every check passes. Defences here are not document-side at all — they are external corroboration: cross-checks against payroll APIs, employer registries, credit bureau trade-line confirmation, and post-disbursal monitoring of where the disbursed funds actually flow. If the bank statement says salary lands in account X, the disbursal repayment should be coming out of account X.

A note on jurisdictional layering

How regulators expect you to use these checks varies, but the through-line is the same: AI-assisted document forensics is permitted as part of the verification stack, not as the sole control.

In the United States, the FFIEC BSA/AML manual sets supervisory expectations on documentary verification rigour, and the Federal Reserve’s synthetic identity fraud research frames the policy concern. In the European Union, the EBA’s remote-onboarding guidelines (EBA/GL/2022/15) explicitly contemplate AI in the verification stack with proportional human oversight. The UK’s FCA financial-crime guide aligns. Singapore’s MAS has comparable expectations under its technology risk management guidelines. India’s RBI digital-lending rules and account-aggregator framework push toward verified data feeds where available, but document checks remain the fallback for borrowers outside the AA universe. Across regimes, FATF’s international standards set the baseline.

The pattern: if the LLM chain flags a statement, route it to human review with the model’s explanations attached. Do not auto-decline on a flag alone. Keep the audit trail. That posture survives examination in every jurisdiction we operate in.

Where to go from here

The full set of fraud-detection prompts — synthetic identity, document tampering, ring fraud, payout-stage scams — sits inside the Fraud Detection with AI Playbook. It includes the four-check chain in this post with parameter tuning notes, plus the benchmark scripts we used to generate the catch-rate numbers.

Next read: the ten red flags AI catches in loan applications — the broader checklist that the bank-statement chain plugs into.

Frequently asked questions

Can an LLM detect a fake bank statement?

Yes, on surgical edits — the kind where a fraudster has changed two or three numbers in an otherwise real statement. A four-check LLM chain (arithmetic, cadence, vendor fingerprint, PDF metadata) catches roughly 80% of those edits in our synthetic benchmark of 50 statements. It does not reliably catch a fully regenerated, pixel-perfect fake produced from scratch in a matching template — those need a different layer (account-aggregator pulls or direct bank APIs).

What's the most common bank-statement manipulation pattern in loan applications?

Three patterns dominate: inflating one or more salary credits to lift apparent income, suppressing one recurring debit (often an existing EMI) to lower apparent FOIR, and recomputing the running balance so the math still adds up. A human reviewer scanning a 30-page PDF in two minutes will miss all three. Structured arithmetic and cadence checks catch them in seconds.

Is using an LLM for document forensics compliant with AML rules?

Major regulators treat AI-assisted document checks as augmentation, not as the sole control. The EBA's remote-onboarding guidelines, the FFIEC BSA/AML manual, the FCA's financial-crime guidance, and FATF's standards all permit AI in the verification stack provided there is human oversight on flagged cases, an audit trail, and the AI is not the only decision input. A statement that fails the LLM chain should be queued for human review, not auto-declined.

Should I run an LLM forensic check before or after my deterministic rules?

After. Run cheap deterministic rules first — known-bad IBAN/account number lists, duplicate file hashes, blacklisted employer names — and only run the LLM chain on the survivors. The LLM is the most expensive step per file, both in tokens and in human review time on flags. Putting it last keeps unit economics workable as volume scales.

Sources

Opportunities and Challenges of New Technologies for AML/CFT · Financial Action Task Force
BSA/AML Examination Manual — Customer Due Diligence · Federal Financial Institutions Examination Council
Synthetic Identity Fraud in the U.S. Payment System · Federal Reserve Board
Guidelines on the use of remote customer onboarding solutions (EBA/GL/2022/15) · European Banking Authority