Underwriting thin-file borrowers with LLMs: the diligence prompt chain that turns alternative signal into a credit decision
A four-step LLM diligence chain that converts gig earnings, rent ledgers, and telco data into the same scorecard columns a thick-file applicant fills out — for borrowers in any market.
This post extends the how to underwrite loans with AI builder’s guide into the specific case that frustrates small lenders most: a borrower with no usable bureau record but a real, stable income.
The lazy frame is that thin-file borrowers need a different scoring model. They don’t. They need a different intake — a structured way to convert gig dashboard screenshots, rent ledgers, telco bills, and informal employer letters into the same scorecard columns a thick-file applicant fills out. Once those columns are populated, the rest of the underwriting workflow runs unchanged. This is the work an LLM does well and that a hand-built rules engine does poorly, because the inputs are messy and the variations are endless.
What follows is the four-step prompt chain we run, the signals worth pulling per region, the worked example that shows the chain converting a decline into a graded approval, and an honest section on where the chain still fails.
Why “thin-file” is a misleading frame
The term suggests a problem of missing information. It is more accurately a problem of missing channels. A São Paulo delivery rider has a complete picture of monthly earnings inside iFood’s app. A Manila freelance designer has six years of receipts in her e-money wallet. A Lagos market trader has a verifiable pattern of supplier payments through a payment processor. None of these sit in the bureau. All of them, fed to a structured intake prompt, populate enough of the scorecard to underwrite confidently — often more confidently than a thick-file salaried borrower whose payslips don’t tell you whether the employer is solvent next quarter.
Two consequences follow. First, building a separate “thin-file model” is mostly the wrong abstraction. Build a single scorecard and feed it through alternative-data ingestion when bureau data is absent or sparse. Second, the limiting factor is rarely the model — it is the willingness and ability to pull the alternative source. Where the source is one click away (Open Banking pulls in the EU and UK, Account Aggregator in India), thin-file underwriting works at scale. Where the source requires the borrower to manually upload screenshots and the lender to manually parse them, the unit economics break. The LLM chain bridges that gap when the API doesn’t exist.
The four classes of alternative signal worth pulling
Across markets, four classes of alternative signal carry the most weight in our experience. The names of the data sources change; the categories don’t.
The first is earnings data — verified income at source rather than reported by the borrower. Payroll APIs (Argyle, Pinwheel, Atomic in the US; Open Banking salary credits in the UK and EU; Account Aggregator in India), gig-platform exports (Uber, Lyft, DoorDash, iFood, Rappi, Grab, Ola, Bolt), and mobile-money receipt history (M-Pesa in Kenya, GCash in the Philippines, Orange Money across Francophone Africa) all sit here. Strongest signal, hardest to fake.
The second is obligations data — rent ledgers, utility bills, telco statements, existing instalment plans visible in payment-processor histories. This proxies the FOIR that the bureau would otherwise compute. Rent payment history through a third-party rent-tech platform is particularly clean; landlord-attested rent ledgers are more useful than nothing but lower trust.
The third is behavioural data — banking flow patterns (regular salary cadence, savings behaviour, overdraft frequency), e-commerce return rates if exposed, and platform tenure on the gig or e-money side. Behavioural data is rarely sufficient on its own but reliably nudges the grade up or down on the margin.
The fourth is social-graph stability — employer continuity (how long the borrower has been on the same platform or with the same employer), address tenure, and contactability across multiple channels. These are weak individually but useful as red-flag screens: a sudden flip on multiple stability indicators in the last quarter is a signal worth taking.
The four classes feed into one scorecard. The chain’s job is to extract them, validate them, and map them.
The 4-step prompt chain — the original artifact
The chain has four prompts, run sequentially, with each step’s output becoming structured input to the next.
Step 1 — Intake. Convert raw inputs into a normalised JSON record.
You are a credit-application intake assistant. Below is a mixed bundle of
inputs from a thin-file applicant: a self-declaration form, three months of
gig-platform earnings export, a screenshot of a rent-app payment history, a
telco statement, and an employer letter from an informal employer.
Produce a single JSON record with these fields:
applicant: { name, age, country, region, contact_methods }
income: {
primary_source, primary_source_type, monthly_amounts: [...],
median_monthly, last_12m_volatility_pct
}
obligations: { rent_monthly, utilities_monthly, existing_emis: [...] }
stability: {
primary_source_tenure_months, address_tenure_months,
contactable_across_channels: bool
}
red_flags: [...]
raw_evidence_pointers: [...] // where each field came from in the inputs
For every field, include the source pointer. Use "[gap: X]" for any required
field not present. Do not infer values.
Step 2 — Signal extraction. Compute the derived signals the scorecard wants from the normalised record.
You are a credit-signal extraction model. Below is a normalised JSON record of
an applicant's intake. Produce a derived-signals JSON:
affordability_metrics: {
median_monthly_income_usd, total_obligations_usd, proposed_emi_usd,
proposed_foir_pct, residual_income_usd
}
stability_score: { tenure_band, address_band, channel_breadth }
earnings_quality: {
volatility_band (low/medium/high), trend (rising/flat/declining),
seasonality_present: bool
}
red_flag_count: int
evidence_strength: { each input class scored low / medium / high }
Convert all currency to USD using a stated FX assumption; flag if FX is needed.
Step 3 — Scorecard mapping. Map the derived signals to the existing scorecard’s columns.
You are a credit scorecard mapper. The scorecard's columns are:
income_score, obligations_score, stability_score, earnings_quality_score,
fraud_indicator_score
Each is on a 1-5 scale per the scorecard rubric below.
Given the derived signals JSON, populate each column with a score and a
one-sentence justification citing the specific signal that drove it.
Rubric:
{INSERT_YOUR_SCORECARD_RUBRIC_HERE}
Step 4 — Memo. Produce the credit memo from the populated scorecard.
You are an underwriting assistant. Given the populated scorecard below, produce
a credit memo with these exact sections:
Proposed Decision (Approve / Approve with Conditions / Decline / Refer)
Risk Grade (A/B/C/D, with one-sentence justification)
Rationale (3-5 bullets, factual)
Conditions Precedent (numbered)
Deviations from Policy (or "None required")
Gaps / Outstanding Items (with [gap: X] tags)
For every figure cited, name the source field. Do not infer.
The fourth step is the same memo prompt detailed in the credit-memo generation post — that’s deliberate. The thin-file chain is the front-end that converts unstructured inputs into the structured record the existing memo workflow already handles.
A side-by-side worked example
A synthetic borrower: “L. Okafor”, 29, gig delivery rider in a representative emerging market, requesting a USD 1,200 12-month personal loan. Bureau record: thin (one closed mobile-loan trade-line, four years old, paid in full).
Run through a traditional bureau-only scorecard, the application returns insufficient data. The decision tree exits at “no scorable bureau record → decline” before any other field is considered. Nothing else about the borrower is examined.
Run through the four-step LLM chain with the alternative inputs supplied (gig-platform export showing 14 months of earnings averaging USD 540/month with mid-band volatility, rent-app history showing 11 months of timely USD 180 rent payments, telco statement showing 3-year tenure on a postpaid plan with no missed payments):
| Scorecard column | Bureau-only outcome | LLM-chain outcome |
|---|---|---|
| Income | n/a | 4 of 5 (verified gig income, mid volatility) |
| Obligations | n/a | 4 of 5 (rent + telco only, FOIR 28%) |
| Stability | n/a | 3 of 5 (14m gig tenure, 11m address tenure) |
| Earnings quality | n/a | 3 of 5 (flat trend, mid-band volatility) |
| Fraud indicators | n/a | 5 of 5 (no flags) |
| Memo decision | Decline (insufficient data) | Approve with Conditions (Grade B-) at +250 bps over standard rate |
| Conditions | — | Verified payroll lock-in if available; minimum 6-month gig tenure recheck before any future top-up |
Two things to notice. The LLM-chain decision is more conservative on rate — it prices in the higher uncertainty of alternative data — and it carries explicit conditions that wouldn’t be on a thick-file approval. That’s the right shape. The bureau-only path declined a borrower who is creditworthy at the right price. The chain captures the price and the conditions cleanly. False-decline rate falls more than the bad-debt rate rises, which is the only honest measure of whether alt-data underwriting is working for you.
Where the chain fails
Three failure modes worth surfacing.
Gig-platform earnings volatility past the 12-month window. The chain reads the export it is given. If the export covers 12 months and the prior 12 months were materially different — a different city, a different platform, a longer break — the chain has no view. Approval grades inflate when seasonality lengthens. Mitigation: cap the maximum tenure at the inverse of the export window (12-month export → maximum loan tenure 12 months), and require a refresh on any top-up.
Off-platform obligations the borrower doesn’t mention. The chain sees what the inputs include. A borrower with a parallel loan on another platform that doesn’t show in their bank statements — common in markets with active informal lending — is invisible. Mitigation: include credit-bureau pull for whatever sliver exists, plus a self-declaration with stated penalties for omission, plus where available a multi-bureau or data-cooperative pull.
Signal that correlates with protected attributes. This is the one that matters most for compliance. Postal codes, employer names, gig platforms, even certain telco operators correlate with protected attributes in many markets. The chain is not bias-aware on its own. Mitigation: run a quarterly fair-lending review on approval rates and pricing by protected class on the applications you have ground truth on, document the diligence prompts, and never include protected attributes as inputs even indirectly. The CFPB’s stance is that explanations must be specific and that “complex algorithm” is not a defence; assume the same standard applies wherever you operate.
A regional reality check
The chain itself is region-agnostic. The data sources differ.
In the United States, payroll APIs (Argyle, Pinwheel, Atomic) and Plaid bank-flow pulls cover most of what you need. In the United Kingdom and the European Union, Open Banking under PSD2 — and PSD3 once it lands — gives you direct, verified bank-flow data with consent. In India, the Account Aggregator framework provides consented, verified pulls across banks, mutual funds, and increasingly insurance and tax data. In Brazil, Open Finance (the central bank’s evolution of Open Banking) gives comparable coverage. In Kenya and across East Africa, M-Pesa statements are the single most useful alt-data source for thin-file scoring — they capture earnings, obligations, and behavioural cadence in one feed. In the GCC, salary-transfer letters and the Wage Protection System data offer verified earnings. In Australia, the Consumer Data Right (CDR) extends Open Banking-style access. In Singapore, MyInfo plus SGFinDex serve a similar role for the formal segment.
The chain runs the same against any of these. The CSV export differs. Build the chain to accept whatever fields the highest-quality source in each market provides, and let it use a smaller subset where the source is poorer. The output structure stays constant.
Where to go from here
The full diligence chain — including the rubric, the test cases, and the eight prompts that wrap around the four shown here — sits in the AI Lending Prompt Library. The library is the production version of what’s described above, with regional variants for each of the data sources named in the reality check.
Next read: the credit-memo generation post — the back-end of the chain in detail, since the thin-file chain hands the memo prompt the same structured record a thick-file application would produce.
Frequently asked questions
What counts as a 'thin-file' borrower in 2026 globally?
A thin-file borrower is anyone for whom the standard credit bureau in their country returns either no record or a record too sparse to score. The World Bank's Findex data puts roughly 1.4 billion adults outside formal banking globally, and a much larger group has bank accounts but no scoreable bureau history — recent migrants, gig workers, young first-jobbers, returning expatriates, and most of the adult population in markets where bureau penetration is below 60%. The label is the same; the underlying data picture differs sharply by region.
Which alternative data source has the highest predictive lift?
Verified earnings data — payroll-API pulls, gig-platform export, mobile-money payment-receipt history — has the strongest predictive lift in our testing and in the public research we trust. Rent and utility ledgers are second. Behavioural banking flows are third. Social-graph stability indicators are fourth. The single highest-lift source is whichever one tells you the borrower's actual cashflow, verified at source rather than self-reported. Everything else is corroborating signal.
Can an LLM safely make the credit decision for a thin-file applicant?
No, and the regulatory frame across most jurisdictions is explicit on this. The LLM should structure the diligence and propose a decision; the human underwriter owns the decision. The CFPB in the US, the FCA in the UK, the EBA in the EU, the RBI in India, MAS in Singapore, and APRA in Australia all converge on the same posture for AI in credit: explainability, human oversight, and demonstrable non-discrimination across protected attributes. An LLM-only decision on a thin-file applicant fails that bar in every one of those regimes.
How do I avoid disparate-impact problems when alternative data correlates with protected attributes?
Three operational steps. First, never include a protected attribute as an input feature, even indirectly via proxies you know correlate (postal code is the classic). Second, run a regular fair-lending review on outcomes by protected class on a sample basis — if your approval rate or pricing diverges across groups for similar applications, you have a problem to investigate even if the model didn't mean to. Third, document the diligence prompt chain so a regulator or your own audit team can trace why each application got the decision it did. The CFPB has been clear that 'the algorithm did it' is not an adverse-action defence.
Sources
- Data Point: Becoming Credit Visible · Consumer Financial Protection Bureau
- Big tech and the changing structure of financial intermediation (BIS Working Paper No. 779) · Bank for International Settlements
- The Global Findex Database 2021 · World Bank
- Guidance for firms on the fair treatment of vulnerable customers (FG21/1) · Financial Conduct Authority