Methodology

Cash flow is the oldest credit signal. We automated reading it.

Lendiro extracts 47 engineered features from 24 months of bank transaction records. A gradient boosting model trained on real repayment outcomes produces a score calibrated to predict 30, 60, and 90-day default probability — with no bureau input.

Three data layers, explained

All data derives from bank-permissioned transaction records. No synthetic data. No demographic variables. No zip codes.

01

Transaction inflow and outflow records

Raw material: the complete list of debit and credit transactions in the applicant's primary bank account(s) over the past 24 months. Includes ACH transfers, debit card transactions, payroll deposits, bill payments, and peer-to-peer transfers.

Why 24 months: a single month of bank data is noise. 24 months captures seasonal patterns, income growth or decline, and the full rhythm of recurring obligations. Our validation shows that predictive accuracy improves significantly from 6 to 12 months, and again from 12 to 24 — beyond 24, marginal gains drop sharply.

02

Recurring payment identification and classification

From the raw transaction stream, we identify and classify recurring payments: rent, utilities, insurance, subscriptions, loan repayments. Classification uses merchant name normalization and amount-stability heuristics to distinguish true recurring obligations from irregular spending.

The signal we extract is not the amount — it's the timing. A borrower who pays on the same date every month, within a narrow variance window, is displaying a financial discipline signal directly relevant to future loan repayment. We normalize for weekends, holidays, and payroll lag to avoid false negatives.

03

Transaction network structure

We model the applicant's transaction history as a directed graph: nodes are counterparties (employers, merchants, peers, utilities), edges are transactions. Graph features include counterparty degree centrality, employer transaction stability, merchant category entropy, and payment network breadth.

Economic integration is observable through transaction patterns. A borrower who transacts with stable employers, diverse merchants, and consistent utility providers occupies a more stable economic position than one with few, irregular counterparties — regardless of the absolute dollar amounts involved.

Gradient boosting on 47 engineered features. Not a black box.

Lendiro uses gradient boosting (XGBoost variant) trained on labeled repayment outcome data. The model is tree-based, not neural — which means each decision is traceable to specific feature contributions. This explainability is not a convenience: it's required for ECOA adverse action notice compliance.

Training objective: minimize 90-day default prediction error on held-out validation data. The model is periodically retrained as new outcome data accumulates. Integration partners do not need to retrain or tune any parameters — the model is a black-box API from their perspective, but internally each decision tree is auditable.

47 input features fall into four families: velocity features (rate of change), rhythm features (regularity), balance features (headroom and reserves), and graph features (counterparty network). Feature importance is tracked continuously; the model's top-6 features account for approximately 80% of predictive weight.

MODEL Feature importance (top 6)
"inflow_velocity_24m": 0.31
"payment_cadence_score": 0.26
"rent_consistency_score": 0.21
"overdraft_frequency": 0.15
"counterparty_diversity": 0.11
"income_gap_frequency": 0.07
... 41 additional features
XGBoost · 47 features · tree-based · auditable

Features included in the model

  • Cash-flow velocity over 6, 12, and 24 months
  • Recurring payment timing consistency
  • Income gap frequency and duration
  • Overdraft frequency and recency
  • Average monthly balance and balance headroom
  • Counterparty diversity and network centrality
  • Employer transaction stability indicators
  • Merchant category entropy (spending breadth)

Features excluded from the model

  • Zip code, census tract, or neighborhood identifiers
  • Merchant geolocation or transaction geography
  • Account holder name or any name-derived features
  • Any variable that could proxy for race, national origin, gender, or religion
  • Employer name or industry-level employment classification
  • Foreign currency transaction indicators
  • Remittance transfer patterns (potential national-origin proxy)

Feature exclusion decisions are reviewed with outside fair lending counsel. Documentation available to qualified integration partners under NDA.

What our validation shows

Gini Coefficient 0.61 – 0.66

On holdout thin-file populations. Comparable to bureau model performance on bureau-scored populations, but applied to the segment bureau models cannot score.

AUC-ROC 0.78 – 0.82

Across validation datasets spanning 18 months. AUC improves as lookback window increases — 24-month data outperforms 12-month by approximately 9 percentage points.

90-Day Default Lift 1.8x – 2.2x

On populations previously declined by bureau-only models. The borrowers Lendiro approves that bureau models would decline show 90-day default rates 1.8 to 2.2 times lower than random acceptance from the same cohort.

Performance figures reported from internal validation. Lendiro does not claim these figures represent guaranteed outcomes for any specific lender or portfolio. Actual performance will vary based on population characteristics, loan product structure, and economic conditions.

Our validation methodology: holdout sets drawn from historical thin-file applicant populations with known 90-day repayment outcomes. No data from integration partners was used in model training — training data is Lendiro-sourced only. Integration partners receive performance reporting on their own portfolios after 90 days of production data.

VALIDATION SETUP

Holdout size: 40,000 thin-file applicants. Lookback: 24 months of transaction data per applicant. Outcome label: 90-day default binary. Train/validation split: 80/20 stratified. Validation period: 18 months of out-of-time data.

See the model in action on your data.

Integration partners receive a pilot report showing model performance against their own thin-file decline pool. 30-day pilot available on request.

Request a Pilot