Most cash-flow underwriting models treat transaction data as a flat list. Monthly inflows. Monthly outflows. Net balance. Recurring payment rate. Aggregate statistics computed over a lookback window. These features are meaningful — they explain a good portion of the variance in repayment behavior — but they discard something that the raw data contains: the structure of who the borrower transacts with, how those relationships are distributed, and what that distribution implies about financial stability and social context.
Transaction graph features are the representation of that structure. The idea is to model a bank account's transaction history not as a time series of amounts, but as a network: the account holder is a node, each payee and payer is a node, and each transaction is a directed edge weighted by frequency and amount. From that network, you can compute centrality measures, diversity metrics, and temporal evolution signals that flat aggregates can't capture.
This article describes how transaction graph features are constructed, which graph properties are predictive for credit risk, and what the engineering requirements look like at inference time.
Why Graph Structure Carries Credit-Relevant Information
The premise needs some justification, because the leap from "network structure of transactions" to "credit risk signal" isn't obvious.
Consider two borrowers with identical monthly net cash flows: $3,200 inflows, $2,900 outflows, $300 average net monthly balance. Borrower A receives income from a single direct deposit source, pays rent to one landlord, buys groceries at 2–3 merchants, pays utilities, and otherwise has minimal transaction activity — 18–22 unique counterparties over 12 months. Borrower B has the same net cash flow but 140 unique counterparties in the same period: multiple payment apps, frequent peer transfers in both directions, 8–10 different income sources, transactions that spike in size and frequency around certain dates and then go quiet.
Flat aggregate features see the same borrower. Graph features see a stable, low-variance financial network versus a high-velocity, high-uncertainty network. In our validation work, transaction network concentration — measured as the Herfindahl-Hirschman Index applied to transaction counterparty volume — carries predictive signal for repayment performance that is distinct from (and additive to) cash-flow aggregate features. High concentration (few counterparties receiving most of the transaction volume) correlates with stability. High dispersion with high peer-to-peer transfer volume correlates with elevated default risk in thin-file populations.
We're not saying this is a universal law — the relationship is probabilistic and segment-dependent. For gig workers who legitimately have multiple income sources, counterparty diversity is expected and not a risk indicator. Feature engineering needs to distinguish between income-side diversity (which is neutral or positive) and expenditure-side and peer-transfer diversity (which carries the risk signal).
The Core Graph Features
Counterparty Concentration (Expenditure Side)
Compute the Herfindahl-Hirschman Index on outflow transaction volume by payee fingerprint over the 12-month lookback. HHI ranges from near-0 (maximum dispersion) to 1.0 (all outflows to a single payee). A borrower with 70% of outflows going to 3–4 stable counterparties (rent, utilities, groceries) has a high HHI. A borrower with outflows spread across 80 payees in roughly equal distribution has a low HHI.
For most lending products, higher outflow HHI is a positive signal — it indicates predictable, stable expenditure patterns. The exception: a borrower whose single-largest outflow counterparty is a peer payment app, not a service provider, may be routing money to cover informal obligations that aren't visible as direct expenditures.
Peer Transfer Ratio
What fraction of total transaction volume (by count and by amount) is peer-to-peer transfers — Zelle, Cash App, Venmo, PayPal — versus service or merchant transactions? Elevated peer transfer ratios, particularly when those transfers show irregular amounts and irregular frequency, are associated with informal financial obligations and liquidity management behaviors that carry elevated credit risk.
This feature requires careful calibration. Peer transfers also include legitimate use cases — splitting restaurant bills, paying back friends, informal childcare payments — that carry no credit risk signal. The risk-relevant variant of this feature is high-frequency, high-dollar peer transfers with irregular timing, especially when they coincide with low-balance periods (suggesting the transfers are reactive to cash pressure rather than routine).
Income Source Entropy
Shannon entropy computed over inflow transaction volume by source, over the 12-month lookback. High entropy = many income sources with roughly equal contribution. Low entropy = dominated by one or two income sources. For W-2 employees, income entropy is low (one payroll source). For gig workers, income entropy can be high — but the temporal pattern of inflows (regular, predictable intervals even if from multiple sources) distinguishes stable gig income from erratic multi-source cash flow.
Income source entropy is most useful when combined with income regularity: a borrower with high entropy but high inflow regularity (multiple sources, each depositing predictably) is fundamentally different from high entropy with low regularity.
Network Temporal Stability
How consistent is the transaction network structure over time? Compute the cosine similarity between the counterparty distribution in the first half of the lookback window and the second half. High similarity means the borrower has been transacting with the same counterparties in roughly the same proportions — a signal of stable financial life. Low similarity means the network has reorganized substantially, which can indicate: relocation (new landlord, new utility providers), major life change, or financial instability that prompted change in spending patterns.
Network temporal stability combined with overall decline in HHI (becoming more dispersed over time) is a particularly strong signal of deteriorating financial stability — the financial network is fragmenting in a way that often precedes delinquency by 2–4 months in our validation data.
Engineering Requirements
Graph features are computationally more expensive than simple aggregates, but the cost is manageable if the computation happens at feature-store refresh time rather than on the decision call path.
The data model requires building a counterparty register for each account — a deduplicated, normalized index of all payees and payers, with transaction volume and frequency tracked over the lookback window. The normalization step is non-trivial: the same landlord might appear as "PROPERTY MGMT LLC," "ZELLE LANDLORD DAVID," and "ACH 041000124 RENT" across 12 months if payment method changes. Fuzzy merchant name matching combined with amount-band and recurrence pattern matching is necessary for the register to be accurate enough to compute meaningful concentration metrics.
Once the counterparty register is maintained, the graph features themselves are computationally cheap — HHI computation on a 30-node graph takes microseconds. The engineering cost is almost entirely in the normalization and deduplication layer.
Storage requirements: a fully materialized 24-month transaction graph with per-counterparty aggregates for a single account runs to roughly 2–8KB depending on transaction volume. At scale (100K+ active accounts), this is manageable in a standard key-value store with appropriate indexing. Retrieval latency for a pre-materialized feature set is under 20ms for the P95 case.
Model Integration and Feature Importance
In our gradient boosting model, transaction graph features don't dominate feature importance — cash-flow velocity and recurring payment consistency features typically rank higher. But graph features add consistent lift in Gini coefficient on the thin-file segment specifically. The mechanism is that graph features capture a dimension of financial behavior — the structure of economic relationships — that time-series aggregates miss, and for thin-file borrowers where every signal dimension matters, that additional orthogonal dimension of predictive information moves the needle.
The practical effect in our validation: adding transaction graph features to a cash-flow aggregate baseline increases Gini on thin-file populations by 3–7 percentage points (depending on segment and lookback window). For a risk model that may already have a Gini in the 40–50% range on thin-file populations — substantially below the 55–65% Gini that bureau models achieve on prime populations — a 5-point improvement is meaningful.
We're not claiming transaction graph features are a complete solution for thin-file underwriting. They're one component of a feature architecture that also includes cash-flow velocity, income stability, and recurring payment consistency. The value of graph features is specifically in the structural information they capture that aggregate features don't — and that value is most pronounced in the thin-file segment where every additional predictive dimension is worth incorporating.
Interpretability and Adverse Action
Graph features present a particular adverse action challenge: it's difficult to explain "your transaction network has a low Herfindahl-Hirschman Index" in consumer-facing language. The mapping from technical feature to adverse action reason code requires extra care.
For graph-derived decline contributions, the adverse action reason statement needs to translate the underlying signal without requiring the applicant to understand network theory. "Your bank account shows irregular and varied payment activity that was not consistent with our underwriting standards" captures the graph dispersion signal in interpretable terms. "Frequent transfers to payment apps were not consistent with the financial stability our underwriting requires" captures the peer transfer ratio signal.
These translations are approximate — they're reason-code proxies that communicate the decision rationale in actionable terms, not precise technical descriptions of the feature. That level of approximation is inherent to adverse action in any complex model, and it's explicitly acknowledged in the CFPB's guidance on algorithmic adverse action explanation. The requirement is that the reason statement be accurate (the feature described did contribute to the decline) and actionable (the applicant understands what behavior change might lead to a different outcome). Graph-feature reason codes can meet that bar with deliberate mapping work.