technical

Why Credit Decision Latency Matters at Point of Sale

Credit decision API latency infrastructure

There's a number that most credit teams don't track but should: decision latency P99. The 99th percentile of how long your underwriting call takes. Not the median. The tail.

At BNPL checkout, the difference between a 400ms decision and a 4,000ms decision is measurable in conversion rate. Not because customers consciously notice — most won't — but because mobile checkout flows are built with timeout assumptions, loading spinners, and back-button behavior that all optimize around the expectation of near-instant responses. When your credit API is the latency bottleneck, you're adding friction to the moment your customer has already decided to buy.

This article gets into what sub-second credit decisioning actually requires technically — not as a marketing claim, but as an engineering problem with specific constraint surfaces.

Where the Latency Actually Lives

When a credit decision API call takes 3 seconds, the 3 seconds are almost never in the model inference itself. Modern gradient boosting inference on engineered tabular features takes 5–15ms. The latency is in everything that has to happen before and after the model runs.

The primary contributors, roughly in order of typical magnitude:

Bureau pull: 800ms–2,500ms. If your underwriting logic requires a real-time bureau pull on every decision call, you've already lost the sub-second game in most scenarios. Bureau API response times vary by bureau, vary by traffic load, and have non-trivial tail latency. A P99 bureau pull might cost you 2–3 seconds even when median is under 1 second.

Bank data enrichment: 300ms–1,200ms if done synchronously via open banking APIs. This is the latency cost of fetching real-time transaction data at decision time rather than using pre-fetched and pre-featurized data. Real-time open banking calls have their own tail latency characteristics, particularly for aggregators that are themselves calling bank APIs.

Feature computation: 20ms–100ms depending on lookback window and feature set complexity. Computing 24-month cash-flow velocity features from raw transaction data is not trivial — there's sorting, normalization, rolling window calculations, and categorical encoding that all add up. Running this on pre-stored transaction data with good indexing is much faster than computing it from a raw API dump.

Network round-trips: 20ms–80ms per hop, and typical decision pipelines have 3–5 internal service calls. Synchronous chaining of microservices adds up quickly.

The Architecture Choices That Determine Whether Sub-Second Is Achievable

Getting to reliable sub-second P95 latency (not just P50) on a cash-flow underwriting call requires making specific architectural choices at design time. Retrofitting them onto an existing pipeline is painful. The choices that matter most:

Pre-featurization vs. Real-Time Feature Computation

The single most impactful architectural decision. If you wait until a credit decision call arrives to pull and compute bank transaction features, you're in a latency hole before you start. The alternative: maintain a feature store with pre-computed cash-flow features for known account holders, updated on a rolling basis (every 24–48 hours is sufficient for most features). Decision time then becomes a feature lookup, not a compute task.

This matters especially for neobanks doing credit decisions on their own deposit customers — you already have the transaction data in your ledger. The feature computation can happen as a batch process at low-traffic hours, not as a synchronous step on the decision path.

Bureau Pull Strategy

For thin-file populations, the bureau pull often returns limited data at high latency cost. Options worth evaluating:

  • Cached bureau data: pull on first application, cache for 30–45 days, serve from cache on subsequent applications or re-applications. Requires careful cache invalidation logic but eliminates the bureau RTT on most calls.
  • Bureau pull deferral: make the initial cash-flow decision without bureau, pull bureau only if the cash-flow model returns REVIEW (neither clear approve nor clear decline). This works when your cash-flow model has sufficient coverage — if 60% of your volume is clear-approve or clear-decline by cash-flow signal alone, you've eliminated the bureau RTT on 60% of calls.
  • Eliminating bureau entirely for specific product tiers: for small-dollar products below certain thresholds, bureau may add latency and cost without adding meaningful decision quality for thin-file populations. This requires explicit policy justification and ECOA-compliant adverse action process if you use bureau for any credit decisions.

Synchronous vs. Asynchronous Decisioning

Not all credit products require synchronous real-time decisions. BNPL at checkout does. A personal loan application that takes 5 minutes to complete probably doesn't need the decision in under 500ms — the UX expectation is different. Matching your architecture to the actual UX requirement rather than defaulting to synchronous everywhere is worth doing explicitly.

Where synchronous sub-second decisions are genuinely required (BNPL, embedded credit at checkout, credit line increases triggered by real-time deposit events), the architecture cost of achieving that target is substantial. It requires the feature store, caching strategy, and network topology choices described above. Where they're not actually required, investing in synchronous latency optimization is solving the wrong problem.

Testing Latency: The Metrics That Matter

If your team is evaluating a credit decisioning API and vendor claims sub-second response times, here's the test matrix that surfaces real performance:

P50 / P95 / P99 breakdown. A vendor who only quotes P50 latency is hiding tail performance. P99 is the relevant number for conversion impact — it tells you what happens to the 1-in-100 customer who gets the slow path.

Latency under load. What does P99 look like at 5x your typical decision volume? Underwriting APIs that perform well at 100 decisions/minute may degrade significantly at 500 decisions/minute if they're not built for horizontal scaling. Run load tests before committing to a production integration.

Cold-start vs. warm performance. Some architectures have significant cold-start latency (spinning up compute resources) that hits the first few calls after a quiet period. This is a BNPL-relevant concern: traffic patterns in checkout flows are often bursty, with heavy volume during evening hours and near-zero during late night. A cold-start event right when Friday evening checkout volume spikes is a real performance risk.

Geographic latency. Where are your users, and where are the decision API's compute resources? A Dallas-based lender with a decision API hosted in us-east-1 has different RTT characteristics than a California-based lender calling the same endpoint. For truly latency-sensitive applications, regional compute placement matters.

What Lendiro's Architecture Does Differently

We made the pre-featurization choice from day one. Cash-flow features for accounts connected via open banking are computed on a 24-hour rolling update cycle, stored in a low-latency feature store, and served as lookups at decision time rather than computed inline.

The result is that our median decision call — from API request received to JSON response returned — is under 900ms, with P95 under 1.4 seconds. Those numbers include network time from typical US endpoints. For customers who pre-stage applicant data before initiating the decision call, median drops to under 500ms.

We're not claiming those numbers are absolute — real-world performance depends on integration architecture, network topology, and data freshness posture. But the point is: sub-second credit decisioning is achievable at scale if the architecture is designed for it from the start. It's not achievable by optimizing a synchronous feature-compute pipeline at the end.

The Business Impact of Getting Latency Right

Conversion rate impact from decision latency is difficult to isolate in production environments — too many confounding variables. But the directional evidence is consistent: checkout abandonment rates increase measurably when page loads or API calls exceed 2–3 seconds, and credit decision latency that sits on the critical path to checkout completion follows the same dynamics.

We're not saying that every 100ms of latency reduction directly translates to a specific conversion uplift. The relationship is more complex and product-specific. What we are saying is that for BNPL and embedded lending at checkout, decision latency is a product quality metric — not just an infrastructure metric — and treating it as such means instrumenting, measuring, and improving it the same way you'd treat any other customer-facing performance indicator.

The lenders with the strongest checkout conversion metrics know this. They instrument decision latency separately from page load time. They set SLA thresholds. They have runbooks for when P99 spikes. The ones who don't have those practices are usually the ones who discover the latency problem after they see unexplained checkout abandonment rates that no one can explain.