Payments infrastructure · 6 weeks · Q4 2025

Cut p99 settlement latency from 480ms to 145ms in six weeks

A regional payments processor was missing SLA windows during European batch settlement runs. JVM tuning + observability overhaul brought p99 latency under the contractual threshold without code changes to the business logic.

70%
reduction in p99 settlement latency

A US-based regional payments processor handling $14B in annual transaction volume contacted us in September 2025 with a contractual SLA problem. Their European settlement window — running daily between 04:00 and 06:00 UTC — was missing the p99 200ms threshold roughly 40% of the time. Each miss triggered a penalty clause and a manual reconciliation process the operations team was tired of running.

The codebase was a 580,000-line Spring Boot 2.7 application running on OpenJDK 17 with G1GC. The team had inherited the JVM flag configuration from a 2017 deployment guide and had not revisited it. Initial JFR profiling under production load showed three findings: GC pauses were consuming 35-40% of the latency budget on the worst calls, allocation rate was 4.2 GB/s during peak windows (much higher than the working set warranted), and a single Hashtable lookup in the legacy fraud-detection path was synchronizing across all settlement threads.

We did not change any business logic. Week one was JFR baseline profiling and async-profiler allocation analysis. Week two replaced the synchronized Hashtable with ConcurrentHashMap and added AppCDS to the build. Week three switched G1GC to ZGC with appropriate flag tuning (-XX:+UseZGC -Xmx16g -XX:SoftMaxHeapSize=12g). Week four ran shadow traffic against both configurations for 7 days. Weeks five and six were observability dashboard rebuild and runbook documentation.

The cutover went live on November 18. Over the following month, p99 settlement latency averaged 145ms with no settlement window missing the 200ms threshold. The operations team has not run a manual reconciliation since. We left the engagement with a written runbook and a quarterly check-in cadence.

480ms → 145ms
p99 latency
0
business logic changes
42 days
from kickoff to cutover
6 engineers
theirs · 2 engineers ours
Start the conversation →