Financial ServicesAI and Data Services22 weeks

AI-Powered Fraud Detection for Mid-Market Commercial Bank

Moving From Rules-Based Alerts to Behavioral Analytics That Catch What Static Thresholds Miss

Fraud Detection Accuracy

False Positive Reduction

Annual Fraud Prevented

0.0M

Avg. Scoring Latency

0ms

The Client

Mid-Market Commercial Bank

This mid-market commercial bank serves approximately 1.6 million retail and business customers through a network of 280 branches. The bank processes roughly 9 million transactions daily across debit, credit, ACH, wire, and mobile payment channels. Their fraud operations team of 34 analysts had been fighting an increasingly losing battle against sophisticated fraud schemes using a rules-based detection system implemented in 2016.

The Challenge

The Problem

The legacy fraud detection system operated on approximately 1,200 static rules. If a transaction exceeded a dollar threshold, triggered a geographic anomaly, or matched a known fraud pattern, it generated an alert. The problem was twofold.

First, the rules were blunt instruments. A customer who travels frequently for business would trigger geographic alerts constantly. A fraudster who understood the threshold structure could structure transactions just below detection limits. Second, at peak, the system generated 14,000 alerts per day. With 34 analysts working across two shifts, each analyst was expected to review roughly 200 alerts per shift. Analysts developed shortcuts — bulk-dismissing certain alert categories — that inevitably allowed real fraud through.

The bank needed a system that could learn transaction patterns at the individual customer level, score transactions in real time (before authorization), and be explainable for regulatory audits.

Our Approach

4 Phases. 22 weeks.

Built an ensemble AI model combining XGBoost, LSTM, and Graph Neural Networks with explainable AI (SHAP) for regulatory compliance. Deployed on real-time streaming infrastructure scoring transactions in 12ms — before authorization.

Data Assessment & Feature Engineering

4 weeks

Ingested 26 months of historical transaction records (5.8 billion rows), customer profiles, device fingerprints, and case management records. Constructed over 340 behavioral features across six categories: velocity, amount deviation, geographic, temporal, merchant, and channel features.

The most predictive fraud signals were not individual transaction attributes but changes in behavioral sequences — a pattern that rules-based systems fundamentally cannot detect.

Model Development & Validation

6 weeks

Developed an ensemble model: XGBoost for structured feature scoring, LSTM for transaction sequence patterns, and a Graph Neural Network for relationship topology analysis. Implemented SHAP explainability values for every flagged transaction.

Every flagged transaction comes with a human-readable explanation: 'This transaction was flagged because the purchase amount is 4.2x the customer average, the device is new, and 3 similar transactions occurred within 12 minutes.'

Real-Time Scoring Infrastructure

6 weeks

Built a Kafka-based streaming architecture with Flink processing and Redis feature store. The ensemble model runs on GPU-accelerated NVIDIA Triton instances, scoring transactions end-to-end in 12ms average — well within the 150ms authorization window.

Training-serving skew was eliminated by codifying feature definitions in a shared repository consumed by both batch training (Spark) and real-time serving (Flink) pipelines.

Deployment, Monitoring & Model Ops

6 weeks

Six weeks of shadow mode comparing AI decisions against the legacy rules engine on real production traffic. Built model monitoring dashboards tracking precision, recall, score distribution drift, and feature drift with automated alerts.

The system passed its first OCC regulatory examination. The examiner specifically noted the quality of model explainability documentation.

The Results

Performance That Speaks

Metric

Before

After

Change

Fraud Detection Rate (True Positive)

62%

91%

+47%

False Positive Rate

4.1%

1.35%

-67%

Daily Alert Volume

14,000

3,800

-73%

Analyst Investigation Coverage

40% of alerts

97% of alerts

+143%

Avg. Alert Investigation Time

8.2 minutes

3.4 minutes

-59%

Fraud Losses (annualized)

$11.6M

$7.4M

-$4.2M

Authorization Latency Impact

N/A (post-auth)

12ms avg. (pre-auth)

New

Model Explainability Audit

N/A

Passed OCC examination

Passed

The $4.2 million reduction in annual fraud losses was the headline number for the board. But the operational transformation was equally significant — analysts went from investigating 40% of alerts to covering 97%, improving job satisfaction and reducing team turnover by 30%.

Technology

The Stack

StreamingApache Kafka + Kafka Streams

Real-Time ProcessingApache Flink

Feature StoreRedis (online) + Delta Lake (offline)

ML ModelsXGBoost, LSTM (PyTorch), Graph Neural Network

Model ServingNVIDIA Triton Inference Server on GPU

ExplainabilitySHAP + custom attention visualization

InfrastructureAWS (EKS, MSK, ElastiCache, SageMaker)

MonitoringGrafana + custom ML performance dashboards

MLOpsMLflow for experiment tracking, Airflow for orchestration

Data WarehouseSnowflake (historical analysis + regulatory reporting)

Reflections

What This Project Taught Us

The model development was perhaps 30% of the total effort. The remaining 70% was infrastructure, integration, explainability, monitoring, regulatory documentation, and the organizational change management required to transition a 34-person team from rules-based to AI-augmented workflows.

The feature store architecture proved to be the single most important technical decision. By ensuring training and serving used identical feature computation logic, we eliminated an entire category of production bugs that plague ML systems.

Regulators are not opposed to AI in fraud detection. They are opposed to AI they cannot audit. Explainability is not a nice-to-have — it is a deployment prerequisite. We designed it from day one, not as a retrofit.

Ready?

Ready to transform your digital experience?

Flynaut builds enterprise-grade digital experiences for brands that refuse to compromise.

Schedule a Consultation Start Your Assessment