We rebuilt the signal scoring pipeline from scratch, fixing look-ahead contamination and adding a top-decile filter that produced 72.2% win rate on selected signals.
72.2%
Win rate (top-decile signals)
55 to 62%
Model accuracy (post-fix)
32K/sec
Hypothesis tester speed
0.3ms p95
LightGBM prescreener latency
CHAPTER 01
The Apex trading system generated more than 50,000 signals per day across 1,200 crypto pairs and US equities. Nearly all of them bled capital. The initial pipeline funneled signals through a LightGBM model trained with train_test_split(random_state=42) across 10,486 historical trades, producing a 48% accuracy score. That is worse than a coin flip on a binary outcome. Profiling the failure revealed three compounding flaws: 12 of the 18 training features were hardcoded constants with zero variance, the column entry_signal_strength was NULL for 100% of training rows, and random shuffling introduced future leakage by placing 2026 rows in the training set alongside 2024 rows.
The Rust feature engine computed 55 technical indicators per bar using SIMD and wrote to Redis and Postgres every tick. The ML training pipeline ignored it entirely and instead computed proxy features from a JSONL chain that turned out to be 99% duplicates. 2.08 million rows of near-identical records produced a model that had learned nothing about market structure.
CHAPTER 02
A structured comparison across four candidate architectures was run on the same 10,486 clean trade dataset. The critical finding emerged from a percentile analysis: the top 10% of signals by composite score produced a 72.2% win rate and +0.80% mean PnL per trade. Everything below that threshold produced cumulative losses. The alpha was not in the scoring model. It was in the selectivity filter applied on top of it.
Approach B (Few Strong ensemble combining 7B and 14B parameter models) won because the 161 LoRA adapters in Approach A had a fundamental misconfiguration: prompt.json specified llama3.2:1b while adapter_config.json pointed to a different model. None of the adapters loaded in production despite 1.9 GB of adapter storage.
The production architecture implemented two changes simultaneously. First, the Rust feature engine was wired directly into the LightGBM training pipeline, replacing the 12 zero-variance proxy features with 55 real technical indicators. The training split was changed from random shuffle to a strict chronological boundary. Second, a top-decile filter passed only the top 10% of signals per cycle to execution.
ARCHITECTURE OVERVIEW
INGEST
LightGBM
FEATURES
Rust 1.84 (SIMD feature engine)
TRAIN
Python 3.12
v1 / v2 / v3
SERVE
Redis 7
Production predictions feed back into training set. Continuous retraining cadence
CHAPTER 03
The inference stack was divided into three latency tiers. Fast Brain used qwen2.5-coder:1.5b via local Ollama at 12-second inference for per-cycle signal triage. Deep Brain used qwen2.5-coder:7b running every 4 hours for pattern review. Research Brain used qwen2.5-coder:14b running daily for strategy-level analysis. The LightGBM prescreener ran in the hot path at sub-millisecond latency, ahead of any model inference.
The LGBM model trained on the 55 Rust features with chronological split achieved 55% to 62% accuracy on the held-out test window, up from 48%. More importantly, combining this model's output with the top-decile filter produced a compound selection function that reliably identified the profitable tail. The retraining cadence was set to weekly, triggered when the rolling win rate on the most recent 500 closed trades deviated more than 5 percentage points from the model's expected win rate.
TECH STACK
CHAPTER 04
Applying the score floor plus top-decile filter to the 10,486-trade backtest produced a 72.2% win rate on selected signals with +0.80% mean PnL per trade. The unfiltered population produced 27.1% win rate and negative PnL across all four architectures. The filter reduced execution volume by approximately 90%, which was the intended behavior. Selectivity, not model sophistication, was the mechanism of profit.
The hypothesis tester binary processed 738 hypotheses in 23 milliseconds, approximately 32,000 tests per second, which was the benchmarking baseline for the broader signal search infrastructure.
72.2%
Win rate (top-decile signals)
55 to 62%
Model accuracy (post-fix)
32K/sec
Hypothesis tester speed
0.3ms p95
LightGBM prescreener latency
CHAPTER 05
DECISION · 01
The most expensive mistake was treating model accuracy as the primary optimization target. A model at 55% accuracy on a balanced dataset sounds useful. On a live trading dataset where 27% of signals are winners in aggregate, a 55%-accurate model that scores the profitable signals higher than the unprofitable ones only helps if the selectivity filter is aggressive enough to exploit the score ordering.
DECISION · 02
The second significant failure was the training data pipeline. Using random splits on time-series data produces optimistic accuracy estimates because models can memorize future context. After fixing the split, the model's true accuracy dropped, but the top-decile selection remained valid because the model still preserved relative rank ordering.
DECISION · 03
The local AI hypothesis failed clearly. 1B to 3B parameter models lacked the reasoning depth for market analysis. The production decision was to move all trading-path inference to quantitative signals from the Rust engine plus LightGBM, with cloud LLMs reserved for strategic reasoning tasks that run offline.
START A PROJECT
We build fast. Most projects ship in under two weeks. Start with a free 30-minute discovery call.
Start a ProjectWe found a 50-percentage-point win rate spread between market regimes, fixed a regime classifier that was routing by symbol name instead of market structure, and built a live suppression system for anti-patterns.
62.1% Win rate in choppy regime
Read case study →
AI / Machine LearningWe built a Rust correlation engine processing 1,200 symbols with incremental sliding window updates at 340ms p95 per cycle, 14x faster than full recompute.
1,200 Symbols in correlation matrix
Read case study →
AI / Machine LearningWe built a five-layer parallel context engine that synthesizes macro, sector, correlation, historical, and catalyst data into a 2-sentence market narrative within 1.5 seconds of signal emission.
1-1.5 sec Synthesis latency (p95)
Read case study →