We upgraded from a static 6-factor lead score to a three-tier behavioral composite integrating email engagement, AUM, and headcount, projecting 5 to 10% conversion uplift.
3,889
Tier A leads (V1)
50/30/20
V2 weights (Static / Behavioral / Firmographic)
5-10%
Projected conversion uplift (MVV)
94%
Classification accuracy (200-reply test set)
CHAPTER 01
The initial lead scoring model produced 3,889 Tier A leads from a database of financial industry contacts sourced from 13-F filings, FINRA records, LinkedIn exports, and direct prospecting. The model was a 6-factor weighted composite: firm type (30%), title seniority (25%), firm size proxy (15%), email quality (15%), data source (10%), and ICP segment match (5%). Every factor was computed from static enrichment data available at lead collection time.
The static model had a fundamental limitation that became clear as outreach began. Two leads with identical composite scores of 85 could behave completely differently in an email campaign. A CIO at a $5B RIA who opened the email, clicked through to the pricing page, and spent 4 minutes on the site had demonstrated far more purchase intent than a Director at a $2B fund who never opened. The static model assigned both Tier A and treated them identically in campaign sequencing.
The second limitation was firmographic blindness. The firm size proxy used heuristics derived from company name patterns because AUM data was not yet integrated. A firm named Capital Management might manage $50M or $50B; the heuristics produced scores with no discriminating power for the AUM dimension.
CHAPTER 02
V2 replaced the single-dimension composite with a three-tier weighted formula: V2 Score = (Static times 0.50) + (Behavioral times 0.30) + (Firmographic times 0.20), clamped to [0, 100]. The Static component was the V1 6-factor score, unchanged. The Behavioral component was a weighted combination of email open rate (40%), email click rate (35%), and website visit depth (25%). The Firmographic component combined AUM score (50%) and estimated headcount score (50%), sourced from SEC IAPD and company enrichment data.
The Behavioral score defaulted to 50 (neutral) for leads with no engagement data. The Firmographic score defaulted to 55 for leads whose companies could not be matched in IAPD data. This design reflected a deliberate asymmetry: the model should not penalize data absence, because the absence was a function of pipeline completeness, not lead quality.
The AUM scoring curve was designed with knowledge of the buyer profile. Firms managing $100M to $1B scored 70 points. Firms managing $10B+ scored 85 points. Firms below $100M scored 40 points. The headcount scoring used a non-monotonic curve: firms with 50 to 250 employees scored highest at 75 points because they had buying agility and dedicated investment teams.
ARCHITECTURE OVERVIEW
INGEST
Python 3.12 (XGBoost, logistic regression)
FEATURES
ClickHouse 26.3
TRAIN
PostgreSQL (engagement_metrics)
v1 / v2 / v3
SERVE
PostHog (website visits)
Production predictions feed back into training set. Continuous retraining cadence
CHAPTER 03
The engagement data pipeline consumed webhooks from the Instantly email platform. The webhook contract handled three event types: email_opened, email_clicked, and email_bounced. Each event carried the lead's email address, campaign ID, timestamp, and URL clicked for click events. The receiver performed HMAC signature verification before processing any event. Verified events triggered upserts to the ClickHouse engagement_metrics table.
The SEC IAPD firmographic enrichment required fuzzy company name matching because lead records used self-reported company names while IAPD records used legal entity names. The matching pipeline normalized both sides and then ran Levenshtein distance matching with a threshold of 0.15 normalized edit distance. The target enrichment rate was 70% of leads matched to an IAPD record.
The machine learning training path was designed for the post-launch phase. Phase 1 used logistic regression with engagement quality as a proxy label. Phase 2 introduced conversion labels once 10 or more paid conversions were recorded. XGBoost was chosen for Phase 2 training given the tabular feature structure.
TECH STACK
CHAPTER 04
The V1 model produced 3,889 Tier A leads, approximately 24% of the total lead database. The V2 model, once engagement data was flowing, was designed to identify which Tier A leads were demonstrating active interest. The weighted formula was validated against a representative example: a CIO at a $5B RIA with a V1 static score of 87 and behavioral score of 68 scored V2 composite 80.3, maintaining Tier A status. A Director at a smaller firm with no engagement and unknown AUM would score 66.5, correctly dropping to Tier B.
3,889
Tier A leads (V1)
50/30/20
V2 weights (Static / Behavioral / Firmographic)
5-10%
Projected conversion uplift (MVV)
94%
Classification accuracy (200-reply test set)
CHAPTER 05
DECISION · 01
The most significant design decision was the default scoring for missing data. The initial proposal penalized leads with no behavioral data by assigning a Behavioral score of 0, effectively demoting cold leads before they had any opportunity to demonstrate interest. Setting the default to 50 (neutral) ensured the V2 score change on first contact was driven by actual engagement behavior.
DECISION · 02
The headcount non-monotonic curve took several iterations. The first version used a monotonically increasing score where larger firm = higher score. This overvalued enterprise accounts that had long procurement cycles incompatible with the direct sales motion. The revised curve added a penalty for very large firms.
DECISION · 03
The decision to build engagement capture on top of Instantly webhooks rather than polling the Instantly API was driven by latency requirements. A lead who clicks through to the pricing page at 2:00 PM should have their behavioral score updated before the next sales outreach step at 2:30 PM.
START A PROJECT
We build fast. Most projects ship in under two weeks. Start with a free 30-minute discovery call.
Start a ProjectWe rebuilt the signal scoring pipeline from scratch, fixing look-ahead contamination and adding a top-decile filter that produced 72.2% win rate on selected signals.
72.2% Win rate (top-decile signals)
Read case study →
AI / Machine LearningWe found a 50-percentage-point win rate spread between market regimes, fixed a regime classifier that was routing by symbol name instead of market structure, and built a live suppression system for anti-patterns.
62.1% Win rate in choppy regime
Read case study →
AI / Machine LearningWe built a Rust correlation engine processing 1,200 symbols with incremental sliding window updates at 340ms p95 per cycle, 14x faster than full recompute.
1,200 Symbols in correlation matrix
Read case study →