We added a lock-free AtomicUsize round-robin proxy pool to argus-common, giving all 23 downloader binaries IP rotation without duplication or mutex contention.
180/min
Download throughput (proxy)
40 sym/min
Download throughput (direct)
85%
India backfill (from 40%)
100
Proxy pool size
CHAPTER 01
Argus acquires market data across 10,752 daily symbols spanning US equities, European markets, Japan (TSE), Latin America, Southeast Asia, and India (BSE/NSE). The Hetzner server IP is flagged or rate-limited by several critical free-tier data sources. Specific blocks discovered during initial bulk download attempts: FRED's main endpoint accepted TCP connections but delivered zero bytes; CoinGecko's free tier returned HTTP 429 on every request; Stooq.com dropped the TCP connection outright. The Yahoo Finance crumb authentication endpoint worked intermittently but failed for India symbols, affecting 11,706 BSE/NSE tickers.
The design constraint was operational simplicity: the rotation logic needed to live in argus-common, the shared library crate, so every downloader binary got it without duplication. The 23 compiled downloader binaries each link argus-common. Adding proxy support to the common crate meant all 23 binaries gained it in one change.
CHAPTER 02
The proxy pool lives in argus-common/src/proxy.rs and exposes two public types: ProxyEntry and ProxyPool. ProxyPool fetches 100 proxies from the Webshare.io v2 API on initialization, maintains them in a Vec, and rotates through them using an AtomicUsize index. Proxies are selected via fetch_add on the atomic counter, modulo the pool length. This produces round-robin distribution without a mutex on the selection path, which matters for the 50-concurrent-session downloader configurations.
Each ProxyEntry carries an AtomicU32 failure counter. Five consecutive failures mark a proxy as unhealthy; next_proxy() skips it. The failure threshold of 5 was chosen empirically: a single 429 or timeout should not retire a proxy, but a sequence of five indicates either a dead proxy or a source-specific block on that IP range. Healthy proxies are re-assessed at pool refresh time every 3,600 seconds.
ARCHITECTURE OVERVIEW
INGRESS
Rust 1.84
Tokio 1.40 CLUSTER
pod-1
pod-2
pod-3
pod-4
pod-5
pod-6
STORAGE
reqwest 0.12
OBSERVABILITY
argus-common (internal)
CHAPTER 03
The selection path is lock-free using Ordering::Relaxed on the atomic counter. Relaxed ordering is correct here. The atomic counter does not synchronize with any other memory; its only purpose is to produce a different index on each call. Stale reads in a concurrent scenario could cause two tasks to pick the same proxy index, but that is acceptable: the session isolation happens at the TCP level, and two tasks sharing one proxy momentarily does not violate any invariant.
The real complication in production was the Yahoo Finance crumb flow for India symbols. The crumb endpoint at fc.yahoo.com uses a session cookie that ties to the originating IP. With proxy rotation, each request in a download session could come from a different IP, invalidating the crumb. The fix was to acquire the crumb once per downloader task using the designated proxy for that task, then reuse the same proxy for the entire symbol backfill within that task. This required passing the proxy index through to the download loop rather than selecting fresh from the pool on each request.
TECH STACK
CHAPTER 04
Before proxy rotation: India BSE/NSE backfill stalled at roughly 40% completion with Yahoo 403 errors. After per-task proxy pinning: completions reached approximately 85% with failures concentrated in a subset of India symbols where Yahoo's crumb validation was stricter.
For US equity batch downloads running 50 concurrent sessions: throughput measured at approximately 180 symbols per minute across the pool versus 40 symbols per minute on direct Hetzner IP and approximately 120 symbols per minute with a single proxy. The 1.5x improvement from 120 to 180 reflects the round-robin distribution reducing per-proxy request density. Healthy pool size in a steady-state session: 83 of 100 proxies after one hour of active downloading.
180/min
Download throughput (proxy)
40 sym/min
Download throughput (direct)
85%
India backfill (from 40%)
100
Proxy pool size
CHAPTER 05
DECISION · 01
AtomicUsize with Ordering::Relaxed for the rotation counter was the right call. The alternative, a Mutex, would have introduced contention across 50 concurrent tasks for an operation that takes nanoseconds. The tradeoff of occasional index collision is irrelevant at this scale.
DECISION · 02
The five-failure threshold for health demotion is probably too low for sources that transiently throttle on burst. During the Japan TSE download, the rate limiter occasionally returned 429 for 3 to 5 consecutive requests before recovering. A per-source failure counter, or exponential backoff before marking a proxy unhealthy, would improve utilization for burst-sensitive sources.
DECISION · 03
The hardcoded fallback API key in source is a maintenance liability. It was expedient for the initial build sprint but should be removed in favor of a clear MissingApiKey error that forces proper env var configuration.
START A PROJECT
We build fast. Most projects ship in under two weeks. Start with a free 30-minute discovery call.
Start a ProjectWe discovered 209,033 regime keys with no TTL and fixed them in a single SCAN pass, then cut the regime endpoint latency 13x by eliminating per-request key scans.
209,033 Keys without TTL (found)
Read case study →
InfrastructureWe built a 63-line Node.js proxy that gives Vercel serverless functions read-only access to a private ClickHouse instance with zero database exposure.
12ms Proxy overhead (end-to-end)
Read case study →
InfrastructureWe audited 168 running services consuming 33GB of RAM, culled the dead weight, and reduced the Argus footprint to 25 production services using 12GB.
168 Services before audit (33GB RAM)
Read case study →