Infrastructure | xLydian

Cloud-Native Deployment

All services run on a GCP Kubernetes cluster with dedicated node pools for live execution and strategy runners. Kubernetes health probes (startup, liveness, readiness) guarantee automatic pod recovery - if a process fails, it restarts with full state restoration within seconds. Rolling deployments with zero downtime ensure uninterrupted operation during updates.

Redundant Data Pipelines

Market data flows through WebSocket connections with automatic reconnection (exponential backoff, jitter, resubscription) and stale-data detection. Redis Streams serve as the central message bus with consumer-group tracking and at-least-once delivery semantics, ensuring no signal is lost. QuestDB provides high-throughput time-series storage with connection pooling and retry logic. The pipeline scales horizontally - add pods to increase throughput across any data bus.

Redundant IP Addresses

Multiple IP address ranges are mapped via custom iptables masquerading rules at the cluster level. If one IP hits an exchange rate limit, traffic routes through alternate source addresses automatically, maintaining uninterrupted connectivity.

Internal Rate Limiting

A token-bucket rate limiter with per-exchange, per-operation granularity governs all outbound requests. Instrument-level throttlers control data update frequencies. Combined with exchange-specific retry strategies and configurable backoff, we stay well within exchange limits while maximizing throughput.

Execution

Smart execution logic prioritizes maker orders to minimize slippage and reduce trading costs, with support for limit-following, volume splitting, and gradual entry strategies when conditions require it.

Monitoring & Observability

Prometheus metrics, Grafana dashboards, and structured cloud logging provide real-time visibility into execution latency, data pipeline health, position state, and system performance. Slack and email alerting ensure the team is notified immediately of any anomalies across the cluster.