Formula 1 is one of the most data-rich sports in the world. Every weekend, teams process gigabytes of telemetry—tire temperatures, throttle application, braking zones, and micro-sector times. But getting access to this data in real-time as a developer can be challenging.
To solve this, I built the F1 Live Timing Streamer, an Apify Actor designed to intercept the official F1 SignalR data feed, parse it, and stream it out in structurally useful formats for automated analysis and machine learning.
The Challenge: SignalR and Real-Time Architectures
Unlike standard REST APIs where you make a request and get a static JSON response, live timing relies on WebSockets—specifically, Microsoft's SignalR protocol. The stream emits dozens of event types simultaneously: session status updates, weather changes, track condition flags, and most importantly, driver telemetry.
How the F1 Streamer Works
The core of the project relies on establishing a secure, persistent connection to the F1 feed using Python. Once connected, the script listens for specific event broadcast channels.
- Event Authentication: Handshaking with the SignalR hub to request the specific race weekend data.
- Message Decoding: The raw feed sends highly compressed proprietary data blobs. The script decodes and normalizes this into standard JSON.
- State Management: Telemetry isn't sent as a full snapshot every millisecond; it is sent as "deltas" (only what changed). The Actor maintains a state cache to assemble a full picture of the track before writing to the dataset.
Output Formats & Data Transformation
A raw dump of JSON isn't always useful. Different users have different needs. To make the Actor versatile, I implemented multiple output structures:
1. The "Raw" Format
Useful for developers who want to rebuild the F1 timing app exactly as it appears. It passes through the normalized JSON feed identically to how the official timing engineers parse it.
2. The "SQL-Ready" Flat Format
Nested JSON is notoriously difficult to query in standard SQL without complex UNNEST
expressions.
This output format flattens the hierarchy—ensuring every metric (like
Driver_44_Sector_1_Time) is a distinct, easily queryable column. This is perfect for piping
directly into PostgreSQL or BigQuery.
3. Derived Micro-Metrics (The "ML-Ready" Format)
This is where the magic happens. Rather than just reporting the current tire age, the Actor calculates derived statistics in real-time:
- Pit Time Breakdown: Splitting total pit lane time into actual stationary time vs. transit time.
- Tire Cliff Projection: Monitoring lap time degradation curves and comparing them against historical compound fall-off rates to predict when a driver's pace will hit the "cliff".
- Dynamic Sector Deltas: Real-time comparisons of a driver's current micro-sector against their own personal best, rather than just the overall session best.
Real-Time Webhooks
Because this is an Apify Actor, I integrated it directly with Apify Webhooks. You don't have to poll the dataset to see if a driver crashed. The Actor can send a direct webhook payload to your custom server or Discord bot the second a Yellow Flag is waved or a Safety Car is deployed.
Conclusion
Building data pipelines isn't just about grabbing information; it's about structuring it so it becomes actionable. Whether it's supply chain logistics or race car telemetry, the fundamental principles remain the same: connect reliably, clean thoroughly, and format specifically for your end user.
You can try out the F1 Live Timing Streamer on the Apify Store.