How streaming data works, and where Wingfoil fits in.

Until about 15 years ago the standard approach to data processing was batch processing where data is collected, stored, and processed later. You can think of it as being like photography in the pre-digital era where you would spend weeks taking photos (collecting data), but you couldn’t see them immediately. You had to wait until the roll was finished, take it to a darkroom, and wait for the film to develop. By the time you saw the image, the moment it captured was long gone. And this can be a limitation, because in a business context batch processing means you’re looking at data that is out of date, even if you’re processing those batches more regularly than the average photographer.

In recent years, however, we’ve seen that rise of steaming data, i.e data that is continuously generated and delivered in real-time, allowing systems to handle information the instant it arrives. To extend the analogy, we’ve moved from developing film to watching a live video feed. This allows businesses to detect patterns and trigger actions immediately, whether that’s stopping a fraudulent transaction or adjusting a stock price, without waiting for a complete dataset to be collected first.

How streaming data works

To understand how streaming data works, you have to abandon the mental model of the database table, that static grid of rows and columns, and embrace the log – an immutable, append-only sequence of records ordered by time. The log is essentially a list rather than a table, and this structure enables high-speed streaming primarily because it utilises sequential, append-only writes, which are significantly more efficient for hardware to process than random access operations. Sequential data is much faster for retrieving data in bulk or reading the next item in a stream.

In a traditional database, we often store the current state. If a user’s balance changes, we overwrite the old number. We lose the history of how we got there. In a stream, however, we store the event. We record exactly what happened and when. This distinction is fundamental.

These events act as the atoms of the streaming world. Because these events occur constantly, often thousands of times per second, we cannot plug this firehose directly into a fragile application. Instead, we use frameworks that act as a central nervous system, accepting these events and processing them in order. This architecture shifts the challenge from simple storage to managing time, ie. handling data that arrives late or out of order while maintaining the integrity of the sequence.

Some background on the history of streaming data

The market for streaming data technology is already big and growing rapidly ($128 Bn by 2030 at a CAGR of 30%), driven by the huge rise in data over the last 20 years, and in particular by the growth of large scale web applications that generate huge amounts of click stream and other data.

Perhaps the best know the new streaming data frameworks is Kafka. Kafka was originally developed by a team at LinkedIn to solve a specific problem: processing massive amounts of “clickstream” data (user activity logs) for highly scaled web applications. They built Kafka to be a high-throughput pipe, designed to ingest massive volumes of data without choking. Similarly, ReactiveX was developed by Microsoft to handle asynchronous data streams in .NET, eventually influencing how developers handle data across many languages. There others which do similar jobs.

But what all of these systems have in common is that they prioritise throughput over raw latency. You can think of them like cargo ships: incredibly powerful and capable of carrying massive loads, but not designed for the speed required for certain latency critical applications like building electronic market places and certain realtime AI applications. Kafka streams, for instance, is bound to the Java Virtual Machine (which is relatively slow) and is tightly coupled to the wider streaming framework, adding overhead that makes it unsuitable for ultra-low latency tasks. ReactiveX, meanwhile, operates with a depth-first graph approach and lacks a native historical mode. Again, this makes it hard to operate at ultra-low latencies.

Wingfoil is the only ultra-low latency streaming framework

Latency is relative. What’s fast for one application or context is leaden footed for another. For example, for a video call, a delay of 200 milliseconds is acceptable. But for an electronic marketplace or High-Frequency Trading (HFT) system, delays are measured in the low microseconds or nanoseconds. In these environments, if you are 50 microseconds slower than the competition, you lose the opportunity.

An this is where Wingfoil comes in. It is an ultra-low latency, highly-scalable stream processing framework built specifically for these ultra-low latency, often high-stakes environments.

A range of streaming solutions for a range of use cases

There is no one-size-fits all solution for data streaming – the choice of data streaming solution comes down to the specific demands of the of the application you are building. Kafka and ReactiveX – the “cargo ships” of the streaming world – will always remain essential for moving massive volumes of data, but if your priority is ultra-low latency rather than volume of throughput then Wingfoil offers the performance you need, replacing complex, hand-rolled C++ solutions with a modern, accessible Rust framework.

And indeed, Wingfoil can work with or alongside other streaming solutions. Wingfoil offers simple, composable APIs so you can plug it directly into your machine learning models or risk engines and easily integrate with existing tools. Wingfoil has browser bindings in Python and Typescript and integrates with Tokio to simplify the setup of asynchronous I/O adapters, which means workloads can be efficiently distributed across multi-threaded and multi-server environments.

If you want to know more about Wingfoil and how it works please read our white paper, or learn more about use on Crates or Github.

How streaming data works

Some background on the history of streaming data

Wingfoil is the only ultra-low latency streaming framework

A range of streaming solutions for a range of use cases

Related Posts

Wingfoil-Python – get the ultra-low latency data streaming performance of Rust while working in Python

Introducing Wingfoil – the ultra-low latency data streaming framework