There are two types of real-time machine learning systems. One type makes predictions in real-time, the other updates models in real-time.
By its very nature, it seems natural that machine learning would inherently be a real-time technology. However, when it comes to putting it to use, confusion reigns.
“There seems to be little consensus on what real-time ML means, and there hasn’t been a lot of in-depth discussion on how it’s done in the industry,” Chip Huyen, computer scientist, Stanford professor, and machine learning expert with Snorkel.AI, explained in a recent post titled: Machine learning is going real-time.
For starters, Huyen notes there are two levels of real-time machine learning:
“Level 1: Your ML system makes predictions in real-time (online predictions).
Level 2: Your system can incorporate new data and update your model in real-time (online learning).”
Level-1 real-time ML is the most common approach, but the challenge arises with many companies “switching from batch processing to stream processing, from request-driven architecture to event-driven architecture,” Huyen says. Stream processing is tied to the popularity of Kafka and Flink, but adoption is spotty at best. The reasons for slow adoption, she says, include the following:
The benefits of streaming are unclear: They may not have applications that benefit from online predictions, or, if they have such applications, “but have never done online predictions before.”
Real-time streaming infrastructure requires investment: “Infrastructure updates are expensive and can jeopardize existing applications. Managers might not be willing to invest to upgrade their infra to allow online predictions.”
Moving to streaming requires a “mental shift:” “Switching from batch processing to stream processing requires a mental shift,” Huyen says. “With batch processing, you know when a job is done. With stream processing, it’s never done. With batch processing, you can have well-defined tables and join them, but in streaming, there are no tables to join, then what does it mean to do a join operation on two streams?”
Huyen has some recommendations for incorporating real-time predictions into enterprises: speed up inference, with a “model that can make predictions in the order of milliseconds,” and adopt a “real-time pipeline that can process data, input it into the model, and return a prediction in real-time.” In addition, she advises making models smaller to fit on edge devices, and simply making hardware faster.
Level 2 real-time ML (online learning) is actually rare at this time, Huyen states. “Very, very few companies actually do this because this method suffers from catastrophic forgetting – neural networks abruptly forget previously learned information upon learning new information.” Plus, she adds, “it can be more expensive to run a learning step on only one data point than on a batch (this can be mitigated by having hardware just powerful enough to process exactly one data point).”
Another challenge is “online learning flips a lot of what we’ve learned about machine learning on its head,” Huyen continues. “In online learning, there’s no epoch – your model sees each data point only once. There’s no such thing as convergence either. Your underlying data distribution keeps on shifting. There’s nothing stationary to converge to.”