AI legend Yann LeCun, one of the godfathers of deep learning, sees self-supervised learning as the key to AI's future
The field of artificial intelligence moves fast. It has only been 8 years since the modern era of deep learning began at the 2012 ImageNet competition. Progress in the field since then has been breathtaking and relentless.
If anything, this breakneck pace is only accelerating. Five years from now, the field of AI will look very different than it does today. Methods that are currently considered cutting-edge will have become outdated; methods that today are nascent or on the fringes will be mainstream.
What will the next generation of artificial intelligence look like? Which novel AI approaches will unlock currently unimaginable possibilities in technology and business? This article highlights three emerging areas within AI that are poised to redefine the field—and society—in the years ahead. Study up now.
1. Unsupervised Learning
The dominant paradigm in the world of AI today is supervised learning. In supervised learning, AI models learn from datasets that humans have curated and labeled according to predefined categories. (The term “supervised learning” comes from the fact that human “supervisors” prepare the data in advance.)
While supervised learning has driven remarkable progress in AI over the past decade, from autonomous vehicles to voice assistants, it has serious limitations.
The process of manually labeling thousands or millions of data points can be enormously expensive and cumbersome. The fact that humans must label data by hand before machine learning models can ingest it has become a major bottleneck in AI.
At a deeper level, supervised learning represents a narrow and circumscribed form of learning. Rather than being able to explore and absorb all the latent information, relationships and implications in a given dataset, supervised algorithms orient only to the concepts and categories that researchers have identified ahead of time.
In contrast, unsupervised learning is an approach to AI in which algorithms learn from data without human-provided labels or guidance.
Many AI leaders see unsupervised learning as the next great frontier in artificial intelligence. In the words of AI legend Yann LeCun: “The next AI revolution will not be supervised.” UC Berkeley professor Jitenda Malik put it even more colorfully: “Labels are the opium of the machine learning researcher.”
How does unsupervised learning work? In a nutshell, the system learns about some parts of the world based on other parts of the world. By observing the behavior of, patterns among, and relationships between entities—for example, words in a text or people in a video—the system bootstraps an overall understanding of its environment. Some researchers sum this up with the phrase “predicting everything from everything else.”
Unsupervised learning more closely mirrors the way that humans learn about the world: through open-ended exploration and inference, without a need for the “training wheels” of supervised learning. One of its fundamental advantages is that there will always be far more unlabeled data than labeled data in the world (and the former is much easier to come by).
In the words of LeCun, who prefers the closely related term “self-supervised learning”: “In self-supervised learning, a portion of the input is used as a supervisory signal to predict the remaining portion of the input....More knowledge about the structure of the world can be learned through self-supervised learning than from [other AI paradigms], because the data is unlimited and the amount of feedback provided by each example is huge.”
Unsupervised learning is already having a transformative impact in natural language processing. NLP has seen incredible progress recently thanks to a new unsupervised learning architecture known as the Transformer, which originated at Google about three years ago. (See #3 below for more on Transformers.)
Efforts to apply unsupervised learning to other areas of AI remain at earlier stages, but rapid progress is being made. To take one example, a startup named Helm.ai is seeking to use unsupervised learning to leapfrog the leaders in the autonomous vehicle industry.
Many researchers see unsupervised learning as the key to developing human-level AI. According to LeCun, mastering unsupervised learning is “the greatest challenge in ML and AI of the next few years.”
2. Federated Learning
One of the overarching challenges of the digital era is data privacy. Because data is the lifeblood of modern artificial intelligence, data privacy issues play a significant (and often limiting) role in AI’s trajectory.
Privacy-preserving artificial intelligence—methods that enable AI models to learn from datasets without compromising their privacy—is thus becoming an increasingly important pursuit. Perhaps the most promising approach to privacy-preserving AI is federated learning.
The concept of federated learning was first formulated by researchers at Google in early 2017. Over the past year, interest in federated learning has exploded: more than 1,000 research papers on federated learning were published in the first six months of 2020, compared to just 180 in all 2018.
The standard approach to building machine learning models today is to gather all the training data in one place, often in the cloud, and then to train the model on the data. But this approach is not practicable for much of the world’s data, which for privacy and security reasons cannot be moved to a central data repository. This makes it off-limits to traditional AI techniques.
Federated learning solves this problem by flipping the conventional approach to AI on its head.
Rather than requiring one unified dataset to train a model, federated learning leaves the data where it is, distributed across numerous devices and servers on the edge. Instead, many versions of the model are sent out—one to each device with training data—and trained locally on each subset of data. The resulting model parameters, but not the training data itself, are then sent back to the cloud. When all these “mini-models” are aggregated, the result is one overall model that functions as if it had been trained on the entire dataset at once.
The original federated learning use case was to train AI models on personal data distributed across billions of mobile devices. As those researchers summarized: “Modern mobile devices have access to a wealth of data suitable for machine learning models....However, this rich data is often privacy sensitive, large in quantity, or both, which may preclude logging to the data center....We advocate an alternative that leaves the training data distributed on the mobile devices, and learns a shared model by aggregating locally-computed updates.”
More recently, healthcare has emerged as a particularly promising field for the application of federated learning.
It is easy to see why. On one hand, there are an enormous number of valuable AI use cases in healthcare. On the other hand, healthcare data, especially patients’ personally identifiable information, is extremely sensitive; a thicket of regulations like HIPAA restrict its use and movement. Federated learning could enable researchers to develop life-saving healthcare AI tools without ever moving sensitive health records from their source or exposing them to privacy breaches.
A host of startups has emerged to pursue federated learning in healthcare. The most established is Paris-based Owkin; earlier-stage players include Lynx.MD, Ferrum Health and Secure AI Labs.
Beyond healthcare, federated learning may one day play a central role in the development of any AI application that involves sensitive data: from financial services to autonomous vehicles, from government use cases to consumer products of all kinds. Paired with other privacy-preserving techniques like differential privacy and homomorphic encryption, federated learning may provide the key to unlocking AI’s vast potential while mitigating the thorny challenge of data privacy.
The wave of data privacy legislation being enacted worldwide today (starting with GDPR and CCPA, with many similar laws coming soon) will only accelerate the need for these privacy-preserving techniques. Expect federated learning to become an important part of the AI technology stack in the years ahead.
We have entered a golden era for natural language processing.
Open AI’s release of GPT-3, the most powerful language model ever built, captivated the technology world this summer. It has set a new standard in NLP: it can write impressive poetry, generate functioning code, compose thoughtful business memos, write articles about itself, and so much more.
GPT-3 is just the latest (and largest) in a string of similarly architected NLP models—Google’s BERT, Open AI’s GPT-2, Facebook’s RoBERTa and others—that are redefining what is possible in NLP.
The key technology breakthrough underlying this revolution in language AI is the Transformer.
Transformers were introduced in a landmark 2017 research paper. Previously, state-of-the-art NLP methods had all been based on recurrent neural networks (e.g., LSTMs). By definition, recurrent neural networks process data sequentially—that is, one word at a time, in the order that the words appear.
Transformers’ great innovation is to make language processing parallelized: all the tokens in a given body of text are analyzed at the same time rather than in sequence. In order to support this parallelization, Transformers rely heavily on an AI mechanism known as attention. Attention enables a model to consider the relationships between words regardless of how far apart they are and to determine which words and phrases in a passage are most important to “pay attention to.”
Why is parallelization so valuable? Because it makes Transformers vastly more computationally efficient than RNNs, meaning they can be trained on much larger datasets. GPT-3 was trained on roughly 500 billion words and consists of 175 billion parameters, dwarfing any RNN in existence.
Transformers have been associated almost exclusively with NLP to date, thanks to the success of models like GPT-3. But just this month, a groundbreaking new paper was released that successfully applies Transformers to computer vision. Many AI researchers believe this work could presage a new era in computer vision. (As well-known ML researcher Oriol Vinyals put it simply, “My take is: farewell convolutions.”)
While leading AI companies like Google and Facebook have begun to put Transformer-based models into production, most organizations remain in the early stages of productizing and commercializing this technology. Open AI has announced plans to make GPT-3 commercially accessible via API, which could seed an entire ecosystem of startups building applications on top of it.
Expect Transformers to serve as the foundation for a whole new generation of AI capabilities in the years ahead, starting with natural language. As exciting as the past decade has been in the field of artificial intelligence, it may prove to be just a prelude to the decade ahead.
4. Neural Network Compression
AI is moving to the edge.
There are tremendous advantages to being able to run AI algorithms directly on devices at the edge—e.g., phones, smart speakers, cameras, vehicles—without sending data back and forth from the cloud.
Perhaps most importantly, edge AI enhances data privacy because data need not be moved from its source to a remote server. Edge AI is also lower latency since all processing happens locally; this makes a critical difference for time-sensitive applications like autonomous vehicles or voice assistants. It is more energy- and cost-efficient, an increasingly important consideration as the computational and economic costs of machine learning balloon. And it enables AI algorithms to run autonomously without the need for an Internet connection.
Nvidia CEO Jensen Huang, one of the titans of the AI business world, sees edge AI as the future of computing: “AI is moving from the cloud to the edge, where smart sensors connected to AI computers can speed checkouts, direct forklifts, orchestrate traffic, save power. In time, there will be trillions of these small autonomous computers, powered by AI.”
But in order for this lofty vision of ubiquitous intelligence at the edge to become a reality, a key technology breakthrough is required: AI models need to get smaller. A lot smaller. Developing and commercializing techniques to shrink neural networks without compromising their performance has thus become one of the most important pursuits in the field of AI.
The typical deep learning model today is massive, requiring significant computational and storage resources in order to run. OpenAI’s new language model GPT-3, which made headlines this summer, has a whopping 175 billion model parameters, requiring more than 350 GB just to store the model. Even models that don’t approach GPT-3 in size are still extremely computationally intensive: ResNet-50, a widely used computer vision model developed a few years ago, uses 3.8 billion floating-point operations per second to process an image.
These models cannot run at the edge. The hardware processors in edge devices (think of the chips in your phone, your Fitbit, or your Roomba) are simply not powerful enough to support them.
Developing methods to make deep learning models more lightweight therefore represents a critical unlock: it will unleash a wave of product and business opportunities built around decentralized artificial intelligence.
How would such model compression work?
Researchers and entrepreneurs have made tremendous strides in this field in recent years, developing a series of techniques to miniaturize neural networks. These techniques can be grouped into five major categories: pruning, quantization, low-rank factorization, compact convolutional filters, and knowledge distillation.
Pruning entails identifying and eliminating the redundant or unimportant connections in a neural network in order to slim it down. Quantization compresses models by using fewer bits to represent values. In low-rank factorization, a model’s tensors are decomposed in order to construct sparser versions that approximate the original tensors. Compact convolutional filters are specially designed filters that reduce the number of parameters required to carry out convolution. Finally, knowledge distillation involves using the full-sized version of a model to “teach” a smaller model to mimic its outputs.
These techniques are mostly independent from one another, meaning they can be deployed in tandem for improved results. Some of them (pruning, quantization) can be applied after the fact to models that already exist, while others (compact filters, knowledge distillation) require developing models from scratch.
A handful of startups has emerged to bring neural network compression technology from research to market. Among the more promising are Pilot AI, Latent AI, Edge Impulse and Deeplite. As one example, Deeplite claims that its technology can make neural networks 100x smaller, 10x faster, and 20x more power efficient without sacrificing performance.
“The number of devices in the world that have some computational capability has skyrocketed in the last decade,” explained Pilot AI CEO Jon Su. “Pilot AI’s core IP enables a significant reduction in the size of the AI models used for tasks like object detection and tracking, making it possible for AI/ML workloads to be run directly on edge IoT devices. This will enable device manufacturers to transform the billions of sensors sold every year—things like push button doorbells, thermostats, or garage door openers—into rich tools that will power the next generation of IoT applications.”
Large technology companies are actively acquiring startups in this category, underscoring the technology’s long-term strategic importance. Earlier this year Apple acquired Seattle-based Xnor.ai for a reported $200 million; Xnor’s technology will help Apple deploy edge AI capabilities on its iPhones and other devices. In 2019 Tesla snapped up DeepScale, one of the early pioneers in this field, to support inference on its vehicles.
And one of the most important technology deals in years—Nvidia’s pending $40 billion acquisition of Arm, announced last month—was motivated in large part by the accelerating shift to efficient computing as AI moves to the edge.
Emphasizing this point, Nvidia CEO Jensen Huang said of the deal: “Energy efficiency is the single most important thing when it comes to computing going forward....together, Nvidia and Arm are going to create the world's premier computing company for the age of AI.”
In the years ahead, artificial intelligence will become untethered, decentralized and ambient, operating on trillions of devices at the edge. Model compression is an essential enabling technology that will help make this vision a reality.
5. Generative AI
Today’s machine learning models mostly interpet and classify existing data: for instance, recognizing faces or identifying fraud. Generative AI is a fast-growing new field that focuses instead on building AI that can generate its own novel content. To put it simply, generative AI takes artificial intelligence beyond perceiving to creating.
Two key technologies are at the heart of generative AI: generative adversarial networks (GANs) and variational autoencoders (VAEs).
The more attention-grabbing of the two methods, GANs were invented by Ian Goodfellow in 2014 while he was pursuing his PhD at the University of Montreal under AI pioneer Yoshua Bengio.
Goodfellow’s conceptual breakthrough was to architect GANs with two separate neural networks—and then pit them against one another.
Starting with a given dataset (say, a collection of photos of human faces), the first neural network (called the “generator”) begins generating new images that, in terms of pixels, are mathematically similar to the existing images. Meanwhile, the second neural network (the “discriminator”) is fed photos without being told whether they are from the original dataset or from the generator’s output; its task is to identify which photos have been synthetically generated.
As the two networks iteratively work against one another—the generator trying to fool the discriminator, the discriminator trying to suss out the generator’s creations—they hone one another’s capabilities. Eventually the discriminator’s classification success rate falls to 50%, no better than random guessing, meaning that the synthetically generated photos have become indistinguishable from the originals.
In 2016, AI great Yann LeCun called GANs “the most interesting idea in the last ten years in machine learning.”
VAEs, introduced around the same time as GANs, are a conceptually similar technique that can be used as an alternative to GANs.
Like GANs, VAEs consist of two neural networks that work in tandem to produce an output. The first network (the “encoder”) takes a piece of input data and compresses it into a lower-dimensional representation. The second network (the “decoder”) takes this compressed representation and, based on a probability distribution of the original data’s attributes and a randomness function, generates novel outputs that “riff” on the original input.
In general, GANs generate higher-quality output than do VAEs but are more difficult and more expensive to build.
Like artificial intelligence more broadly, generative AI has inspired both widely beneficial and frighteningly dangerous real-world applications. Only time will tell which will predominate.
On the positive side, one of the most promising use cases for generative AI is synthetic data. Synthetic data is a potentially game-changing technology that enables practitioners to digitally fabricate the exact datasets they need to train AI models.
Getting access to the right data is both the most important and the most challenging part of AI today. Generally, in order to train a deep learning model, researchers must collect thousands or millions of data points from the real world. They must then have labels attached to each data point before the model can learn from the data. This is at best an expensive and time-consuming process; at worst, the data one needs is simply impossible to get one’s hands on.
Synthetic data upends this paradigm by enabling practitioners to artificially create high-fidelity datasets on demand, tailored to their precise needs. For instance, using synthetic data methods, autonomous vehicle companies can generate billions of different driving scenes for their vehicles to learn from without needing to actually encounter each of these scenes on real-world streets.
As synthetic data approaches real-world data in accuracy, it will democratize AI, undercutting the competitive advantage of proprietary data assets. In a world in which data can be inexpensively generated on demand, the competitive dynamics across industries will be upended.
A crop of promising startups has emerged to pursue this opportunity, including Applied Intuition, Parallel Domain, AI.Reverie, Synthesis AI and Unlearn.AI. Large technology companies—among them Nvidia, Google and Amazon—are also investing heavily in synthetic data. The first major commercial use case for synthetic data was autonomous vehicles, but the technology is quickly spreading across industries, from healthcare to retail and beyond.
Counterbalancing the enormous positive potential of synthetic data, a different generative AI application threatens to have a widely destructive impact on society: deepfakes.
We covered deepfakes in detail in this column earlier this year. In essence, deepfake technology enables anyone with a computer and an Internet connection to create realistic-looking photos and videos of people saying and doing things that they did not actually say or do.
The first use case to which deepfake technology has been widely applied is pornography. According to a July 2019 report from startup Sensity, 96% of deepfake videos online are pornographic. Deepfake pornography is almost always non-consensual, involving the artificial synthesis of explicit videos that feature famous celebrities or personal contacts.
From these dark corners of the Internet, the use of deepfakes has begun to spread to the political sphere, where the potential for harm is even greater. Recent deepfake-related political incidents in Gabon, Malaysia and Brazil may be early examples of what is to come.
In a recent report, The Brookings Institution grimly summed up the range of political and social dangers that deepfakes pose: “distorting democratic discourse; manipulating elections; eroding trust in institutions; weakening journalism; exacerbating social divisions; undermining public safety; and inflicting hard-to-repair damage on the reputation of prominent individuals, including elected officials and candidates for office.”
The core technologies underlying synthetic data and deepfakes are the same. Yet the use cases and potential real-world impacts are diametrically opposed.
It is a great truth in technology that any given innovation can either confer tremendous benefits or inflict grave harm on society, depending on how humans choose to employ it. It is true of nuclear energy; it is true of the Internet. It is no less true of artificial intelligence. Generative AI is a powerful case in point.
6. “System 2” Reasoning
In his landmark book Thinking, Fast And Slow, Nobel-winning psychologist Daniel Kahneman popularized the concepts of “System 1” thinking and “System 2” thinking.
System 1 thinking is intuitive, fast, effortless and automatic. Examples of System 1 activities include recognizing a friend’s face, reading the words on a passing billboard, or completing the phrase “War And _______”. System 1 requires little conscious processing.
System 2 thinking is slower, more analytical and more deliberative. Humans use System 2 thinking when effortful reasoning is required to solve abstract problems or handle novel situations. Examples of System 2 activities include solving a complex brain teaser or determining the appropriateness of a particular behavior in a social setting.
Though the System 1/System 2 framework was developed to analyze human cognition, it maps remarkably well to the world of artificial intelligence today. In short, today’s cutting-edge AI systems excel at System 1 tasks but struggle mightily with System 2 tasks.
AI leader Andrew Ng summarized this well: “If a typical person can do a mental task with less than one second of thought, we can probably automate it using AI either now or in the near future.”
Yoshua Bengio’s 2019 keynote address at NeurIPS explored this exact theme. In his talk, Bengio called on the AI community to pursue new methods to enable AI systems to go beyond System 1 tasks to System 2 capabilities like planning, abstract reasoning, causal understanding, and open-ended generalization.
“We want to have machines that understand the world, that build good world models, that understand cause and effect, and can act in the world to acquire knowledge,” Bengio said.
There are many different ways to frame the AI discipline’s agenda, trajectory and aspirations. But perhaps the most powerful and compact way is this: in order to progress, AI needs to get better at System 2 thinking.
No one yet knows with certainty the best way to move toward System 2 AI. The debate over how to do so has coursed through the field in recent years, often contentiously. It is a debate that evokes basic philosophical questions about the concept of intelligence.
Bengio is convinced that System 2 reasoning can be achieved within the current deep learning paradigm, albeit with further innovations to today’s neural networks.
“Some people think we need to invent something completely new to face these challenges, and maybe go back to classical AI to deal with things like high-level cognition,” Bengio said in his NeurIPS keynote. “[But] there is a path from where we are now, extending the abilities of deep learning, to approach these kinds of high-level questions of cognitive system 2.”
Bengio pointed to attention mechanisms, continuous learning and meta-learning as existing techniques within deep learning that hold particular promise for the pursuit of System 2 AI.
Others, though, believe that the field of AI needs a more fundamental reset.
Professor and entrepreneur Gary Marcus has been a particularly vocal advocate of non-deep-learning approaches to System 2 intelligence. Marcus has called for a hybrid solution that combines neural networks with symbolic methods, which were popular in the earliest years of AI research but have fallen out of favor more recently.
“Deep learning is only part of the larger challenge of building intelligent machines,” Marcus wrote in the New Yorker in 2012, at the dawn of the modern deep learning era. “Such techniques lack ways of representing causal relationships and are likely to face challenges in acquiring abstract ideas....They have no obvious ways of performing logical inferences, and they are also still a long way from integrating abstract knowledge, such as information about what objects are, what they are for, and how they are typically used.”
Marcus co-founded robotics startup Robust.AI to pursue this alternative path toward AI that can reason. Just yesterday, Robust announced its $15 million Series A fundraise.
Computer scientist Judea Pearl is another leading thinker who believes the road to System 2 reasoning lies beyond deep learning. Pearl has for years championed causal inference—the ability to understand cause and effect, not just statistical association—as the key to building truly intelligent machines. As Pearl put it recently: “All the impressive achievements of deep learning amount to just curve fitting.”
Of the six AI areas explored in this article series, this final one is, purposely, the most open-ended and abstract. There are many potential paths to System 2 AI; the road ahead remains shrouded. It is likely to be a circuitous and perplexing journey. But within our lifetimes, it will transform the economy and the world.
The Tech Platform