Machine learning generates far more carbon emissions than most people realize. A Stanford team has developed a tool to measure the hidden cost.
How can we train AI systems while using less energy? | Stocksy/Jetta Productions
For all the advances enabled by artificial intelligence, from speech recognition to self-driving cars, AI systems consume a lot of power and can generate high volumes of climate-changing carbon emissions.
A study last year found that training an off-the-shelf AI language-processing system produced 1,400 pounds of emissions — about the amount produced by flying one person roundtrip between New York and San Francisco. The full suite of experiments needed to build and train that AI language system from scratch can generate even more: up to 78,000 pounds, depending on the source of power. That’s twice as much as the average American exhales over an entire lifetime.
But there are ways to make machine learning cleaner and greener, a movement that has been called “Green AI.” Some algorithms are less power-hungry than others, for example, and many training sessions can be moved to remote locations that get most of their power from renewable sources.
The key, however, is for AI developers and companies to know how much their machine learning experiments are spewing and how much those volumes could be reduced.
Now, a team of researchers from Stanford, Facebook AI Research, and McGill University has come up with an easy-to-use tool that quickly measures both how much electricity a machine learning project will use and how much that means in carbon emissions.
“As machine learning systems become more ubiquitous and more resource intensive, they have the potential to significantly contribute to carbon emissions,” says Peter Henderson, a PhD student at Stanford in computer science and the lead author.
“But you can’t solve a problem if you can’t measure it. Our system can help researchers and industry engineers understand how carbon-efficient their work is, and perhaps prompt ideas about how to reduce their carbon footprint.”
Henderson teamed up on the “experiment impact tracker” with Dan Jurafsky, chair of linguistics and an HAI-affiliated professor of computer science at Stanford; Emma Brunskill, an HAI-affiliated assistant professor of computer science at Stanford; Jieru Hu, a software engineer at Facebook AI Research; Joelle Pineau, a professor of computer science at McGill and co-managing director of Facebook AI Research; and Joshua Romoff, a PhD candidate at McGill.
“There’s a big push to scale up machine learning to solve bigger and bigger problems, using more compute power and more data,” says Jurafsky. “As that happens, we have to be mindful of whether the benefits of these heavy-compute models are worth the cost of the impact on the environment.”
Machine learning systems build their skills by running millions of statistical experiments around the clock, steadily refining their models to carry out tasks. Those training sessions, which can last weeks or even months, are increasingly power-hungry. And because the costs have plunged for both computing power and massive datasets, machine learning is increasingly pervasive in business, government, academia, and personal life.
To get an accurate measure of what that means for carbon emissions, the researchers began by measuring the power consumption of a particular AI model. That’s more complicated than it sounds, because a single machine often trains several models at the same time, so each training session has to be untangled from the others. Each training session also draws power for shared overhead functions, such as data storage and cooling, which need to be properly allocated.
The next step is to translate power consumption into carbon emissions, which depend on the mix of renewable and fossil fuels that produced the electricity. That mix varies widely by location as well as by time of day. In areas with a lot of solar power, for example, the carbon intensity of electricity goes down as the sun climbs higher in the sky.
To get that information, the researchers scoured public sources of data about the energy mix in different regions of the United States and the world. In California, the experiment-tracker plugs into real-time data from California ISO, which manages the flow of electricity over most of the state’s grids. At 12:45 p.m. on a day in late May, for example, renewables were supplying 47% of the state’s power.
The location of an AI training session can make a big difference in its carbon emissions. The researchers estimated that running a session in Estonia, which relies overwhelmingly on shale oil, will produce 30 times the volume of carbon as the same session would in Quebec, which relies primarily on hydroelectricity.
Indeed, the researchers’ first recommendation for reducing the carbon footprint is to move training sessions to a location supplied mainly by renewable sources. That can be easy, because datasets can be stored on a cloud server and accessed from almost anywhere.