Calvin Qi, who works at Glean, a search startup, would love to improve his company’s products with the latest artificial intelligence algorithms.
Glean provides search tools for applications such as Gmail, Slack, and Salesforce. According to Qi, new AI techniques for parsing language would allow Glean’s customers to find the right file or conversation much faster.
However, training such a cutting-edge AI algorithm costs millions of dollars. As a result, Glean employs smaller, less capable AI models that are incapable of extracting as much meaning from text. In the last decade, AI has produced exciting breakthroughs, such as programs that can outperform humans in complex games, steer cars through city streets under certain conditions, respond to spoken commands, and write coherent text based on a short prompt. Writing, in particular, is reliant on recent advances in computers’ ability to parse and manipulate text.
Those advancements are largely the result of feeding the algorithms more text to learn from and providing them with more chips to digest it. And that is not cheap.
Consider OpenAI’s GPT-3 language model, a large, mathematically simulated neural network fed reams of web-scraped text. GPT-3 can discover statistical patterns that predict which words should come after others with remarkable accuracy. GPT-3 is significantly better than previous AI models at tasks like answering questions, summarizing text, and correcting grammatical errors right out of the box. According to one metric, it is 1,000 times more capable than its predecessor, GPT-2.
The rising cost of advanced AI training is also a concern for established companies looking to expand their AI capabilities.
Dan McCreary is the leader of a team at Optum, a health IT company, that uses language models to analyze call transcripts in order to identify higher-risk patients or recommend referrals. Even training a language model one-thousandth the size of GPT-3, he claims, can quickly deplete the team’s budget. Models must be trained for specific tasks, which can cost more than $50,000 when rented from cloud computing companies.
According to McCreary, cloud computing providers have little incentive to reduce prices. “We can’t trust cloud providers to work with us to reduce the costs of building our AI models,” he says. He is considering purchasing specialized chips designed to accelerate AI training.
Part of the reason AI has advanced so quickly recently is that many academic labs and startups have been able to download and use the most recent ideas and techniques. Image processing algorithms, for example, emerged from academic labs and were developed using off-the-shelf hardware and openly shared data sets.
However, it has become increasingly clear that progress in AI is linked to an exponential increase in underlying computer power over time.
Of course, large corporations have always had advantages in terms of budget, scale, and reach. Large amounts of computer power are also required in industries such as drug discovery. Some are now advocating for even greater expansion. Microsoft announced this week that it had collaborated with Nvidia to create a language model that was more than twice the size of GPT-3. Chinese researchers claim to have created a language model four times larger than that.
The company is based on a technique developed by MIT professor Michael Carbin and one of his students, Jonathan Frankle, that involves “pruning” a neural network to remove inefficiencies and create a much smaller network capable of comparable performance. According to Frankle, preliminary results indicate that it should be possible to cut the amount of computer power required to train something like GPT-3 in half, lowering development costs.
Carbin claims that there are other methods for improving neural network training performance. Mosaic ML intends to open-source much of its technology while also providing consulting services to companies looking to reduce the cost of AI deployment. According to Carbin, one potential offering is a tool that measures the trade-offs between different methods in terms of accuracy, speed, and cost.
Mosaic ML’s technology, according to Kanter of MLCommons, may help well-heeled companies take their models to the next level, but it may also help democratize AI for companies lacking deep AI expertise.