About this series
I'm Ethan Weinberger. After a brief stint in the hedge fund world I'm now a Ph.D student in machine learning at the University of Washington.
The world of AI has had some real breakthroughs mixed with massive amounts of cash and wildly speculative claims - a perfect recipe for BS. Kernels of Truth takes a deep dive into recent work in the field to determine whether reality matches up with the hype.
twitter: @efweinberger, email: ethan [at] honestyisbest [dot] com.
June 25, 2020
In AI, Money & Attention Are All You Need
The invention of Transformer models has turned out to be one of the most influential developments in recent machine learning history. Transformers are designed to excel at problems involving human language, such as translating between two languages or summarizing long articles into single paragraphs. While previous deep learning models performed far better than classical, non-deep models at such tasks, they still struggled when presented with long sentences with complex structure. Earlier neural network models for language were designed to process sentences in order, one word at a time. Thus, as sentences grew longer and more complex, these models were prone to forgetting information contained in earlier parts of the sentence by the time they reached the end. Transformers bypass this issue with a so-called attention mechanism, which allows the model to process all parts of a sentence at the same time and better capture long-range dependencies. The idea behind the attention mechanism comes from how humans behave when forced to process a large signal - rather than focus equally on the whole signal at once, we tend to focus on (or “attend to”) a specific region, while ignoring some details of the rest of it. For example, when our visual system processes an image, we focus on a specific object and absorb the information in its finer details. At the same time, we can pull in coarser information from our peripheral vision for additional context, while ignoring that which doesn’t affect our understanding of the scene. The attention mechanism allows transformers to process entire sentences at once in a similar fashion by learning to focus on the parts that contain most of the meaning, potentially using information from the rest for additional context, and ignoring the low-signal parts. Increasingly-powerful transformers have been developed at a rapid clip since the original paper from Google in 2017, with some highlights including ELMo (Allen Institute of Artificial Intelligence, early 2018), BERT (Google, late 2018) and GPT-2 (OpenAI, early 2019).
In late May OpenAI released GPT-3, their latest and greatest transformer model. Models in the GPT family, including GPT-3 and its predecessor GPT-2, have been featured in the popular press for their ability to generate realistic looking writing samples from just a short prompt (see this demo to generate your own). How did the researchers at OpenAI improve upon GPT-2, a model that was already supposedly so powerful that it was too dangerous to release to the public? The answer is simple; they made it bigger - more than two orders of magnitude bigger. GPT-2, already famous for its size, consists of 1.5 billion parameters, taking up just over 5GB of memory. GPT-3 goes the extra mile with over 175 billion parameters, clocking in at over 350GB of memory even after some custom optimizations. Current hardware is far from being able to run such a large model on a single machine. Instead, multiple expensive pieces of infrastructure must be run in parallel to have any hope of training such a model. Back of the envelope calculations suggest that training just one GPT-3 model would cost over $10 million in compute credits for infrastructure from the major cloud providers, not to mention the cost of employing expensive AI researchers and engineers.
OpenAI was founded as a nonprofit research lab, so how could they afford to train this behemoth of a model? The organization started off in 2015 with a big pile of cash; their original backers included Silicon Valley heavyweights like Sam Altman, Elon Musk, and Peter Thiel among others, with a total funding commitment of over $1 billion. At first OpenAI thought this funding would be enough to sustain themselves more or less indefinitely, saying they “expect to only spend a tiny fraction of [the money] in the next few years”. However, it turns out that producing headline-grabbing AI research is expensive, and the company had a far higher burn rate than they first expected. The situation was dire enough that they were forced to abandon their original nonprofit status last year and seek money from outside investors. Soon after the transition, they nabbed another $1 billion of funds from their friends up in Redmond along with some snazzy new supercomputers.
OpenAI isn’t alone in pursuing increasingly expensive AI research projects. In Mountain View Google has been busy working on ideas for new computer vision models. Traditionally, such models were designed by hand, and new architecture choices were meant to alleviate specific issues with older ones. Recently, a group at Google has decided to throw that process out the window; instead of continuing to design models by hand, they have been hard at work refining a process called Neural Architecture Search (NAS), whereby an algorithm mixes and matches different neural network pieces until it finds a combination that performs suitably well at the given task. The process takes a lot of computation. Each new combination of blocks needs to be trained to assess its performance on the given task, and it can take hundreds of tries before the algorithm finds a suitable architecture. Even with Google’s research infrastructure, their original implementation took days of computation time to converge on good models. Despite the computational costs, NAS works very well when it is given enough time. In fact, it works well enough to outperform the previous hand-designed architectures that once were state of the art - another big win for absurd levels of computation.
Not to be outdone by their competitors, Facebook’s scientists have also been exploring what they can achieve with AI at the multimillion dollar model scale. Oddly enough, their projects have been venturing outside domains that we would expect from a social media company, like language or computer vision. Instead, new projects have focused on topics such as understanding the structure and function of proteins (for the measely cost of using 128 NVIDIA Tesla v100 GPUs, each of which costs upwards of $8500, for four days straight).
What’s the upshot? Advances in AI can be expensive. While many conceptual breakthroughs certainly remain to be had, it’s likely that costly hardware systems will also play a major role in new developments. The ideas behind many modern deep learning architectures were developed decades ago, but until GPU technology was sufficiently advanced in the late 2000s they were seen as unrealistically compute intensive to be of any practical use. It’s possible that we’re hitting a similar plateau now, only this time a handful of tech giants can use their budgets can sidestep the problem while progress in academic labs stalls.
Machine learning researchers, even those in industrial research labs, have long been proponents of open research and collaboration across institutions. That said, there’s no guarantee this will be the case forever, especially as advances in ML become more central to a company’s competitive advantage. OpenAI, which started off with lofty ambitions to conduct open research free from any monetary concerns, has already begun commercializing its GPT-3 model. Notably, while the previous version GPT-2 was released with a detailed paper describing the model and accompanying code, that wasn’t the case for GPT-3. The GPT-3 paper was far more vague in its descriptions of the model architecture. In what represents perhaps a more signifcant move, the accompanying Github repository has already been set to read-only without the authors actually publishing their code for the model, only the datasets used to train it. When asked about plans to release the model itself, OpenAI declined to comment.
A more closed-off world of AI research might benefit the few companies who can compete in it, but it would represent a major setback for a field in which the free flow of ideas has been standard practice. Without additional support for public academic work, there’s no guarantee that this culture of openness will continue. Moreover, given the potential societal impact of new AI technologies, concentrating so much power in the hands of a few industrial players is certain to lead to disaster. Researchers are repeatedly discovering behavior in current AI systems that we would consider unfair or unethical, such as models used by the judicial system to predict risk of recidivism consistently rating black defendents as higher risk after controlling for other variables or facial recognition systems disproportionately misidentifying non-white male subjects. Such discoveries are only made possible by a community that values open research and discussion.
How can we avoid a dystopian future where only the largest tech companies control new developments in AI and, potentially more importantly, decide how to weigh any potential new ethical issues? More government support would be a good start. Last year the Trump Administration unveiled the American AI Initiative, an effort to ensure that the US remains a global leader in AI technologies. However, whether the administration will actually follow through with significant funding to achieve this goal remains unclear. The federal government is looking to spend just shy of $5 billion on AI technology in 2020, $4 billion of which would be allocated for defense-related research and $1 billion for everything else. While this sounds impressive at first, it’s a paltry sum compared to what other major governments are willing to spend. In China the Shanghai city government alone has committed to spending $15 billion on AI projects over the next 10 years. While public numbers don’t exist for China-wide government spending, it’s almost certain to be at least an order of magnitude beyond that of a single city.
Luckily some members of Congress understand the importance of having the public sector play a larger role in research efforts. In November of last year, Senate Minority Leader Chuck Schumer introduced a proposal to put forward $100 billion towards AI research over the next five years. The proposal has support from both sides of the aisle. Not only does it fit well with the Democratic Party’s general support for increased funding for the sciences, but it also aligns with the GOP’s increasingly anti-China platform; China is the most likely country to usurp US dominance in AI technologies, and ensuring that the Chinese Communist Party doesn’t invent Skynet before we do is an easy sell. Unfortunately, the rank and file GOP attitude towards the idea doesn’t extend to the top leadership. Senate Majority Leader Mitch McConnell and President Trump remain opposed to the increase in funding, thus dooming the proposal.
The 2020 US election is less than five months away, and Democrats are in a strong position to flip the White House and, surprisingly, the Senate too. If they do so, a subsequent infusion of cash for public AI research could ensure that new advances, along with acknowledgements of any accompanying downsides, become public knowledge. Otherwise, unless the Trump administration has a change of heart, we’ll continue on the path to a future where new AI technologies are controlled exclusively by the largest Silicon Valley companies and foreign governments that don’t exactly share our values of openness and freedom of expression.
Click here to read more from Kernels of Truth.