Ilya Sutskever(前 OpenAI 联合创始人兼首席科学家)在前几天召开的 NeurIPS 会议上表示,大模型的预训练已经走到了尽头。而 Noam Brown(OpenAI 研究员,曾带领团队开发出在德州扑克中战胜职业选手的 AI 系统 Pluribus)在近期关于 OpenAI O1 发布的采访中提到,提升 Test-Time Compute 是提升大模型答案质量的关键。2024 年的圣诞节前夕,一片节日气氛下,湾区的硅谷 AI 大佬、机构和投资者们正在深入探讨从 “Scaling Learning” 向 “Scaling Search” 转变的路径。而这一切的思考,都可以追溯到 Rich Sutton(强化学习领域的奠基人之一)在 2019 年发表的经典短文 The Bitter Lesson (苦涩的教训)。
今天,就让我们静下心来,细读这篇雄文。或许在阅读之后,我们可以对当下与未来的 AI 发展,拥有更深刻的理解和启发。

The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation.
AI 发展的 70 年以来,最深刻的教训莫过于:那些能把算力用到极致的通用方法是最有效的,且最终会展现出碾压性的优势。究其根本,在于摩尔定律,具体来讲,就是单位算力成本随时间持续呈指数级下降这一客观规律。
Most AI research has been conducted as if the computation available to the agent were constant. In this scenario, leveraging human knowledge would be one of the only ways to improve performance. However, over a slightly longer timeframe than a typical research project, massively more computation inevitably becomes available.
很多经典的 AI 研究都假设可用算力就是一个常数。在这种情况下,利用人类的知识提升 AI 性能也就几乎成了唯一途径。然而实际上,算力在不断增长,一个项目截止到下一个项目立项哪怕仅仅是这么短暂的一个假期,实际可用的算力也会大幅增加。
Seeking short-term improvements, researchers often focus on leveraging their domain expertise. But in the long run, the leveraging of computation is the key factor. These two approaches don't necessarily conflict, but in practice, they often do.
研究者为了在短期内取得质量上的突破,往往会利用自己对特定领域的知识对 AI 进行补充,但长远来看,算力才是真正的决胜因素。领域知识和算力这两个事情客观来讲并不相关,但在实践中却常常相互冲突。
Time invested in one approach is time taken away from the other. Researchers also develop psychological commitments to their chosen approach. Furthermore, the human-knowledge approach tends to complicate methods, making them less adaptable to general methods that leverage computation. History is replete with examples of AI researchers learning this bitter lesson, and reviewing some prominent cases is instructive.
研究者的时间有限:有限的时间投入到一种方法上,就必然会错失另一种方法带来的潜在收益。而且,研究者也容易对他们选择的方法产生心理上的依赖,甚至把这种依赖当做自己的特长,从而难以跳出固有思维。更重要的是,基于人类知识的方法往往会增加研究的复杂性,限制了基于算力的通用方法的有效性。这条真理,是许多 AI 研究者在无数次的碰壁之后才痛彻心扉地领悟到的,其中一些典型的案例值得我们深思。
In computer chess, the methods that defeated the world champion, Kasparov, in 1997, were based on massive, deep search. At the time, this was looked upon with dismay by the majority of computer-chess researchers who had pursued methods that leveraged human understanding of the special structure of chess.
1997 年,“深蓝”凭借大规模深度搜索的能力,击败了国际象棋世界冠军卡斯帕罗夫。这给当时大多数致力于将人类对象棋的理解融入 AI 程序的研究者们泼了一大盆冷水。
When a simpler, search-based approach with specialized hardware and software proved vastly more effective, these human-knowledge-based chess researchers were not good losers. They argued that "brute force" search might have won this time, but it wasn't a general strategy, and besides, it wasn't how humans played chess. These researchers championed methods based on human input and were disappointed by their failure.
当一种更暴力朴素、基于搜索的方法,结合专门的硬件和软件,被证明效果显著时,这些基于人类知识的国际象棋研究者却输得不太服气。他们争辩说,“暴力搜索”仅仅是侥幸得逞,但这绝非通用策略,再说,我们人类也不是这么下棋的。那些一直主张基于人类经验的方法的研究者们,因此对这次的人类失利倍感失落。
A similar pattern of research progress was seen in computer Go, only delayed by a further 20 years. Enormous initial efforts went into avoiding search by taking advantage of human knowledge, or of the special features of the game, but all those efforts proved irrelevant, or worse, once search was applied effectively at scale.
20 年后,历史在围棋领域惊人地重演。起初,研究者们投入了巨大的精力,竭力避免暴力搜索,转而利用人类历史长河积累下的知识或对围棋的特殊道义,但所有这些努力最终都付诸东流,甚至变成了阻碍,而大规模、高效的搜索方法最终大获全胜。
Also important was the use of learning by self-play to learn a value function (as it was in many other games and even in chess, although learning did not play a big role in the 1997 program that first beat a world champion). Learning by self-play, and learning in general, is like search in that it enables massive computation to be brought to bear.
同样重要的还有利用自我对弈学习(Self-Play)来训练代价函数(这在许多其他游戏,甚至国际象棋中也得到了应用,尽管在 1997 年首次击败世界冠军的程序中,学习的贡献并不突出)。自我对弈学习以及其他类型的机器学习,与搜索一样,都能有效地利用海量算力。
Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research. In computer Go, as in computer chess, researchers initially focused on utilizing human understanding (to reduce the amount of search needed) but only later achieved much greater success by embracing search and learning.
搜索和学习是人工智能研究中运用海量计算资源的两类最重要技术。在计算机围棋领域,与计算机国际象棋领域一样,研究者们最初都将重点放在利用人类的知识(以减少所需的搜索量)去改进 AI,直到很久以后才幡然醒悟:拥抱搜索和学习才是通往成功的坦途。
In speech recognition, there was an early competition, sponsored by DARPA, in the 1970s. Entrants included a host of special methods that took advantage of human knowledge---knowledge of words, of phonemes, of the human vocal tract, etc. On the other side were newer methods that were more statistical in nature and did much more computation, based on hidden Markov models (HMMs).
在语音识别领域,DARPA 于 20 世纪 70 年代赞助了一场早期竞赛。参赛作品包括大量利用人类知识的特殊方法——关于单词、音素、人类声道等的专业知识。另一方面,出现了一些另辟蹊径的算法,它们本质上更具统计性,并且基于隐马尔可夫模型 (HMM) 进行了更多的计算和搜索。
Again, the statistical methods won out over the human-knowledge-based methods. This led to a major change in all of natural language processing, gradually over decades, where statistics and computation came to dominate the field.
又一次,统计方法脱颖而出!击败了基于人类知识的方法。这也引发了自然语言处理领域的重大变革,并在随后的几十年中逐渐确立了统计和计算的统治地位。
The recent rise of deep learning in speech recognition is the most recent step in this consistent direction. Deep learning methods rely even less on human knowledge, and use even more computation, together with learning on huge training sets, to produce dramatically better speech recognition systems.
近年来,深度学习在语音识别领域的兴起正是这一趋势的延续。深度学习方法更少依赖人类知识,更多地依赖算力,并结合对海量训练数据的学习,最终构建出性能显著提升的语音识别系统。
As in the games, researchers always tried to make systems that worked the way they thought their own minds worked—they tried to put that knowledge in their systems—but it proved ultimately counterproductive, and a colossal waste of researcher’s time. This was especially true as, through Moore's law, massive computation became available and a means was found to put it to good use.
与研究象棋围棋游戏类似,研究员总是试图让 AI 系统按照他们自己所理解的大脑运作方式工作 —— 他们尝试将这些自己对世界的认知注入到 AI 系统中,期待 AI 能像自己一样聪明 —— 但最终证明,这往往适得其反,并且极大地浪费了研究员的时间。在摩尔定律带来的海量算力和人们对算力更高效的利用的客观事实下,这种做法的弊端更加明显。
In computer vision, there has been a similar pattern. Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better.
计算机视觉领域也呈现出类似的模式。早期的研究方法将视觉理解为边缘检测、广义圆柱体搜索,或基于 SIFT 特征的识别。但如今,这些方法都已成为过去式。现代深度学习模型仅使用卷积和像素不变性的概念,就能取得惊叹的性能。
This is a big lesson. As a field, we still have not thoroughly learned it, as we are continuing to make the same kind of mistakes. To see this, and to effectively resist it, we have to understand the appeal of these mistakes. We have to learn the bitter lesson that building in how we think we think does not work in the long run.
这是一个又一个,一个又一个的深刻的教训。但作为 AI 这个领域,我们仍然没有领会它,因为我们还在不断重复着同样的错误。为了认清这一点为了避免重蹈覆辙,我们必须坦然承认重复这些错误带有很强的诱惑力。我们必须痛苦地认识到:那些将我们自以为是的思考方式强加于 AI 系统中工作,最终都是行不通的。
The bitter lesson is based on the historical observations that: 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.
历史的经验一次又一次地告诫我们:1)AI 研究者常常试图将人类的知识灌输到 AI 算法中;2)这种做法在短期内通常有效,并且能给研究人员带来个人的成就感和虚荣心;3)但从长远来看,它会造成瓶颈,甚至阻碍进一步发展;4)最终的突破性进展往往源于一种截然不同的思路,即通过搜索和学习来扩展算力规模。而那些最终的成功往往伴随着是苦涩,常常难以被下咽,因为算力的成功意味着对我们所虚荣的以人类为中心的固有思维一记响亮的打脸。
One thing that should be learned from the bitter lesson is the great power of general purpose methods—methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.
从这个惨痛的教训中,我们应该领悟到通用方法的巨大潜力,即使在可用算力变得极其庞大的情况下,这些方法依然能够随着算力的增长而不断扩展。目前看来,具备这种无限扩展能力的两大方法就是搜索和学习。
The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex. We should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries.
从这个惨痛的教训中,我们还可以领悟到的第二点是,人类心智的实际是极其复杂、根本无法简化的。我们应该放弃那些尝试寻找简化思维模型,就像我们放弃曾经那些试图简化对空间、物理、多智能体或对称性的理解一样。
All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless. Instead, we should build in only the meta-methods that can find and capture this arbitrary complexity.
在面对这个无规律的、本质极其复杂的外部世界,我们不应该将这些复杂的概念按照自己的理解生硬编码到系统中,因为它们的复杂性是无限的;相反,我们应该专注构建那些能够发现并捕捉这种任意复杂性的元方法。
Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.
而对这些元方法而言,能否找到好的近似解至关重要。但寻找这些近似解的过程必须由我们的算法自己完成,而不是我们人类自己。我们需要的不是一个会重复我们已有知识的 AI,而是一个能够像人类一样学习和探索的 AI。而那些试图将我们已有偏执生硬的编码到 AI 中的徒劳,最终只会让我们离真正的学习和认知过程渐行渐远。
原文链接:http://www.incompleteideas.net/IncIdeas/BitterLesson.html
文章来自微信公众号“Jina AI”

【开源免费】Browser-use 是一个用户AI代理直接可以控制浏览器的工具。它能够让AI 自动执行浏览器中的各种任务,如比较价格、添加购物车、回复各种社交媒体等。
项目地址:https://github.com/browser-use/browser-use
【开源免费】DeepBI是一款AI原生的数据分析平台。DeepBI充分利用大语言模型的能力来探索、查询、可视化和共享来自任何数据源的数据。用户可以使用DeepBI洞察数据并做出数据驱动的决策。
项目地址:https://github.com/DeepInsight-AI/DeepBI?tab=readme-ov-file
本地安装:https://www.deepbi.com/
【开源免费】airda(Air Data Agent)是面向数据分析的AI智能体,能够理解数据开发和数据分析需求、根据用户需要让数据可视化。
项目地址:https://github.com/hitsz-ids/airda
【开源免费】AutoGPT是一个允许用户创建和运行智能体的(AI Agents)项目。用户创建的智能体能够自动执行各种任务,从而让AI有步骤的去解决实际问题。
项目地址:https://github.com/Significant-Gravitas/AutoGPT
【开源免费】MetaGPT是一个“软件开发公司”的智能体项目,只需要输入一句话的老板需求,MetaGPT即可输出用户故事 / 竞品分析 / 需求 / 数据结构 / APIs / 文件等软件开发的相关内容。MetaGPT内置了各种AI角色,包括产品经理 / 架构师 / 项目经理 / 工程师,MetaGPT提供了一个精心调配的软件公司研发全过程的SOP。
项目地址:https://github.com/geekan/MetaGPT/blob/main/docs/README_CN.md