Observing how researchers try to guide the output of autoregressive language models into thouhgt like patterns is fascinating. This ist the first of a series of posts in which I’ll briefly address the ( for me ) most visible papers that trace recent developments.

Curiously most of these deal with topics on the very top of the abstraction stack, they deal with how to feed text into the machine and not with gritty details of model and training.

Chain-of-Thought (CoT) Prompting Elicits Reasoning in Large Language Models

Rule of thumb is that everything that gets its own well reknown acronym is at least influential and by the amount of references you see to the CoT paper by Wei et. al of Google Brain, this one most certainly is.

The probably best and most condensed summary of CoT can be taken directly form the papers abstract ¹:

generating a chain of thought - a series of intermediate reasoning steps - signiﬁcantly improves the ability of large language models to perform complex reasoning

The rest of the publication addresses how a language model can be induced to create such chains of thought and to establish the fact that the claim of the abstract is indeed valid.

Mainly Wei et al. observe that for large language models (LLMs) including a few examples into the prompt that demonstrate how to decompose to the problem type under consideration enables sufficiently large models to:

perform a similar decomposition for a previously unseen problem
this decomposition being part of the output improves the success in benchmarks in particular the BIG-Bench

Fazit

The approach is brilliant in its simplicity, no finetuning is needed and the performance of many models can be greatly improved at the cost of sacrificing a few tokens of context window.

The only small issue I have with the paper is that along the lines it is claimed that chain-of-thought prompting is emergent with respect to model size. The shown plots don’t discurrage that statement, but IMHO are not enough to confidently make that claim.

How unusual! ↩