Chain-of-Thought Reasoning without Prompting

Course (https://llmagents-learning.org/f24) offered by Berkeley.

This content is inspired from the Lecture 1 (LLM Reasoning). LLM Reasoning forms the basis of reasoning in agents. I’ll talk about this lecture and the paper Chain of thought reasoning without prompting. (https://arxiv.org/abs/2402.10200)

In this paper, the problem that has been solved is “Can LLMs reason well without prompt engineering?”. The existing methods use specific prompting techniques like few-shot or zero-shot Chain-of-Thought prompting (CoT) prompting. Standard greedy decoding method is used as an existing strategy.

Greedy decoding:

Step-by-Step Selection: The model generates text one word (or token) at a time.
Most Likely Choice: At each step, it selects the word with the highest probability according to its predicted distribution.
No Exploration: It doesn’t consider alternative words or explore different possible sequences. It always picks the most likely option at every step.

LLMs struggle with the greedy decoding approach while reasoning. The paper proposes CoT decoding, which is exploring the alternative paths among the top-k tokens. Model give a high confidence score when CoT decoding is used for reasoning.

CoT decoding means enabling the model to reason by itself, not humans giving a task specific prompt or reasoning.

Contributions:

LLMs can reason by simple decoding changes, without the use of prompting
This process enables a better understanding of LLMs’ intrinsic reasoning capabilities without imposing human priors.
1. Intrinsic reasoning capabilities of a model refer to the LLMs capability to deduce any answer based on the knowledge embedded during pretraining
2. Human priors is how human think while solving complex problems (breaking the problem down into smaller tasks)
3. Confounding Factor: When it’s difficult to determine whether the intrinsic capabilities of LLMs are used or human prompting is used
  1. Bypassing confounding factor can give us the “truthful assessment” of how the LLM reasons
4. Pre-trained LLMs have inherent reasoning capabilities for tasks like commonsense reasoning, math problems but these paths might not be the default path for the model to take
5. Activation: When prompting highlights reasoning capabilities already present in the model (e.g., solving basic math problems or answering commonsense questions).
6. Teaching: When prompting introduces entirely new reasoning structures or problem-solving formats for tasks the model cannot handle independently (e.g., highly synthetic or novel tasks).
CoT decoding selects paths based on the answer confidence

Screenshot 2024-11-22 at 8.53.31 PM.png