ICLR 2025 Poster“Revisiting In-context Learning Inference Circuit in Large Language Models” Explained

Background

Borrowed from [1].

In-context Learning uses sentence-label-styled prompts to predict the label of the last sentence (query).

Previous works problems 1:

Highly embedded input / linearly synthetic input → A significant gap between reality and these works.
Small models.

Example:

Borrowed from [2].

Previous works problems 2:

So, our work focuses on real-world large language models, and tries to propose an inference circuit to fit the afore-observed inference phenomenon.

We first assume that the model uses such a process to handle ICL input. The Fig.1 is a clear diagram indicating how information is transmitted.

Step 1: Summarize. LMs encode each input text $x_i$ into linear representations in the hidden state of its corresponding forerunner token $s_i$.
Step 2: Semantics Merge. For demonstrations, LMs merge the encoded representations of $s_i$ with the hidden state of its corresponding label tokens $y_i$.
Step 3: Feature Retrieval and Copy. LMs retrieve merged label representations $y_{1:k}$ from Step 2 similar to the query representation $s_q$ in a task-relevant subspace and then merge them with the query representation.