第10章 Few-shot、CoT 与动态提示策略

"The difference between a mediocre agent and a great one is often not the model — it's the information the model receives at the right moment."

本章要点

Few-shot 在 Agent 场景中不是教模型"怎么回答问题"，而是教它"怎么使用工具"
Chain-of-Thought（CoT）和 Extended Thinking 让 Agent 在行动前先推理，显著降低错误率
动态提示注入是 Agent 系统的核心能力——在正确的时间给模型正确的信息
Skill/模板模式将常见工作流标准化为可复用的提示词片段
Prompt Caching 利用静态/动态分离大幅降低重复请求的成本和延迟
提示词优化存在收益递减点——到达后应转投工具和架构改进

10.1 Agent 场景下的 Few-shot 重新定义

传统 NLP 中的 Few-shot 是给模型几个"输入→输出"的示例，让它学会生成模式。但在 Agent 场景中，Few-shot 的含义发生了根本变化——它不再教模型如何生成文本，而是教模型如何正确使用工具。

10.1.1 隐式 Few-shot：Claude Code 的做法

Claude Code 没有在系统提示词中放传统的 few-shot 示例，而是通过规则描述来编码工具使用模式。这本质上是一种压缩了的 few-shot——用规则代替了完整的示例：

# 这些规则等价于几十个 few-shot 示例

- To read files use Read instead of cat, head, tail, or sed
- To edit files use Edit instead of sed or awk
- To create files use Write instead of cat with heredoc or echo redirection
- To search for files use Glob instead of find or ls
- To search the content of files, use Grep instead of grep or rg

每一条规则都在告诉模型："当你遇到 X 场景时，用 Y 工具而不是 Z 工具。"这比给出完整的对话示例更节省 token，但信息密度更高。

10.1.2 显式 Few-shot：何时需要完整示例

当规则不够清晰、模型反复犯同一类错误时，需要显式的 few-shot 示例。典型场景：

场景一：复杂的工具组合模式

markdown

## Example: Renaming a function across the codebase

User: "Rename getUserInfo to fetchUserProfile"

Step 1 — Find all references:
<tool>Grep pattern="getUserInfo" output_mode="files_with_matches"</tool>
Result: src/api.ts, src/hooks/useUser.ts, src/tests/api.test.ts

Step 2 — Edit each file:
<tool>Read file_path="src/api.ts"</tool>
<tool>Edit file_path="src/api.ts" old_string="getUserInfo" new_string="fetchUserProfile" replace_all=true</tool>
(repeat for each file)

Step 3 — Verify:
<tool>Bash command="npm test"</tool>

WRONG approach (common mistake):
- Using Write to rewrite entire files (loses unread content)
- Only changing the definition, forgetting call sites
- Not running tests after rename

场景二：错误恢复的期望行为

markdown

## Example: Handling edit failure

<tool>Edit file_path="src/main.ts" old_string="function old(" new_string="function new("</tool>
Error: old_string not found in file

Correct recovery:
1. Read the file to see actual content
2. Find the correct string to match
3. Retry with accurate old_string

WRONG recovery:
- Retrying the exact same edit (will fail again)
- Using Write to overwrite the entire file

10.1.3 Few-shot 的 Token 经济学

显式 few-shot 示例是昂贵的——每个示例可能消耗 200-500 token。如果你有 5 个示例，就是 1000-2500 token 的固定开销。在每次 LLM 调用中都重复发送。

策略                     Token 成本    效果
────────────────────────────────────────────
无 few-shot               0            模型可能犯常见错误
规则式 few-shot（隐式）    100-300      覆盖 80% 的场景
完整对话示例              500-2500     覆盖 95% 的场景

实践建议：先用规则式，监控模型犯错率，只在反复出错的场景加完整示例。

10.2 Chain-of-Thought：让 Agent 先想后做

10.2.1 为什么 Agent 特别需要 CoT

传统聊天场景中，CoT 提升的是回答的准确性。在 Agent 场景中，CoT 的价值远不止于此——它决定了 工具选择的正确性。

没有 CoT 的 Agent 行为模式：

用户: "帮我修复登录页面的 CSS 问题"
模型: (立即行动) → Edit login.css, 改了一堆样式
结果: 改错了，因为没先读代码了解问题是什么

有 CoT 的 Agent 行为模式：

用户: "帮我修复登录页面的 CSS 问题"
模型: (先思考)
  - 用户说的是 CSS 问题，但没说具体是什么问题
  - 我应该先看看当前的 CSS 和页面结构
  - 可能需要在浏览器中查看效果
模型: (行动) → 先 Read login.css 和 login.vue
  → 分析后发现是 flexbox 对齐问题
  → 精准修改一行 CSS
结果: 一次修复成功

10.2.2 Extended Thinking

Anthropic 的 Claude 支持 Extended Thinking——模型在正式回复之前，先生成一段内部推理过程。这段推理对用户不可见，但会消耗 token：

typescript

const response = await client.messages.create({
  model: 'claude-opus-4-6',
  max_tokens: 16000,
  thinking: {
    type: 'enabled',
    budget_tokens: 10000  // 给思考分配的 token 预算
  },
  messages: [{ role: 'user', content: task }]
})

// response.content 可能包含:
// [0] { type: 'thinking', thinking: '...' }  ← 内部推理（不展示给用户）
// [1] { type: 'text', text: '...' }          ← 正式回复
// [2] { type: 'tool_use', ... }              ← 工具调用

Extended Thinking 特别适合以下 Agent 决策场景：

场景	不用 Thinking	用 Thinking
多文件重构	直接改，可能遗漏依赖	先规划修改顺序和依赖关系
调试复杂 bug	盲目搜索，来回多次	先分析错误日志，形成假设
模糊需求	猜测用户意图，可能跑偏	先列出可能的理解，选择最合理的
架构决策	给出第一个想到的方案	对比多个方案的 trade-off

10.2.3 结构化推理提示

即使不使用 Extended Thinking API，也可以通过系统提示词引导模型先推理：

Before taking any action, briefly analyze:
1. What is the user actually asking for?
2. What information do I still need?
3. What's the simplest approach that could work?
4. What could go wrong?
Then proceed with the minimum necessary tool calls.

Claude Code 在多个地方体现了这一理念：

In general, do not propose changes to code you haven't read.
If a user asks about or wants you to modify a file, read it first.
Understand existing code before suggesting modifications.

这些不是"请你思考"的空泛指示——而是具体的行为约束，迫使模型在行动前先获取信息（读代码）。

10.3 动态提示注入：Agent 的核心差异化能力

静态的系统提示词无法适应所有场景。动态提示注入——在运行时根据当前上下文按需注入信息——是 Agent 系统区别于简单 chatbot 的核心能力。

10.3.1 会话初始化注入

Claude Code 在每次会话开始时，自动收集并注入当前环境信息：

typescript

// Claude Code 会话开始时注入的上下文
const sessionContext = `
# Environment
- Primary working directory: ${cwd}
- Is a git repository: ${isGitRepo}
- Platform: ${platform}
- Shell: ${shell}
- OS Version: ${osVersion}
- Model: ${modelName}

gitStatus: ${gitStatusSnapshot}
Current branch: ${currentBranch}
Main branch: ${mainBranch}
Recent commits:
${recentCommits}
`

这段注入的价值在于：模型不需要用户解释就知道自己在什么环境中工作。用户说"帮我提交代码"，模型已经知道当前有哪些修改、在哪个分支、最近的提交风格是什么——这些信息全部来自初始化注入。

10.3.2 中途注入：System Reminders

对话进行中，Harness 可以在消息流中插入 <system-reminder> 标签，向模型传递新信息：

xml

<system-reminder>
The user opened file src/auth.ts in the IDE.
This may or may not be related to the current task.
</system-reminder>

System Reminder 的设计原则：

非侵入性——它出现在对话流中，但不是用户消息。模型可以自行判断是否相关
时效性——传递的是当前时刻的状态，不是永久规则
可忽略——明确告知模型"这可能相关也可能不相关"，避免模型过度反应

Claude Code 使用 system-reminder 传递多种运行时信息：

IDE 中用户打开的文件
可用的 deferred tools 列表
当前日期时间
记忆内容（CLAUDE.md 和 auto-memory）
Skill 加载的模板内容

10.3.3 条件指令注入

根据当前上下文动态启用或禁用特定规则：

typescript

function buildConditionalInstructions(context: SessionContext): string[] {
  const instructions: string[] = []

  // 只在 Git 仓库中启用 Git 相关规则
  if (context.isGitRepo) {
    instructions.push(GIT_SAFETY_PROTOCOL)  // ~800 tokens
    instructions.push(COMMIT_GUIDELINES)     // ~500 tokens
    instructions.push(PR_CREATION_RULES)     // ~600 tokens
  }

  // 只在有 package.json 时启用 Node.js 规则
  if (context.hasPackageJson) {
    instructions.push('Prefer npm/yarn/pnpm over global installs')
  }

  // 只在 monorepo 中启用相关规则
  if (context.isMonorepo) {
    instructions.push('Always specify the workspace when running commands')
  }

  return instructions
}

条件注入的价值：节省 token。如果当前项目不是 Git 仓库，那 Git 相关的 1900 token 规则完全不需要注入。对于有 40+ 工具的系统，按需加载工具定义也能节省几千 token。

10.4 Skill 模板模式

10.4.1 什么是 Skill

Claude Code 的 Skill 系统是动态提示注入的典型应用。用户输入斜杠命令（如 /commit），系统加载一段预定义的提示词模板，注入到当前对话中：

10.4.2 Skill 的设计原则

一个好的 Skill 模板应该：

markdown

# commit skill

## 触发条件
当用户说"提交代码"、"commit"、"/commit" 时

## 工作流程
1. Run git status to see all untracked files
2. Run git diff to see both staged and unstaged changes
3. Run git log to see recent commit messages (match style)
4. Analyze all changes and draft a commit message:
   - Summarize the nature of changes (new feature, bug fix, etc.)
   - Focus on "why" rather than "what"
5. Stage relevant files (NEVER use git add -A)
6. Create the commit with the drafted message
7. Run git status after commit to verify success

## 安全约束
- Do not commit files that likely contain secrets (.env, credentials.json)
- If pre-commit hook fails, fix the issue and create a NEW commit (never --amend)
- Do not push unless explicitly asked

Skill 模式的核心优势：

优势	说明
可组合	不同 skill 可以组合使用（/commit + /review-pr）
用户可扩展	用户在 `.claude/skills/` 目录下编写自己的 skill
版本可控	每个 skill 是独立的 Markdown 文件，可以 Git 管理
团队共享	项目级 skill 可以提交到仓库，团队共享标准工作流
渐进增强	新 skill 不影响已有功能，只是增加新能力

10.5 自一致性检查：Agent 自我验证

让 Agent 不只是"做了"，还要"验证做对了"：

Claude Code 在 Git 操作中强制要求自一致性：

# 创建提交后必须验证
After committing, run git status to verify success.

# 如果 pre-commit hook 失败
If the commit fails due to pre-commit hook:
  fix the issue and create a NEW commit
  (不是 --amend，因为 commit 没有成功)

自一致性检查的成本是额外的工具调用（1-2 次），但它带来的可靠性提升远超这个成本。在生产环境中，一个未经验证的修改可能导致小时级的调试——而一次额外的 npm test 只需要几秒钟。

10.6 Prompt Caching：被低估的性能利器

10.6.1 为什么 Agent 场景特别适合 Prompt Caching

Agent 的一个显著特点是多轮 LLM 调用共享大量相同的前缀——系统提示词、工具定义、角色指令在整个会话中几乎不变。这正是 Prompt Caching 最擅长的场景：

第 1 轮调用: [System Prompt 3000t] + [用户消息 100t]
第 2 轮调用: [System Prompt 3000t] + [历史 500t] + [工具结果 200t]
第 3 轮调用: [System Prompt 3000t] + [历史 1200t] + [工具结果 300t]
...
第 N 轮调用: [System Prompt 3000t] + [历史 Nt] + [工具结果 Mt]
              ^^^^^^^^^^^^^^^^^^^^^^^^
              这 3000t 每次都完全相同 → 缓存！

10.6.2 实现方式

Anthropic 的 Prompt Caching 通过在 system message 上标记 cache_control 来启用：

typescript

const response = await client.messages.create({
  model: 'claude-sonnet-4-6',
  system: [
    {
      type: 'text',
      text: STATIC_SYSTEM_PROMPT,           // 3000 tokens，几乎不变
      cache_control: { type: 'ephemeral' }  // 标记为可缓存
    },
    {
      type: 'text',
      text: buildDynamicContext(session)     // 每次不同的动态部分
    }
  ],
  messages: conversationHistory
})

10.6.3 效果量化

                    无缓存          有缓存
────────────────────────────────────────────
首次调用 TTFT      800ms           800ms (写缓存)
后续调用 TTFT      800ms           200ms (缓存命中)
20 轮对话总成本    60K tokens      3K + 19×0.3K ≈ 8.7K tokens
成本降低                           ~85%

10.6.4 设计提示词时的缓存意识

为了最大化缓存命中率，把不变的内容放前面，变化的内容放后面：

┌────────────────────────────────┐
│ 静态部分（可缓存）               │ ← cache_control: ephemeral
│  - 身份定义                     │
│  - 角色指令                     │
│  - 工具使用规则                 │
│  - 安全协议                     │
│  - 输出格式                     │
├────────────────────────────────┤
│ 动态部分（每次不同）             │ ← 不标记缓存
│  - 当前 git status              │
│  - CLAUDE.md 内容               │
│  - 最近文件变更                 │
│  - 会话记忆                     │
└────────────────────────────────┘

10.7 提示工程的收益递减

一个残酷的事实：提示词优化到一定程度后，继续投入的回报急剧下降。

当你发现自己在纠结"用'请'还是'务必'"的时候，说明提示词优化已经到头了。此时应该把精力转向：

更好的工具设计（第5章）——让模型有更好的"手"
更好的上下文管理（第4章）——让模型看到更相关的信息
更好的错误恢复（第7章）——让系统从错误中自动恢复
更好的评估体系（第18章）——量化改进效果

实际案例

Claude Code 的开发团队在某次迭代中发现：花两周优化 system prompt 的措辞，任务成功率提升了 2%。但他们花三天改进了 Edit 工具的错误提示信息（告诉模型"old_string not found, here is the actual content of the file"），任务成功率提升了 8%。

教训：提示词是入口，但不是唯一的杠杆。

10.8 本章小结

动态提示策略的核心思想是在正确的时间给模型正确的信息：

Few-shot 重定义——在 Agent 中，few-shot 教的是工具使用模式。规则式（隐式）优先，必要时加完整示例
CoT 是决策质量的基石——Extended Thinking 让模型在复杂决策前先推理，显著降低工具选择错误率
动态注入四层体系——初始化注入、中途 Reminder、条件指令、Skill 模板，各司其职
Skill 模式——将常见工作流标准化为可复用、可组合、可版本控制的提示词模板
自一致性——Agent 不只做，还要验证做对了。额外的验证成本远低于错误的修复成本
Prompt Caching——利用 Agent 多轮调用的重复前缀特性，降低 85% 的缓存命中部分成本
知道何时停止——提示词优化有收益递减点，到达后转投工具和架构改进

下一章我们将进入记忆系统的设计——如何在有限的上下文窗口中管理 Agent 的"短期记忆"。

第10章 Few-shot、CoT 与动态提示策略 ​

10.1 Agent 场景下的 Few-shot 重新定义 ​

10.1.1 隐式 Few-shot：Claude Code 的做法 ​

10.1.2 显式 Few-shot：何时需要完整示例 ​

10.1.3 Few-shot 的 Token 经济学 ​

10.2 Chain-of-Thought：让 Agent 先想后做 ​

10.2.1 为什么 Agent 特别需要 CoT ​

10.2.2 Extended Thinking ​

10.2.3 结构化推理提示 ​

10.3 动态提示注入：Agent 的核心差异化能力 ​

10.3.1 会话初始化注入 ​

10.3.2 中途注入：System Reminders ​

10.3.3 条件指令注入 ​

10.4 Skill 模板模式 ​

10.4.1 什么是 Skill ​

10.4.2 Skill 的设计原则 ​

10.5 自一致性检查：Agent 自我验证 ​

10.6 Prompt Caching：被低估的性能利器 ​

10.6.1 为什么 Agent 场景特别适合 Prompt Caching ​

10.6.2 实现方式 ​

10.6.3 效果量化 ​

10.6.4 设计提示词时的缓存意识 ​

10.7 提示工程的收益递减 ​

实际案例 ​

10.8 本章小结 ​

第10章 Few-shot、CoT 与动态提示策略

10.1 Agent 场景下的 Few-shot 重新定义

10.1.1 隐式 Few-shot：Claude Code 的做法

10.1.2 显式 Few-shot：何时需要完整示例

10.1.3 Few-shot 的 Token 经济学

10.2 Chain-of-Thought：让 Agent 先想后做

10.2.1 为什么 Agent 特别需要 CoT

10.2.2 Extended Thinking

10.2.3 结构化推理提示

10.3 动态提示注入：Agent 的核心差异化能力

10.3.1 会话初始化注入

10.3.2 中途注入：System Reminders

10.3.3 条件指令注入

10.4 Skill 模板模式

10.4.1 什么是 Skill

10.4.2 Skill 的设计原则

10.5 自一致性检查：Agent 自我验证

10.6 Prompt Caching：被低估的性能利器

10.6.1 为什么 Agent 场景特别适合 Prompt Caching

10.6.2 实现方式

10.6.3 效果量化

10.6.4 设计提示词时的缓存意识

10.7 提示工程的收益递减

实际案例

10.8 本章小结