智谱 AI GLM 使用指南¶

智谱 AI 的 GLM 系列模型针对中文场景深度优化，具有出色的中文理解能力、Deep Thinking 模式和极具竞争力的价格（2025 年降价 90%）。

模型概览¶

模型选择建议¶

from tfrobot.brain.chain.llms import GLM

# 通用场景（推荐）
llm = GLM(name="glm-4-flash")

# 复杂任务
llm = GLM(name="glm-4-plus")

# 多模态任务
llm = GLM(name="glm-4v-plus")

# Deep Thinking 模式
llm = GLM(name="glm-4-0520", deep_thinking_mode=True)

配置参数¶

核心参数¶

from tfrobot.brain.chain.llms import GLM

llm = GLM(
    name="glm-4-flash",
    temperature=0.7,        # 0.0-1.0，控制随机性
    top_p=0.7,             # 0.0-1.0，核采样
    max_tokens=4096,        # 最大输出 tokens
    stream=False,           # 是否流式输出
)

API 配置¶

llm = GLM(
    name="glm-4-flash",
    zhipuai_api_key="...",   # API 密钥（推荐使用环境变量）
    timeout=60.0,            # 请求超时时间
    max_retries=3,           # 最大重试次数
)

Deep Thinking 模式（深度思考模式）¶

GLM-4.5 及以上模型支持深度思考模式，可以看到模型的推理过程。

配置参数¶

from tfrobot.brain.chain.llms import GLM

llm = GLM(
    name="glm-4-0520",
    thinking={"type": "enabled"},  # 启用深度思考模式
)

# 结果中的 reasoning_content 字段会包含推理过程
result = llm.complete(current_input=user_input)
print(result.generations[0].reasoning_content)  # 推理过程
print(result.generations[0].text)               # 最终答案

参数说明¶

参数	类型	默认值	说明
`thinking`	dict	-	控制是否开启深度思考模式

thinking 参数的格式：

# 启用深度思考
thinking={"type": "enabled"}

# 禁用深度思考
thinking={"type": "disabled"}

模型支持¶

通过 LLMeta 机制自动检测模型是否支持深度思考模式：

from whosellm import LLMeta

llm_meta = LLMeta("glm-4-0520")
if llm_meta.capabilities.supports_thinking:
    print("该模型支持深度思考模式")

支持的模型： - glm-4.5 及以上版本 - glm-4.1v-thinking-flashx - glm-4.1v-thinking-flash - glm-z1 系列（部分新版本）

API 映射¶

TFRobot 参数会直接映射到智谱 AI API：

# TFRobot 配置
llm = GLM(
    name="glm-4-0520",
    thinking={"type": "enabled"},
)

# 转换为智谱 AI API 请求
{
    "model": "glm-4-0520",
    "thinking": {
        "type": "enabled"
    }
}

注意：只有当模型支持深度思考模式时（LLMeta 检测），thinking 参数才会被添加到 API 请求中。

响应格式¶

# 纯文本
llm = GLM(name="glm-4-flash", response_format={"type": "text"})

# JSON 格式（需要模型支持）
llm = GLM(
    name="glm-4-flash",
    response_format={"type": "json_object"}
)

工具调用¶

原生模式（推荐）¶

from tfrobot.brain.chain.llms import GLMWithTools
from tfrobot.brain.chain.prompt.tool_prompt import ToolPrompt
from tfrobot.drive.tool.tool import tool

@tool
def get_weather(city: str) -> str:
    """获取指定城市的天气"""
    return f"{city} 今天晴天，温度 25°C"

llm = GLMWithTools(name="glm-4-flash")
llm.system_msg_prompt = [ToolPrompt()]

result = llm.complete(
    current_input=TextMessage(content="北京今天的天气怎么样？"),
    tools=[get_weather]
)

GLM vs GLMWithTools¶

GLM：基础模型，需要手动配置 ToolPrompt
GLMWithTools：预配置 ToolPrompt，更方便

# 方式一：使用 GLM
llm = GLM(name="glm-4-flash")
llm.system_msg_prompt.append(ToolPrompt())

# 方式二：使用 GLMWithTools（推荐）
llm = GLMWithTools(name="glm-4-flash")

多模态支持¶

图片理解¶

from tfrobot.schema.message.conversation.message_dto import MultiPartMessage
from tfrobot.schema.message.msg_part import TextPart, ImagePart

msg = MultiPartMessage(content=[
    TextPart(text="这张图片里有什么？"),
    ImagePart(
        image_url=ImgUrl(url="path/to/image.jpg"),
        detail="high"
    ),
])

result = llm.complete(current_input=msg)

音频处理¶

from tfrobot.schema.message.conversation.message_dto import AudioMessage

msg = AudioMessage(
    content=Path("path/to/audio.mp3"),
    creator=BaseUser(name="User", uid="1")
)

result = llm.complete(current_input=msg)

视频处理¶

from tfrobot.schema.message.msg_part import VideoPart, VideoUrl

msg = MultiPartMessage(content=[
    TextPart(text="描述这个视频的内容"),
    VideoPart(
        video_url=VideoUrl(
            url="path/to/video.mp4",
            mime_type="video/mp4"
        )
    ),
])

result = llm.complete(current_input=msg)

PDF 文档分析¶

from tfrobot.schema.message.msg_part import FilePart, FileUrl

msg = MultiPartMessage(content=[
    TextPart(text="总结这个 PDF 文档的主要内容"),
    FilePart(
        file_url=FileUrl(
            url="path/to/document.pdf",
            mime_type="application/pdf"
        )
    ),
])

result = llm.complete(current_input=msg)

Token 计算¶

中文 Token 计算¶

GLM 使用 BPE (Byte-Pair Encoding) 进行 tokenization：

# 中文 Token 估算
# 约数：1 个中文字符 ≈ 1.6 tokens
# 精确计算需要使用 tokenizer

from tfrobot.brain.chain.llms import GLM

llm = GLM(name="glm-4-flash")

# 估算 Token 数
text = "这是一段中文文本"
estimated_tokens = len(text) * 1.6  # 粗略估算

实时 Token 统计¶

result = llm.complete(current_input=user_input)

# 查看实际 Token 使用
print(f"输入 tokens: {result.usage.prompt_tokens}")
print(f"输出 tokens: {result.usage.completion_tokens}")
print(f"总 tokens: {result.usage.total_tokens}")

成本优化¶

Token 计费¶

from tfrobot.brain.chain.llms import GLM

# 设置价格（用于计费统计）
llm = GLM(
    name="glm-4-flash",
    input_price=0.00008,   # ¥0.8/百万tokens（示例）
    output_price=0.0002    # ¥2/百万tokens（示例）
)

节省 Token 的技巧¶

使用 Flash 模型：glm-4-flash 价格极低，适合大部分场景
控制 Prompt 长度：中文 Token 密度较高，精简 Prompt 效果明显
使用摘要模式：历史消息自动摘要

上下文管理¶

设置上下文窗口¶

llm = GLM(
    name="glm-4-flash",
    context_window=128000  # 128K tokens
)

上下文压缩¶

当上下文超出限制时（错误码 1261），Chain 会自动触发压缩：

# Chain 自动处理
from tfrobot.brain.chain import SingleChain

chain = SingleChain(llm=llm)
result = chain.run(input_message=user_input)

手动处理上下文超长¶

try:
    result = llm.complete(current_input=user_input)
except Exception as e:
    if "1261" in str(e):  # GLM 上下文超长错误码
        # 手动压缩
        compacted, _, _, _, _ = llm.collapse_context(
            current_input=user_input,
            conversation=conversation,
            to_size=100000
        )
        result = llm.complete(current_input=user_input, conversation=compacted)

错误处理¶

常见错误¶

错误码	原因	解决方案
1261	上下文超长	使用 Chain 的 compact 功能
1301	API 密钥错误	检查环境变量
1302	配额不足	检查账户余额
1303	参数错误	检查请求参数
1304	内容审核失败	检查输入内容

手动错误处理¶

try:
    result = llm.complete(current_input=user_input)
except Exception as e:
    error_code = str(e)
    if "1261" in error_code:
        # 上下文超长
        compacted, _, _, _, _ = llm.collapse_context(...)
        result = llm.complete(current_input=user_input, conversation=compacted)
    elif "1301" in error_code:
        # API 密钥错误
        raise ValueError("请检查 ZHIPUAI_API_KEY 环境变量")
    else:
        raise

高级用法¶

流式输出¶

llm = GLM(name="glm-4-flash", stream=True)

result = llm.complete(current_input=user_input)
# result.generations[0].text 会逐步生成

Deep Thinking 应用¶

# 需要深度推理的任务
llm = GLM(
    name="glm-4-0520",
    deep_thinking_mode=True
)

result = llm.complete(
    current_input=TextMessage(content="分析以下数学问题的解题思路：...")
)

# Deep Thinking 模式会生成更详细的推理过程

并行处理¶

import asyncio

async def batch_process(llm, inputs):
    tasks = [llm.async_complete(current_input=inp) for inp in inputs]
    results = await asyncio.gather(*tasks)
    return results

最佳实践¶

1. Temperature 设置¶

# 需要创造性（如写作）
llm = GLM(name="glm-4-flash", temperature=1.0)

# 需要稳定性（如代码生成）
llm = GLM(name="glm-4-flash", temperature=0.0)

2. Prompt 工程¶

llm.system_msg_prompt = [
    MemoPrompt(template="你是一个专业的 Python 开发者。"),
    ToolPrompt(),
]

llm.after_input_msg_prompt = [
    MemoPrompt(template="请使用 Markdown 格式回复，代码块标明语言。"),
]

3. 中文场景优化¶

# GLM 对中文有天然优势
llm = GLM(name="glm-4-flash")

# 中文任务
result = llm.complete(
    current_input=TextMessage(content="解释什么是量子纠缠")
)

4. 工具设计¶

@tool
def search_baidu(query: str) -> str:
    """
    搜索百度

    Args:
        query: 搜索关键词
    """
    # 工具实现
    ...

模型	上下文	特色	适用场景
glm-4-flash	128K	极低价格、快速响应	通用对话、简单任务
glm-4-plus	128K	综合性能强	复杂推理、代码生成
glm-4-air	128K	平衡性能和成本	中等复杂度任务
glm-4v-plus	128K	多模态能力	图文理解、视觉问答
glm-4-0520	128K	Deep Thinking 模式	需要深度推理的任务

智谱 AI GLM 使用指南¶

模型概览¶

推荐模型¶

模型选择建议¶

配置参数¶

核心参数¶

API 配置¶

Deep Thinking 模式（深度思考模式）¶

配置参数¶

参数说明¶

模型支持¶

API 映射¶

响应格式¶

工具调用¶

原生模式（推荐）¶

GLM vs GLMWithTools¶

多模态支持¶

图片理解¶

音频处理¶

视频处理¶

PDF 文档分析¶

Token 计算¶

中文 Token 计算¶

实时 Token 统计¶

成本优化¶

Token 计费¶

节省 Token 的技巧¶

上下文管理¶

设置上下文窗口¶

上下文压缩¶

手动处理上下文超长¶

错误处理¶

常见错误¶

手动错误处理¶

高级用法¶

流式输出¶

Deep Thinking 应用¶

并行处理¶

最佳实践¶

1. Temperature 设置¶

2. Prompt 工程¶

3. 中文场景优化¶

4. 工具设计¶

相关文档¶