Ollama LLM¶

⚠️ 重要说明¶

Ollama 仅适用于本地测试，不建议用于生产环境

功能限制¶

Ollama 在以下方面存在显著不足：

错误捕获与处理
缺乏完善的错误分类和识别机制
上下文超长等异常处理不完善
网络错误恢复能力有限
高级功能支持
Token 计算不准确（使用近似计算）
缺少速率限制处理
流式输出支持不完整
测试覆盖不足
TFRobot 团队对 Ollama 的测试覆盖有限
生产环境验证不足
原生能力设计限制
Ollama 的 API 设计相对简单，缺少企业级功能
与云端 API 相比，错误处理和重试机制较为基础

原因分析¶

这些不足既有测试覆盖不够的原因，也有 Ollama 原生能力设计不足的综合因素：

测试层面：TFRobot 团队对 Ollama 的测试优先级较低，覆盖不充分
设计层面：Ollama 作为本地推理工具，其 API 设计优先考虑简单性而非企业级特性
生态层面：Ollama 缺少完善的错误处理、监控、可观测性等生产级功能

适用场景¶

Ollama 仅适用于： - ✅ 本地开发测试 - ✅ 概念验证（POC） - ✅ 数据隐私要求极高的场景（且能接受功能限制）

API 文档¶

Ollama ¶

Bases: ChatLLM[dict]

适配本地使用Ollama运行的各种单机小模型 | Ollama LLM (Local Language Model)

stop_words `property` ¶

stop_words: list[str]

获取停止词列表，需要与ToolPrompt（如果存在的话）整合

Returns:

Type	Description
`list[str]`	list[str]: 停止词列表

partial_params `property` ¶

partial_params: dict

返回部分参数

Returns:

Type	Description
`dict`	dict

reformat_request_msg_to_api ¶

reformat_request_msg_to_api(msg: BaseLLMMessage) -> dict

将LLMUserMessage转换为Ollama请求消息格式。Ollama接口对于格式有自己的要求，与当前TFRobot协议并不完全一致。因此需要进行转换。

Parameters:

Name	Type	Description	Default
`msg`	`BaseLLMMessage`	BaseLLMMessage	required

Returns:

Name	Type	Description
`dict`	`dict`	Ollama请求消息格式

Source code in tfrobot/brain/chain/llms/ollama.py

def reformat_request_msg_to_api(self, msg: BaseLLMMessage) -> dict:
    """
    将LLMUserMessage转换为Ollama请求消息格式。Ollama接口对于格式有自己的要求，与当前TFRobot协议并不完全一致。因此需要进行转换。

    Args:
        msg (BaseLLMMessage): BaseLLMMessage

    Returns:
        dict: Ollama请求消息格式
    """
    if isinstance(msg, LLMUserMessage) and isinstance(msg.content, list):
        # 如果消息内容是列表类型，说明是极有可能是图片消息，需要进行处理
        contents: list[str] = []
        images: list[str] = []
        for part in msg.content:
            if part.part_type == "text":
                contents.append(part.text)
            elif part.part_type == "image_url":
                img_url = part.image_url.url
                if not is_base64(img_url):
                    # 读取url中的二进制数据，转换为bytes写入images | Read binary data from url and convert to bytes for images
                    result = self.load_image_from_uri(img_url)
                    if result:
                        # 关键点: load_image_from_uri 已经返回 base64 编码的 bytes | Key point: load_image_from_uri already returns base64 encoded bytes
                        base64_bytes, _ = result
                        base64_image = base64_bytes.decode("utf-8")
                        images.append(base64_image)
                    else:
                        # 如果加载失败，记录警告但继续处理 | If loading fails, log warning but continue
                        warnings.warn(f"Failed to load image from URI: {img_url}")
                else:
                    images.append(img_url)
        return {"role": "user", "content": "\n".join(contents), "images": images}
    return msg.model_dump()

construct_request_params ¶

construct_request_params(current_input: BaseMessage, conversation: Optional[list[BaseMessage]] = None, elements: Optional[list[DocElement]] = None, knowledge: Optional[str] = None, tools: Optional[list[BaseTool]] = None, intermediate_msgs: Optional[list[BaseMessage]] = None, response_format: Optional[LLMResponseFormat] = None) -> dict

Construct chat request parameters | 构造聊天请求参数

最终的会话排序如下： 1. 首条System Message 2. 用户历史会话内容 3. 当前输入前的提示 4. 当前输入 5. 当前输入后的提示 6. 中间消息

Parameters:

Name	Type	Description	Default
`current_input`	`BaseMessage`	Current input \| 当前输入	required
`conversation`	`Optional[list[BaseMessage]]`	Conversation history \| 会话历史	`None`
`elements`	`Optional[list[DocElement]]`	Document elements \| 文档元素	`None`
`knowledge`	`Optional[str]`	Related knowledge \| 相关知识	`None`
`tools`	`Optional[list[BaseTool]]`	Available tool list \| 可用的工具列表	`None`
`intermediate_msgs`	`Optional[list[BaseMessage]]`	Intermediate messages, it will be generated during llm work in chain, list tool's response or other system message, those messages will not save into memory, but usefully during chain work \| 中间消息，它将在链式工作中生成，列表工具的响应或其他系统消息，这些消息不会保存到记忆体中，但在链式工作中非常有用	`None`
`response_format`	`Optional[LLMResponseFormat]`	Response format \| 响应格式. 如果在此传递，会覆盖类的同名属性设置。	`None`

Returns:

Name	Type	Description
`dict`	`dict`	Request parameters \| 请求参数

Source code in tfrobot/brain/chain/llms/ollama.py

@overrides.override
def construct_request_params(
    self,
    current_input: BaseMessage,
    conversation: Optional[list[BaseMessage]] = None,
    elements: Optional[list[DocElement]] = None,
    knowledge: Optional[str] = None,
    tools: Optional[list[BaseTool]] = None,
    intermediate_msgs: Optional[list[BaseMessage]] = None,
    response_format: Optional[LLMResponseFormat] = None,
) -> dict:
    """
    Construct chat request parameters | 构造聊天请求参数

    最终的会话排序如下：
    1. 首条System Message
    2. 用户历史会话内容
    3. 当前输入前的提示
    4. 当前输入
    5. 当前输入后的提示
    6. 中间消息

    Args:
        current_input (BaseMessage): Current input | 当前输入
        conversation (Optional[list[BaseMessage]]): Conversation history | 会话历史
        elements (Optional[list[DocElement]]): Document elements | 文档元素
        knowledge (Optional[str]): Related knowledge | 相关知识
        tools (Optional[list[BaseTool]]): Available tool list | 可用的工具列表
        intermediate_msgs (Optional[list[BaseMessage]]): Intermediate messages, it will be generated during llm work
            in chain, list tool's response or other system message, those messages will not save into memory, but
            usefully during chain work | 中间消息，它将在链式工作中生成，列表工具的响应或其他系统消息，这些消息不会保存到记忆体中，
            但在链式工作中非常有用
        response_format (Optional[LLMResponseFormat]): Response format | 响应格式. 如果在此传递，会覆盖类的同名属性设置。

    Returns:
        dict: Request parameters | 请求参数
    """
    # 构造请求参数
    partial_params = self.partial_params
    # 依据模板配置进行格式化
    prompt_ctx: PromptContext = PromptContext(
        user_input=UserInput(input=str(current_input.content), additional_info=current_input.additional_kwargs)
    )
    # 如果当前有动态修改response_format的要求，则使用此要求覆盖类配置
    if response_format:
        # 优先使用参数，其次使用类配置
        match response_format["type"]:
            case "json_object":
                partial_params["format"] = "json"
            case "json_schema":
                if response_format.get("json_schema", {}).get("schema"):
                    partial_params["format"] = response_format["json_schema"]["schema"]  # type: ignore
                else:
                    partial_params["format"] = "json"
            case "text":
                partial_params["format"] = None
            case _:
                ...

        if response_format.get("examples"):
            # 说明当前是要求返回JsonSchema或JsonObject，且有examples字段，则在上下文中注入examples。其值为全局定义：LLM_JSON_RESULT_EXAMPLES_KEY
            if prompt_ctx.user_input.additional_info is None:
                prompt_ctx.user_input.additional_info = {}
            cast(dict, prompt_ctx.user_input.additional_info)[LLM_JSON_RESULT_EXAMPLES_KEY] = (
                json.dumps(response_format["examples"], indent=4, ensure_ascii=False)
                if not isinstance(response_format["examples"], list)
                else "\n---\n".join(
                    [json.dumps(e, indent=4, ensure_ascii=False) for e in response_format["examples"]]
                )
            )
    elif self.response_format != inspect.Parameter.empty and self.response_format.get("examples"):
        # 如果当前类配置中有response_format，且要求返回JsonSchema或JsonObject，且有examples字段，则在上下文中注入examples。其值为全局定义：LLM_JSON_RESULT_EXAMPLES_KEY
        if prompt_ctx.user_input.additional_info is None:
            prompt_ctx.user_input.additional_info = {}
        cast(dict, prompt_ctx.user_input.additional_info)[LLM_JSON_RESULT_EXAMPLES_KEY] = (
            json.dumps(self.response_format["examples"], indent=4, ensure_ascii=False)
            if not isinstance(self.response_format["examples"], list)
            else "\n---\n".join(
                [json.dumps(e, indent=4, ensure_ascii=False) for e in self.response_format["examples"]]
            )
        )

    (intermediate_msgs, is_mapping, forbidden_tools) = (
        self.optimize_map_reduce_messages(intermediate_msgs)
        if intermediate_msgs
        else (intermediate_msgs, False, False)
    )
    # 尝试从当前神经网络中获取是否有连接的Drive。如果有Drive，可以尝试获取其当前可用的工具
    llm_tools = []
    if tools is not None and not forbidden_tools:
        llm_tools.extend(tools)
    if self._neural and not forbidden_tools and not llm_tools:
        llm_tools.extend(self.get_current_tools_form_neural())
    if self.enable_dsl_mode:
        # 如果当前是DSL模式，则需要保证当前工具中有至少有一个合法的DSL工具
        if not any(isinstance(t, DSLTool) for t in llm_tools):
            raise ValueError("DSL模式下，当前工具列表中至少需要有一个DSL工具")
        all_dsl_tools = set([t for t in llm_tools if isinstance(t, DSLTool)])
        # 将DSLTools管理的工具集合并到当前工具列表中，进而保证在DSLToolPrompt中正常渲染提示大模型。
        for dsl_t in all_dsl_tools:
            llm_tools.extend(dsl_t.all_tools)
    else:
        # 如果非DSL工具模式，则移除当前工具中DSLTool
        llm_tools = [t for t in llm_tools if not isinstance(t, DSLTool)]
    if llm_tools:
        llm_tools = list(set(llm_tools))
        prompt_ctx.tools = llm_tools  # type: ignore
    if conversation:
        prompt_ctx.conversation = Conversation(msgs=conversation)
    if elements:
        prompt_ctx.doc_elements = DocElements(elements=elements)
    if knowledge:
        prompt_ctx.knowledge = Knowledge(knowledge=knowledge)
    if intermediate_msgs:
        prompt_ctx.intermediate_msgs = intermediate_msgs
    # 格式化首条System Message
    messages: list[BaseLLMMessage] = []
    if self.system_msg_prompt:
        head_system_msg_content = "\n".join([prompt.format_2_str(prompt_ctx) for prompt in self.system_msg_prompt])
        if head_system_msg_content:
            head_system_msg = LLMSystemMessage(content=head_system_msg_content)
            messages.append(head_system_msg)
    # 格式化用户历史会话内容
    if conversation:
        llm_format = [msg.to_llm_request(use_summary=True) for msg in conversation]
        for lf in llm_format:
            if lf:
                if isinstance(lf, BaseLLMMessage):
                    messages.append(lf)
                elif isinstance(lf, list):
                    messages.extend(lf)
    # 格式化当前输入前的提示
    if self.before_input_msg_prompt:
        before_input_msg_content = "\n".join(
            [prompt.format_2_str(prompt_ctx) for prompt in self.before_input_msg_prompt]
        )
        if before_input_msg_content:
            before_input_msg = LLMSystemMessage(content=before_input_msg_content)
            messages.append(before_input_msg)
    # 格式化当前输入
    if self.reformat_input_prompt and isinstance(current_input, TextMessage):
        reformat_input_msg_content = "\n".join(
            [prompt.format_2_str(prompt_ctx) for prompt in self.reformat_input_prompt]
        )
        if reformat_input_msg_content:
            reformat_input_msg = (
                LLMUserMessage(content=reformat_input_msg_content, name=current_input.creator.name)
                if current_input.creator
                else LLMUserMessage(content=reformat_input_msg_content)
            )
            messages.append(reformat_input_msg)
    else:
        user_input_msg = current_input.to_llm_request()
        if not user_input_msg:
            raise ValueError(
                "未获取到当前输入，或者当前输入在转换为LLM输入时产生未捕获的异步，请检查"
            )  # pragma: no cover
        if isinstance(user_input_msg, list):
            messages.extend(user_input_msg)
        else:
            messages.append(user_input_msg)
    # 格式化当前输入后的提示
    if self.after_input_msg_prompt:
        after_input_msg_content = "\n".join(
            [prompt.format_2_str(prompt_ctx) for prompt in self.after_input_msg_prompt]
        )
        if after_input_msg_content:
            after_input_msg = LLMSystemMessage(content=after_input_msg_content)
            messages.append(after_input_msg)
    if intermediate_msgs:
        # 如果有中间消息，一般情况下是工作在Chain中，随着Chain的状态迁移不断补充进来的，完善一个Chain状态循环。此时需要将Chain的中间消息追加在最后
        for index, msg in enumerate(intermediate_msgs):
            llm_msg = msg.to_llm_request(use_summary=index != len(intermediate_msgs) - 1)
            if isinstance(llm_msg, list):
                messages.extend(llm_msg)
            elif isinstance(llm_msg, BaseLLMMessage):
                messages.append(llm_msg)
    if self.after_intermediate_msg_prompt:
        after_intermediate_msg_content = "\n".join(
            [prompt.format_2_str(prompt_ctx) for prompt in self.after_intermediate_msg_prompt]
        )
        if after_intermediate_msg_content:
            after_intermediate_msg = LLMSystemMessage(content=after_intermediate_msg_content)
            messages.append(after_intermediate_msg)
    # 构造请求参数
    params = {"messages": [self.reformat_request_msg_to_api(o_msg) for o_msg in messages]}
    # 重新结构化messages列表。因为Ollama不支持tool/function等角色，同时目前版本不准备支持图片混合输入。所以需要将所有的非法消息转换为System输入，或者User文件输入
    for dict_msg in params["messages"]:
        if dict_msg["role"] == "tool" or dict_msg["role"] == "function":
            dict_msg[
                "role"
            ] = "tool"  # 从 2024-12-23 开始，ollama开始支持tool角色。原来需要修改为system，现在可以使用tool了。
        if dict_msg["role"] == "console":
            dict_msg["role"] = "user"  # console是TFRobot自定义角色，为了适配OllamaAPI，在此转换为User角色。
        # 对于Assistant带有ToolCalls消息，需要重点处理其arguments，OpenAI接口中arguments是字符串，但Ollama要求是字典格式
        if dict_msg["role"] == "assistant" and dict_msg.get("tool_calls"):
            for tc in dict_msg["tool_calls"]:
                if isinstance(tc["function"]["arguments"], str):
                    tc["function"]["arguments"] = json.loads(tc["function"]["arguments"])
        if isinstance(dict_msg["content"], list):
            dict_msg["content"] = str(dict_msg["content"])  # 将图片类型转换为文本类型
    if llm_tools and not self.used_tool_prompt:
        params["tools"] = [
            {
                "type": "function",
                "function": {
                    "name": tool.tool_name,
                    "description": tool.description,
                    "parameters": tool.params_json_schema,
                },
            }
            for tool in llm_tools
        ]
    params.update(**partial_params)
    return params

extract_thinking `staticmethod` ¶

extract_thinking(text: str) -> tuple[str, str]

从字符串中提取第一个...标签内的内容及剩余部分。

Parameters:

Name	Type	Description	Default
`text`	`str`	要处理的原始字符串	required

Returns:

Type	Description
`tuple[str, str]`	tuple[str, str]: 第一个元素是标签内的内容，第二个是去除整个标签后的剩余字符串

Source code in tfrobot/brain/chain/llms/ollama.py

@staticmethod
def extract_thinking(text: str) -> tuple[str, str]:
    """
    从字符串中提取第一个<thinking>...</thinking>标签内的内容及剩余部分。

    Args:
        text (str): 要处理的原始字符串

    Returns:
        tuple[str, str]: 第一个元素是标签内的内容，第二个是去除整个标签后的剩余字符串
    """
    pattern = re.compile(r"<think>(.*?)</think>", re.DOTALL)
    match = pattern.search(text)

    if not match:
        return "", text

    start, end = match.start(), match.end()
    return match.group(1), text[:start] + text[end:]

construct_llm_result ¶

construct_llm_result(res: dict, params: dict, response_format: Optional[LLMResponseFormat] = None) -> LLMResult

Construct chat result | 构造聊天结果

因为Ollama提供的HttpAPI参数中不支持stop停止词。所以在生成内容中手动实现了此能力。但需要注意这会导致completion_tokens过高，因为是生成后，手动终止的停止词。

Parameters:

Name	Type	Description	Default
`res`	`dict`	Response from Ollama \| Ollama的响应	required
`params`	`dict`	Request to Ollama \| Ollama的请求	required
`response_format`	`Optional[LLMResponseFormat]`	Response format \| 响应格式.	`None`

Returns:

Name	Type	Description
`LLMResult`	`LLMResult`	The result of completion \| 完成的结果

Source code in tfrobot/brain/chain/llms/ollama.py

def construct_llm_result(
    self, res: dict, params: dict, response_format: Optional[LLMResponseFormat] = None
) -> LLMResult:
    """
    Construct chat result | 构造聊天结果

    因为Ollama提供的HttpAPI参数中不支持stop停止词。所以在生成内容中手动实现了此能力。
    但需要注意这会导致completion_tokens过高，因为是生成后，手动终止的停止词。

    Args:
        res (dict): Response from Ollama | Ollama的响应
        params (dict): Request to Ollama | Ollama的请求
        response_format (Optional[LLMResponseFormat]): Response format | 响应格式.

    Returns:
        LLMResult: The result of completion | 完成的结果
    """
    import tiktoken

    # 使用Tiktoken近似计算Token消耗
    all_messages = str(params.get("messages"))

    encoding = tiktoken.get_encoding("gpt2")
    prompt_tokens = encoding.encode(all_messages)
    completion_tokens = encoding.encode(res["message"]["content"])

    split_content = res["message"]["content"]
    if self.stop_words:
        split_content = split_on_stopwords(split_content, self.stop_words)

    # 尝试进行格式化操作
    response_format = response_format or (
        self.response_format if self.response_format != inspect.Parameter.empty else None
    )
    if response_format:
        json_obj = extract_json_from_nature_language(
            split_content,
            schema=cast(dict, response_format.get("json_schema", {})).get("schema") if response_format else None,
        )
        if json_obj:
            split_content = json.dumps(json_obj, ensure_ascii=False)

    usage = TokenUsage(
        prompt_tokens=len(prompt_tokens),
        completion_tokens=len(completion_tokens),
        total_tokens=len(prompt_tokens) + len(completion_tokens),
        prompt_cost=len(prompt_tokens) * self.input_price,
        completion_cost=len(completion_tokens) * self.input_price,
        total_cost=len(prompt_tokens) * self.input_price + len(completion_tokens) * self.input_price,
    )
    reasoning_content, split_content = self.extract_thinking(split_content) if split_content else (None, "")
    llm_res = LLMResult(
        generations=[
            Generation(
                text=split_content,
                reasoning_content=reasoning_content,
                tool_calls=[
                    ToolCall(
                        function=FunctionCall(
                            name=tc["function"]["name"], parameters=json.dumps(tc["function"]["arguments"])
                        )
                    )
                    for tc in res["message"]["tool_calls"]
                ]
                if res["message"].get("tool_calls")
                else None,
            )
        ],
        usage=usage,
        prompt=params,
        meta_info={
            "model": self.name,
            "created": res.get("created_at"),
            "done": res.get("done"),
            "total_duration": res.get("total_duration"),
            "load_duration": res.get("load_duration"),
            "prompt_eval_duration": res.get("prompt_eval_duration"),
            "eval_count": res.get("eval_count"),
            "eval_duration": res.get("eval_duration"),
            "id": None,
            "system_fingerprint": None,
        },
    )
    if self.used_tool_prompt:
        for prompt in self.all_prompts:
            if isinstance(prompt, ToolPrompt):
                prompt.extract_tool_call(llm_res)
    if self.enable_tfl_mode:
        for prompt in self.all_prompts:
            if isinstance(prompt, TFLPrompt):
                prompt.extract_tool_call(llm_res)
    return llm_res

complete ¶

complete(current_input: BaseMessage, conversation: Optional[list[BaseMessage]] = None, elements: Optional[list[DocElement]] = None, knowledge: Optional[str] = None, tools: Optional[list[BaseTool]] = None, intermediate_msgs: Optional[list[BaseMessage]] = None, response_format: Optional[LLMResponseFormat] = None) -> LLMResult

Complete chat | 生成聊天结果

Ollama的Chat生成端点为：POST /api/chat

最终的会话排序如下： 1. 首条System Message 2. 用户历史会话内容 3. 当前输入前的提示 4. 当前输入 5. 当前输入后的提示 6. 中间消息

Parameters:

Name	Type	Description	Default
`current_input`	`BaseMessage`	User last input \| 用户最后输入	required
`conversation`	`Optional[list[BaseMessage]]`	Conversation history \| 会话历史	`None`
`elements`	`Optional[list[DocElement]]`	Document elements \| 文档元素	`None`
`knowledge`	`Optional[str]`	Related knowledge \| 相关知识	`None`
`tools`	`Optional[list[BaseTool]]`	Available tool list \| 可用的工具列表	`None`
`intermediate_msgs`	`Optional[list[BaseMessage]]`	Intermediate messages, it will be generated during llm work in chain, list tool's response or other system message, those messages will not save into memory, but usefully during chain work \| 中间消息，它将在链式工作中生成，列表工具的响应或其他系统消息，这些消息不会保存到记忆体中，但在链式工作中非常有用	`None`
`response_format`	`Optional[LLMResponseFormat]`	Response format \| 响应格式. 如果在此传递，会覆盖类的同名属性设置。	`None`

Returns:

Name	Type	Description
`LLMResult`	`LLMResult`	The result of completion \| 完成的结果

Source code in tfrobot/brain/chain/llms/ollama.py

@report_llm_metrics
@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=4, max=10),
    retry=retry_if_exception_type((httpx.ConnectError, httpx.TimeoutException)),
    before_sleep=before_sleep_log(logger, logging.WARN),
)
@overrides.override
def complete(
    self,
    current_input: BaseMessage,
    conversation: Optional[list[BaseMessage]] = None,
    elements: Optional[list[DocElement]] = None,
    knowledge: Optional[str] = None,
    tools: Optional[list[BaseTool]] = None,
    intermediate_msgs: Optional[list[BaseMessage]] = None,
    response_format: Optional[LLMResponseFormat] = None,
) -> LLMResult:
    """
    Complete chat | 生成聊天结果

    Ollama的Chat生成端点为：POST /api/chat

    最终的会话排序如下：
    1. 首条System Message
    2. 用户历史会话内容
    3. 当前输入前的提示
    4. 当前输入
    5. 当前输入后的提示
    6. 中间消息

    Args:
        current_input (BaseMessage): User last input | 用户最后输入
        conversation (Optional[list[BaseMessage]]): Conversation history | 会话历史
        elements (Optional[list[DocElement]]): Document elements | 文档元素
        knowledge (Optional[str]): Related knowledge | 相关知识
        tools (Optional[list[BaseTool]]): Available tool list | 可用的工具列表
        intermediate_msgs (Optional[list[BaseMessage]]): Intermediate messages, it will be generated during llm work
            in chain, list tool's response or other system message, those messages will not save into memory, but
            usefully during chain work | 中间消息，它将在链式工作中生成，列表工具的响应或其他系统消息，这些消息不会保存到记忆体中，
            但在链式工作中非常有用
        response_format (Optional[LLMResponseFormat]): Response format | 响应格式. 如果在此传递，会覆盖类的同名属性设置。

    Returns:
        LLMResult: The result of completion | 完成的结果
    """
    params = self.construct_request_params(
        current_input, conversation, elements, knowledge, tools, intermediate_msgs, response_format
    )
    with tracer.start_as_current_span("ollama-chat", kind=SpanKind.CLIENT) as span:
        if self.stream:
            span_ctx = LLMEventContext(
                scene="LLM",
                entity_id=str(id(self)),
                desc="before llm streaming request",
                info=str(current_input.content),
                streaming=self.stream,
                token_usage=None,
                user_input=str(current_input.content),
                llm_model_name=self.name,
            )
            span.add_event(SpanEvent.BEFORE_LLM_STREAMING.value, span_ctx.model_dump())
            try:
                with self._client.stream("POST", "/api/chat", json=params) as response:  # noqa
                    res = {"message": {"content": ""}}
                    for line in response.iter_lines():
                        res_json = json.loads(line)
                        if res_json.get("error"):
                            # Ollama 返回错误，直接抛出
                            raise ValueError(res_json.get("error"))
                        if not res_json.get("done"):
                            res["message"]["content"] += res_json.get("message", {}).get("content", "")
                            span_ctx = LLMEventContext(
                                scene="LLM",
                                entity_id=str(id(self)),
                                desc="llm streaming response",
                                info=line,
                                streaming=self.stream,
                                token_usage=None,
                                user_input=str(current_input.content),
                                llm_model_name=self.name,
                            )
                            span.add_event(SpanEvent.LLM_STREAMING.value, span_ctx.model_dump())
                        else:
                            res["model"] = res_json.get("model")
                            res["created_at"] = res_json.get("created_at")
                            res["done"] = res_json.get("done")
                            res["total_duration"] = res_json.get("total_duration")
                            res["load_duration"] = res_json.get("load_duration")
                            res["prompt_eval_count"] = res_json.get("prompt_eval_count")
                            res["eval_count"] = res_json.get("eval_count")
                            res["eval_duration"] = res_json.get("eval_duration")
                            break
                span_ctx = LLMEventContext(
                    scene="LLM",
                    entity_id=str(id(self)),
                    desc="after llm streaming response",
                    info=json.dumps(res, indent=2, ensure_ascii=False),
                    streaming=self.stream,
                    token_usage=None,
                    user_input=str(current_input.content),
                    llm_model_name=self.name,
                )
                span.add_event(SpanEvent.AFTER_LLM_STREAMING.value, span_ctx.model_dump())
            except httpx.RemoteProtocolError as e:
                # Ollama 服务器断开连接，这可能是由于上下文超长导致的
                # 参考：https://github.com/ollama/ollama/issues/2653
                logger.warning(
                    f"Ollama server disconnected without sending a response. "
                    f"This may be caused by context length exceeding the model's limit. "
                    f"Current num_ctx setting: {self.num_ctx}. "
                    f"Consider reducing input size or increasing num_ctx parameter. "
                    f"Error: {e}"
                )
                raise
            except ValueError:
                # 其他错误直接抛出
                raise
        else:
            try:
                res = self._client.post("/api/chat", json=params).raise_for_status().json()
            except httpx.RemoteProtocolError as e:
                # Ollama 服务器断开连接，这可能是由于上下文超长导致的
                # 参考：https://github.com/ollama/ollama/issues/2653
                logger.warning(
                    f"Ollama server disconnected without sending a response. "
                    f"This may be caused by context length exceeding the model's limit. "
                    f"Current num_ctx setting: {self.num_ctx}. "
                    f"Consider reducing input size or increasing num_ctx parameter. "
                    f"Error: {e}"
                )
                raise
            except httpx.HTTPStatusError:
                # 其他 HTTP 错误直接抛出
                raise
    return self.construct_llm_result(res, params=params, response_format=response_format)

async_complete `async` ¶

async_complete(current_input: BaseMessage, conversation: Optional[list[BaseMessage]] = None, elements: Optional[list[DocElement]] = None, knowledge: Optional[str] = None, tools: Optional[list[BaseTool]] = None, intermediate_msgs: Optional[list[BaseMessage]] = None, response_format: Optional[LLMResponseFormat] = None) -> LLMResult

Complete chat async | 异步生成聊天结果

Ollama的Chat生成端点为：POST /api/chat

最终的会话排序如下： 1. 首条System Message 2. 用户历史会话内容 3. 当前输入前的提示 4. 当前输入 5. 当前输入后的提示 6. 中间消息

Parameters:

Name	Type	Description	Default
`current_input`	`BaseMessage`	User last input \| 用户最后输入	required
`conversation`	`Optional[list[BaseMessage]]`	Conversation history \| 会话历史	`None`
`elements`	`Optional[list[DocElement]]`	Document elements \| 文档元素	`None`
`knowledge`	`Optional[str]`	Related knowledge \| 相关知识	`None`
`tools`	`Optional[list[BaseTool]]`	Available tool list \| 可用的工具列表	`None`
`intermediate_msgs`	`Optional[list[BaseMessage]]`	Intermediate messages, it will be generated during llm work in chain, list tool's response or other system message, those messages will not save into memory, but usefully during chain work \| 中间消息，它将在链式工作中生成，列表工具的响应或其他系统消息，这些消息不会保存到记忆体中，但在链式工作中非常有用	`None`
`response_format`	`Optional[LLMResponseFormat]`	Response format \| 响应格式. 如果在此传递，会覆盖类的同名属性设置。	`None`

Returns:

Name	Type	Description
`LLMResult`	`LLMResult`	The result of completion \| 完成的结果

Source code in tfrobot/brain/chain/llms/ollama.py

@report_llm_metrics
@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=4, max=10),
    retry=retry_if_exception_type((httpx.ConnectError, httpx.TimeoutException)),
    before_sleep=before_sleep_log(logger, logging.WARN),
)
@overrides.override
async def async_complete(
    self,
    current_input: BaseMessage,
    conversation: Optional[list[BaseMessage]] = None,
    elements: Optional[list[DocElement]] = None,
    knowledge: Optional[str] = None,
    tools: Optional[list[BaseTool]] = None,
    intermediate_msgs: Optional[list[BaseMessage]] = None,
    response_format: Optional[LLMResponseFormat] = None,
) -> LLMResult:
    """
    Complete chat async | 异步生成聊天结果

    Ollama的Chat生成端点为：POST /api/chat

    最终的会话排序如下：
    1. 首条System Message
    2. 用户历史会话内容
    3. 当前输入前的提示
    4. 当前输入
    5. 当前输入后的提示
    6. 中间消息

    Args:
        current_input (BaseMessage): User last input | 用户最后输入
        conversation (Optional[list[BaseMessage]]): Conversation history | 会话历史
        elements (Optional[list[DocElement]]): Document elements | 文档元素
        knowledge (Optional[str]): Related knowledge | 相关知识
        tools (Optional[list[BaseTool]]): Available tool list | 可用的工具列表
        intermediate_msgs (Optional[list[BaseMessage]]): Intermediate messages, it will be generated during llm work
            in chain, list tool's response or other system message, those messages will not save into memory, but
            usefully during chain work | 中间消息，它将在链式工作中生成，列表工具的响应或其他系统消息，这些消息不会保存到记忆体中，
            但在链式工作中非常有用
        response_format (Optional[LLMResponseFormat]): Response format | 响应格式. 如果在此传递，会覆盖类的同名属性设置。

    Returns:
        LLMResult: The result of completion | 完成的结果
    """
    params = self.construct_request_params(
        current_input, conversation, elements, knowledge, tools, intermediate_msgs, response_format
    )
    with tracer.start_as_current_span("ollama-chat", kind=SpanKind.CLIENT) as span:
        if self.stream:
            span_ctx = LLMEventContext(
                scene="LLM",
                entity_id=str(id(self)),
                desc="before llm streaming request",
                info=str(current_input.content),
                streaming=self.stream,
                token_usage=None,
                user_input=str(current_input.content),
                llm_model_name=self.name,
            )
            span.add_event(SpanEvent.BEFORE_LLM_STREAMING.value, span_ctx.model_dump())
            try:
                async with self._async_client.stream("POST", "/api/chat", json=params) as response:  # noqa
                    res = {"message": {"content": ""}}
                    async for line in response.aiter_lines():
                        res_json = json.loads(line)
                        if res_json.get("error"):
                            # Ollama 返回错误，直接抛出
                            raise ValueError(res_json.get("error"))
                        if not res_json.get("done"):
                            res["message"]["content"] += res_json.get("message", {}).get("content", "")
                            span_ctx = LLMEventContext(
                                scene="LLM",
                                entity_id=str(id(self)),
                                desc="llm streaming response",
                                info=line,
                                streaming=self.stream,
                                token_usage=None,
                                user_input=str(current_input.content),
                                llm_model_name=self.name,
                            )
                            span.add_event(SpanEvent.LLM_STREAMING.value, span_ctx.model_dump())
                        else:
                            res["model"] = res_json.get("model")
                            res["created_at"] = res_json.get("created_at")
                            res["done"] = res_json.get("done")
                            res["total_duration"] = res_json.get("total_duration")
                            res["load_duration"] = res_json.get("load_duration")
                            res["prompt_eval_count"] = res_json.get("prompt_eval_count")
                            res["eval_count"] = res_json.get("eval_count")
                            res["eval_duration"] = res_json.get("eval_duration")
                            break
                span_ctx = LLMEventContext(
                    scene="LLM",
                    entity_id=str(id(self)),
                    desc="after llm streaming response",
                    info=json.dumps(res, indent=2, ensure_ascii=False),
                    streaming=self.stream,
                    token_usage=None,
                    user_input=str(current_input.content),
                    llm_model_name=self.name,
                )
                span.add_event(SpanEvent.AFTER_LLM_STREAMING.value, span_ctx.model_dump())
            except httpx.RemoteProtocolError as e:
                # Ollama 服务器断开连接，这可能是由于上下文超长导致的
                # 参考：https://github.com/ollama/ollama/issues/2653
                logger.warning(
                    f"Ollama server disconnected without sending a response. "
                    f"This may be caused by context length exceeding the model's limit. "
                    f"Current num_ctx setting: {self.num_ctx}. "
                    f"Consider reducing input size or increasing num_ctx parameter. "
                    f"Error: {e}"
                )
                raise
            except ValueError:
                # 其他错误直接抛出
                raise
        else:
            try:
                res = (await self._async_client.post("/api/chat", json=params)).raise_for_status().json()
            except httpx.RemoteProtocolError as e:
                # Ollama 服务器断开连接，这可能是由于上下文超长导致的
                # 参考：https://github.com/ollama/ollama/issues/2653
                logger.warning(
                    f"Ollama server disconnected without sending a response. "
                    f"This may be caused by context length exceeding the model's limit. "
                    f"Current num_ctx setting: {self.num_ctx}. "
                    f"Consider reducing input size or increasing num_ctx parameter. "
                    f"Error: {e}"
                )
                raise
            except httpx.HTTPStatusError:
                # 其他 HTTP 错误直接抛出
                raise
    return self.construct_llm_result(res, params=params, response_format=response_format)

OllamaWithTools ¶

Bases: Ollama

OllamaWithTools相较于Ollama加强了对Tools的控制能力。

construct_request_params ¶

construct_request_params(current_input: BaseMessage, conversation: Optional[list[BaseMessage]] = None, elements: Optional[list[DocElement]] = None, knowledge: Optional[str] = None, tools: Optional[list[BaseTool]] = None, intermediate_msgs: Optional[list[BaseMessage]] = None, response_format: Optional[LLMResponseFormat] = None) -> dict

Construct chat request parameters | 构造聊天请求参数

最终的会话排序如下： 1. 首条System Message 2. 用户历史会话内容 3. 当前输入前的提示 4. 当前输入 5. 当前输入后的提示 6. 中间消息

Parameters:

Name	Type	Description	Default
`current_input`	`BaseMessage`	Current input \| 当前输入	required
`conversation`	`Optional[list[BaseMessage]]`	Conversation history \| 会话历史	`None`
`elements`	`Optional[list[DocElement]]`	Document elements \| 文档元素	`None`
`knowledge`	`Optional[str]`	Related knowledge \| 相关知识	`None`
`tools`	`Optional[list[BaseTool]]`	Available tool list \| 可用的工具列表	`None`
`intermediate_msgs`	`Optional[list[BaseMessage]]`	Intermediate messages, it will be generated during llm work in chain, list tool's response or other system message, those messages will not save into memory, but usefully during chain work \| 中间消息，它将在链式工作中生成，列表工具的响应或其他系统消息，这些消息不会保存到记忆体中，但在链式工作中非常有用	`None`
`response_format`	`Optional[LLMResponseFormat]`	Response format \| 响应格式如果在此指定响应格式，会覆盖LLM的默认响应格式。	`None`

Returns:

Name	Type	Description
`dict`	`dict`	Request parameters \| 请求参数

Source code in tfrobot/brain/chain/llms/ollama.py

@overrides.override
def construct_request_params(
    self,
    current_input: BaseMessage,
    conversation: Optional[list[BaseMessage]] = None,
    elements: Optional[list[DocElement]] = None,
    knowledge: Optional[str] = None,
    tools: Optional[list[BaseTool]] = None,
    intermediate_msgs: Optional[list[BaseMessage]] = None,
    response_format: Optional[LLMResponseFormat] = None,
) -> dict:
    """
    Construct chat request parameters | 构造聊天请求参数

    最终的会话排序如下：
    1. 首条System Message
    2. 用户历史会话内容
    3. 当前输入前的提示
    4. 当前输入
    5. 当前输入后的提示
    6. 中间消息

    Args:
        current_input(BaseMessage): Current input | 当前输入
        conversation(Optional[list[BaseMessage]]): Conversation history | 会话历史
        elements(Optional[list[DocElement]]): Document elements | 文档元素
        knowledge(Optional[str]): Related knowledge | 相关知识
        tools(Optional[list[BaseTool]]): Available tool list | 可用的工具列表
        intermediate_msgs(Optional[list[BaseMessage]]): Intermediate messages, it will be generated during llm work
            in chain, list tool's response or other system message, those messages will not save into memory, but
            usefully during chain work | 中间消息，它将在链式工作中生成，列表工具的响应或其他系统消息，这些消息不会保存到记忆体中，
            但在链式工作中非常有用
        response_format(Optional[LLMResponseFormat]): Response format | 响应格式
            如果在此指定响应格式，会覆盖LLM的默认响应格式。

    Returns:
        dict: Request parameters | 请求参数
    """
    super_res = super().construct_request_params(
        current_input, conversation, elements, knowledge, tools, intermediate_msgs, response_format
    )
    if tools := super_res.get("tools"):
        for i in range(len(tools) - 1, -1, -1):
            tool_name = tools[i]["function"]["name"]
            if self.exclude_tools and tool_name in self.exclude_tools:
                tools.pop(i)
                continue
            if self.available_tools is not None and tool_name not in self.available_tools:
                tools.pop(i)
        if not super_res.get("tools"):
            del super_res["tools"]
    return super_res

split_on_stopwords ¶

split_on_stopwords(text: str, stopwords: str | list[str]) -> str

Split text on stopwords | 在停止词上拆分文本

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to split \| 要拆分的文本	required
`stopwords`	`str \| list[str]`	Stopwords \| 停止词	required

Returns:

Name	Type	Description
`str`	`str`	Splitted text \| 拆分后的文本

Source code in tfrobot/brain/chain/llms/ollama.py

def split_on_stopwords(text: str, stopwords: str | list[str]) -> str:
    """
    Split text on stopwords | 在停止词上拆分文本

    Args:
        text (str): Text to split | 要拆分的文本
        stopwords (str | list[str]): Stopwords | 停止词

    Returns:
        str: Splitted text | 拆分后的文本
    """
    if not stopwords:
        return text
    if isinstance(stopwords, str):
        stopwords = [stopwords]  # 确保停止词是列表形式
    if stopwords:  # 只有当列表非空时执行
        # 移除重复单词并构建正则表达式
        stopwords = list(set(stopwords))
        pattern = r"(" + "|".join(re.escape(word) for word in stopwords if word) + r")"
        parts = re.split(pattern, text, 1)
        if len(parts) > 1 and parts[0].strip():
            # 注意if判断条件里 parts[0].strip() 很重要，很多时候大语言模型会进行指令跟随，可能停止词会出现开头的位置。这种情况下往往是对一个结果的反馈，不需要进行停止
            return parts[0]
    return text  # 如果没有分割发生或停止词列表为空

需求	推荐方案	文档
纯文本生成（DeskLLM）	DeepSeekDesk	文档
多模态任务（DeskLLM）	GeminiDesk	文档
对话 + 工具调用（ChatLLM）	DeepSeek (ChatLLM)	文档

Ollama LLM¶

⚠️ 重要说明¶

功能限制¶

原因分析¶

推荐替代方案¶

适用场景¶

API 文档¶

Ollama ¶

stop_words property ¶

partial_params property ¶

reformat_request_msg_to_api ¶

construct_request_params ¶

extract_thinking staticmethod ¶

construct_llm_result ¶

complete ¶

async_complete async ¶

OllamaWithTools ¶

construct_request_params ¶

split_on_stopwords ¶

stop_words `property` ¶

partial_params `property` ¶

extract_thinking `staticmethod` ¶

async_complete `async` ¶