Skip to content

Ollama LLM

⚠️ 重要说明

Ollama 仅适用于本地测试,不建议用于生产环境

功能限制

Ollama 在以下方面存在显著不足:

  1. 错误捕获与处理
  2. 缺乏完善的错误分类和识别机制
  3. 上下文超长等异常处理不完善
  4. 网络错误恢复能力有限

  5. 高级功能支持

  6. Token 计算不准确(使用近似计算)
  7. 缺少速率限制处理
  8. 流式输出支持不完整

  9. 测试覆盖不足

  10. TFRobot 团队对 Ollama 的测试覆盖有限
  11. 生产环境验证不足

  12. 原生能力设计限制

  13. Ollama 的 API 设计相对简单,缺少企业级功能
  14. 与云端 API 相比,错误处理和重试机制较为基础

原因分析

这些不足既有测试覆盖不够的原因,也有 Ollama 原生能力设计不足的综合因素:

  • 测试层面:TFRobot 团队对 Ollama 的测试优先级较低,覆盖不充分
  • 设计层面:Ollama 作为本地推理工具,其 API 设计优先考虑简单性而非企业级特性
  • 生态层面:Ollama 缺少完善的错误处理、监控、可观测性等生产级功能

推荐替代方案

生产环境建议使用云端 API

需求 推荐方案 文档
纯文本生成(DeskLLM) DeepSeekDesk 文档
多模态任务(DeskLLM) GeminiDesk 文档
对话 + 工具调用(ChatLLM) DeepSeek (ChatLLM) 文档

适用场景

Ollama 仅适用于: - ✅ 本地开发测试 - ✅ 概念验证(POC) - ✅ 数据隐私要求极高的场景(且能接受功能限制)


API 文档

Ollama

Bases: ChatLLM[dict]

适配本地使用Ollama运行的各种单机小模型 | Ollama LLM (Local Language Model)

stop_words property

stop_words: list[str]

获取停止词列表,需要与ToolPrompt(如果存在的话)整合

Returns:

Type Description
list[str]

list[str]: 停止词列表

partial_params property

partial_params: dict

返回部分参数

Returns:

Type Description
dict

dict

reformat_request_msg_to_api

reformat_request_msg_to_api(msg: BaseLLMMessage) -> dict

将LLMUserMessage转换为Ollama请求消息格式。Ollama接口对于格式有自己的要求,与当前TFRobot协议并不完全一致。因此需要进行转换。

Parameters:

Name Type Description Default
msg BaseLLMMessage

BaseLLMMessage

required

Returns:

Name Type Description
dict dict

Ollama请求消息格式

Source code in tfrobot/brain/chain/llms/ollama.py
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
def reformat_request_msg_to_api(self, msg: BaseLLMMessage) -> dict:
    """
    将LLMUserMessage转换为Ollama请求消息格式。Ollama接口对于格式有自己的要求,与当前TFRobot协议并不完全一致。因此需要进行转换。

    Args:
        msg (BaseLLMMessage): BaseLLMMessage

    Returns:
        dict: Ollama请求消息格式
    """
    if isinstance(msg, LLMUserMessage) and isinstance(msg.content, list):
        # 如果消息内容是列表类型,说明是极有可能是图片消息,需要进行处理
        contents: list[str] = []
        images: list[str] = []
        for part in msg.content:
            if part.part_type == "text":
                contents.append(part.text)
            elif part.part_type == "image_url":
                img_url = part.image_url.url
                if not is_base64(img_url):
                    # 读取url中的二进制数据,转换为bytes写入images | Read binary data from url and convert to bytes for images
                    result = self.load_image_from_uri(img_url)
                    if result:
                        # 关键点: load_image_from_uri 已经返回 base64 编码的 bytes | Key point: load_image_from_uri already returns base64 encoded bytes
                        base64_bytes, _ = result
                        base64_image = base64_bytes.decode("utf-8")
                        images.append(base64_image)
                    else:
                        # 如果加载失败,记录警告但继续处理 | If loading fails, log warning but continue
                        warnings.warn(f"Failed to load image from URI: {img_url}")
                else:
                    images.append(img_url)
        return {"role": "user", "content": "\n".join(contents), "images": images}
    return msg.model_dump()

construct_request_params

construct_request_params(current_input: BaseMessage, conversation: Optional[list[BaseMessage]] = None, elements: Optional[list[DocElement]] = None, knowledge: Optional[str] = None, tools: Optional[list[BaseTool]] = None, intermediate_msgs: Optional[list[BaseMessage]] = None, response_format: Optional[LLMResponseFormat] = None) -> dict

Construct chat request parameters | 构造聊天请求参数

最终的会话排序如下: 1. 首条System Message 2. 用户历史会话内容 3. 当前输入前的提示 4. 当前输入 5. 当前输入后的提示 6. 中间消息

Parameters:

Name Type Description Default
current_input BaseMessage

Current input | 当前输入

required
conversation Optional[list[BaseMessage]]

Conversation history | 会话历史

None
elements Optional[list[DocElement]]

Document elements | 文档元素

None
knowledge Optional[str]

Related knowledge | 相关知识

None
tools Optional[list[BaseTool]]

Available tool list | 可用的工具列表

None
intermediate_msgs Optional[list[BaseMessage]]

Intermediate messages, it will be generated during llm work in chain, list tool's response or other system message, those messages will not save into memory, but usefully during chain work | 中间消息,它将在链式工作中生成,列表工具的响应或其他系统消息,这些消息不会保存到记忆体中, 但在链式工作中非常有用

None
response_format Optional[LLMResponseFormat]

Response format | 响应格式. 如果在此传递,会覆盖类的同名属性设置。

None

Returns:

Name Type Description
dict dict

Request parameters | 请求参数

Source code in tfrobot/brain/chain/llms/ollama.py
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
@overrides.override
def construct_request_params(
    self,
    current_input: BaseMessage,
    conversation: Optional[list[BaseMessage]] = None,
    elements: Optional[list[DocElement]] = None,
    knowledge: Optional[str] = None,
    tools: Optional[list[BaseTool]] = None,
    intermediate_msgs: Optional[list[BaseMessage]] = None,
    response_format: Optional[LLMResponseFormat] = None,
) -> dict:
    """
    Construct chat request parameters | 构造聊天请求参数

    最终的会话排序如下:
    1. 首条System Message
    2. 用户历史会话内容
    3. 当前输入前的提示
    4. 当前输入
    5. 当前输入后的提示
    6. 中间消息

    Args:
        current_input (BaseMessage): Current input | 当前输入
        conversation (Optional[list[BaseMessage]]): Conversation history | 会话历史
        elements (Optional[list[DocElement]]): Document elements | 文档元素
        knowledge (Optional[str]): Related knowledge | 相关知识
        tools (Optional[list[BaseTool]]): Available tool list | 可用的工具列表
        intermediate_msgs (Optional[list[BaseMessage]]): Intermediate messages, it will be generated during llm work
            in chain, list tool's response or other system message, those messages will not save into memory, but
            usefully during chain work | 中间消息,它将在链式工作中生成,列表工具的响应或其他系统消息,这些消息不会保存到记忆体中,
            但在链式工作中非常有用
        response_format (Optional[LLMResponseFormat]): Response format | 响应格式. 如果在此传递,会覆盖类的同名属性设置。

    Returns:
        dict: Request parameters | 请求参数
    """
    # 构造请求参数
    partial_params = self.partial_params
    # 依据模板配置进行格式化
    prompt_ctx: PromptContext = PromptContext(
        user_input=UserInput(input=str(current_input.content), additional_info=current_input.additional_kwargs)
    )
    # 如果当前有动态修改response_format的要求,则使用此要求覆盖类配置
    if response_format:
        # 优先使用参数,其次使用类配置
        match response_format["type"]:
            case "json_object":
                partial_params["format"] = "json"
            case "json_schema":
                if response_format.get("json_schema", {}).get("schema"):
                    partial_params["format"] = response_format["json_schema"]["schema"]  # type: ignore
                else:
                    partial_params["format"] = "json"
            case "text":
                partial_params["format"] = None
            case _:
                ...

        if response_format.get("examples"):
            # 说明当前是要求返回JsonSchema或JsonObject,且有examples字段,则在上下文中注入examples。其值为全局定义:LLM_JSON_RESULT_EXAMPLES_KEY
            if prompt_ctx.user_input.additional_info is None:
                prompt_ctx.user_input.additional_info = {}
            cast(dict, prompt_ctx.user_input.additional_info)[LLM_JSON_RESULT_EXAMPLES_KEY] = (
                json.dumps(response_format["examples"], indent=4, ensure_ascii=False)
                if not isinstance(response_format["examples"], list)
                else "\n---\n".join(
                    [json.dumps(e, indent=4, ensure_ascii=False) for e in response_format["examples"]]
                )
            )
    elif self.response_format != inspect.Parameter.empty and self.response_format.get("examples"):
        # 如果当前类配置中有response_format,且要求返回JsonSchema或JsonObject,且有examples字段,则在上下文中注入examples。其值为全局定义:LLM_JSON_RESULT_EXAMPLES_KEY
        if prompt_ctx.user_input.additional_info is None:
            prompt_ctx.user_input.additional_info = {}
        cast(dict, prompt_ctx.user_input.additional_info)[LLM_JSON_RESULT_EXAMPLES_KEY] = (
            json.dumps(self.response_format["examples"], indent=4, ensure_ascii=False)
            if not isinstance(self.response_format["examples"], list)
            else "\n---\n".join(
                [json.dumps(e, indent=4, ensure_ascii=False) for e in self.response_format["examples"]]
            )
        )

    (intermediate_msgs, is_mapping, forbidden_tools) = (
        self.optimize_map_reduce_messages(intermediate_msgs)
        if intermediate_msgs
        else (intermediate_msgs, False, False)
    )
    # 尝试从当前神经网络中获取是否有连接的Drive。如果有Drive,可以尝试获取其当前可用的工具
    llm_tools = []
    if tools is not None and not forbidden_tools:
        llm_tools.extend(tools)
    if self._neural and not forbidden_tools and not llm_tools:
        llm_tools.extend(self.get_current_tools_form_neural())
    if self.enable_dsl_mode:
        # 如果当前是DSL模式,则需要保证当前工具中有至少有一个合法的DSL工具
        if not any(isinstance(t, DSLTool) for t in llm_tools):
            raise ValueError("DSL模式下,当前工具列表中至少需要有一个DSL工具")
        all_dsl_tools = set([t for t in llm_tools if isinstance(t, DSLTool)])
        # 将DSLTools管理的工具集合并到当前工具列表中,进而保证在DSLToolPrompt中正常渲染提示大模型。
        for dsl_t in all_dsl_tools:
            llm_tools.extend(dsl_t.all_tools)
    else:
        # 如果非DSL工具模式,则移除当前工具中DSLTool
        llm_tools = [t for t in llm_tools if not isinstance(t, DSLTool)]
    if llm_tools:
        llm_tools = list(set(llm_tools))
        prompt_ctx.tools = llm_tools  # type: ignore
    if conversation:
        prompt_ctx.conversation = Conversation(msgs=conversation)
    if elements:
        prompt_ctx.doc_elements = DocElements(elements=elements)
    if knowledge:
        prompt_ctx.knowledge = Knowledge(knowledge=knowledge)
    if intermediate_msgs:
        prompt_ctx.intermediate_msgs = intermediate_msgs
    # 格式化首条System Message
    messages: list[BaseLLMMessage] = []
    if self.system_msg_prompt:
        head_system_msg_content = "\n".join([prompt.format_2_str(prompt_ctx) for prompt in self.system_msg_prompt])
        if head_system_msg_content:
            head_system_msg = LLMSystemMessage(content=head_system_msg_content)
            messages.append(head_system_msg)
    # 格式化用户历史会话内容
    if conversation:
        llm_format = [msg.to_llm_request(use_summary=True) for msg in conversation]
        for lf in llm_format:
            if lf:
                if isinstance(lf, BaseLLMMessage):
                    messages.append(lf)
                elif isinstance(lf, list):
                    messages.extend(lf)
    # 格式化当前输入前的提示
    if self.before_input_msg_prompt:
        before_input_msg_content = "\n".join(
            [prompt.format_2_str(prompt_ctx) for prompt in self.before_input_msg_prompt]
        )
        if before_input_msg_content:
            before_input_msg = LLMSystemMessage(content=before_input_msg_content)
            messages.append(before_input_msg)
    # 格式化当前输入
    if self.reformat_input_prompt and isinstance(current_input, TextMessage):
        reformat_input_msg_content = "\n".join(
            [prompt.format_2_str(prompt_ctx) for prompt in self.reformat_input_prompt]
        )
        if reformat_input_msg_content:
            reformat_input_msg = (
                LLMUserMessage(content=reformat_input_msg_content, name=current_input.creator.name)
                if current_input.creator
                else LLMUserMessage(content=reformat_input_msg_content)
            )
            messages.append(reformat_input_msg)
    else:
        user_input_msg = current_input.to_llm_request()
        if not user_input_msg:
            raise ValueError(
                "未获取到当前输入,或者当前输入在转换为LLM输入时产生未捕获的异步,请检查"
            )  # pragma: no cover
        if isinstance(user_input_msg, list):
            messages.extend(user_input_msg)
        else:
            messages.append(user_input_msg)
    # 格式化当前输入后的提示
    if self.after_input_msg_prompt:
        after_input_msg_content = "\n".join(
            [prompt.format_2_str(prompt_ctx) for prompt in self.after_input_msg_prompt]
        )
        if after_input_msg_content:
            after_input_msg = LLMSystemMessage(content=after_input_msg_content)
            messages.append(after_input_msg)
    if intermediate_msgs:
        # 如果有中间消息,一般情况下是工作在Chain中,随着Chain的状态迁移不断补充进来的,完善一个Chain状态循环。此时需要将Chain的中间消息追加在最后
        for index, msg in enumerate(intermediate_msgs):
            llm_msg = msg.to_llm_request(use_summary=index != len(intermediate_msgs) - 1)
            if isinstance(llm_msg, list):
                messages.extend(llm_msg)
            elif isinstance(llm_msg, BaseLLMMessage):
                messages.append(llm_msg)
    if self.after_intermediate_msg_prompt:
        after_intermediate_msg_content = "\n".join(
            [prompt.format_2_str(prompt_ctx) for prompt in self.after_intermediate_msg_prompt]
        )
        if after_intermediate_msg_content:
            after_intermediate_msg = LLMSystemMessage(content=after_intermediate_msg_content)
            messages.append(after_intermediate_msg)
    # 构造请求参数
    params = {"messages": [self.reformat_request_msg_to_api(o_msg) for o_msg in messages]}
    # 重新结构化messages列表。因为Ollama不支持tool/function等角色,同时目前版本不准备支持图片混合输入。所以需要将所有的非法消息转换为System输入,或者User文件输入
    for dict_msg in params["messages"]:
        if dict_msg["role"] == "tool" or dict_msg["role"] == "function":
            dict_msg[
                "role"
            ] = "tool"  # 从 2024-12-23 开始,ollama开始支持tool角色。原来需要修改为system,现在可以使用tool了。
        if dict_msg["role"] == "console":
            dict_msg["role"] = "user"  # console是TFRobot自定义角色,为了适配OllamaAPI,在此转换为User角色。
        # 对于Assistant带有ToolCalls消息,需要重点处理其arguments,OpenAI接口中arguments是字符串,但Ollama要求是字典格式
        if dict_msg["role"] == "assistant" and dict_msg.get("tool_calls"):
            for tc in dict_msg["tool_calls"]:
                if isinstance(tc["function"]["arguments"], str):
                    tc["function"]["arguments"] = json.loads(tc["function"]["arguments"])
        if isinstance(dict_msg["content"], list):
            dict_msg["content"] = str(dict_msg["content"])  # 将图片类型转换为文本类型
    if llm_tools and not self.used_tool_prompt:
        params["tools"] = [
            {
                "type": "function",
                "function": {
                    "name": tool.tool_name,
                    "description": tool.description,
                    "parameters": tool.params_json_schema,
                },
            }
            for tool in llm_tools
        ]
    params.update(**partial_params)
    return params

extract_thinking staticmethod

extract_thinking(text: str) -> tuple[str, str]

从字符串中提取第一个...标签内的内容及剩余部分。

Parameters:

Name Type Description Default
text str

要处理的原始字符串

required

Returns:

Type Description
tuple[str, str]

tuple[str, str]: 第一个元素是标签内的内容,第二个是去除整个标签后的剩余字符串

Source code in tfrobot/brain/chain/llms/ollama.py
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
@staticmethod
def extract_thinking(text: str) -> tuple[str, str]:
    """
    从字符串中提取第一个<thinking>...</thinking>标签内的内容及剩余部分。

    Args:
        text (str): 要处理的原始字符串

    Returns:
        tuple[str, str]: 第一个元素是标签内的内容,第二个是去除整个标签后的剩余字符串
    """
    pattern = re.compile(r"<think>(.*?)</think>", re.DOTALL)
    match = pattern.search(text)

    if not match:
        return "", text

    start, end = match.start(), match.end()
    return match.group(1), text[:start] + text[end:]

construct_llm_result

construct_llm_result(res: dict, params: dict, response_format: Optional[LLMResponseFormat] = None) -> LLMResult

Construct chat result | 构造聊天结果

因为Ollama提供的HttpAPI参数中不支持stop停止词。所以在生成内容中手动实现了此能力。 但需要注意这会导致completion_tokens过高,因为是生成后,手动终止的停止词。

Parameters:

Name Type Description Default
res dict

Response from Ollama | Ollama的响应

required
params dict

Request to Ollama | Ollama的请求

required
response_format Optional[LLMResponseFormat]

Response format | 响应格式.

None

Returns:

Name Type Description
LLMResult LLMResult

The result of completion | 完成的结果

Source code in tfrobot/brain/chain/llms/ollama.py
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
def construct_llm_result(
    self, res: dict, params: dict, response_format: Optional[LLMResponseFormat] = None
) -> LLMResult:
    """
    Construct chat result | 构造聊天结果

    因为Ollama提供的HttpAPI参数中不支持stop停止词。所以在生成内容中手动实现了此能力。
    但需要注意这会导致completion_tokens过高,因为是生成后,手动终止的停止词。

    Args:
        res (dict): Response from Ollama | Ollama的响应
        params (dict): Request to Ollama | Ollama的请求
        response_format (Optional[LLMResponseFormat]): Response format | 响应格式.

    Returns:
        LLMResult: The result of completion | 完成的结果
    """
    import tiktoken

    # 使用Tiktoken近似计算Token消耗
    all_messages = str(params.get("messages"))

    encoding = tiktoken.get_encoding("gpt2")
    prompt_tokens = encoding.encode(all_messages)
    completion_tokens = encoding.encode(res["message"]["content"])

    split_content = res["message"]["content"]
    if self.stop_words:
        split_content = split_on_stopwords(split_content, self.stop_words)

    # 尝试进行格式化操作
    response_format = response_format or (
        self.response_format if self.response_format != inspect.Parameter.empty else None
    )
    if response_format:
        json_obj = extract_json_from_nature_language(
            split_content,
            schema=cast(dict, response_format.get("json_schema", {})).get("schema") if response_format else None,
        )
        if json_obj:
            split_content = json.dumps(json_obj, ensure_ascii=False)

    usage = TokenUsage(
        prompt_tokens=len(prompt_tokens),
        completion_tokens=len(completion_tokens),
        total_tokens=len(prompt_tokens) + len(completion_tokens),
        prompt_cost=len(prompt_tokens) * self.input_price,
        completion_cost=len(completion_tokens) * self.input_price,
        total_cost=len(prompt_tokens) * self.input_price + len(completion_tokens) * self.input_price,
    )
    reasoning_content, split_content = self.extract_thinking(split_content) if split_content else (None, "")
    llm_res = LLMResult(
        generations=[
            Generation(
                text=split_content,
                reasoning_content=reasoning_content,
                tool_calls=[
                    ToolCall(
                        function=FunctionCall(
                            name=tc["function"]["name"], parameters=json.dumps(tc["function"]["arguments"])
                        )
                    )
                    for tc in res["message"]["tool_calls"]
                ]
                if res["message"].get("tool_calls")
                else None,
            )
        ],
        usage=usage,
        prompt=params,
        meta_info={
            "model": self.name,
            "created": res.get("created_at"),
            "done": res.get("done"),
            "total_duration": res.get("total_duration"),
            "load_duration": res.get("load_duration"),
            "prompt_eval_duration": res.get("prompt_eval_duration"),
            "eval_count": res.get("eval_count"),
            "eval_duration": res.get("eval_duration"),
            "id": None,
            "system_fingerprint": None,
        },
    )
    if self.used_tool_prompt:
        for prompt in self.all_prompts:
            if isinstance(prompt, ToolPrompt):
                prompt.extract_tool_call(llm_res)
    if self.enable_tfl_mode:
        for prompt in self.all_prompts:
            if isinstance(prompt, TFLPrompt):
                prompt.extract_tool_call(llm_res)
    return llm_res

complete

complete(current_input: BaseMessage, conversation: Optional[list[BaseMessage]] = None, elements: Optional[list[DocElement]] = None, knowledge: Optional[str] = None, tools: Optional[list[BaseTool]] = None, intermediate_msgs: Optional[list[BaseMessage]] = None, response_format: Optional[LLMResponseFormat] = None) -> LLMResult

Complete chat | 生成聊天结果

Ollama的Chat生成端点为:POST /api/chat

最终的会话排序如下: 1. 首条System Message 2. 用户历史会话内容 3. 当前输入前的提示 4. 当前输入 5. 当前输入后的提示 6. 中间消息

Parameters:

Name Type Description Default
current_input BaseMessage

User last input | 用户最后输入

required
conversation Optional[list[BaseMessage]]

Conversation history | 会话历史

None
elements Optional[list[DocElement]]

Document elements | 文档元素

None
knowledge Optional[str]

Related knowledge | 相关知识

None
tools Optional[list[BaseTool]]

Available tool list | 可用的工具列表

None
intermediate_msgs Optional[list[BaseMessage]]

Intermediate messages, it will be generated during llm work in chain, list tool's response or other system message, those messages will not save into memory, but usefully during chain work | 中间消息,它将在链式工作中生成,列表工具的响应或其他系统消息,这些消息不会保存到记忆体中, 但在链式工作中非常有用

None
response_format Optional[LLMResponseFormat]

Response format | 响应格式. 如果在此传递,会覆盖类的同名属性设置。

None

Returns:

Name Type Description
LLMResult LLMResult

The result of completion | 完成的结果

Source code in tfrobot/brain/chain/llms/ollama.py
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
@report_llm_metrics
@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=4, max=10),
    retry=retry_if_exception_type((httpx.ConnectError, httpx.TimeoutException)),
    before_sleep=before_sleep_log(logger, logging.WARN),
)
@overrides.override
def complete(
    self,
    current_input: BaseMessage,
    conversation: Optional[list[BaseMessage]] = None,
    elements: Optional[list[DocElement]] = None,
    knowledge: Optional[str] = None,
    tools: Optional[list[BaseTool]] = None,
    intermediate_msgs: Optional[list[BaseMessage]] = None,
    response_format: Optional[LLMResponseFormat] = None,
) -> LLMResult:
    """
    Complete chat | 生成聊天结果

    Ollama的Chat生成端点为:POST /api/chat

    最终的会话排序如下:
    1. 首条System Message
    2. 用户历史会话内容
    3. 当前输入前的提示
    4. 当前输入
    5. 当前输入后的提示
    6. 中间消息

    Args:
        current_input (BaseMessage): User last input | 用户最后输入
        conversation (Optional[list[BaseMessage]]): Conversation history | 会话历史
        elements (Optional[list[DocElement]]): Document elements | 文档元素
        knowledge (Optional[str]): Related knowledge | 相关知识
        tools (Optional[list[BaseTool]]): Available tool list | 可用的工具列表
        intermediate_msgs (Optional[list[BaseMessage]]): Intermediate messages, it will be generated during llm work
            in chain, list tool's response or other system message, those messages will not save into memory, but
            usefully during chain work | 中间消息,它将在链式工作中生成,列表工具的响应或其他系统消息,这些消息不会保存到记忆体中,
            但在链式工作中非常有用
        response_format (Optional[LLMResponseFormat]): Response format | 响应格式. 如果在此传递,会覆盖类的同名属性设置。

    Returns:
        LLMResult: The result of completion | 完成的结果
    """
    params = self.construct_request_params(
        current_input, conversation, elements, knowledge, tools, intermediate_msgs, response_format
    )
    with tracer.start_as_current_span("ollama-chat", kind=SpanKind.CLIENT) as span:
        if self.stream:
            span_ctx = LLMEventContext(
                scene="LLM",
                entity_id=str(id(self)),
                desc="before llm streaming request",
                info=str(current_input.content),
                streaming=self.stream,
                token_usage=None,
                user_input=str(current_input.content),
                llm_model_name=self.name,
            )
            span.add_event(SpanEvent.BEFORE_LLM_STREAMING.value, span_ctx.model_dump())
            try:
                with self._client.stream("POST", "/api/chat", json=params) as response:  # noqa
                    res = {"message": {"content": ""}}
                    for line in response.iter_lines():
                        res_json = json.loads(line)
                        if res_json.get("error"):
                            # Ollama 返回错误,直接抛出
                            raise ValueError(res_json.get("error"))
                        if not res_json.get("done"):
                            res["message"]["content"] += res_json.get("message", {}).get("content", "")
                            span_ctx = LLMEventContext(
                                scene="LLM",
                                entity_id=str(id(self)),
                                desc="llm streaming response",
                                info=line,
                                streaming=self.stream,
                                token_usage=None,
                                user_input=str(current_input.content),
                                llm_model_name=self.name,
                            )
                            span.add_event(SpanEvent.LLM_STREAMING.value, span_ctx.model_dump())
                        else:
                            res["model"] = res_json.get("model")
                            res["created_at"] = res_json.get("created_at")
                            res["done"] = res_json.get("done")
                            res["total_duration"] = res_json.get("total_duration")
                            res["load_duration"] = res_json.get("load_duration")
                            res["prompt_eval_count"] = res_json.get("prompt_eval_count")
                            res["eval_count"] = res_json.get("eval_count")
                            res["eval_duration"] = res_json.get("eval_duration")
                            break
                span_ctx = LLMEventContext(
                    scene="LLM",
                    entity_id=str(id(self)),
                    desc="after llm streaming response",
                    info=json.dumps(res, indent=2, ensure_ascii=False),
                    streaming=self.stream,
                    token_usage=None,
                    user_input=str(current_input.content),
                    llm_model_name=self.name,
                )
                span.add_event(SpanEvent.AFTER_LLM_STREAMING.value, span_ctx.model_dump())
            except httpx.RemoteProtocolError as e:
                # Ollama 服务器断开连接,这可能是由于上下文超长导致的
                # 参考:https://github.com/ollama/ollama/issues/2653
                logger.warning(
                    f"Ollama server disconnected without sending a response. "
                    f"This may be caused by context length exceeding the model's limit. "
                    f"Current num_ctx setting: {self.num_ctx}. "
                    f"Consider reducing input size or increasing num_ctx parameter. "
                    f"Error: {e}"
                )
                raise
            except ValueError:
                # 其他错误直接抛出
                raise
        else:
            try:
                res = self._client.post("/api/chat", json=params).raise_for_status().json()
            except httpx.RemoteProtocolError as e:
                # Ollama 服务器断开连接,这可能是由于上下文超长导致的
                # 参考:https://github.com/ollama/ollama/issues/2653
                logger.warning(
                    f"Ollama server disconnected without sending a response. "
                    f"This may be caused by context length exceeding the model's limit. "
                    f"Current num_ctx setting: {self.num_ctx}. "
                    f"Consider reducing input size or increasing num_ctx parameter. "
                    f"Error: {e}"
                )
                raise
            except httpx.HTTPStatusError:
                # 其他 HTTP 错误直接抛出
                raise
    return self.construct_llm_result(res, params=params, response_format=response_format)

async_complete async

async_complete(current_input: BaseMessage, conversation: Optional[list[BaseMessage]] = None, elements: Optional[list[DocElement]] = None, knowledge: Optional[str] = None, tools: Optional[list[BaseTool]] = None, intermediate_msgs: Optional[list[BaseMessage]] = None, response_format: Optional[LLMResponseFormat] = None) -> LLMResult

Complete chat async | 异步生成聊天结果

Ollama的Chat生成端点为:POST /api/chat

最终的会话排序如下: 1. 首条System Message 2. 用户历史会话内容 3. 当前输入前的提示 4. 当前输入 5. 当前输入后的提示 6. 中间消息

Parameters:

Name Type Description Default
current_input BaseMessage

User last input | 用户最后输入

required
conversation Optional[list[BaseMessage]]

Conversation history | 会话历史

None
elements Optional[list[DocElement]]

Document elements | 文档元素

None
knowledge Optional[str]

Related knowledge | 相关知识

None
tools Optional[list[BaseTool]]

Available tool list | 可用的工具列表

None
intermediate_msgs Optional[list[BaseMessage]]

Intermediate messages, it will be generated during llm work in chain, list tool's response or other system message, those messages will not save into memory, but usefully during chain work | 中间消息,它将在链式工作中生成,列表工具的响应或其他系统消息,这些消息不会保存到记忆体中, 但在链式工作中非常有用

None
response_format Optional[LLMResponseFormat]

Response format | 响应格式. 如果在此传递,会覆盖类的同名属性设置。

None

Returns:

Name Type Description
LLMResult LLMResult

The result of completion | 完成的结果

Source code in tfrobot/brain/chain/llms/ollama.py
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
@report_llm_metrics
@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=4, max=10),
    retry=retry_if_exception_type((httpx.ConnectError, httpx.TimeoutException)),
    before_sleep=before_sleep_log(logger, logging.WARN),
)
@overrides.override
async def async_complete(
    self,
    current_input: BaseMessage,
    conversation: Optional[list[BaseMessage]] = None,
    elements: Optional[list[DocElement]] = None,
    knowledge: Optional[str] = None,
    tools: Optional[list[BaseTool]] = None,
    intermediate_msgs: Optional[list[BaseMessage]] = None,
    response_format: Optional[LLMResponseFormat] = None,
) -> LLMResult:
    """
    Complete chat async | 异步生成聊天结果

    Ollama的Chat生成端点为:POST /api/chat

    最终的会话排序如下:
    1. 首条System Message
    2. 用户历史会话内容
    3. 当前输入前的提示
    4. 当前输入
    5. 当前输入后的提示
    6. 中间消息

    Args:
        current_input (BaseMessage): User last input | 用户最后输入
        conversation (Optional[list[BaseMessage]]): Conversation history | 会话历史
        elements (Optional[list[DocElement]]): Document elements | 文档元素
        knowledge (Optional[str]): Related knowledge | 相关知识
        tools (Optional[list[BaseTool]]): Available tool list | 可用的工具列表
        intermediate_msgs (Optional[list[BaseMessage]]): Intermediate messages, it will be generated during llm work
            in chain, list tool's response or other system message, those messages will not save into memory, but
            usefully during chain work | 中间消息,它将在链式工作中生成,列表工具的响应或其他系统消息,这些消息不会保存到记忆体中,
            但在链式工作中非常有用
        response_format (Optional[LLMResponseFormat]): Response format | 响应格式. 如果在此传递,会覆盖类的同名属性设置。

    Returns:
        LLMResult: The result of completion | 完成的结果
    """
    params = self.construct_request_params(
        current_input, conversation, elements, knowledge, tools, intermediate_msgs, response_format
    )
    with tracer.start_as_current_span("ollama-chat", kind=SpanKind.CLIENT) as span:
        if self.stream:
            span_ctx = LLMEventContext(
                scene="LLM",
                entity_id=str(id(self)),
                desc="before llm streaming request",
                info=str(current_input.content),
                streaming=self.stream,
                token_usage=None,
                user_input=str(current_input.content),
                llm_model_name=self.name,
            )
            span.add_event(SpanEvent.BEFORE_LLM_STREAMING.value, span_ctx.model_dump())
            try:
                async with self._async_client.stream("POST", "/api/chat", json=params) as response:  # noqa
                    res = {"message": {"content": ""}}
                    async for line in response.aiter_lines():
                        res_json = json.loads(line)
                        if res_json.get("error"):
                            # Ollama 返回错误,直接抛出
                            raise ValueError(res_json.get("error"))
                        if not res_json.get("done"):
                            res["message"]["content"] += res_json.get("message", {}).get("content", "")
                            span_ctx = LLMEventContext(
                                scene="LLM",
                                entity_id=str(id(self)),
                                desc="llm streaming response",
                                info=line,
                                streaming=self.stream,
                                token_usage=None,
                                user_input=str(current_input.content),
                                llm_model_name=self.name,
                            )
                            span.add_event(SpanEvent.LLM_STREAMING.value, span_ctx.model_dump())
                        else:
                            res["model"] = res_json.get("model")
                            res["created_at"] = res_json.get("created_at")
                            res["done"] = res_json.get("done")
                            res["total_duration"] = res_json.get("total_duration")
                            res["load_duration"] = res_json.get("load_duration")
                            res["prompt_eval_count"] = res_json.get("prompt_eval_count")
                            res["eval_count"] = res_json.get("eval_count")
                            res["eval_duration"] = res_json.get("eval_duration")
                            break
                span_ctx = LLMEventContext(
                    scene="LLM",
                    entity_id=str(id(self)),
                    desc="after llm streaming response",
                    info=json.dumps(res, indent=2, ensure_ascii=False),
                    streaming=self.stream,
                    token_usage=None,
                    user_input=str(current_input.content),
                    llm_model_name=self.name,
                )
                span.add_event(SpanEvent.AFTER_LLM_STREAMING.value, span_ctx.model_dump())
            except httpx.RemoteProtocolError as e:
                # Ollama 服务器断开连接,这可能是由于上下文超长导致的
                # 参考:https://github.com/ollama/ollama/issues/2653
                logger.warning(
                    f"Ollama server disconnected without sending a response. "
                    f"This may be caused by context length exceeding the model's limit. "
                    f"Current num_ctx setting: {self.num_ctx}. "
                    f"Consider reducing input size or increasing num_ctx parameter. "
                    f"Error: {e}"
                )
                raise
            except ValueError:
                # 其他错误直接抛出
                raise
        else:
            try:
                res = (await self._async_client.post("/api/chat", json=params)).raise_for_status().json()
            except httpx.RemoteProtocolError as e:
                # Ollama 服务器断开连接,这可能是由于上下文超长导致的
                # 参考:https://github.com/ollama/ollama/issues/2653
                logger.warning(
                    f"Ollama server disconnected without sending a response. "
                    f"This may be caused by context length exceeding the model's limit. "
                    f"Current num_ctx setting: {self.num_ctx}. "
                    f"Consider reducing input size or increasing num_ctx parameter. "
                    f"Error: {e}"
                )
                raise
            except httpx.HTTPStatusError:
                # 其他 HTTP 错误直接抛出
                raise
    return self.construct_llm_result(res, params=params, response_format=response_format)

OllamaWithTools

Bases: Ollama

OllamaWithTools相较于Ollama加强了对Tools的控制能力。

construct_request_params

construct_request_params(current_input: BaseMessage, conversation: Optional[list[BaseMessage]] = None, elements: Optional[list[DocElement]] = None, knowledge: Optional[str] = None, tools: Optional[list[BaseTool]] = None, intermediate_msgs: Optional[list[BaseMessage]] = None, response_format: Optional[LLMResponseFormat] = None) -> dict

Construct chat request parameters | 构造聊天请求参数

最终的会话排序如下: 1. 首条System Message 2. 用户历史会话内容 3. 当前输入前的提示 4. 当前输入 5. 当前输入后的提示 6. 中间消息

Parameters:

Name Type Description Default
current_input BaseMessage

Current input | 当前输入

required
conversation Optional[list[BaseMessage]]

Conversation history | 会话历史

None
elements Optional[list[DocElement]]

Document elements | 文档元素

None
knowledge Optional[str]

Related knowledge | 相关知识

None
tools Optional[list[BaseTool]]

Available tool list | 可用的工具列表

None
intermediate_msgs Optional[list[BaseMessage]]

Intermediate messages, it will be generated during llm work in chain, list tool's response or other system message, those messages will not save into memory, but usefully during chain work | 中间消息,它将在链式工作中生成,列表工具的响应或其他系统消息,这些消息不会保存到记忆体中, 但在链式工作中非常有用

None
response_format Optional[LLMResponseFormat]

Response format | 响应格式 如果在此指定响应格式,会覆盖LLM的默认响应格式。

None

Returns:

Name Type Description
dict dict

Request parameters | 请求参数

Source code in tfrobot/brain/chain/llms/ollama.py
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
@overrides.override
def construct_request_params(
    self,
    current_input: BaseMessage,
    conversation: Optional[list[BaseMessage]] = None,
    elements: Optional[list[DocElement]] = None,
    knowledge: Optional[str] = None,
    tools: Optional[list[BaseTool]] = None,
    intermediate_msgs: Optional[list[BaseMessage]] = None,
    response_format: Optional[LLMResponseFormat] = None,
) -> dict:
    """
    Construct chat request parameters | 构造聊天请求参数

    最终的会话排序如下:
    1. 首条System Message
    2. 用户历史会话内容
    3. 当前输入前的提示
    4. 当前输入
    5. 当前输入后的提示
    6. 中间消息

    Args:
        current_input(BaseMessage): Current input | 当前输入
        conversation(Optional[list[BaseMessage]]): Conversation history | 会话历史
        elements(Optional[list[DocElement]]): Document elements | 文档元素
        knowledge(Optional[str]): Related knowledge | 相关知识
        tools(Optional[list[BaseTool]]): Available tool list | 可用的工具列表
        intermediate_msgs(Optional[list[BaseMessage]]): Intermediate messages, it will be generated during llm work
            in chain, list tool's response or other system message, those messages will not save into memory, but
            usefully during chain work | 中间消息,它将在链式工作中生成,列表工具的响应或其他系统消息,这些消息不会保存到记忆体中,
            但在链式工作中非常有用
        response_format(Optional[LLMResponseFormat]): Response format | 响应格式
            如果在此指定响应格式,会覆盖LLM的默认响应格式。

    Returns:
        dict: Request parameters | 请求参数
    """
    super_res = super().construct_request_params(
        current_input, conversation, elements, knowledge, tools, intermediate_msgs, response_format
    )
    if tools := super_res.get("tools"):
        for i in range(len(tools) - 1, -1, -1):
            tool_name = tools[i]["function"]["name"]
            if self.exclude_tools and tool_name in self.exclude_tools:
                tools.pop(i)
                continue
            if self.available_tools is not None and tool_name not in self.available_tools:
                tools.pop(i)
        if not super_res.get("tools"):
            del super_res["tools"]
    return super_res

split_on_stopwords

split_on_stopwords(text: str, stopwords: str | list[str]) -> str

Split text on stopwords | 在停止词上拆分文本

Parameters:

Name Type Description Default
text str

Text to split | 要拆分的文本

required
stopwords str | list[str]

Stopwords | 停止词

required

Returns:

Name Type Description
str str

Splitted text | 拆分后的文本

Source code in tfrobot/brain/chain/llms/ollama.py
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
def split_on_stopwords(text: str, stopwords: str | list[str]) -> str:
    """
    Split text on stopwords | 在停止词上拆分文本

    Args:
        text (str): Text to split | 要拆分的文本
        stopwords (str | list[str]): Stopwords | 停止词

    Returns:
        str: Splitted text | 拆分后的文本
    """
    if not stopwords:
        return text
    if isinstance(stopwords, str):
        stopwords = [stopwords]  # 确保停止词是列表形式
    if stopwords:  # 只有当列表非空时执行
        # 移除重复单词并构建正则表达式
        stopwords = list(set(stopwords))
        pattern = r"(" + "|".join(re.escape(word) for word in stopwords if word) + r")"
        parts = re.split(pattern, text, 1)
        if len(parts) > 1 and parts[0].strip():
            # 注意if判断条件里 parts[0].strip() 很重要,很多时候大语言模型会进行指令跟随,可能停止词会出现开头的位置。这种情况下往往是对一个结果的反馈,不需要进行停止
            return parts[0]
    return text  # 如果没有分割发生或停止词列表为空