Skip to content

IDE工具

介绍

在LLM模型开发过程中,如果LLM没有一个合适的IDE工具,那如何利用LLM生成代码能力便难上加难。

我们团队在使用LLM过程中有过感慨:“有时候写代码,我已经分不清是GPT在给我打工还是我在给GPT打工了”。虽说是一句玩笑话,但背后反映了:

  1. LLM的代码生成能力已经达到了生产标准。这句话相信还有很多人不信,但我们团队仍然坚信于此。
  2. LLM生成的代码,如何落地到当前的项目中,还存在诸多限制,导致人为的操作成本过高,包括但不限于:
  3. 你需要复制代码到当前项目中,然后调试玩IDE给出的报警与错误信息。
  4. 调整代码风格以适应当前的项目要求。
  5. 通过当前的测试用例,并对其进行调整。

在不断的实践中,我们冒出一个想法,何不开发一款专门让LLM使用的IDE工具?这样,LLM生成的代码可以直接在IDE中调试,而不需要人为的操作。

要知道我们人类编写代码也不是一蹴而就的,而是与IDE+搜索引擎等工具不断打磨出来的。LLM也需要这样的工具,才能更好的发挥其代码生成能力。

萌生这个想法的时候Devin还没有发布,如今已经快半年了。我们团队在这半年的时间里,不断的尝试与实践,终于有了一个初步的成果。就是这个IDE工具模块。

在正式介绍IDE工具模块之前,我们先看一下截止2024-6-19,其它一些开源的,类Devin工具,其IDE是如何实现的:

Devika

PROMPT = open("src/agents/coder/prompt.jinja2", "r").read().strip()

class Coder:
    def __init__(self, base_model: str):
       ...

    def render(
        self, step_by_step_plan: str, user_context: str, search_results: dict
    ) -> str:
        ...

    def validate_response(self, response: str) -> Union[List[Dict[str, str]], bool]:
        ...

    def save_code_to_project(self, response: List[Dict[str, str]], project_name: str):
        ...

    def get_project_path(self, project_name: str):
        ...

    def response_to_markdown_prompt(self, response: List[Dict[str, str]]) -> str:
        ...

    def emulate_code_writing(self, code_set: list, project_name: str):
        ...

    @retry_wrapper
    def execute(
        self,
        step_by_step_plan: str,
        user_context: str,
        search_results: dict,
        project_name: str
    ) -> str:
        ...

这部分代码中最重要的部分在于execute/emulate_code_writing/save_code_to_project这几个函数。这几个函数基本实现了从LLM获取文件修改意见,并且落地到具体文件中。

OpenDevin

其基于Gymnasium的设计理念设计了一个比较简单的IDE工具。其核心代码如下:

def insert_lines(
    to_insert: list[str], original: list[str], start: int = 0, end: int = -1
):
    """
    Insert the new content to the original content based on start and end
    """
    ...


async def write_file(path, workdir, content, start=0, end=-1) -> Observation:
    insert = content.split('\n')

    try:
        whole_path = resolve_path(path, workdir)
        if not os.path.exists(os.path.dirname(whole_path)):
            os.makedirs(os.path.dirname(whole_path))
        mode = 'w' if not os.path.exists(whole_path) else 'r+'
        try:
            with open(whole_path, mode, encoding='utf-8') as file:
                if mode != 'w':
                    all_lines = file.readlines()
                    new_file = insert_lines(insert, all_lines, start, end)
                else:
                    new_file = [i + '\n' for i in insert]

                file.seek(0)
                file.writelines(new_file)
                file.truncate()
        except FileNotFoundError:
            return ErrorObservation(f'File not found: {path}')
        except IsADirectoryError:
            return ErrorObservation(
                f'Path is a directory: {path}. You can only write to files'
            )
        except UnicodeDecodeError:
            return ErrorObservation(f'File could not be decoded as utf-8: {path}')
    except PermissionError:
        return ErrorObservation(f'Malformed paths not permitted: {path}')
    return FileWriteObservation(content='', path=path)

核心函数是以上两个。驱动(调用)这两个函数的代码如下:

class ServerRuntime(Runtime):
    def __init__(
        self,
        event_stream: EventStream,
        sid: str = 'default',
        sandbox: Sandbox | None = None,
    ):
        ...

    async def run(self, action: CmdRunAction) -> Observation:
        return self._run_command(action.command, background=action.background)

    async def kill(self, action: CmdKillAction) -> Observation:
        ...
    async def read(self, action: FileReadAction) -> Observation:
        # TODO: use self.file_store
        working_dir = self.sandbox.get_working_directory()
        return await read_file(action.path, working_dir, action.start, action.end)

    async def write(self, action: FileWriteAction) -> Observation:
        # TODO: use self.file_store
        working_dir = self.sandbox.get_working_directory()
        return await write_file(
            action.path, working_dir, action.content, action.start, action.end
        )

其驱动流程是在于将LLM的返回转换为Action,然后将Action关到RunTime中,通过调用write_file与read_file等函数将其转换为Observation

SWE-Agent

class SWEEnv(gym.Env):
    """Gym environment for SWE-bench. This class should handle all communication with the docker container."""

    name = "swe_main"
    # This prefix will be prepended to the image name when caching task images
    cached_image_prefix = "swe-agent-task-env-"

    def __init__(self, args: EnvironmentArguments):
        ...

    def step(self, action: str) -> tuple[str | None, int, bool, dict]:
        ...

    def communicate(
        self,
        input: str,
        timeout_duration=25,
    ) -> str:
        ...

SWE-Agent整体上也是基于Gymnasium的设计理念设计的IDE工具。其核心代码在于step/communicate这两个函数。

它与前二者不同的是,全部使用命令行命令,然后将所有命令交付到一个Docker镜像中去运行,这样的好处在于安全隔离。当前的设计的不足是仅能依赖于命令行工具进行代码编辑,能力非常弱,遇到复杂的编辑需求会大量消耗Token。

它封装了一个命令行工具,用于帮助大模型进行文件编辑,具体格式如下:

# @yaml
# signature: |-
#   edit <start_line>:<end_line>
#   <replacement_text>
#   end_of_edit
# docstring: replaces lines <start_line> through <end_line> (inclusive) with the given text in the open file. The replacement text is terminated by a line with only end_of_edit on it. All of the <replacement text> will be entered, so make sure your indentation is formatted properly. Python files will be checked for syntax errors after the edit. If the system detects a syntax error, the edit will not be executed. Simply try to edit the file again, but make sure to read the error message and modify the edit command you issue accordingly. Issuing the same command a second time will just lead to the same error message again.
# end_name: end_of_edit
# arguments:
#   start_line:
#     type: integer
#     description: the line number to start the edit at
#     required: true
#   end_line:
#     type: integer
#     description: the line number to end the edit at (inclusive)
#     required: true
#   replacement_text:
#     type: string
#     description: the text to replace the current selection with
#     required: true
edit() {
    if [ -z "$CURRENT_FILE" ]
    then
        echo 'No file open. Use the `open` command first.'
        return
    fi

    local start_line="$(echo $1: | cut -d: -f1)"
    local end_line="$(echo $1: | cut -d: -f2)"

    if [ -z "$start_line" ] || [ -z "$end_line" ]
    then
        echo "Usage: edit <start_line>:<end_line>"
        return
    fi
    # 其它代码省略,可去其源码查阅

最终的命令调用流程如下:

{
  "environment": "swe_main",
  "trajectory": [
    {
      "action": "ls -F\n",
      "observation": "AUTHORS.rst\nCHANGELOG.rst\nCODE_OF_CONDUCT.md\nCONTRIBUTING.rst\nLICENSE\nMANIFEST.in\nNOTICE\nREADME.rst\nRELEASING.md\nazure-pipelines.yml\ndocs/\nexamples/\nperformance/\npyproject.toml\nsetup.cfg\nsetup.py\nsrc/\ntests/\ntox.ini\n",
      "response": "Let's list out some of the files in the repository to get an idea of the structure and contents. We can use the `ls -F` command to list the files in the current directory.\n```\nls -F\n```",
      "state": "{\"open_file\": \"n/a\", \"working_dir\": \"/marshmallow-code__marshmallow\"}\n",
      "thought": "Let's list out some of the files in the repository to get an idea of the structure and contents. We can use the `ls -F` command to list the files in the current directory.\n"
    },
    {
      "action": "open setup.py\n",
      "observation": "[File: /marshmallow-code__marshmallow/setup.py (94 lines total)]\n1:import re\n2:from setuptools import setup, find_packages\n3:\n4:EXTRAS_REQUIRE = {\n5:    \"tests\": [\"pytest\", \"pytz\", \"simplejson\"],\n6:    \"lint\": [\n7:        \"mypy==0.910\",\n8:        \"flake8==4.0.1\",\n9:        \"flake8-bugbear==21.9.2\",\n10:        \"pre-commit~=2.4\",\n11:    ],\n12:    \"docs\": [\n13:        \"sphinx==4.2.0\",\n14:        \"sphinx-issues==1.2.0\",\n15:        \"alabaster==0.7.12\",\n16:        \"sphinx-version-warning==1.1.2\",\n17:        \"autodocsumm==0.2.7\",\n18:    ],\n19:}\n20:EXTRAS_REQUIRE[\"dev\"] = EXTRAS_REQUIRE[\"tests\"] + EXTRAS_REQUIRE[\"lint\"] + [\"tox\"]\n21:\n22:\n23:def find_version(fname):\n24:    \"\"\"Attempts to find the version number in the file names fname.\n25:    Raises RuntimeError if not found.\n26:    \"\"\"\n27:    version = \"\"\n28:    with open(fname) as fp:\n29:        reg = re.compile(r'__version__ = [\\'\"]([^\\'\"]*)[\\'\"]')\n30:        for line in fp:\n31:            m = reg.match(line)\n32:            if m:\n33:                version = m.group(1)\n34:                break\n35:    if not version:\n36:        raise RuntimeError(\"Cannot find version information\")\n37:    return version\n38:\n39:\n40:def read(fname):\n41:    with open(fname) as fp:\n42:        content = fp.read()\n43:    return content\n44:\n45:\n46:setup(\n47:    name=\"marshmallow\",\n48:    version=find_version(\"src/marshmallow/__init__.py\"),\n49:    description=(\n50:        \"A lightweight library for converting complex \"\n51:        \"datatypes to and from native Python datatypes.\"\n52:    ),\n53:    long_description=read(\"README.rst\"),\n54:    author=\"Steven Loria\",\n55:    author_email=\"sloria1@gmail.com\",\n56:    url=\"https://github.com/marshmallow-code/marshmallow\",\n57:    packages=find_packages(\"src\", exclude=(\"test*\", \"examples\")),\n58:    package_dir={\"\": \"src\"},\n59:    package_data={\"marshmallow\": [\"py.typed\"]},\n60:    include_package_data=True,\n61:    extras_require=EXTRAS_REQUIRE,\n62:    license=\"MIT\",\n63:    zip_safe=False,\n64:    keywords=[\n65:        \"serialization\",\n66:        \"rest\",\n67:        \"json\",\n68:        \"api\",\n69:        \"marshal\",\n70:        \"marshalling\",\n71:        \"deserialization\",\n72:        \"validation\",\n73:        \"schema\",\n74:    ],\n75:    python_requires=\">=3.6\",\n76:    classifiers=[\n77:        \"Development Status :: 5 - Production/Stable\",\n78:        \"Intended Audience :: Developers\",\n79:        \"License :: OSI Approved :: MIT License\",\n80:        \"Programming Language :: Python :: 3\",\n81:        \"Programming Language :: Python :: 3.6\",\n82:        \"Programming Language :: Python :: 3.7\",\n83:        \"Programming Language :: Python :: 3.8\",\n84:        \"Programming Language :: Python :: 3.9\",\n85:        \"Programming Language :: Python :: 3.10\",\n86:    ],\n87:    test_suite=\"tests\",\n88:    project_urls={\n89:        \"Changelog\": \"https://marshmallow.readthedocs.io/en/latest/changelog.html\",\n90:        \"Issues\": \"https://github.com/marshmallow-code/marshmallow/issues\",\n91:        \"Funding\": \"https://opencollective.com/marshmallow\",\n92:        \"Tidelift\": \"https://tidelift.com/subscription/pkg/pypi-marshmallow?utm_source=pypi-marshmallow&utm_medium=pypi\",  # noqa\n93:    },\n94:)\n",
      "response": "We see that there's a setup.py file. This could be useful for installing the package locally. Since we'll probably need to reproduce the issue to solve it, it would be a good idea to install the package locally. Let's checkout the setup.py file to see what commands we can use to install the package.\n```\nopen setup.py\n```",
      "state": "{\"open_file\": \"n/a\", \"working_dir\": \"/marshmallow-code__marshmallow\"}\n",
      "thought": "We see that there's a setup.py file. This could be useful for installing the package locally. Since we'll probably need to reproduce the issue to solve it, it would be a good idea to install the package locally. Let's checkout the setup.py file to see what commands we can use to install the package.\n"
    },
    {
      "action": "pip install -e .[dev]\n",
      "observation": "Obtaining file:///marshmallow-code__marshmallow\n  Installing build dependencies: started\n  Installing build dependencies: finished with status 'done'\n  Checking if build backend supports build_editable: started\n  Checking if build backend supports build_editable: finished with status 'done'\n  Getting requirements to build editable: started\n  Getting requirements to build editable: finished with status 'done'\n  Preparing editable metadata (pyproject.toml): started\n  Preparing editable metadata (pyproject.toml): finished with status 'done'\nRequirement already satisfied: pytest in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (8.1.1)\nRequirement already satisfied: pytz in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (2024.1)\nRequirement already satisfied: simplejson in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (3.19.2)\nRequirement already satisfied: mypy==0.910 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (0.910)\nRequirement already satisfied: flake8==4.0.1 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (4.0.1)\nRequirement already satisfied: flake8-bugbear==21.9.2 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (21.9.2)\nRequirement already satisfied: pre-commit~=2.4 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (2.21.0)\nRequirement already satisfied: tox in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (4.14.2)\nRequirement already satisfied: mccabe<0.7.0,>=0.6.0 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from flake8==4.0.1) (0.6.1)\nRequirement already satisfied: pycodestyle<2.9.0,>=2.8.0 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from flake8==4.0.1) (2.8.0)\nRequirement already satisfied: pyflakes<2.5.0,>=2.4.0 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from flake8==4.0.1) (2.4.0)\nRequirement already satisfied: attrs>=19.2.0 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from flake8-bugbear==21.9.2) (23.2.0)\nRequirement already satisfied: typing-extensions>=3.7.4 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from mypy==0.910) (4.10.0)\nRequirement already satisfied: mypy-extensions<0.5.0,>=0.4.3 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from mypy==0.910) (0.4.4)\nRequirement already satisfied: toml in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from mypy==0.910) (0.10.2)\nRequirement already satisfied: cfgv>=2.0.0 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from pre-commit~=2.4) (3.4.0)\nRequirement already satisfied: identify>=1.0.0 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from pre-commit~=2.4) (2.5.35)\nRequirement already satisfied: nodeenv>=0.11.1 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from pre-commit~=2.4) (1.8.0)\nRequirement already satisfied: pyyaml>=5.1 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from pre-commit~=2.4) (6.0.1)\nRequirement already satisfied: virtualenv>=20.10.0 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from pre-commit~=2.4) (20.25.1)\nRequirement already satisfied: iniconfig in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from pytest) (2.0.0)\nRequirement already satisfied: packaging in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from pytest) (24.0)\nRequirement already satisfied: pluggy<2.0,>=1.4 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from pytest) (1.4.0)\nRequirement already satisfied: exceptiongroup>=1.0.0rc8 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from pytest) (1.2.0)\nRequirement already satisfied: tomli>=1 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from pytest) (2.0.1)\nRequirement already satisfied: cachetools>=5.3.2 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from tox) (5.3.3)\nRequirement already satisfied: chardet>=5.2 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from tox) (5.2.0)\nRequirement already satisfied: colorama>=0.4.6 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from tox) (0.4.6)\nRequirement already satisfied: filelock>=3.13.1 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from tox) (3.13.3)\nRequirement already satisfied: platformdirs>=4.1 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from tox) (4.2.0)\nRequirement already satisfied: pyproject-api>=1.6.1 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from tox) (1.6.1)\nRequirement already satisfied: setuptools in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from nodeenv>=0.11.1->pre-commit~=2.4) (68.2.2)\nRequirement already satisfied: distlib<1,>=0.3.7 in /root/miniconda3/envs/marshmallow-code__marshmallow__3.13/lib/python3.9/site-packages (from virtualenv>=20.10.0->pre-commit~=2.4) (0.3.8)\nChecking if build backend supports build_editable: started\nChecking if build backend supports build_editable: finished with status 'done'\nBuilding wheels for collected packages: marshmallow\n  Building editable for marshmallow (pyproject.toml): started\n  Building editable for marshmallow (pyproject.toml): finished with status 'done'\n  Created wheel for marshmallow: filename=marshmallow-3.13.0-0.editable-py3-none-any.whl size=4534 sha256=fcca61a41cd2783d4a5b8471ebc48cf886d36f6c66a29cc9b02761fc174a024d\n  Stored in directory: /tmp/pip-ephem-wheel-cache-h335xxo8/wheels/4d/da/d4/e230bf0a3f16cad5a83d235ac24a34b55178ddda181cb64dae\nSuccessfully built marshmallow\nInstalling collected packages: marshmallow\n  Attempting uninstall: marshmallow\n    Found existing installation: marshmallow 3.13.0\n    Uninstalling marshmallow-3.13.0:\n      Successfully uninstalled marshmallow-3.13.0\nSuccessfully installed marshmallow-3.13.0\nWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\n",
      "response": "The setup.py file contains a lot of useful information to install the package locally. In particular, I see there's a [dev] extras that installs all the dependencies needed for development. We can use this to install the package with all the necessary dependencies. Let's install the package locally with the [dev] extras.\n```\npip install -e .[dev]\n```",
      "state": "{\"open_file\": \"/marshmallow-code__marshmallow/setup.py\", \"working_dir\": \"/marshmallow-code__marshmallow\"}\n",
      "thought": "The setup.py file contains a lot of useful information to install the package locally. In particular, I see there's a [dev] extras that installs all the dependencies needed for development. We can use this to install the package with all the necessary dependencies. Let's install the package locally with the [dev] extras.\n"
    },
    {
      "action": "create reproduce.py\n",
      "observation": "[File: /marshmallow-code__marshmallow/reproduce.py (1 lines total)]\n1:\n",
      "response": "Looks like it installed successfully. Now that we have the package installed, we can start working on the issue.\nLet's first start by reproducing the results of the issue. The issue includes some example code for reproduction, which we can use. We'll create a new file called `reproduce.py` and paste the example code into it.\n```\ncreate reproduce.py\n```",
      "state": "{\"open_file\": \"/marshmallow-code__marshmallow/setup.py\", \"working_dir\": \"/marshmallow-code__marshmallow\"}\n",
      "thought": "Looks like it installed successfully. Now that we have the package installed, we can start working on the issue.\nLet's first start by reproducing the results of the issue. The issue includes some example code for reproduction, which we can use. We'll create a new file called `reproduce.py` and paste the example code into it.\n"
    },
    {
      "action": "edit 1:1\nfrom marshmallow.fields import TimeDelta\nfrom datetime import timedelta\n\ntd_field = TimeDelta(precision=\"milliseconds\")\n\nobj = dict()\nobj[\"td_field\"] = timedelta(milliseconds=345)\n\nprint(td_field.serialize(\"td_field\", obj))\nend_of_edit\n",
      "observation": "[File: /marshmallow-code__marshmallow/reproduce.py (9 lines total)]\n1:from marshmallow.fields import TimeDelta\n2:from datetime import timedelta\n3:\n4:td_field = TimeDelta(precision=\"milliseconds\")\n5:\n6:obj = dict()\n7:obj[\"td_field\"] = timedelta(milliseconds=345)\n8:\n9:print(td_field.serialize(\"td_field\", obj))\nFile updated. Please review the changes and make sure they are correct (correct indentation, no duplicate lines, etc). Edit the file again if necessary.\n",
      "response": "Now let's paste in the example code from the issue.\n```\nedit 1:1\nfrom marshmallow.fields import TimeDelta\nfrom datetime import timedelta\n\ntd_field = TimeDelta(precision=\"milliseconds\")\n\nobj = dict()\nobj[\"td_field\"] = timedelta(milliseconds=345)\n\nprint(td_field.serialize(\"td_field\", obj))\nend_of_edit\n```",
      "state": "{\"open_file\": \"/marshmallow-code__marshmallow/reproduce.py\", \"working_dir\": \"/marshmallow-code__marshmallow\"}\n",
      "thought": "Now let's paste in the example code from the issue.\n"
    },
    ...
  }
}

总结目前的开源框架其IDE模式

  1. Devika:基于Python的类实现,通过调用函数实现文件编辑。优点在于容易调试,在一定程度上比较好扩展更多工具。但缺点在于与当前业务流程耦合非常高, 无法独立使用。尤其是与Prompt耦合,这是当前LLM开发一个非常可怕的问题,当代码与Prompt耦合后会导致难以管理预期结果的情况
  2. OpenDevin:基于Gym做了一定的封装,将功能提取成独立函数,Gym是思路很明显是参考了SWE-Agent(我们的实现也参考这个点)。整体上看OpenDevin 是实现最好的一个框架。但也有很多缺点, 比如:函数功能无协议约束,扩展后LLM理解成本非常高等等。
  3. SWE-Agent:算是比较早的一个类Devin实现,其核心思路是将所有命令交付到一个Docker镜像中去运行,这样的好处在于安全隔离。但缺点在于仅能依赖于 命令行工具进行代码编辑,能力非常弱,即使封装了一个专用的命令行工具,但扩展起来成本高,毕竟命令行工具的能力边界还是太窄。当前的编辑文件都依赖于使 用环境变量管理的时候,其上限就能看到了,但是其开创性的使用Gym来封装,算是为行业做出巨大贡献。

TFRobot-IDE设计

在吸取了上述产品设计的优点后,结合TFRobot框架目前的能力,我们设计了目前的TFRobot-IDE。其具有以下特点:

  1. 独立进程管理:TFRobot-IDE是一个独立的进程,不依赖于其它的模块,这样可以更好的管理IDE的生命周期。也就非常容易实现基于Docker的封装,只要将 IDE进程放到Docker中运行,然后配置Docker开发工具对其发送命令接收返回即可。
  2. 基于LSP协议开发:得益于微软在编程领域的开源贡献,我们可以基于LSP协议开发LLM-IDE,这样有如下优势:
  3. 入参参数统一,并且LLM得到了很好的学习。目前的开源大模型对LSP协议一般都有比较好的理解。
  4. 多语言支持,只需要完成对LSP协议的覆盖,可以通过集成各个语言的LSP-Server服务,实现多语言支持,方便完成语法纠错等等功能。
  5. 自动化纠错与优化建议,虽然LLM有很强的语法分析能力,但是因为其技术原理是基于模型,所以在很多情况下,哪怕是基础语法错误,其无法做到100%的错误捕获。 同时,大模型生成的代码在插入到文件内,一旦存在基础错误,在上述框架中,只能通过运行或者将修改后代码传递给大模型两种方式得到一定程度上的解决。如此 一来,既消耗了Token,又消耗了时间。而基于LSP协议的IDE,可以通过LSP协议的纠错功能,实现基础语法纠错,这样可以大大减少LLM的调用次数,提高效率。
  6. 参考monaco-editor:monaco-editor是微软开源的一款编辑器,值得一提的是VSCode正是基于monaco引擎与LSP协议来搭建的。我们使用Python重写了Monaco的核心模块, 进而实现了高效的文本编辑能力,能更好的解决LLM生成的代码插入到文件中的一系列问题。
  7. 为LLM定制优化:简单的实现LSP与monaco-editor还是不够,在实践过程中,我们团队发现LLM诸多方面(比如文本定位)还是不够完善。所以我们对LSP与Monaco做了一些 小的优化方便LLM通过更好的驱动IDE工具,更高效地实现代码生成。
  8. 专有模型训练:在将代码插入到文件中的时候,本质上是LLM对IDE相关工具进行调用,但我们发现如果使用3.5级别的模型,在涉及到复杂修改时,其工具调用链有一定概率出问题。 但如果是使用GPT-4级别的模型,其成本会非常高,经过测算,一次复杂编辑操作大概花费在2毛人民币,这是因为LLM的训练数据集并不是专门为IDE工具设计的, 调用过程中存在错误调用时需要修复,而这种错误又因为无法微调到模型中,从而导致每次都会重复犯错,另外因为IDE提供的工具较多,所以每次任务上下文消耗较大。 所以我们团队专门训练了一个模型,专门用于IDE工具的代码生成,这样可以更好的适应IDE工具的需求。
  9. 基于Gymnasium的设计:我们团队在设计IDE工具时,参考了Gymnasium的设计理念将强化学习的概念整合进来,因为对于工具的使用,我们认为其符合强化学习的基本规律, 但在对强化学习整合上,目前还没有突破,在专有模型训练的时候,结合一其特性,帮助高效生成了训练数据,但我们认为,二者还可能存在更好的合作模式,只是我们还未完全感知到。

基于以上设计,我们整体实现了TFRobot-IDE,当前IDE工具相关代码在`TFRobotV2/tfrobot/drive/tool/ides中。目前尚未完整测试,评估还达不到开源标准,故而尚未开源。