Email Loader

Email Loader¶

EmailLoader ¶

Bases: UnstructuredLoader

Class for loading email files as documents.

Inherits from UnstructuredLoader.

Attributes:

Name	Type	Description
`mime_types`	`ClassVar[set[Literal[EML]]]`	A class variable that holds the MIME types for the email files. It is a set of FileType.EML literals. This attribute is frozen and cannot be modified.

load ¶

load(path_or_uri: str, *, file: bytes | IO[bytes] | None = None, content_type: Optional[str] = None, **kwargs: Any) -> Document

从给定的 URI 或文件对象加载文档并返回 Document 对象 / Load document from the given URI or file object and return Document.

Parameters:

Name	Type	Description	Default
`path_or_uri`	`str`	文档的路径或 URI （必填）/ File path or uri (required)	required
`file`	`bytes \| IO[bytes] \| None`	文件内容，可以是 bytes 或 IO[bytes] 对象（与 path_or_uri 参数二选一）/ File content as bytes or IO[bytes] (mutually exclusive with path_or_uri).	`None`
`content_type`	`Optional[str]`	文档的内容类型（MIME 类型），如果不提供则尝试自动检测 / The content type (MIME type), auto-detected if not provided.	`None`
`**kwargs`	`Any`	其他可选参数，传递给底层的分片函数 / Additional keyword arguments passed to the partition function.	`{}`

Returns:

Name	Type	Description
`Document`	`Document`	返回一个包含文档内容的 Document 对象 / The loaded document.

Raises:

Type	Description
`ValueError`	如果 path_or_uri 和 file 都未提供，或都提供了 / If neither or both path_or_uri and file are provided.

Source code in tfrobot/utils/document_loaders/email.py

def load(
    self,
    path_or_uri: str,
    *,
    file: bytes | IO[bytes] | None = None,
    content_type: Optional[str] = None,
    **kwargs: Any,
) -> Document:
    """
    从给定的 URI 或文件对象加载文档并返回 Document 对象 / Load document from the given URI or file object and return Document.

    Args:
        path_or_uri: 文档的路径或 URI （必填）/ File path or uri (required)
        file: 文件内容，可以是 bytes 或 IO[bytes] 对象（与 path_or_uri 参数二选一）/ File content as bytes or IO[bytes] (mutually exclusive with path_or_uri).
        content_type: 文档的内容类型（MIME 类型），如果不提供则尝试自动检测 / The content type (MIME type), auto-detected if not provided.
        **kwargs: 其他可选参数，传递给底层的分片函数 / Additional keyword arguments passed to the partition function.

    Returns:
        Document: 返回一个包含文档内容的 Document 对象 / The loaded document.

    Raises:
        ValueError: 如果 path_or_uri 和 file 都未提供，或都提供了 / If neither or both path_or_uri and file are provided.
    """
    from unstructured.partition.email import partition_email

    uri = convert_to_file_url(path_or_uri)
    # 处理文件对象 / Handle file object
    file_obj = BytesIO(file) if isinstance(file, bytes) else file if file is not None else self.get_file_obj(uri)
    unique_element_ids = kwargs.pop("unique_element_ids", True)  # 是否使用唯一元素ID
    filename = kwargs.pop(
        "filename", get_filename_from_uri(uri)
    )  # 因为TFRobot partition封装均是使用IO流进行解析，如果再传入filename UnstructuredIO会报错。
    kwargs.pop("url", None)  # 因为TFRobot partition封装均是使用IO流进行解析，如果再传入url UnstructuredIO会报错。
    if not unique_element_ids:
        raise ValueError("TFRobot必须启用唯一元素ID以确保正确处理文档。")

    els = partition_email(file=file_obj, unique_element_ids=unique_element_ids)  # noqa
    return Document.from_elements(
        [create_element_by_unstructured_element(e, filename=filename) for e in els],
        file_uri=AnyUrl(uri),
        file_type=TFFileType.EML,
    )