Blob#

class langchain_core.documents.base.Blob[source]#

Bases: BaseMedia

Blob represents raw data by either reference or value.

Provides an interface to materialize the blob in different representations, and help to decouple the development of data loaders from the downstream parsing of the raw data.

Inspired by: https://developer.mozilla.org/en-US/docs/Web/API/Blob

Example: Initialize a blob from in-memory data

from langchain_core.documents import Blob

blob = Blob.from_data("Hello, world!")

# Read the blob as a string
print(blob.as_string())

# Read the blob as bytes
print(blob.as_bytes())

# Read the blob as a byte stream
with blob.as_bytes_io() as f:
    print(f.read())

Example: Load from memory and specify mime-type and metadata

from langchain_core.documents import Blob

blob = Blob.from_data(
    data="Hello, world!",
    mime_type="text/plain",
    metadata={"source": "https://example.com"}
)

Example: Load the blob from a file

from langchain_core.documents import Blob

blob = Blob.from_path("path/to/file.txt")

# Read the blob as a string
print(blob.as_string())

# Read the blob as bytes
print(blob.as_bytes())

# Read the blob as a byte stream
with blob.as_bytes_io() as f:
    print(f.read())
param data: bytes | str | None = None#

Raw data associated with the blob.

param encoding: str = 'utf-8'#

Encoding to use if decoding the bytes into a string.

Use utf-8 as default encoding, if decoding to string.

param id: str | None = None#

An optional identifier for the document.

Ideally this should be unique across the document collection and formatted as a UUID, but this will not be enforced.

Added in version 0.2.11.

param metadata: dict [Optional]#

Arbitrary metadata associated with the content.

param mimetype: str | None = None#

MimeType not to be confused with a file extension.

param path: PathLike | None = None#

Location where the original content was found.

as_bytes() bytes[source]#

Read data as bytes.

Return type:

bytes

as_bytes_io() Generator[BytesIO | BufferedReader, None, None][source]#

Read data as a byte stream.

Return type:

Generator[BytesIO | BufferedReader, None, None]

as_string() str[source]#

Read data as a string.

Return type:

str

classmethod from_data(data: str | bytes, *, encoding: str = 'utf-8', mime_type: str | None = None, path: str | None = None, metadata: dict | None = None) Blob[source]#

Initialize the blob from in-memory data.

Parameters:
  • data (str | bytes) – the in-memory data associated with the blob

  • encoding (str) – Encoding to use if decoding the bytes into a string

  • mime_type (str | None) – if provided, will be set as the mime-type of the data

  • path (str | None) – if provided, will be set as the source from which the data came

  • metadata (dict | None) – Metadata to associate with the blob

Returns:

Blob instance

Return type:

Blob

classmethod from_path(path: str | PurePath, *, encoding: str = 'utf-8', mime_type: str | None = None, guess_type: bool = True, metadata: dict | None = None) Blob[source]#

Load the blob from a path like object.

Parameters:
  • path (str | PurePath) – path like object to file to be read

  • encoding (str) – Encoding to use if decoding the bytes into a string

  • mime_type (str | None) – if provided, will be set as the mime-type of the data

  • guess_type (bool) – If True, the mimetype will be guessed from the file extension, if a mime-type was not provided

  • metadata (dict | None) – Metadata to associate with the blob

Returns:

Blob instance

Return type:

Blob

property source: str | None#

The source location of the blob as string if known otherwise none.

If a path is associated with the blob, it will default to the path location.

Unless explicitly set via a metadata field called “source”, in which case that value will be used instead.

Examples using Blob