CSVLoader#
- class langchain_community.document_loaders.csv_loader.CSVLoader(file_path: str | Path, source_column: str | None = None, metadata_columns: Sequence[str] = (), csv_args: Dict | None = None, encoding: str | None = None, autodetect_encoding: bool = False, *, content_columns: Sequence[str] = ())[source]#
Load a CSV file into a list of Documents.
Each document represents one row of the CSV file. Every row is converted into a key/value pair and outputted to a new line in the documentβs page_content.
The source for each document loaded from csv is set to the value of the file_path argument for all documents by default. You can override this by setting the source_column argument to the name of a column in the CSV file. The source of each document will then be set to the value of the column with the name specified in source_column.
- Output Example:
column1: value1 column2: value2 column3: value3
- Instantiate:
from langchain_community.document_loaders import CSVLoader loader = CSVLoader(file_path='./hw_200.csv', csv_args={ 'delimiter': ',', 'quotechar': '"', 'fieldnames': ['Index', 'Height', 'Weight'] })
- Load:
docs = loader.load() print(docs[0].page_content[:100]) print(docs[0].metadata)
Index: Index Height: Height(Inches)" Weight: "Weight(Pounds)" {'source': './hw_200.csv', 'row': 0}
- Async load:
docs = await loader.aload() print(docs[0].page_content[:100]) print(docs[0].metadata)
Index: Index Height: Height(Inches)" Weight: "Weight(Pounds)" {'source': './hw_200.csv', 'row': 0}
- Lazy load:
docs = [] docs_lazy = loader.lazy_load() # async variant: # docs_lazy = await loader.alazy_load() for doc in docs_lazy: docs.append(doc) print(docs[0].page_content[:100]) print(docs[0].metadata)
Index: Index Height: Height(Inches)" Weight: "Weight(Pounds)" {'source': './hw_200.csv', 'row': 0}
- Parameters:
file_path (str | Path) β The path to the CSV file.
source_column (str | None) β The name of the column in the CSV file to use as the source. Optional. Defaults to None.
metadata_columns (Sequence[str]) β A sequence of column names to use as metadata. Optional.
csv_args (Dict | None) β A dictionary of arguments to pass to the csv.DictReader. Optional. Defaults to None.
encoding (str | None) β The encoding of the CSV file. Optional. Defaults to None.
autodetect_encoding (bool) β Whether to try to autodetect the file encoding.
content_columns (Sequence[str]) β A sequence of column names to use for the document content. If not present, use all columns that are not part of the metadata.
Methods
__init__
(file_path[,Β source_column,Β ...])- param file_path:
The path to the CSV file.
A lazy loader for Documents.
aload
()Load data into Document objects.
A lazy loader for Documents.
load
()Load data into Document objects.
load_and_split
([text_splitter])Load Documents and split into chunks.
- __init__(file_path: str | Path, source_column: str | None = None, metadata_columns: Sequence[str] = (), csv_args: Dict | None = None, encoding: str | None = None, autodetect_encoding: bool = False, *, content_columns: Sequence[str] = ())[source]#
- Parameters:
file_path (str | Path) β The path to the CSV file.
source_column (str | None) β The name of the column in the CSV file to use as the source. Optional. Defaults to None.
metadata_columns (Sequence[str]) β A sequence of column names to use as metadata. Optional.
csv_args (Dict | None) β A dictionary of arguments to pass to the csv.DictReader. Optional. Defaults to None.
encoding (str | None) β The encoding of the CSV file. Optional. Defaults to None.
autodetect_encoding (bool) β Whether to try to autodetect the file encoding.
content_columns (Sequence[str]) β A sequence of column names to use for the document content. If not present, use all columns that are not part of the metadata.
- async alazy_load() AsyncIterator[Document] #
A lazy loader for Documents.
- Return type:
AsyncIterator[Document]
- lazy_load() Iterator[Document] [source]#
A lazy loader for Documents.
- Return type:
Iterator[Document]
- load_and_split(text_splitter: TextSplitter | None = None) List[Document] #
Load Documents and split into chunks. Chunks are returned as Documents.
Do not override this method. It should be considered to be deprecated!
- Parameters:
text_splitter (Optional[TextSplitter]) β TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.
- Returns:
List of Documents.
- Return type:
List[Document]
Examples using CSVLoader