PDFMinerParser#
- class langchain_community.document_loaders.parsers.pdf.PDFMinerParser(extract_images: bool = False, *, concatenate_pages: bool = True)[source]#
Parse PDF using PDFMiner.
Initialize a parser based on PDFMiner.
- Parameters:
extract_images (bool) β Whether to extract images from PDF.
concatenate_pages (bool) β If True, concatenate all PDF pages into one a single document. Otherwise, return one document per page.
Methods
__init__
([extract_images,Β concatenate_pages])Initialize a parser based on PDFMiner.
lazy_parse
(blob)Lazily parse the blob.
parse
(blob)Eagerly parse the blob into a document or documents.
- __init__(extract_images: bool = False, *, concatenate_pages: bool = True)[source]#
Initialize a parser based on PDFMiner.
- Parameters:
extract_images (bool) β Whether to extract images from PDF.
concatenate_pages (bool) β If True, concatenate all PDF pages into one a single document. Otherwise, return one document per page.