Source code for langchain_community.document_loaders.parsers.registry
"""Module includes a registry of default parser configurations."""fromlangchain_community.document_loaders.baseimportBaseBlobParserfromlangchain_community.document_loaders.parsers.genericimportMimeTypeBasedParserfromlangchain_community.document_loaders.parsers.mswordimportMsWordParserfromlangchain_community.document_loaders.parsers.pdfimportPyMuPDFParserfromlangchain_community.document_loaders.parsers.txtimportTextParserdef_get_default_parser()->BaseBlobParser:"""Get default mime-type based parser."""returnMimeTypeBasedParser(handlers={"application/pdf":PyMuPDFParser(),"text/plain":TextParser(),"application/msword":MsWordParser(),"application/vnd.openxmlformats-officedocument.wordprocessingml.document":(MsWordParser()),},fallback_parser=None,)_REGISTRY={"default":_get_default_parser,}# PUBLIC API
[docs]defget_parser(parser_name:str)->BaseBlobParser:"""Get a parser by parser name."""ifparser_namenotin_REGISTRY:raiseValueError(f"Unknown parser combination: {parser_name}")return_REGISTRY[parser_name]()