VsdxParser#
- class langchain_community.document_loaders.parsers.vsdx.VsdxParser[source]#
Parser for vsdx files.
Methods
get_pages_content
(zfile, source)Get the content of the pages of a vsdx file.
get_relationships
(page, zfile, filelist, ...)Get the relationships of a page and the relationships of its relationships, etc.
lazy_parse
(blob)Retrieve the contents of pages from a .vsdx file and insert them into documents, one document per page.
parse
(blob)Parse a vsdx file.
- get_pages_content(zfile: ZipFile, source: str) List[Tuple[int, str, str]] [source]#
Get the content of the pages of a vsdx file.
- zfile#
The vsdx file under zip format.
- Type:
zipfile.ZipFile
- source#
The path of the vsdx file.
- Type:
str
- Returns:
A list of tuples containing the page number, the name of the page and the content of the page for each page of the vsdx file.
- Return type:
list[tuple[int, str, str]]
- Parameters:
zfile (ZipFile)
source (str)
- get_relationships(page: str, zfile: ZipFile, filelist: List[str], pagexml_rels: List[dict]) Set[str] [source]#
Get the relationships of a page and the relationships of its relationships, etc… recursively. Pages are based on other pages (ex: background page), so we need to get all the relationships to get all the content of a single page.
- Parameters:
page (str)
zfile (ZipFile)
filelist (List[str])
pagexml_rels (List[dict])
- Return type:
Set[str]