KeybertLinkExtractor#
- class langchain_community.graph_vectorstores.extractors.keybert_link_extractor.KeybertLinkExtractor(*, kind: str = 'kw', embedding_model: str = 'all-MiniLM-L6-v2', extract_keywords_kwargs: Dict[str, Any] | None = None)[source]#
Beta
This feature is in beta. It is actively being worked on, so the API may change.
Extract keywords using KeyBERT.
KeyBERT is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings to create keywords and keyphrases that are most similar to a document.
The KeybertLinkExtractor uses KeyBERT to create links between documents that have keywords in common.
Example:
extractor = KeybertLinkExtractor() results = extractor.extract_one("lorem ipsum...")
How to link Documents on common keywords using Keybert#
Preliminaries#
Install the keybert package:
pip install -q langchain_community keybert
Usage#
We load the
state_of_the_union.txt
file, chunk it, then for each chunk we extract keyword links and add them to the chunk.Using extract_one()#
We can use
extract_one()
on a document to get the links and add the links to the document metadata withadd_links()
:from langchain_community.document_loaders import TextLoader from langchain_community.graph_vectorstores import CassandraGraphVectorStore from langchain_community.graph_vectorstores.extractors import KeybertLinkExtractor from langchain_core.graph_vectorstores.links import add_links from langchain_text_splitters import CharacterTextSplitter loader = TextLoader("state_of_the_union.txt") raw_documents = loader.load() text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) documents = text_splitter.split_documents(raw_documents) keyword_extractor = KeybertLinkExtractor() for document in documents: links = keyword_extractor.extract_one(document) add_links(document, links) print(documents[0].metadata)
{'source': 'state_of_the_union.txt', 'links': [Link(kind='kw', direction='bidir', tag='ukraine'), Link(kind='kw', direction='bidir', tag='ukrainian'), Link(kind='kw', direction='bidir', tag='putin'), Link(kind='kw', direction='bidir', tag='vladimir'), Link(kind='kw', direction='bidir', tag='russia')]}
Using LinkExtractorTransformer#
Using the
LinkExtractorTransformer
, we can simplify the link extraction:from langchain_community.document_loaders import TextLoader from langchain_community.graph_vectorstores.extractors import ( KeybertLinkExtractor, LinkExtractorTransformer, ) from langchain_text_splitters import CharacterTextSplitter loader = TextLoader("state_of_the_union.txt") raw_documents = loader.load() text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) documents = text_splitter.split_documents(raw_documents) transformer = LinkExtractorTransformer([KeybertLinkExtractor()]) documents = transformer.transform_documents(documents) print(documents[0].metadata)
{'source': 'state_of_the_union.txt', 'links': [Link(kind='kw', direction='bidir', tag='ukraine'), Link(kind='kw', direction='bidir', tag='ukrainian'), Link(kind='kw', direction='bidir', tag='putin'), Link(kind='kw', direction='bidir', tag='vladimir'), Link(kind='kw', direction='bidir', tag='russia')]}
The documents with keyword links can then be added to a
GraphVectorStore
:from langchain_community.graph_vectorstores import CassandraGraphVectorStore store = CassandraGraphVectorStore.from_documents(documents=documents, embedding=...)
- param kind:
Kind of links to produce with this extractor.
- param embedding_model:
Name of the embedding model to use with KeyBERT.
- param extract_keywords_kwargs:
Keyword arguments to pass to KeyBERT’s
extract_keywords
method.
Methods
__init__
(*[, kind, embedding_model, ...])Extract keywords using KeyBERT.
extract_many
(inputs)Add edges from each input to the corresponding documents.
extract_one
(input)Add edges from each input to the corresponding documents.
- __init__(*, kind: str = 'kw', embedding_model: str = 'all-MiniLM-L6-v2', extract_keywords_kwargs: Dict[str, Any] | None = None)[source]#
Extract keywords using KeyBERT.
KeyBERT is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings to create keywords and keyphrases that are most similar to a document.
The KeybertLinkExtractor uses KeyBERT to create links between documents that have keywords in common.
Example:
extractor = KeybertLinkExtractor() results = extractor.extract_one("lorem ipsum...")
How to link Documents on common keywords using Keybert#
Preliminaries#
Install the keybert package:
pip install -q langchain_community keybert
Usage#
We load the
state_of_the_union.txt
file, chunk it, then for each chunk we extract keyword links and add them to the chunk.Using extract_one()#
We can use
extract_one()
on a document to get the links and add the links to the document metadata withadd_links()
:from langchain_community.document_loaders import TextLoader from langchain_community.graph_vectorstores import CassandraGraphVectorStore from langchain_community.graph_vectorstores.extractors import KeybertLinkExtractor from langchain_core.graph_vectorstores.links import add_links from langchain_text_splitters import CharacterTextSplitter loader = TextLoader("state_of_the_union.txt") raw_documents = loader.load() text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) documents = text_splitter.split_documents(raw_documents) keyword_extractor = KeybertLinkExtractor() for document in documents: links = keyword_extractor.extract_one(document) add_links(document, links) print(documents[0].metadata)
{'source': 'state_of_the_union.txt', 'links': [Link(kind='kw', direction='bidir', tag='ukraine'), Link(kind='kw', direction='bidir', tag='ukrainian'), Link(kind='kw', direction='bidir', tag='putin'), Link(kind='kw', direction='bidir', tag='vladimir'), Link(kind='kw', direction='bidir', tag='russia')]}
Using LinkExtractorTransformer#
Using the
LinkExtractorTransformer
, we can simplify the link extraction:from langchain_community.document_loaders import TextLoader from langchain_community.graph_vectorstores.extractors import ( KeybertLinkExtractor, LinkExtractorTransformer, ) from langchain_text_splitters import CharacterTextSplitter loader = TextLoader("state_of_the_union.txt") raw_documents = loader.load() text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) documents = text_splitter.split_documents(raw_documents) transformer = LinkExtractorTransformer([KeybertLinkExtractor()]) documents = transformer.transform_documents(documents) print(documents[0].metadata)
{'source': 'state_of_the_union.txt', 'links': [Link(kind='kw', direction='bidir', tag='ukraine'), Link(kind='kw', direction='bidir', tag='ukrainian'), Link(kind='kw', direction='bidir', tag='putin'), Link(kind='kw', direction='bidir', tag='vladimir'), Link(kind='kw', direction='bidir', tag='russia')]}
The documents with keyword links can then be added to a
GraphVectorStore
:from langchain_community.graph_vectorstores import CassandraGraphVectorStore store = CassandraGraphVectorStore.from_documents(documents=documents, embedding=...)
- param kind:
Kind of links to produce with this extractor.
- param embedding_model:
Name of the embedding model to use with KeyBERT.
- param extract_keywords_kwargs:
Keyword arguments to pass to KeyBERT’s
extract_keywords
method.
- Parameters:
kind (str) –
embedding_model (str) –
extract_keywords_kwargs (Dict[str, Any] | None) –
- Parameters:
kind (str) –
embedding_model (str) –
extract_keywords_kwargs (Dict[str, Any] | None) –