GLiNERLinkExtractor#
- class langchain_community.graph_vectorstores.extractors.gliner_link_extractor.GLiNERLinkExtractor(labels: List[str], *, kind: str = 'entity', model: str = 'urchade/gliner_mediumv2.1', extract_kwargs: Dict[str, Any] | None = None)[source]#
Beta
This feature is in beta. It is actively being worked on, so the API may change.
Link documents with common named entities using GLiNER.
GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoder (BERT-like).
The
GLiNERLinkExtractor
uses GLiNER to create links between documents that have named entities in common.Example:
extractor = GLiNERLinkExtractor( labels=["Person", "Award", "Date", "Competitions", "Teams"] ) results = extractor.extract_one("some long text...")
How to link Documents on common named entities#
Preliminaries#
Install the
gliner
package:pip install -q langchain_community gliner
Usage#
We load the
state_of_the_union.txt
file, chunk it, then for each chunk we extract named entity links and add them to the chunk.Using extract_one()#
We can use
extract_one()
on a document to get the links and add the links to the document metadata withadd_links()
:from langchain_community.document_loaders import TextLoader from langchain_community.graph_vectorstores import CassandraGraphVectorStore from langchain_community.graph_vectorstores.extractors import GLiNERLinkExtractor from langchain_core.graph_vectorstores.links import add_links from langchain_text_splitters import CharacterTextSplitter loader = TextLoader("state_of_the_union.txt") raw_documents = loader.load() text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) documents = text_splitter.split_documents(raw_documents) ner_extractor = GLiNERLinkExtractor(["Person", "Topic"]) for document in documents: links = ner_extractor.extract_one(document) add_links(document, links) print(documents[0].metadata)
{'source': 'state_of_the_union.txt', 'links': [Link(kind='entity:Person', direction='bidir', tag='President Zelenskyy'), Link(kind='entity:Person', direction='bidir', tag='Vladimir Putin')]}
Using LinkExtractorTransformer#
Using the
LinkExtractorTransformer
, we can simplify the link extraction:from langchain_community.document_loaders import TextLoader from langchain_community.graph_vectorstores.extractors import ( GLiNERLinkExtractor, LinkExtractorTransformer, ) from langchain_text_splitters import CharacterTextSplitter loader = TextLoader("state_of_the_union.txt") raw_documents = loader.load() text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) documents = text_splitter.split_documents(raw_documents) ner_extractor = GLiNERLinkExtractor(["Person", "Topic"]) transformer = LinkExtractorTransformer([ner_extractor]) documents = transformer.transform_documents(documents) print(documents[0].metadata)
{'source': 'state_of_the_union.txt', 'links': [Link(kind='entity:Person', direction='bidir', tag='President Zelenskyy'), Link(kind='entity:Person', direction='bidir', tag='Vladimir Putin')]}
The documents with named entity links can then be added to a
GraphVectorStore
:from langchain_community.graph_vectorstores import CassandraGraphVectorStore store = CassandraGraphVectorStore.from_documents(documents=documents, embedding=...)
- param labels:
List of kinds of entities to extract.
- param kind:
Kind of links to produce with this extractor.
- param model:
GLiNER model to use.
- param extract_kwargs:
Keyword arguments to pass to GLiNER.
Methods
__init__
(labels,Β *[,Β kind,Β model,Β ...])extract_many
(inputs)Add edges from each input to the corresponding documents.
extract_one
(input)Add edges from each input to the corresponding documents.
- __init__(labels: List[str], *, kind: str = 'entity', model: str = 'urchade/gliner_mediumv2.1', extract_kwargs: Dict[str, Any] | None = None)[source]#
- Parameters:
labels (List[str]) β
kind (str) β
model (str) β
extract_kwargs (Dict[str, Any] | None) β
- Parameters:
labels (List[str]) β
kind (str) β
model (str) β
extract_kwargs (Dict[str, Any] | None) β