GLiNERLinkExtractor#

class langchain_community.graph_vectorstores.extractors.gliner_link_extractor.GLiNERLinkExtractor(labels: List[str], *, kind: str = 'entity', model: str = 'urchade/gliner_mediumv2.1', extract_kwargs: Dict[str, Any] | None = None)[source]#

Beta

This feature is in beta. It is actively being worked on, so the API may change.

Link documents with common named entities using GLiNER.

GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoder (BERT-like).

The GLiNERLinkExtractor uses GLiNER to create links between documents that have named entities in common.

Example:

extractor = GLiNERLinkExtractor(
    labels=["Person", "Award", "Date", "Competitions", "Teams"]
)
results = extractor.extract_one("some long text...")

How to link Documents on common named entities#

Preliminaries#

Install the gliner package:

pip install -q langchain_community gliner

Usage#

We load the state_of_the_union.txt file, chunk it, then for each chunk we extract named entity links and add them to the chunk.

Using extract_one()#

We can use extract_one() on a document to get the links and add the links to the document metadata with add_links():

from langchain_community.document_loaders import TextLoader
from langchain_community.graph_vectorstores import CassandraGraphVectorStore
from langchain_community.graph_vectorstores.extractors import GLiNERLinkExtractor
from langchain_core.graph_vectorstores.links import add_links
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("state_of_the_union.txt")
raw_documents = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)

ner_extractor = GLiNERLinkExtractor(["Person", "Topic"])
for document in documents:
    links = ner_extractor.extract_one(document)
    add_links(document, links)

print(documents[0].metadata)

{'source': 'state_of_the_union.txt', 'links': [Link(kind='entity:Person', direction='bidir', tag='President Zelenskyy'), Link(kind='entity:Person', direction='bidir', tag='Vladimir Putin')]}

Using LinkExtractorTransformer#

Using the LinkExtractorTransformer, we can simplify the link extraction:

from langchain_community.document_loaders import TextLoader
from langchain_community.graph_vectorstores.extractors import (
    GLiNERLinkExtractor,
    LinkExtractorTransformer,
)
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("state_of_the_union.txt")
raw_documents = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)

ner_extractor = GLiNERLinkExtractor(["Person", "Topic"])
transformer = LinkExtractorTransformer([ner_extractor])
documents = transformer.transform_documents(documents)

print(documents[0].metadata)

{'source': 'state_of_the_union.txt', 'links': [Link(kind='entity:Person', direction='bidir', tag='President Zelenskyy'), Link(kind='entity:Person', direction='bidir', tag='Vladimir Putin')]}

The documents with named entity links can then be added to a GraphVectorStore:

from langchain_community.graph_vectorstores import CassandraGraphVectorStore

store = CassandraGraphVectorStore.from_documents(documents=documents, embedding=...)

param labels:: List of kinds of entities to extract.
param kind:: Kind of links to produce with this extractor.
param model:: GLiNER model to use.
param extract_kwargs:: Keyword arguments to pass to GLiNER.

Methods

`__init__`(labels, *[, kind, model, ...])
`extract_many`(inputs)	Add edges from each input to the corresponding documents.
`extract_one`(input)	Add edges from each input to the corresponding documents.

__init__(labels: List[str], *, kind: str = 'entity', model: str = 'urchade/gliner_mediumv2.1', extract_kwargs: Dict[str, Any] | None = None)[source]#

Parameters:

labels (List[str]) –
kind (str) –
model (str) –
extract_kwargs (Dict[str, Any] | None) –

extract_many(inputs: Iterable[str | Document]) → Iterable[Set[Link]][source]#

Add edges from each input to the corresponding documents.

Parameters:: inputs (Iterable[str | Document]) – The input content to extract edges from.
Returns:: Iterable over the set of links extracted from the input.
Return type:: Iterable[Set[Link]]

extract_one(input: str | Document) → Set[Link][source]#

Add edges from each input to the corresponding documents.

Parameters:: input (str | Document) – The input content to extract edges from.
Returns:: Set of links extracted from the input.
Return type:: Set[Link]

Parameters:

labels (List[str]) –
kind (str) –
model (str) –
extract_kwargs (Dict[str, Any] | None) –