[docs]classBaseSparseEmbedding(ABC):"""Interface for Sparse embedding models. You can inherit from it and implement your custom sparse embedding model. """
[docs]classBM25SparseEmbedding(BaseSparseEmbedding):"""Sparse embedding model based on BM25. **Note: We recommend using the Milvus built-in BM25 function to implement sparse embedding in your application. This class is more of a reference because it requires the user to manage the corpus, which is not practical. The Milvus built-in function solves this problem and makes the BM25 sparse process easier and less frustrating for users. For more information, please refer to: https://milvus.io/docs/full-text-search.md#Full-Text-Search and https://github.com/milvus-io/bootcamp/blob/master/bootcamp/tutorials/integration/langchain/full_text_search_with_langchain.ipynb ** This class uses the BM25 model in Milvus model to implement sparse vector embedding. This model requires pymilvus[model] to be installed. `pip install pymilvus[model]` For more information please refer to: https://milvus.io/docs/embed-with-bm25.md """
[docs]def__init__(self,corpus:List[str],language:str="en"):warnings.warn("BM25SparseEmbedding class will be deprecated in the future. ""We recommend using the Milvus built-in BM25 function instead, ""which is easier to use ""and doesn't require manual corpus management. ""For more information, please refer to: ""https://milvus.io/docs/full-text-search.md#Full-Text-Search",DeprecationWarning,stacklevel=2,)frompymilvus.model.sparseimportBM25EmbeddingFunction# type: ignorefrompymilvus.model.sparse.bm25.tokenizersimport(# type: ignorebuild_default_analyzer,)self.analyzer=build_default_analyzer(language=language)self.bm25_ef=BM25EmbeddingFunction(self.analyzer,num_workers=1)self.bm25_ef.fit(corpus)