Skip to main content


CTranslate2 is a C++ and Python library for efficient inference with Transformer models.

The project implements a custom runtime that applies many performance optimization techniques such as weights quantization, layers fusion, batch reordering, etc., to accelerate and reduce the memory usage of Transformer models on CPU and GPU.

A full list of features and supported models is included in the projectโ€™s repository. To start, please check out the official quickstart guide.

Installation and Setupโ€‹

Install the Python package:

pip install ctranslate2


See a usage example.

from langchain_community.llms import CTranslate2
API Reference:CTranslate2

Was this page helpful?

You can leave detailed feedback on GitHub.