You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

⚖️ To use commercially, please obtain a license. 🙏 Thank you for supporting my research! 🤗

ClinicalEncoder25: The First Diagnosable ColBERT for Clinical Reasoning

ClinicalEncoder25 is a breakthrough in AI for healthcare—a non-generative, interpretable reasoning model that understands clinical text at millisecond speed, with token-level precision. Built on the new Diagnosable ColBERT architecture, it maps every word to a semantic clinical graph, enabling real-time reasoning, retrieval, and debugging.

📖 Read the full announcement: ClinicalEncoder25: The First Diagnosable ColBERT
🧪 Try the live demo: Hover over any word for more details!

Why ClinicalEncoder25?

Most AI models today focus on generation, but understanding comes first. ClinicalEncoder25 is designed for deep, interpretable reasoning in clinical and medical texts, with:

Millisecond-latency document encoding
Token-level semantic mapping to medical ontologies (UMLS, SnomedCT, ICD-10, etc.)
Hallucination-free, non-generative reasoning
Live debugging and interpretability via the Diagnosable ColBERT architecture

It’s the first model to combine late-interaction retrieval, clinical coding, and topic extraction in a single, unified representation.

Model Details

Model Description

Model Type: PyLate ColBERT
Base Model: ettin-encoder-400m
Document Length: 2048 tokens (supports up to 8194 tokens outside pylate)
Query Length: 64 tokens
Output Dimensionality: 128 (after projection) / 1024 (before projection)
Similarity Function: MaxSim
Language: English
License: CC-BY-NC 4.0

Key Features

Diagnosable ColBERT: Every token is interpretable and mapped to a clinical concept.
ClinicalMap25 Integration: Use without the dense projection layer to map tokens directly to medical concepts at the L2 level.
Efficient Retrieval: Uses FastPLAID for fast, scalable similarity search.

Model Sources

Documentation: PyLate Documentation
Repository: PyLate on GitHub
Hugging Face: PyLate models on Hugging Face

Full Model Architecture

ColBERT(
  (0): Transformer({'max_seq_length': 2047, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Dense({'in_features': 1024, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'use_residual': False})
)

Usage

Token-to-Concept mapping

To use the model in its Token-to-Concept mapping, the model should be used directly from the transformers library.

To map Tokens to Concepts:

Encode your text using ClinicalEncoder25 to obtain token embeddings.
Load the ClinicalMap25 embeddings for the target ontology, and compute cosine similarity between your token embeddings and the concept embeddings.
Retrieve the top-matching concepts for each token.

See more details in the Parallia/ClinicalMap25-for-SnomedCT repository.

For extracting local concept mappings, an adapter is required. For extracting a concept-likelihood score, an adapter is also required. If this is relevant to your use case, please contact us.

Finetuning for BERT medical tasks

This model is not a typical ColBERT model, and it can also be used as a finetuning base for many different medical tasks, including medical NER and NEL.

To do so, just load the model using the AutoModel.from_pretrained method of the transformers library, and finetune the model for your favorite task.

This model should perform amazingly well for tasks that require deep understanding of medical concepts.

NOTE: Loading only the Transformer model with transformers will discard the dense layer. This is usually what you want but, for some tasks, the dense layer might provide value. It's advisable to try both approaches for best results.

Retrieval

Indexing documents with a single vector

To index documents in a more tradional one-vector-per-document fashion, the usage of MUVERA is recommended. Implementations exist in the TxtAI library, and in Weaviate.

Indexing documents with multi-vector late interactions

To use the model in its ColBERT mode, it's easier to install the PyLate library:

pip install -U pylate

Load the ColBERT model and initialize the FastPLAID index, then encode and index your documents:

from pylate import indexes, models, retrieve

# Step 1: Load the ColBERT model
model = models.ColBERT(
    model_name_or_path="Parallia/ClinicalEncoder25-Diagnosable-Colbert-L2-for-medical-texts",
)

# Step 2: Initialize the PLAID index
index = indexes.PLAID(
    index_folder="pylate-index",
    index_name="index",
    override=True,  # This overwrites the existing index if any
)

# Step 3: Encode the documents
documents_ids = ["1", "2", "3"]
documents = ["the patient is cold", "the weather is cold", "hypothermia"]

documents_embeddings = model.encode(
    documents,
    batch_size=32,
    is_query=False,  # Ensure that it is set to False to indicate that these are documents, not queries
    show_progress_bar=True,
)

# Step 4: Add document embeddings to the index by providing embeddings and corresponding ids
index.add_documents(
    documents_ids=documents_ids,
    documents_embeddings=documents_embeddings,
)

Note that you do not have to recreate the index and encode the documents every time. Once you have created an index and added the documents, you can re-use the index later by loading it:

# To load an index, simply instantiate it with the correct folder/name and without overriding it
index = indexes.PLAID(
    index_folder="pylate-index",
    index_name="index",
)

Retrieving top-k documents for queries

Once the documents are indexed, you can retrieve the top-k most relevant documents for a given set of queries. To do so, initialize the ColBERT retriever with the index you want to search in, encode the queries and then retrieve the top-k documents to get the top matches ids and relevance scores:

# Step 1: Initialize the ColBERT retriever
retriever = retrieve.ColBERT(index=index)

# Step 2: Encode the queries
queries_embeddings = model.encode(
    ["low body temperature", "it is snowing"],
    batch_size=32,
    is_query=True,  #  # Ensure that it is set to False to indicate that these are queries
    show_progress_bar=True,
)

# Step 3: Retrieve top-k documents
scores = retriever.retrieve(
    queries_embeddings=queries_embeddings,
    k=10,  # Retrieve the top 10 matches for each query
)

Reranking

If you only want to use the ColBERT model to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank:

from pylate import rank, models

queries = [
    "low body temperature",
    "it is snowing",
]

documents = [
    ["document A", "document B"],
    ["document 1", "document C", "document B"],
]

documents_ids = [
    [1, 2],
    [1, 3, 2],
]

model = models.ColBERT(
    model_name_or_path="pylate_model_id",
)

queries_embeddings = model.encode(
    queries,
    is_query=True,
)

documents_embeddings = model.encode(
    documents,
    is_query=False,
)

reranked_documents = rank.rerank(
    documents_ids=documents_ids,
    queries_embeddings=queries_embeddings,
    documents_embeddings=documents_embeddings,
)

License

This model is released under the CC-BY-NC 4.0 license. For commercial use, please obtain a license.

Downloads last month: 89

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for Parallia/ClinicalEncoder25-Diagnosable-Colbert-L2-for-medical-texts

Base model

jhu-clsp/ettin-encoder-400m

Finetuned

(5)

this model

Collection including Parallia/ClinicalEncoder25-Diagnosable-Colbert-L2-for-medical-texts

ClinicalEncoder25: a Diagnosable ColBERT for medical texts

Collection

In this collection, you will find other released models and datasets for ClinicalEncoder25, our brand new retrieval and reasoning model for healthcare • 2 items • Updated 4 days ago