| ⚖️ To use commercially, please obtain a license. 🙏 Thank you for supporting my research! 🤗 |
|---|
ClinicalEncoder25: The First Diagnosable ColBERT for Clinical Reasoning
ClinicalEncoder25 is a breakthrough in AI for healthcare—a non-generative, interpretable reasoning model that understands clinical text at millisecond speed, with token-level precision. Built on the new Diagnosable ColBERT architecture, it maps every word to a semantic clinical graph, enabling real-time reasoning, retrieval, and debugging.
- 📖 Read the full announcement: ClinicalEncoder25: The First Diagnosable ColBERT
- 🧪 Try the live demo: Hover over any word for more details!
Why ClinicalEncoder25?
Most AI models today focus on generation, but understanding comes first. ClinicalEncoder25 is designed for deep, interpretable reasoning in clinical and medical texts, with:
- Millisecond-latency document encoding
- Token-level semantic mapping to medical ontologies (UMLS, SnomedCT, ICD-10, etc.)
- Hallucination-free, non-generative reasoning
- Live debugging and interpretability via the Diagnosable ColBERT architecture
It’s the first model to combine late-interaction retrieval, clinical coding, and topic extraction in a single, unified representation.
Model Details
Model Description
- Model Type: PyLate ColBERT
- Base Model: ettin-encoder-400m
- Document Length: 2048 tokens (supports up to 8194 tokens outside pylate)
- Query Length: 64 tokens
- Output Dimensionality: 128 (after projection) / 1024 (before projection)
- Similarity Function: MaxSim
- Language: English
- License: CC-BY-NC 4.0
Key Features
- Diagnosable ColBERT: Every token is interpretable and mapped to a clinical concept.
- ClinicalMap25 Integration: Use without the dense projection layer to map tokens directly to medical concepts at the L2 level.
- Efficient Retrieval: Uses FastPLAID for fast, scalable similarity search.
Model Sources
- Documentation: PyLate Documentation
- Repository: PyLate on GitHub
- Hugging Face: PyLate models on Hugging Face
Full Model Architecture
ColBERT(
(0): Transformer({'max_seq_length': 2047, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
(1): Dense({'in_features': 1024, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'use_residual': False})
)
Usage
Token-to-Concept mapping
To use the model in its Token-to-Concept mapping, the model should be used directly from the transformers library.
To map Tokens to Concepts:
- Encode your text using ClinicalEncoder25 to obtain token embeddings.
- Load the ClinicalMap25 embeddings for the target ontology, and compute cosine similarity between your token embeddings and the concept embeddings.
- Retrieve the top-matching concepts for each token.
See more details in the Parallia/ClinicalMap25-for-SnomedCT repository.
For extracting local concept mappings, an adapter is required. For extracting a concept-likelihood score, an adapter is also required. If this is relevant to your use case, please contact us.
Finetuning for BERT medical tasks
This model is not a typical ColBERT model, and it can also be used as a finetuning base for many different medical tasks, including medical NER and NEL.
To do so, just load the model using the AutoModel.from_pretrained method of the transformers library, and finetune the model for your favorite task.
This model should perform amazingly well for tasks that require deep understanding of medical concepts.
NOTE: Loading only the Transformer model with transformers will discard the dense layer. This is usually what you want but, for some tasks, the dense layer might provide value. It's advisable to try both approaches for best results.
Retrieval
Indexing documents with a single vector
To index documents in a more tradional one-vector-per-document fashion, the usage of MUVERA is recommended. Implementations exist in the TxtAI library, and in Weaviate.
Indexing documents with multi-vector late interactions
To use the model in its ColBERT mode, it's easier to install the PyLate library:
pip install -U pylate
Load the ColBERT model and initialize the FastPLAID index, then encode and index your documents:
from pylate import indexes, models, retrieve
# Step 1: Load the ColBERT model
model = models.ColBERT(
model_name_or_path="Parallia/ClinicalEncoder25-Diagnosable-Colbert-L2-for-medical-texts",
)
# Step 2: Initialize the PLAID index
index = indexes.PLAID(
index_folder="pylate-index",
index_name="index",
override=True, # This overwrites the existing index if any
)
# Step 3: Encode the documents
documents_ids = ["1", "2", "3"]
documents = ["the patient is cold", "the weather is cold", "hypothermia"]
documents_embeddings = model.encode(
documents,
batch_size=32,
is_query=False, # Ensure that it is set to False to indicate that these are documents, not queries
show_progress_bar=True,
)
# Step 4: Add document embeddings to the index by providing embeddings and corresponding ids
index.add_documents(
documents_ids=documents_ids,
documents_embeddings=documents_embeddings,
)
Note that you do not have to recreate the index and encode the documents every time. Once you have created an index and added the documents, you can re-use the index later by loading it:
# To load an index, simply instantiate it with the correct folder/name and without overriding it
index = indexes.PLAID(
index_folder="pylate-index",
index_name="index",
)
Retrieving top-k documents for queries
Once the documents are indexed, you can retrieve the top-k most relevant documents for a given set of queries. To do so, initialize the ColBERT retriever with the index you want to search in, encode the queries and then retrieve the top-k documents to get the top matches ids and relevance scores:
# Step 1: Initialize the ColBERT retriever
retriever = retrieve.ColBERT(index=index)
# Step 2: Encode the queries
queries_embeddings = model.encode(
["low body temperature", "it is snowing"],
batch_size=32,
is_query=True, # # Ensure that it is set to False to indicate that these are queries
show_progress_bar=True,
)
# Step 3: Retrieve top-k documents
scores = retriever.retrieve(
queries_embeddings=queries_embeddings,
k=10, # Retrieve the top 10 matches for each query
)
Reranking
If you only want to use the ColBERT model to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank:
from pylate import rank, models
queries = [
"low body temperature",
"it is snowing",
]
documents = [
["document A", "document B"],
["document 1", "document C", "document B"],
]
documents_ids = [
[1, 2],
[1, 3, 2],
]
model = models.ColBERT(
model_name_or_path="pylate_model_id",
)
queries_embeddings = model.encode(
queries,
is_query=True,
)
documents_embeddings = model.encode(
documents,
is_query=False,
)
reranked_documents = rank.rerank(
documents_ids=documents_ids,
queries_embeddings=queries_embeddings,
documents_embeddings=documents_embeddings,
)
License
This model is released under the CC-BY-NC 4.0 license. For commercial use, please obtain a license.
- Downloads last month
- 89
Model tree for Parallia/ClinicalEncoder25-Diagnosable-Colbert-L2-for-medical-texts
Base model
jhu-clsp/ettin-encoder-400m