metadata
language:
- en
- zh
license: apache-2.0
pipeline_tag: text-generation
library_name: transformers
BlockFFN-Medium
This is the original 0.5B BlockFFN checkpoint used in the paper BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity for acceleration tests.
Usage
You can load and use this model simply by using AutoTokenizer and AutoModelForCausalLM from the transformers library.
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
import torch
# Assuming the model ID is "SparseLLM/BlockFFN-Medium"
model_id = "SparseLLM/BlockFFN-Medium"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, trust_remote_code=True)
# Create a text generation pipeline
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
device_map="auto",
)
# Example usage
prompt = "The quick brown fox jumps over the lazy"
result = pipe(prompt, max_new_tokens=50, do_sample=True, top_p=0.9, temperature=0.7)
print(result[0]["generated_text"])
Citation
If you find our work useful for your research, please kindly cite our paper as follows:
@article{song2025blockffn,
title={{BlockFFN}: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity},
author={Chenyang Song and Weilin Zhao and Xu Han and Chaojun Xiao and Yingfa Chen and Yuxuan Li and Zhiyuan Liu and Maosong Sun},
journal={arXiv preprint arXiv:2507.08771},
year={2025},
url={https://arxiv.org/pdf/2507.08771},
}