Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
Paper
•
2507.10524
•
Published
•
70
pip install torch transformers accelerate
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load model and tokenizer
model_name = "your-username/fine_tune"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Generate text
prompt = "The key to artificial intelligence is"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_length=100,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
# For advanced users: Access MoR-specific features
# Note: This requires the original MoR codebase for full functionality
from transformers import AutoConfig
config = AutoConfig.from_pretrained(model_name)
# The model supports dynamic recursion depths through routing mechanisms
# See the original repository for complete MoR training and inference scripts
The MoR model introduces several key innovations over standard transformers:
The model maintains competitive performance on standard benchmarks while providing significant efficiency improvements.
If you use this model in your research, please cite:
@misc{bae2025mixtureofrecursionslearningdynamicrecursive,
title={Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation},
author={Sangmin Bae and Yujin Kim and Reza Bayat and Sungnyun Kim and Jiyoun Ha and Tal Schuster and Adam Fisch and Hrayr Harutyunyan and Ziwei Ji and Aaron Courville and Se-Young Yun},
year={2025},
eprint={2507.10524},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2507.10524},
}
This model is released under the MIT License. See the LICENSE file for details.
Sangmin Bae, Yujin Kim, Reza Bayat, Sungnyun Kim, Jiyoun Ha, Tal Schuster, Adam Fisch, Hrayr Harutyunyan, Ziwei Ji, Aaron Courville, Se-Young Yun
KAIST AI, Mila, Google Cloud, Google DeepMind, Google Research, Université de Montréal
For complete training scripts, evaluation code, and advanced MoR features, please visit the official GitHub repository.
Base model
microsoft/DialoGPT-medium