Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection Paper • 2602.03216 • Published 3 days ago • 12
L4Q: Parameter Efficient Quantization-Aware Training on Large Language Models via LoRA-wise LSQ Paper • 2402.04902 • Published Feb 7, 2024 • 5
LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents Paper • 2602.01053 • Published 5 days ago • 7
Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection Paper • 2602.03216 • Published 3 days ago • 12
Retrospective Sparse Attention for Efficient Long-Context Generation Paper • 2508.09001 • Published Aug 12, 2025 • 2
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published Oct 30, 2025 • 122
Exploring Conditions for Diffusion models in Robotic Control Paper • 2510.15510 • Published Oct 17, 2025 • 40
LightMem: Lightweight and Efficient Memory-Augmented Generation Paper • 2510.18866 • Published Oct 21, 2025 • 113