Submitted by Jingfeng Yao 82 Towards Scalable Pre-training of Visual Tokenizers for Generation MiniMax 162 4