🐻 Gumini-1.5B (구미니)

Built with Qwen

5,700× less data, better performance.
Gumini-1.5B achieves Korean PPL 8.49 with only 3.14B tokens, outperforming Qwen-1.5B (18T tokens, PPL 8.84).

🔥 Key Results

Model Params Training Tokens Korean PPL ↓ Rank
Qwen-2.5-7B 7.62B 18T 6.39 #1
Gemma-2B 2.0B 2T 8.15 #2
Gumini-1.5B (Ours) 1.54B 3.14B 8.49 #3
Qwen-2.5-1.5B 1.5B 18T 8.84 #4
Llama-3.2-3B 3.21B 9T 9.47 #5
EXAONE-3.5-2.4B 2.4B ~6.5T 9.80 #6

📊 Data Efficiency

vs Model Their Tokens Gumini Tokens Efficiency
Qwen-2.5 18T 3.14B 5,732× less
Llama-3.2 9T 3.14B 2,866× less
EXAONE-3.5 ~6.5T 3.14B ~2,070× less

Model Description

Gumini-1.5B (구미니) is a bilingual Korean-English base language model trained using the Inheritune methodology. Starting from Qwen 2.5 3B, the model progressively grew from 10 to 16 layers through 7 training stages, with ~3.14B tokens of continued pretraining on a Korean–English mixed corpus.

This is a BASE model, not instruction-tuned.
It produces text continuations rather than conversational responses.

Training Highlights

Inheritune Progressive Layer Growing

Stage 0: 10 layers (1.08B) → 393M tokens
Stage 1: 11 layers (1.15B) → 393M tokens
Stage 2: 12 layers (1.23B) → 393M tokens
Stage 3: 13 layers (1.31B) → 393M tokens
Stage 4: 14 layers (1.39B) → 393M tokens
Stage 5: 15 layers (1.47B) → 393M tokens
Stage 6: 16 layers (1.54B) → 786M tokens ⭐
────────────────────────────────────────────
Total: 16 layers, 1.54B params, ~3.14B tokens

Model Details

Attribute Value
Researcher Gumin Kwon (권구민)
Base Model Qwen/Qwen2.5-3B
Training Method Inheritune + Pretraining
Parameters 1.54B
Layers 16
Hidden Size 2048
Attention Heads 16
KV Heads 2 (GQA)
Vocab Size 151,936
Total Tokens Trained ~3.14B
Precision BF16

Training Data

Dataset Language Weight
FineWeb-Edu (sample-10BT) English 20%
CulturaX-ko Korean 50%
Wikipedia-ko Korean 30%

Total: 80% Korean, 20% English

Optimization

learning_rate: 2.0e-4
weight_decay: 0.1
lr_scheduler: cosine
warmup_ratio: 0.01
max_grad_norm: 1.0
precision: bf16
gradient_checkpointing: true
attention: PyTorch SDPA (Flash Attention)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "GuminiResearch/Gumini-1.5B-Base",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("GuminiResearch/Gumini-1.5B-Base")

prompt = "저는 구미니입니다."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    repetition_penalty=1.2,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Using Pipeline

from transformers import pipeline

generator = pipeline(
    "text-generation",
    model="GuminiResearch/Gumini-1.5B-Base",
    torch_dtype="bfloat16",
    device_map="auto",
)

output = generator(
    "저는 구미니입니다.",
    max_new_tokens=100,
    temperature=0.7,
    repetition_penalty=1.2,
)
print(output[0]["generated_text"])

Evaluation

Stage Layers Parameters
0 10 1.08B
5 15 1.47B
6 16 1.54B

Model Family

Model Layers Params Tokens Status
Gumini-1B 10 1.08B 393M ✅ Released
Gumini-1.5B 16 1.54B 3.14B This Model

Limitations

  • Base model: No instruction-tuning or safety alignment
  • High repetition risk: Use repetition_penalty >= 1.2
  • May generate incorrect or outdated information
  • Should not be used in sensitive or safety-critical contexts
  • Knowledge cutoff based on training data

License

Qwen Research License (Non-Commercial)

This model is Built with Qwen and derived from Qwen 2.5 3B.

Qwen is licensed under the Qwen RESEARCH LICENSE AGREEMENT.
Copyright (c) Alibaba Cloud. All Rights Reserved.

This model is for NON-COMMERCIAL / RESEARCH use only.
For commercial use, contact Alibaba Cloud.

References

Inheritune Paper

@inproceedings{Sanyal2024inheritune,
  title={Inheritune: Training Smaller Yet More Attentive Language Models},
  author={Sunny Sanyal and Ravid Shwartz-Ziv and Alexandros G. Dimakis and Sujay Sanghavi},
  year={2024},
  url={https://arxiv.org/abs/2404.08634}
}

Qwen 2.5

@misc{qwen2.5,
  title={Qwen2.5: A Party of Foundation Models},
  author={Qwen Team},
  year={2024},
  url={https://qwenlm.github.io/blog/qwen2.5/}
}

Citation

@misc{gumini2025,
  title={Gumini-1.5B: Bilingual Korean-English Language Model via Inheritune},
  author={Gumin Kwon},
  year={2025},
  note={Built with Qwen. Trained with Inheritune progressive layer growing.},
  url={https://huggingface.co/GuminiResearch/Gumini-1.5B-Base}
}

Author

Gumin Kwon (권구민)


Built with Qwen
Gumini - 작지만 똑똑한 AI

Downloads last month
113
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for GuminiResearch/Gumini-1.5B-Base

Base model

Qwen/Qwen2.5-3B
Finetuned
(278)
this model
Quantizations
1 model

Datasets used to train GuminiResearch/Gumini-1.5B-Base

Space using GuminiResearch/Gumini-1.5B-Base 1

Collection including GuminiResearch/Gumini-1.5B-Base