You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Llama 3.2B Quantized (q4_k_m, GGUF)

This repository provides the quantized Llama 3.2B model in GGUF format (q4_k_m) for efficient deployment on resource-constrained environments, including mobile devices.

This model is part of the research published in Springer LNNS (ICT4SD 2025) and openly available on arXiv.

Springer DOI: https://doi.org/10.1007/978-3-032-06697-8_33
arXiv: https://arxiv.org/abs/2512.06490

This model is a quantized version of Meta's original Llama 3.2 model. Please refer to the original model card for full details on its capabilities and limitations.

Base Model: Llama 3.2B (Meta AI)
Quantization: 4-bit Post-Training Quantization (q4_k_m)
Format: GGUF (compatible with llama.cpp and Ollama)
Model File: llama_3.2_3b_q4_k_m.gguf

Llama 3.2B Quantized (q4_k_m, GGUF)

This repository provides the quantized Llama 3.2B model in GGUF format (q4_k_m) for efficient deployment on resource-constrained environments, including mobile devices.

Base Model: Llama 3.2B (Meta AI)
Quantization: 4-bit Post-Training Quantization (nf4 → q4_k_m)
Format: GGUF (compatible with llama.cpp and Ollama)
Model File: llama_3.2_3b_q4_k_m.gguf

Usage with llama.cpp

# Clone llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

# Run with the quantized model
./main -m ./llama_3.2_3b_q4_k_m.gguf -p "Hello, how are you?"

Download

You can download this model directly via:

git lfs install
git clone https://huggingface.co/Cap4ainN3m0/llama-3.2-3b-q4-k-m

Or programmatically:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="Cap4ainN3m0/llama-3.2-3b-q4-k-m", local_dir="models/llama3-quantized")

Project and Code

The full research workflow, Colab notebook, results, and mobile deployment guide are available here:
GitHub Repository

Citation

If you use this model or workflow, please cite:

Yadav, A., & Bhargavi, R.C. (2025).
Optimizing LLMs Using Quantization for Mobile Execution.
Presented at ICT Goa 2025, DOI pending.

Downloads last month: 7

GGUF

Model size

3B params

Architecture

llama

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Cap4ainN3m0/llama-3.2-3b-q4-k-m

Base model

meta-llama/Llama-3.2-3B

Quantized

(120)

this model