You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Llama 3.2B Quantized (q4_k_m, GGUF)

This repository provides the quantized Llama 3.2B model in GGUF format (q4_k_m) for efficient deployment on resource-constrained environments, including mobile devices.

This model is part of the research published in Springer LNNS (ICT4SD 2025) and openly available on arXiv.

This model is a quantized version of Meta's original Llama 3.2 model. Please refer to the original model card for full details on its capabilities and limitations.

  • Base Model: Llama 3.2B (Meta AI)
  • Quantization: 4-bit Post-Training Quantization (q4_k_m)
  • Format: GGUF (compatible with llama.cpp and Ollama)
  • Model File: llama_3.2_3b_q4_k_m.gguf

Llama 3.2B Quantized (q4_k_m, GGUF)

This repository provides the quantized Llama 3.2B model in GGUF format (q4_k_m) for efficient deployment on resource-constrained environments, including mobile devices.

  • Base Model: Llama 3.2B (Meta AI)
  • Quantization: 4-bit Post-Training Quantization (nf4 โ†’ q4_k_m)
  • Format: GGUF (compatible with llama.cpp and Ollama)
  • Model File: llama_3.2_3b_q4_k_m.gguf

Usage with llama.cpp

# Clone llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

# Run with the quantized model
./main -m ./llama_3.2_3b_q4_k_m.gguf -p "Hello, how are you?"

Download

You can download this model directly via:

git lfs install
git clone https://huggingface.co/Cap4ainN3m0/llama-3.2-3b-q4-k-m

Or programmatically:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="Cap4ainN3m0/llama-3.2-3b-q4-k-m", local_dir="models/llama3-quantized")

Project and Code

The full research workflow, Colab notebook, results, and mobile deployment guide are available here:
GitHub Repository


Citation

If you use this model or workflow, please cite:

Yadav, A., & Bhargavi, R.C. (2025).
Optimizing LLMs Using Quantization for Mobile Execution.
Presented at ICT Goa 2025, DOI pending.
Downloads last month
7
GGUF
Model size
3B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Cap4ainN3m0/llama-3.2-3b-q4-k-m

Quantized
(120)
this model