TinyLlama Salience - Conversation Memory Model
Fine-tuned TinyLlama-1.1B-Chat for conversation memory and salience detection using the LoCoMo dataset.
Model Details
- Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Dataset: Percena/locomo-mc10
- Task: Conversation memory question answering
- Training: 3 epochs with 4-bit quantization
Training Configuration
- LoRA rank: 8
- LoRA alpha: 16
- Learning rate: 3e-4
- Batch size: 4
- Quantization: 4-bit (NF4)
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model = AutoModelForCausalLM.from_pretrained("thebnbrkr/tinyllama-salience")
tokenizer = AutoTokenizer.from_pretrained("thebnbrkr/tinyllama-salience")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
prompt = """Context:
Alice told Bob she's moving to Seattle next month.
Question: Where is Alice moving?
Answer:"""
result = pipe(prompt, max_new_tokens=50)
print(result[0]["generated_text"])
Intended Use
This model is designed to answer questions based on conversational context, identifying salient information from dialogue history.
Limitations
- Small model (1.1B parameters) - may struggle with complex reasoning
- Trained on limited dataset (LoCoMo-10)
- Best for short-context conversation memory tasks
Training Data
Trained on the LoCoMo (Long Context Memory) dataset, specifically the MC10 variant which tests conversation memory through question-answering.
License
Apache 2.0
- Downloads last month
- 20
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for thebnbrkr/tinyllama-salience
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0