TinyLlama Salience - Conversation Memory Model

Fine-tuned TinyLlama-1.1B-Chat for conversation memory and salience detection using the LoCoMo dataset.

Model Details

Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Dataset: Percena/locomo-mc10
Task: Conversation memory question answering
Training: 3 epochs with 4-bit quantization

Training Configuration

LoRA rank: 8
LoRA alpha: 16
Learning rate: 3e-4
Batch size: 4
Quantization: 4-bit (NF4)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model = AutoModelForCausalLM.from_pretrained("thebnbrkr/tinyllama-salience")
tokenizer = AutoTokenizer.from_pretrained("thebnbrkr/tinyllama-salience")

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

prompt = """Context:
Alice told Bob she's moving to Seattle next month.

Question: Where is Alice moving?
Answer:"""

result = pipe(prompt, max_new_tokens=50)
print(result[0]["generated_text"])

Intended Use

This model is designed to answer questions based on conversational context, identifying salient information from dialogue history.

Limitations

Small model (1.1B parameters) - may struggle with complex reasoning
Trained on limited dataset (LoCoMo-10)
Best for short-context conversation memory tasks

Training Data

Trained on the LoCoMo (Long Context Memory) dataset, specifically the MC10 variant which tests conversation memory through question-answering.

License

Apache 2.0

Downloads last month: 20

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for thebnbrkr/tinyllama-salience

Base model

TinyLlama/TinyLlama-1.1B-Chat-v1.0

Adapter

(1277)

this model

thebnbrkr
/

tinyllama-salience