TinyLlama Salience - Conversation Memory Model

Fine-tuned TinyLlama-1.1B-Chat for conversation memory and salience detection using the LoCoMo dataset.

Model Details

  • Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Dataset: Percena/locomo-mc10
  • Task: Conversation memory question answering
  • Training: 3 epochs with 4-bit quantization

Training Configuration

  • LoRA rank: 8
  • LoRA alpha: 16
  • Learning rate: 3e-4
  • Batch size: 4
  • Quantization: 4-bit (NF4)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model = AutoModelForCausalLM.from_pretrained("thebnbrkr/tinyllama-salience")
tokenizer = AutoTokenizer.from_pretrained("thebnbrkr/tinyllama-salience")

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

prompt = """Context:
Alice told Bob she's moving to Seattle next month.

Question: Where is Alice moving?
Answer:"""

result = pipe(prompt, max_new_tokens=50)
print(result[0]["generated_text"])

Intended Use

This model is designed to answer questions based on conversational context, identifying salient information from dialogue history.

Limitations

  • Small model (1.1B parameters) - may struggle with complex reasoning
  • Trained on limited dataset (LoCoMo-10)
  • Best for short-context conversation memory tasks

Training Data

Trained on the LoCoMo (Long Context Memory) dataset, specifically the MC10 variant which tests conversation memory through question-answering.

License

Apache 2.0

Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for thebnbrkr/tinyllama-salience

Adapter
(1277)
this model

Dataset used to train thebnbrkr/tinyllama-salience