DDRO Step-2 SFT Reference Policy β€” NQ + TU (Title/URL)

This repository contains the Step 2 (SFT) checkpoint used as the reference policy (Ο€_ref) / initialization for the DDRO training stage from Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval.

What this is

  • Stage: Step 2 / supervised fine-tuning (SFT)
  • Role in DDRO: reference policy (Ο€_ref)

Corresponding final DDRO model

  • Post-DDRO (Step 3) checkpoint: kiyam/ddro-nq-tu

How to load

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

repo = "kiyam/ddro-nq-tu-sft"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForSeq2SeqLM.from_pretrained(repo)

Intended use

Research and reproducibility. Please cite the DDRO paper if you use this checkpoint.

Downloads last month
19
Safetensors
Model size
0.2B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including kiyam/ddro-nq-tu-sft