update readme with gen. params. (#7)

- update readme and generation_config.json with sampling params (f5e42c398db499a7d4e780f7ef2677009b129d6c)
- add sampling params recommendation in usage section (8de97bb1bb0b1f1480232db7c10ae1ec1f4e2ff5)

Files changed (2) hide show

README.md +4 -0
generation_config.json +1 -1

README.md CHANGED Viewed

@@ -30,6 +30,9 @@ base_model:
 > [!WARNING]
 > **WARNING:** ALIA-40b-Instruct is an instruction-tuned model with a preliminary alignment process. It has not yet undergone a full alignment procedure to ensure safety. The model may generate biased, factually incorrect, harmful, or inappropriate content. Users should **refer to the Limitations section** and apply additional filtering and alignment processes before deploying this model in production.
 # ALIA-40b-instruct Model Card
 The ALIA-40b-instruct model is an instructed variant of a context-extended [base ALIA-40b model](https://huggingface.co/BSC-LT/ALIA-40b), which was pre-trained from scratch on 9.83 trillion tokens of carefully curated data spanning 35 European languages (including code). This instructed version is optimized to follow user prompts and engage in dialogue. It supports a broad range of languages (e.g. Spanish, Catalan, Basque, English, etc.) and is capable of text generation, translation, summarization, and question-answering in these languages. This version has also gone through a preliminary alignment phase for helpfulness and safety with synthetically generated preference pairs.
@@ -216,6 +219,7 @@ At what temperature does water boil?<|im_end|>
 <|im_start|>assistant
 Water turns into vapor at 100°C.<|im_end|>
 ```
 ---

 > [!WARNING]
 > **WARNING:** ALIA-40b-Instruct is an instruction-tuned model with a preliminary alignment process. It has not yet undergone a full alignment procedure to ensure safety. The model may generate biased, factually incorrect, harmful, or inappropriate content. Users should **refer to the Limitations section** and apply additional filtering and alignment processes before deploying this model in production.
+> [!NOTE]
+> **Sampling Parameters:** For optimal performance, we recommend using temperatures close to zero (0 - 0.2). Additionally, we advise against using any type of repetition penalty, as from our experience, [it negatively impacts instructed model's responses](https://www.reddit.com/r/LocalLLaMA/comments/1g383mq/repetition_penalties_are_terribly_implemented_a/).
 # ALIA-40b-instruct Model Card
 The ALIA-40b-instruct model is an instructed variant of a context-extended [base ALIA-40b model](https://huggingface.co/BSC-LT/ALIA-40b), which was pre-trained from scratch on 9.83 trillion tokens of carefully curated data spanning 35 European languages (including code). This instructed version is optimized to follow user prompts and engage in dialogue. It supports a broad range of languages (e.g. Spanish, Catalan, Basque, English, etc.) and is capable of text generation, translation, summarization, and question-answering in these languages. This version has also gone through a preliminary alignment phase for helpfulness and safety with synthetically generated preference pairs.
 <|im_start|>assistant
 Water turns into vapor at 100°C.<|im_end|>
 ```
+Loading the model with transformers' `AutoModelForCausalLM` guarantees that adequate sampling parameters are used during generation. If using alternative inference libraries such as vLLM, Ollama, or SGLang, it is crucial to verify that optimal parameters are used. To this end, in order to ensure optimal results, we recommend using **temperatures around 0-0.2** without any type of repetition penalties applied.
 ---

generation_config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "_from_model_config": true,
   "bos_token_id": 1,
-  "do_sample": true,
   "eos_token_id": [
     2,
     5

 {
   "_from_model_config": true,
   "bos_token_id": 1,
+  "do_sample": false,
   "eos_token_id": [
     2,
     5