Remove deprecated Cognitive Bias

#8
Files changed (1) hide show
  1. README.md +1 -16
README.md CHANGED
@@ -943,22 +943,7 @@ The ALIA-40b-instruct model is an instruction-tuned variant with preliminary ali
943
 
944
  ### Bias and Harm:
945
 
946
- Following [Mina et al. (2025)](https://aclanthology.org/2025.coling-main.120/), we examine the models’ robustness against cognitive biases focusing on positional effects and majority class bias. On the one hand, we measure majority class bias with a 4-shot binary classification experiment using the [SST-2](https://huggingface.co/datasets/BSC-LT/cobie_sst2) dataset (Socher et al., 2013). As detailed in the following table, we observe significant effects with a moderate effect size.
947
-
948
- | **Bias** | **Task** | **Cramér’s V coefficient** |
949
- | --- | --- | --- |
950
- | **Majority Class** | **SST-2** | 0.39 |
951
-
952
- On the other hand, for positional effects, we evaluate primacy and recency biases in 0-shot settings leveraging the [ARC](https://huggingface.co/datasets/BSC-LT/cobie_ai2_arc) dataset (Clark et al., 2018). We detect significant, but relatively weak positional effects. This suggests that the model is relatively robust against the examined cognitive biases.
953
-
954
- | **Bias** | **Task** | **φ coefficient** |
955
- | --- | --- | --- |
956
- | **Primacy** | **ARC-Easy** | 0.10 |
957
- | | **ARC-Challenge** | 0.11 |
958
- | **Recency** | **ARC-Easy** | 0.12 |
959
- | | **ARC-Challenge** | 0.17 |
960
-
961
- In addition, we examine the presence of undesired social biases by measuring the performance and bias scores on the [BBQ](https://huggingface.co/datasets/heegyu/bbq) dataset (Parrish et al., 2022) as well as on their adaptations to the Spanish and Catalan contexts ([EsBBQ](https://huggingface.co/datasets/BSC-LT/EsBBQ) and [CaBBQ](https://huggingface.co/datasets/BSC-LT/CaBBQ), Ruiz-Fernández et al., 2025). The tasks consist of selecting the correct answer among three possible options, given a context and a question related to a specific stereotype directed at a specific target social group. We measure the model’s accuracy on the QA task as well as the bias score, which quantifies the degree to which the model systematically relies on social biases to answer the questions. Note that the bias scores are calculated using the metric originally defined for each respective benchmark.
962
 
963
  Performance is high in disambiguated settings —where the correct answer to the question can be easily gleaned from the context. However, the model tends to fail to choose the correct answer in ambiguous settings —where the correct answer is not provided. Note that the range for the bias score is between -1 and 1; however, all bias scores are positive, which indicates a strong reliance and alignment with social biases to solve the task. This reveals that the model may reflect biases present in its training data and may produce stereotyped, offensive, or harmful content, particularly regarding gender, ethnicity, nationality, and other protected attributes.
964
 
 
943
 
944
  ### Bias and Harm:
945
 
946
+ We examine the presence of undesired social biases by measuring the performance and bias scores on the [BBQ](https://huggingface.co/datasets/heegyu/bbq) dataset (Parrish et al., 2022) as well as on their adaptations to the Spanish and Catalan contexts ([EsBBQ](https://huggingface.co/datasets/BSC-LT/EsBBQ) and [CaBBQ](https://huggingface.co/datasets/BSC-LT/CaBBQ), Ruiz-Fernández et al., 2025). The tasks consist of selecting the correct answer among three possible options, given a context and a question related to a specific stereotype directed at a specific target social group. We measure the model’s accuracy on the QA task as well as the bias score, which quantifies the degree to which the model systematically relies on social biases to answer the questions. Note that the bias scores are calculated using the metric originally defined for each respective benchmark.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
947
 
948
  Performance is high in disambiguated settings —where the correct answer to the question can be easily gleaned from the context. However, the model tends to fail to choose the correct answer in ambiguous settings —where the correct answer is not provided. Note that the range for the bias score is between -1 and 1; however, all bias scores are positive, which indicates a strong reliance and alignment with social biases to solve the task. This reveals that the model may reflect biases present in its training data and may produce stereotyped, offensive, or harmful content, particularly regarding gender, ethnicity, nationality, and other protected attributes.
949