### How to run

* Install libraries using the cell below (for grazie-api-gateway-client you will have to add a custom JB repository)
* Put the production prompt to file `data/prod_prompt.txt`
* Environment variables:
    - `GRAZIE_API_JWT_TOKEN` -- JWT token for grazie (check `api_wrappers/grazie_wrapper.py` to adjust the client initialization if necessary)
    - `HF_TOKEN` -- should _not_ be required; however, if it is, set it to a valid Hugging Face token

In [1]:
# !pip install grazie-api-gateway-client
# !pip install tqdm
# !pip install pandas
# !pip install datasets

In [2]:
from api_wrappers.grazie_wrapper import generate_for_prompt
from api_wrappers.hf_data_loader import load_full_commit_with_predictions_as_pandas
from tqdm import tqdm

tqdm.pandas()

In [3]:
with open("data/prod_prompt.txt") as f:
	PROD_PROMPT = f.read().strip()

def prod_prompt(diff):
	return PROD_PROMPT.replace("$diff", diff).replace("$text", "")

def generate_commit_message_prod(diff):
	return generate_for_prompt(prod_prompt(diff))

In [4]:
generate_commit_message_prod("TEST")

"Certainly! I'll need to see the specific code differences (diffs) you would like to have summarized into a commit message. Please provide the diffs so I can assist you properly."

In [5]:
DATA = load_full_commit_with_predictions_as_pandas()[["mods", "prediction"]].rename(columns={"mods": "diff", "prediction": "prediction_current"})
DATA.head()

Using the latest cached version of the dataset since JetBrains-Research/lca-commit-message-generation couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'commitchronicle-py-long' at cache\JetBrains-Research___lca-commit-message-generation\commitchronicle-py-long\0.0.0\58dcef83a63cccebacd3e786afd73181cc9175e5 (last modified on Sun Apr  7 11:16:22 2024).
Using the latest cached version of the dataset since JetBrains-Research/lca-results couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'cmg_gpt_4_0613' at cache\JetBrains-Research___lca-results\cmg_gpt_4_0613\0.0.0\4b56bbf7243da371b3e0a42a0c9db1f37af98c39 (last modified on Fri May 31 16:00:33 2024).


Unnamed: 0,diff,prediction_current
0,"[{'change_type': 'MODIFY', 'old_path': 'cupy/c...",Extend memory management to consider CUDA stre...
1,"[{'change_type': 'MODIFY', 'old_path': 'tests/...",Implement utility methods for parameterized te...
2,"[{'change_type': 'MODIFY', 'old_path': 'numpy/...",Update numpy function imports to use numpy as ...
3,"[{'change_type': 'MODIFY', 'old_path': 'numpy/...",Switch to using internal implementation method...
4,"[{'change_type': 'MODIFY', 'old_path': 'numpy/...",Add type hints and refine array API wrappers\n...


In [6]:
DATA["prediction_prod"] = DATA.progress_apply(lambda row: generate_commit_message_prod(str(row["diff"])), axis=1)

100%|██████████| 163/163 [11:58<00:00,  4.41s/it]


In [7]:
current_avg_length = DATA["prediction_current"].str.len().mean()
print(f"Current average length: {current_avg_length}")

Current average length: 625.5644171779142


In [8]:
prod_avg_length = DATA["prediction_prod"].str.len().mean()
print(f"Prod average length: {prod_avg_length}")

Prod average length: 352.88957055214723


In [9]:
print(f"Length ratio (current / prod): {current_avg_length / prod_avg_length})")

Length ratio (current / prod): 1.772691712591923)
