4b392457a071185629b43d746f4d57bb

This model is a fine-tuned version of albert/albert-large-v1 on the nyu-mll/glue dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Mse	Mae	R2
No log	0	0	9.0016	0	1.3998	9.0029	2.5995	-3.0273
No log	1	179	2.7978	0.0078	1.8172	2.7988	1.4154	-0.2520
No log	2	358	3.0951	0.0156	1.6307	3.0957	1.4280	-0.3848
No log	3	537	2.5670	0.0312	1.8107	2.5676	1.3258	-0.1486
No log	4	716	2.5547	0.0625	2.1322	2.5552	1.3305	-0.1430
No log	5	895	2.3377	0.125	2.7865	2.3385	1.3018	-0.0461
0.1464	6	1074	2.1885	0.25	3.9335	2.1893	1.2471	0.0207
2.2043	7	1253	2.2435	0.5	6.4150	2.2443	1.2869	-0.0040
2.1685	8.0	1432	2.5172	1.0	11.5412	2.5179	1.3199	-0.1264
2.2231	9.0	1611	2.3999	1.0	11.4455	2.4007	1.3012	-0.0739
2.2465	10.0	1790	2.3139	1.0	11.6275	2.3147	1.2901	-0.0354

Safetensors

Model size

17.7M params

Tensor type

F32

Base model

Finetuned

(19)

this model