End of training

Browse files

Files changed (6) hide show

README.md +21 -16
logs/learning_rate=1e-05, per_device_train_batch_size=4, warmup_ratio=0.5/events.out.tfevents.1724121660.5f530b1cf724 +3 -0
logs/learning_rate=1e-05, per_device_train_batch_size=4, warmup_ratio=0.5/events.out.tfevents.1724124987.5f530b1cf724 +3 -0
logs/learning_rate=4e-05, per_device_train_batch_size=8, warmup_ratio=0.5/completed.flag +0 -0
model.safetensors +1 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -16,14 +16,14 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
 The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
-- eval_enwikippl: 173.0
-- eval_frwikippl: 624.0
-- eval_zhwikippl: 160.0
-- eval_tinystoriesppl: 145.0
-- eval_loss: 1.1443
-- eval_runtime: 12.6089
-- eval_samples_per_second: 47.585
-- eval_steps_per_second: 11.896
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
@@ -48,8 +48,8 @@ More information needed
 The following hyperparameters were used during training:
 - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
 - train_embeddings: True
-- learning_rate: 4e-05
-- train_batch_size: 8
 - eval_batch_size: 4
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
@@ -64,12 +64,17 @@ Peak GPU Memory: 7.9381 GB
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | **teacher eval** |  | 43.75 | 61.75 |  |  |  |  | 11.8125 | 19.125 |
-| 0 | 0 | 1176821039104.0 | 72567767433216.0 | 20.1450 | 12.5909 | 47.653 | 11.913 | 3019898880.0 | 12713103196160.0 |
-| 1500 | 0.2020 | 864.0 | 4992.0 | 2.2099 | 12.5621 | 47.763 | 11.941 | 556.0 | 6784.0 |
-| 3000 | 0.4040 | 370.0 | 1720.0 | 1.6174 | 12.5358 | 47.863 | 11.966 | 270.0 | 286.0 |
-| 4500 | 0.6061 | 216.0 | 808.0 | 1.2965 | 12.5792 | 47.698 | 11.924 | 174.0 | 202.0 |
-| 6000 | 0.8081 | 178.0 | 676.0 | 1.1639 | 12.4818 | 48.07 | 12.017 | 149.0 | 162.0 |
-| 7425 | 1.0 | 173.0 | 624.0 | 1.1443 | 12.6089 | 47.585 | 11.896 | 145.0 | 160.0 |
 ### Framework versions
 - Distily 0.2.0

 The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
+- eval_enwikippl: 249.0
+- eval_frwikippl: 988.0
+- eval_zhwikippl: 222.0
+- eval_tinystoriesppl: 194.0
+- eval_loss: 1.4817
+- eval_runtime: 12.5664
+- eval_samples_per_second: 47.746
+- eval_steps_per_second: 11.937
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
 The following hyperparameters were used during training:
 - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
 - train_embeddings: True
+- learning_rate: 1e-05
+- train_batch_size: 4
 - eval_batch_size: 4
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | **teacher eval** |  | 43.75 | 61.75 |  |  |  |  | 11.8125 | 19.125 |
+| 0 | 0 | 1176821039104.0 | 72567767433216.0 | 20.1450 | 12.5803 | 47.694 | 11.923 | 3019898880.0 | 12713103196160.0 |
+| 1500 | 0.1010 | 21888.0 | 301056.0 | 4.5299 | 12.4618 | 48.147 | 12.037 | 8064.0 | 1056768.0 |
+| 3000 | 0.2020 | 1872.0 | 11648.0 | 2.7341 | 12.4652 | 48.134 | 12.034 | 1128.0 | 119296.0 |
+| 4500 | 0.3030 | 684.0 | 4480.0 | 2.1008 | 12.4698 | 48.116 | 12.029 | 468.0 | 4992.0 |
+| 6000 | 0.4040 | 404.0 | 2160.0 | 1.7866 | 12.4916 | 48.032 | 12.008 | 312.0 | 486.0 |
+| 7500 | 0.5051 | 300.0 | 1296.0 | 1.5620 | 12.5682 | 47.74 | 11.935 | 226.0 | 246.0 |
+| 9000 | 0.6061 | 249.0 | 988.0 | 1.4817 | 12.5664 | 47.746 | 11.937 | 194.0 | 222.0 |
+| 10500 | 0.7071 | 228.0 | 884.0 | 1.3820 | 12.5815 | 47.689 | 11.922 | 179.0 | 193.0 |
+| 12000 | 0.8081 | 220.0 | 892.0 | 1.3587 | 12.5682 | 47.74 | 11.935 | 177.0 | 170.0 |
+| 13500 | 0.9091 | 216.0 | 812.0 | 1.3531 | 12.5051 | 47.981 | 11.995 | 174.0 | 170.0 |
+| 14850 | 1.0 | 215.0 | 804.0 | 1.3510 | 12.5257 | 47.901 | 11.975 | 173.0 | 168.0 |
 ### Framework versions
 - Distily 0.2.0

logs/learning_rate=1e-05, per_device_train_batch_size=4, warmup_ratio=0.5/events.out.tfevents.1724121660.5f530b1cf724 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a64afeb80c6132ad26a0af73f5d4118ac711a5221768184a3de0fbc1c9439eb8
+size 7019562

logs/learning_rate=1e-05, per_device_train_batch_size=4, warmup_ratio=0.5/events.out.tfevents.1724124987.5f530b1cf724 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1804357652e7d650451a22c25520c8650f37b04d77a02d8b195ac2dc2850455a
+size 578

logs/learning_rate=4e-05, per_device_train_batch_size=8, warmup_ratio=0.5/completed.flag ADDED Viewed

File without changes

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5947fcd031c5672f831689839d515d6379ade26dcaea708601e75b87f9e4d701
 size 248894656

 version https://git-lfs.github.com/spec/v1
+oid sha256:6a825acf3a43304daa7580ff21a1d75a1f78c7e8a9e127591f5460de2017301f
 size 248894656

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3b55a604840cd8f97ad8e84c04212bdd9ee1a4a1ab7072e317003d24156fc046
 size 1017899144

 version https://git-lfs.github.com/spec/v1
+oid sha256:88cc1619933743948cebfea8e08b2429299deea5e848f8ade5a142157b32dffc
 size 1017899144