FatihJimale commited on
Commit
0bd97f3
·
verified ·
1 Parent(s): 4f462dc

Upload 7 files

Browse files
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 50256,
4
+ "eos_token_id": 50256,
5
+ "transformers_version": "4.55.2"
6
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<|endoftext|>",
3
+ "eos_token": "<|endoftext|>",
4
+ "pad_token": "<|endoftext|>",
5
+ "unk_token": "<|endoftext|>"
6
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "50256": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ }
12
+ },
13
+ "bos_token": "<|endoftext|>",
14
+ "clean_up_tokenization_spaces": false,
15
+ "eos_token": "<|endoftext|>",
16
+ "extra_special_tokens": {},
17
+ "model_max_length": 1024,
18
+ "pad_token": "<|endoftext|>",
19
+ "tokenizer_class": "GPT2Tokenizer",
20
+ "unk_token": "<|endoftext|>"
21
+ }
trainer_state.json ADDED
@@ -0,0 +1,2161 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 2.0,
6
+ "eval_steps": 2481,
7
+ "global_step": 14888,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.006717270101430779,
14
+ "grad_norm": 4.896524906158447,
15
+ "learning_rate": 5.480984340044743e-06,
16
+ "loss": 5.1811,
17
+ "step": 50
18
+ },
19
+ {
20
+ "epoch": 0.013434540202861557,
21
+ "grad_norm": 2.687243700027466,
22
+ "learning_rate": 1.1073825503355706e-05,
23
+ "loss": 4.4733,
24
+ "step": 100
25
+ },
26
+ {
27
+ "epoch": 0.020151810304292337,
28
+ "grad_norm": 1.7079490423202515,
29
+ "learning_rate": 1.6666666666666667e-05,
30
+ "loss": 3.8,
31
+ "step": 150
32
+ },
33
+ {
34
+ "epoch": 0.026869080405723115,
35
+ "grad_norm": 1.7120026350021362,
36
+ "learning_rate": 2.225950782997763e-05,
37
+ "loss": 3.3653,
38
+ "step": 200
39
+ },
40
+ {
41
+ "epoch": 0.03358635050715389,
42
+ "grad_norm": 1.4521100521087646,
43
+ "learning_rate": 2.785234899328859e-05,
44
+ "loss": 3.1308,
45
+ "step": 250
46
+ },
47
+ {
48
+ "epoch": 0.040303620608584674,
49
+ "grad_norm": 1.4319677352905273,
50
+ "learning_rate": 3.3445190156599555e-05,
51
+ "loss": 2.9757,
52
+ "step": 300
53
+ },
54
+ {
55
+ "epoch": 0.04702089071001545,
56
+ "grad_norm": 1.053918719291687,
57
+ "learning_rate": 3.903803131991052e-05,
58
+ "loss": 2.8591,
59
+ "step": 350
60
+ },
61
+ {
62
+ "epoch": 0.05373816081144623,
63
+ "grad_norm": 0.9901153445243835,
64
+ "learning_rate": 4.463087248322148e-05,
65
+ "loss": 2.7635,
66
+ "step": 400
67
+ },
68
+ {
69
+ "epoch": 0.06045543091287701,
70
+ "grad_norm": 1.0325074195861816,
71
+ "learning_rate": 4.999999763367056e-05,
72
+ "loss": 2.6867,
73
+ "step": 450
74
+ },
75
+ {
76
+ "epoch": 0.06717270101430778,
77
+ "grad_norm": 0.7236190438270569,
78
+ "learning_rate": 4.999840037833225e-05,
79
+ "loss": 2.6204,
80
+ "step": 500
81
+ },
82
+ {
83
+ "epoch": 0.07388997111573857,
84
+ "grad_norm": 0.7032467126846313,
85
+ "learning_rate": 4.9993845429571105e-05,
86
+ "loss": 2.5692,
87
+ "step": 550
88
+ },
89
+ {
90
+ "epoch": 0.08060724121716935,
91
+ "grad_norm": 0.5874780416488647,
92
+ "learning_rate": 4.9986333326307276e-05,
93
+ "loss": 2.5271,
94
+ "step": 600
95
+ },
96
+ {
97
+ "epoch": 0.08732451131860013,
98
+ "grad_norm": 0.5380228161811829,
99
+ "learning_rate": 4.997586495733758e-05,
100
+ "loss": 2.498,
101
+ "step": 650
102
+ },
103
+ {
104
+ "epoch": 0.0940417814200309,
105
+ "grad_norm": 0.5316600203514099,
106
+ "learning_rate": 4.996244156123031e-05,
107
+ "loss": 2.4672,
108
+ "step": 700
109
+ },
110
+ {
111
+ "epoch": 0.10075905152146168,
112
+ "grad_norm": 0.4045695662498474,
113
+ "learning_rate": 4.994606472617869e-05,
114
+ "loss": 2.4403,
115
+ "step": 750
116
+ },
117
+ {
118
+ "epoch": 0.10747632162289246,
119
+ "grad_norm": 0.4827907979488373,
120
+ "learning_rate": 4.9926736389813e-05,
121
+ "loss": 2.4176,
122
+ "step": 800
123
+ },
124
+ {
125
+ "epoch": 0.11419359172432324,
126
+ "grad_norm": 0.4095781743526459,
127
+ "learning_rate": 4.99044588389713e-05,
128
+ "loss": 2.4055,
129
+ "step": 850
130
+ },
131
+ {
132
+ "epoch": 0.12091086182575402,
133
+ "grad_norm": 0.3878771662712097,
134
+ "learning_rate": 4.9879234709428855e-05,
135
+ "loss": 2.3713,
136
+ "step": 900
137
+ },
138
+ {
139
+ "epoch": 0.12762813192718478,
140
+ "grad_norm": 0.3336116671562195,
141
+ "learning_rate": 4.9851066985586316e-05,
142
+ "loss": 2.3656,
143
+ "step": 950
144
+ },
145
+ {
146
+ "epoch": 0.13434540202861556,
147
+ "grad_norm": 0.3157025873661041,
148
+ "learning_rate": 4.981995900011657e-05,
149
+ "loss": 2.3446,
150
+ "step": 1000
151
+ },
152
+ {
153
+ "epoch": 0.14106267213004636,
154
+ "grad_norm": 0.305578351020813,
155
+ "learning_rate": 4.978591443357048e-05,
156
+ "loss": 2.3291,
157
+ "step": 1050
158
+ },
159
+ {
160
+ "epoch": 0.14777994223147714,
161
+ "grad_norm": 0.34396952390670776,
162
+ "learning_rate": 4.9748937313941414e-05,
163
+ "loss": 2.3196,
164
+ "step": 1100
165
+ },
166
+ {
167
+ "epoch": 0.15449721233290792,
168
+ "grad_norm": 0.32559117674827576,
169
+ "learning_rate": 4.970903201618863e-05,
170
+ "loss": 2.3039,
171
+ "step": 1150
172
+ },
173
+ {
174
+ "epoch": 0.1612144824343387,
175
+ "grad_norm": 0.28197962045669556,
176
+ "learning_rate": 4.966620326171969e-05,
177
+ "loss": 2.2978,
178
+ "step": 1200
179
+ },
180
+ {
181
+ "epoch": 0.16793175253576947,
182
+ "grad_norm": 0.3120637536048889,
183
+ "learning_rate": 4.962045611783186e-05,
184
+ "loss": 2.2814,
185
+ "step": 1250
186
+ },
187
+ {
188
+ "epoch": 0.17464902263720025,
189
+ "grad_norm": 0.31267744302749634,
190
+ "learning_rate": 4.9571795997112506e-05,
191
+ "loss": 2.2717,
192
+ "step": 1300
193
+ },
194
+ {
195
+ "epoch": 0.18136629273863103,
196
+ "grad_norm": 0.3214496374130249,
197
+ "learning_rate": 4.9520228656798784e-05,
198
+ "loss": 2.2573,
199
+ "step": 1350
200
+ },
201
+ {
202
+ "epoch": 0.1880835628400618,
203
+ "grad_norm": 0.36649203300476074,
204
+ "learning_rate": 4.946576019809639e-05,
205
+ "loss": 2.246,
206
+ "step": 1400
207
+ },
208
+ {
209
+ "epoch": 0.19480083294149259,
210
+ "grad_norm": 0.31962043046951294,
211
+ "learning_rate": 4.940839706545777e-05,
212
+ "loss": 2.2388,
213
+ "step": 1450
214
+ },
215
+ {
216
+ "epoch": 0.20151810304292336,
217
+ "grad_norm": 0.395917683839798,
218
+ "learning_rate": 4.9348146045819585e-05,
219
+ "loss": 2.2331,
220
+ "step": 1500
221
+ },
222
+ {
223
+ "epoch": 0.20823537314435414,
224
+ "grad_norm": 0.3968123495578766,
225
+ "learning_rate": 4.928501426779974e-05,
226
+ "loss": 2.2212,
227
+ "step": 1550
228
+ },
229
+ {
230
+ "epoch": 0.21495264324578492,
231
+ "grad_norm": 0.3795957863330841,
232
+ "learning_rate": 4.921900920085394e-05,
233
+ "loss": 2.2153,
234
+ "step": 1600
235
+ },
236
+ {
237
+ "epoch": 0.2216699133472157,
238
+ "grad_norm": 0.39966660737991333,
239
+ "learning_rate": 4.915013865439197e-05,
240
+ "loss": 2.2015,
241
+ "step": 1650
242
+ },
243
+ {
244
+ "epoch": 0.22838718344864647,
245
+ "grad_norm": 0.3845181465148926,
246
+ "learning_rate": 4.907841077685372e-05,
247
+ "loss": 2.204,
248
+ "step": 1700
249
+ },
250
+ {
251
+ "epoch": 0.23510445355007725,
252
+ "grad_norm": 0.48438650369644165,
253
+ "learning_rate": 4.900383405474503e-05,
254
+ "loss": 2.1953,
255
+ "step": 1750
256
+ },
257
+ {
258
+ "epoch": 0.24182172365150803,
259
+ "grad_norm": 0.4857676923274994,
260
+ "learning_rate": 4.892641731163372e-05,
261
+ "loss": 2.1815,
262
+ "step": 1800
263
+ },
264
+ {
265
+ "epoch": 0.2485389937529388,
266
+ "grad_norm": 0.6407597661018372,
267
+ "learning_rate": 4.8846169707105525e-05,
268
+ "loss": 2.1755,
269
+ "step": 1850
270
+ },
271
+ {
272
+ "epoch": 0.25525626385436956,
273
+ "grad_norm": 0.5669896006584167,
274
+ "learning_rate": 4.8763100735680445e-05,
275
+ "loss": 2.1698,
276
+ "step": 1900
277
+ },
278
+ {
279
+ "epoch": 0.26197353395580036,
280
+ "grad_norm": 0.5839398503303528,
281
+ "learning_rate": 4.867722022568936e-05,
282
+ "loss": 2.165,
283
+ "step": 1950
284
+ },
285
+ {
286
+ "epoch": 0.2686908040572311,
287
+ "grad_norm": 0.6011441946029663,
288
+ "learning_rate": 4.858853833811119e-05,
289
+ "loss": 2.1574,
290
+ "step": 2000
291
+ },
292
+ {
293
+ "epoch": 0.2754080741586619,
294
+ "grad_norm": 0.562146008014679,
295
+ "learning_rate": 4.849706556537074e-05,
296
+ "loss": 2.146,
297
+ "step": 2050
298
+ },
299
+ {
300
+ "epoch": 0.2821253442600927,
301
+ "grad_norm": 0.5889083743095398,
302
+ "learning_rate": 4.840281273009719e-05,
303
+ "loss": 2.1465,
304
+ "step": 2100
305
+ },
306
+ {
307
+ "epoch": 0.2888426143615235,
308
+ "grad_norm": 0.6204415559768677,
309
+ "learning_rate": 4.8305790983843744e-05,
310
+ "loss": 2.1457,
311
+ "step": 2150
312
+ },
313
+ {
314
+ "epoch": 0.2955598844629543,
315
+ "grad_norm": 0.6884430050849915,
316
+ "learning_rate": 4.820601180576811e-05,
317
+ "loss": 2.1367,
318
+ "step": 2200
319
+ },
320
+ {
321
+ "epoch": 0.30227715456438503,
322
+ "grad_norm": 0.6785121560096741,
323
+ "learning_rate": 4.810348700127441e-05,
324
+ "loss": 2.1377,
325
+ "step": 2250
326
+ },
327
+ {
328
+ "epoch": 0.30899442466581584,
329
+ "grad_norm": 0.667234480381012,
330
+ "learning_rate": 4.7998228700616384e-05,
331
+ "loss": 2.1279,
332
+ "step": 2300
333
+ },
334
+ {
335
+ "epoch": 0.3157116947672466,
336
+ "grad_norm": 0.5685933232307434,
337
+ "learning_rate": 4.789024935746223e-05,
338
+ "loss": 2.1165,
339
+ "step": 2350
340
+ },
341
+ {
342
+ "epoch": 0.3224289648686774,
343
+ "grad_norm": 0.7456786036491394,
344
+ "learning_rate": 4.7779561747421106e-05,
345
+ "loss": 2.1148,
346
+ "step": 2400
347
+ },
348
+ {
349
+ "epoch": 0.32914623497010814,
350
+ "grad_norm": 0.7704935669898987,
351
+ "learning_rate": 4.766617896653162e-05,
352
+ "loss": 2.1035,
353
+ "step": 2450
354
+ },
355
+ {
356
+ "epoch": 0.33331094243299525,
357
+ "eval_loss": 2.031871795654297,
358
+ "eval_runtime": 1653.4784,
359
+ "eval_samples_per_second": 72.217,
360
+ "eval_steps_per_second": 9.028,
361
+ "step": 2481
362
+ },
363
+ {
364
+ "epoch": 0.33586350507153895,
365
+ "grad_norm": 0.6733186841011047,
366
+ "learning_rate": 4.755011442971233e-05,
367
+ "loss": 2.1033,
368
+ "step": 2500
369
+ },
370
+ {
371
+ "epoch": 0.3425807751729697,
372
+ "grad_norm": 0.7341210246086121,
373
+ "learning_rate": 4.7431381869174574e-05,
374
+ "loss": 2.1,
375
+ "step": 2550
376
+ },
377
+ {
378
+ "epoch": 0.3492980452744005,
379
+ "grad_norm": 0.7517989277839661,
380
+ "learning_rate": 4.730999533279775e-05,
381
+ "loss": 2.0965,
382
+ "step": 2600
383
+ },
384
+ {
385
+ "epoch": 0.35601531537583125,
386
+ "grad_norm": 0.7430043816566467,
387
+ "learning_rate": 4.71859691824672e-05,
388
+ "loss": 2.0893,
389
+ "step": 2650
390
+ },
391
+ {
392
+ "epoch": 0.36273258547726206,
393
+ "grad_norm": 0.7112200856208801,
394
+ "learning_rate": 4.7059318092375016e-05,
395
+ "loss": 2.0856,
396
+ "step": 2700
397
+ },
398
+ {
399
+ "epoch": 0.3694498555786928,
400
+ "grad_norm": 0.8947717547416687,
401
+ "learning_rate": 4.693005704728384e-05,
402
+ "loss": 2.0832,
403
+ "step": 2750
404
+ },
405
+ {
406
+ "epoch": 0.3761671256801236,
407
+ "grad_norm": 0.8591889142990112,
408
+ "learning_rate": 4.679820134075395e-05,
409
+ "loss": 2.0761,
410
+ "step": 2800
411
+ },
412
+ {
413
+ "epoch": 0.38288439578155437,
414
+ "grad_norm": 0.933829665184021,
415
+ "learning_rate": 4.666376657333379e-05,
416
+ "loss": 2.0781,
417
+ "step": 2850
418
+ },
419
+ {
420
+ "epoch": 0.38960166588298517,
421
+ "grad_norm": 0.8115527629852295,
422
+ "learning_rate": 4.652676865071417e-05,
423
+ "loss": 2.0758,
424
+ "step": 2900
425
+ },
426
+ {
427
+ "epoch": 0.3963189359844159,
428
+ "grad_norm": 0.7666231989860535,
429
+ "learning_rate": 4.638722378184641e-05,
430
+ "loss": 2.0693,
431
+ "step": 2950
432
+ },
433
+ {
434
+ "epoch": 0.4030362060858467,
435
+ "grad_norm": 0.9425594806671143,
436
+ "learning_rate": 4.624514847702454e-05,
437
+ "loss": 2.0592,
438
+ "step": 3000
439
+ },
440
+ {
441
+ "epoch": 0.4097534761872775,
442
+ "grad_norm": 0.8913134336471558,
443
+ "learning_rate": 4.610055954593192e-05,
444
+ "loss": 2.0595,
445
+ "step": 3050
446
+ },
447
+ {
448
+ "epoch": 0.4164707462887083,
449
+ "grad_norm": 0.9630874991416931,
450
+ "learning_rate": 4.595347409565237e-05,
451
+ "loss": 2.051,
452
+ "step": 3100
453
+ },
454
+ {
455
+ "epoch": 0.42318801639013903,
456
+ "grad_norm": 0.7468298673629761,
457
+ "learning_rate": 4.5803909528646125e-05,
458
+ "loss": 2.0541,
459
+ "step": 3150
460
+ },
461
+ {
462
+ "epoch": 0.42990528649156984,
463
+ "grad_norm": 0.9688680768013,
464
+ "learning_rate": 4.565188354069091e-05,
465
+ "loss": 2.0416,
466
+ "step": 3200
467
+ },
468
+ {
469
+ "epoch": 0.4366225565930006,
470
+ "grad_norm": 0.9443581700325012,
471
+ "learning_rate": 4.549741411878819e-05,
472
+ "loss": 2.0356,
473
+ "step": 3250
474
+ },
475
+ {
476
+ "epoch": 0.4433398266944314,
477
+ "grad_norm": 0.8630342483520508,
478
+ "learning_rate": 4.534051953903511e-05,
479
+ "loss": 2.0455,
480
+ "step": 3300
481
+ },
482
+ {
483
+ "epoch": 0.45005709679586214,
484
+ "grad_norm": 0.8253282904624939,
485
+ "learning_rate": 4.518121836446206e-05,
486
+ "loss": 2.0378,
487
+ "step": 3350
488
+ },
489
+ {
490
+ "epoch": 0.45677436689729295,
491
+ "grad_norm": 0.8722043633460999,
492
+ "learning_rate": 4.501952944283647e-05,
493
+ "loss": 2.0345,
494
+ "step": 3400
495
+ },
496
+ {
497
+ "epoch": 0.4634916369987237,
498
+ "grad_norm": 0.833014965057373,
499
+ "learning_rate": 4.4855471904432804e-05,
500
+ "loss": 2.0393,
501
+ "step": 3450
502
+ },
503
+ {
504
+ "epoch": 0.4702089071001545,
505
+ "grad_norm": 0.8858036994934082,
506
+ "learning_rate": 4.468906515976912e-05,
507
+ "loss": 2.0342,
508
+ "step": 3500
509
+ },
510
+ {
511
+ "epoch": 0.47692617720158526,
512
+ "grad_norm": 0.7766168713569641,
513
+ "learning_rate": 4.452032889731056e-05,
514
+ "loss": 2.0255,
515
+ "step": 3550
516
+ },
517
+ {
518
+ "epoch": 0.48364344730301606,
519
+ "grad_norm": 1.0143591165542603,
520
+ "learning_rate": 4.434928308113986e-05,
521
+ "loss": 2.0183,
522
+ "step": 3600
523
+ },
524
+ {
525
+ "epoch": 0.4903607174044468,
526
+ "grad_norm": 0.8125320672988892,
527
+ "learning_rate": 4.417594794859533e-05,
528
+ "loss": 2.0202,
529
+ "step": 3650
530
+ },
531
+ {
532
+ "epoch": 0.4970779875058776,
533
+ "grad_norm": 0.7722158432006836,
534
+ "learning_rate": 4.4000344007876444e-05,
535
+ "loss": 2.0114,
536
+ "step": 3700
537
+ },
538
+ {
539
+ "epoch": 0.5037952576073084,
540
+ "grad_norm": 0.9473446607589722,
541
+ "learning_rate": 4.3822492035617404e-05,
542
+ "loss": 2.0125,
543
+ "step": 3750
544
+ },
545
+ {
546
+ "epoch": 0.5105125277087391,
547
+ "grad_norm": 0.854096531867981,
548
+ "learning_rate": 4.3642413074428964e-05,
549
+ "loss": 2.0131,
550
+ "step": 3800
551
+ },
552
+ {
553
+ "epoch": 0.51722979781017,
554
+ "grad_norm": 0.9034811854362488,
555
+ "learning_rate": 4.346012843040877e-05,
556
+ "loss": 2.0056,
557
+ "step": 3850
558
+ },
559
+ {
560
+ "epoch": 0.5239470679116007,
561
+ "grad_norm": 0.8732491135597229,
562
+ "learning_rate": 4.327565967062048e-05,
563
+ "loss": 2.0074,
564
+ "step": 3900
565
+ },
566
+ {
567
+ "epoch": 0.5306643380130315,
568
+ "grad_norm": 1.0670524835586548,
569
+ "learning_rate": 4.3089028620542094e-05,
570
+ "loss": 2.0069,
571
+ "step": 3950
572
+ },
573
+ {
574
+ "epoch": 0.5373816081144622,
575
+ "grad_norm": 0.8823853135108948,
576
+ "learning_rate": 4.2900257361483666e-05,
577
+ "loss": 2.0071,
578
+ "step": 4000
579
+ },
580
+ {
581
+ "epoch": 0.5440988782158931,
582
+ "grad_norm": 0.9907485842704773,
583
+ "learning_rate": 4.2709368227974724e-05,
584
+ "loss": 2.0083,
585
+ "step": 4050
586
+ },
587
+ {
588
+ "epoch": 0.5508161483173238,
589
+ "grad_norm": 1.0381219387054443,
590
+ "learning_rate": 4.251638380512174e-05,
591
+ "loss": 2.0028,
592
+ "step": 4100
593
+ },
594
+ {
595
+ "epoch": 0.5575334184187546,
596
+ "grad_norm": 0.8755475878715515,
597
+ "learning_rate": 4.232132692593602e-05,
598
+ "loss": 1.9974,
599
+ "step": 4150
600
+ },
601
+ {
602
+ "epoch": 0.5642506885201855,
603
+ "grad_norm": 1.062873125076294,
604
+ "learning_rate": 4.212422066863218e-05,
605
+ "loss": 1.9859,
606
+ "step": 4200
607
+ },
608
+ {
609
+ "epoch": 0.5709679586216162,
610
+ "grad_norm": 0.9898818731307983,
611
+ "learning_rate": 4.19250883538976e-05,
612
+ "loss": 1.9922,
613
+ "step": 4250
614
+ },
615
+ {
616
+ "epoch": 0.577685228723047,
617
+ "grad_norm": 0.9990688562393188,
618
+ "learning_rate": 4.172395354213331e-05,
619
+ "loss": 1.9935,
620
+ "step": 4300
621
+ },
622
+ {
623
+ "epoch": 0.5844024988244777,
624
+ "grad_norm": 0.8213431239128113,
625
+ "learning_rate": 4.152084003066636e-05,
626
+ "loss": 1.9867,
627
+ "step": 4350
628
+ },
629
+ {
630
+ "epoch": 0.5911197689259086,
631
+ "grad_norm": 0.913465142250061,
632
+ "learning_rate": 4.1315771850934295e-05,
633
+ "loss": 1.9809,
634
+ "step": 4400
635
+ },
636
+ {
637
+ "epoch": 0.5978370390273393,
638
+ "grad_norm": 0.84449303150177,
639
+ "learning_rate": 4.110877326564179e-05,
640
+ "loss": 1.9707,
641
+ "step": 4450
642
+ },
643
+ {
644
+ "epoch": 0.6045543091287701,
645
+ "grad_norm": 0.9786863327026367,
646
+ "learning_rate": 4.08998687658901e-05,
647
+ "loss": 1.9753,
648
+ "step": 4500
649
+ },
650
+ {
651
+ "epoch": 0.6112715792302008,
652
+ "grad_norm": 1.060813069343567,
653
+ "learning_rate": 4.06890830682793e-05,
654
+ "loss": 1.9804,
655
+ "step": 4550
656
+ },
657
+ {
658
+ "epoch": 0.6179888493316317,
659
+ "grad_norm": 0.9997825026512146,
660
+ "learning_rate": 4.047644111198398e-05,
661
+ "loss": 1.974,
662
+ "step": 4600
663
+ },
664
+ {
665
+ "epoch": 0.6247061194330624,
666
+ "grad_norm": 0.8858147859573364,
667
+ "learning_rate": 4.026196805580253e-05,
668
+ "loss": 1.9705,
669
+ "step": 4650
670
+ },
671
+ {
672
+ "epoch": 0.6314233895344932,
673
+ "grad_norm": 1.018781065940857,
674
+ "learning_rate": 4.004568927518054e-05,
675
+ "loss": 1.9666,
676
+ "step": 4700
677
+ },
678
+ {
679
+ "epoch": 0.6381406596359239,
680
+ "grad_norm": 0.9801031947135925,
681
+ "learning_rate": 3.982763035920836e-05,
682
+ "loss": 1.9634,
683
+ "step": 4750
684
+ },
685
+ {
686
+ "epoch": 0.6448579297373548,
687
+ "grad_norm": 0.9946137070655823,
688
+ "learning_rate": 3.960781710759365e-05,
689
+ "loss": 1.9661,
690
+ "step": 4800
691
+ },
692
+ {
693
+ "epoch": 0.6515751998387855,
694
+ "grad_norm": 0.9870684146881104,
695
+ "learning_rate": 3.9386275527608845e-05,
696
+ "loss": 1.9616,
697
+ "step": 4850
698
+ },
699
+ {
700
+ "epoch": 0.6582924699402163,
701
+ "grad_norm": 0.9816562533378601,
702
+ "learning_rate": 3.916303183101405e-05,
703
+ "loss": 1.969,
704
+ "step": 4900
705
+ },
706
+ {
707
+ "epoch": 0.665009740041647,
708
+ "grad_norm": 0.9809496402740479,
709
+ "learning_rate": 3.8938112430955834e-05,
710
+ "loss": 1.964,
711
+ "step": 4950
712
+ },
713
+ {
714
+ "epoch": 0.6666218848659905,
715
+ "eval_loss": 1.8941781520843506,
716
+ "eval_runtime": 1652.0834,
717
+ "eval_samples_per_second": 72.278,
718
+ "eval_steps_per_second": 9.035,
719
+ "step": 4962
720
+ },
721
+ {
722
+ "epoch": 0.6717270101430779,
723
+ "grad_norm": 0.8745808005332947,
724
+ "learning_rate": 3.871154393884212e-05,
725
+ "loss": 1.9555,
726
+ "step": 5000
727
+ },
728
+ {
729
+ "epoch": 0.6784442802445086,
730
+ "grad_norm": 0.97087562084198,
731
+ "learning_rate": 3.848335316119369e-05,
732
+ "loss": 1.9614,
733
+ "step": 5050
734
+ },
735
+ {
736
+ "epoch": 0.6851615503459394,
737
+ "grad_norm": 1.2281529903411865,
738
+ "learning_rate": 3.825356709647252e-05,
739
+ "loss": 1.9554,
740
+ "step": 5100
741
+ },
742
+ {
743
+ "epoch": 0.6918788204473701,
744
+ "grad_norm": 0.9577364921569824,
745
+ "learning_rate": 3.802221293188748e-05,
746
+ "loss": 1.9569,
747
+ "step": 5150
748
+ },
749
+ {
750
+ "epoch": 0.698596090548801,
751
+ "grad_norm": 1.0820568799972534,
752
+ "learning_rate": 3.7789318040177636e-05,
753
+ "loss": 1.9489,
754
+ "step": 5200
755
+ },
756
+ {
757
+ "epoch": 0.7053133606502318,
758
+ "grad_norm": 1.0203666687011719,
759
+ "learning_rate": 3.7554909976373685e-05,
760
+ "loss": 1.9484,
761
+ "step": 5250
762
+ },
763
+ {
764
+ "epoch": 0.7120306307516625,
765
+ "grad_norm": 0.9432654976844788,
766
+ "learning_rate": 3.731901647453772e-05,
767
+ "loss": 1.9529,
768
+ "step": 5300
769
+ },
770
+ {
771
+ "epoch": 0.7187479008530933,
772
+ "grad_norm": 1.0531628131866455,
773
+ "learning_rate": 3.708166544448189e-05,
774
+ "loss": 1.9436,
775
+ "step": 5350
776
+ },
777
+ {
778
+ "epoch": 0.7254651709545241,
779
+ "grad_norm": 0.9235082864761353,
780
+ "learning_rate": 3.6842884968466276e-05,
781
+ "loss": 1.9427,
782
+ "step": 5400
783
+ },
784
+ {
785
+ "epoch": 0.7321824410559549,
786
+ "grad_norm": 1.1586941480636597,
787
+ "learning_rate": 3.6602703297876276e-05,
788
+ "loss": 1.9427,
789
+ "step": 5450
790
+ },
791
+ {
792
+ "epoch": 0.7388997111573856,
793
+ "grad_norm": 1.0403027534484863,
794
+ "learning_rate": 3.636114884988004e-05,
795
+ "loss": 1.9445,
796
+ "step": 5500
797
+ },
798
+ {
799
+ "epoch": 0.7456169812588164,
800
+ "grad_norm": 0.9487152099609375,
801
+ "learning_rate": 3.611825020406631e-05,
802
+ "loss": 1.9377,
803
+ "step": 5550
804
+ },
805
+ {
806
+ "epoch": 0.7523342513602472,
807
+ "grad_norm": 0.9370021224021912,
808
+ "learning_rate": 3.5874036099063025e-05,
809
+ "loss": 1.939,
810
+ "step": 5600
811
+ },
812
+ {
813
+ "epoch": 0.759051521461678,
814
+ "grad_norm": 1.0592392683029175,
815
+ "learning_rate": 3.562853542913706e-05,
816
+ "loss": 1.9331,
817
+ "step": 5650
818
+ },
819
+ {
820
+ "epoch": 0.7657687915631087,
821
+ "grad_norm": 0.9699749946594238,
822
+ "learning_rate": 3.538177724077562e-05,
823
+ "loss": 1.9332,
824
+ "step": 5700
825
+ },
826
+ {
827
+ "epoch": 0.7724860616645395,
828
+ "grad_norm": 1.1636828184127808,
829
+ "learning_rate": 3.5133790729249585e-05,
830
+ "loss": 1.9382,
831
+ "step": 5750
832
+ },
833
+ {
834
+ "epoch": 0.7792033317659703,
835
+ "grad_norm": 0.9427869915962219,
836
+ "learning_rate": 3.488460523515927e-05,
837
+ "loss": 1.9295,
838
+ "step": 5800
839
+ },
840
+ {
841
+ "epoch": 0.7859206018674011,
842
+ "grad_norm": 1.0565423965454102,
843
+ "learning_rate": 3.4634250240963e-05,
844
+ "loss": 1.9335,
845
+ "step": 5850
846
+ },
847
+ {
848
+ "epoch": 0.7926378719688318,
849
+ "grad_norm": 1.0577119588851929,
850
+ "learning_rate": 3.4382755367488845e-05,
851
+ "loss": 1.9296,
852
+ "step": 5900
853
+ },
854
+ {
855
+ "epoch": 0.7993551420702626,
856
+ "grad_norm": 1.0106233358383179,
857
+ "learning_rate": 3.413015037043003e-05,
858
+ "loss": 1.9314,
859
+ "step": 5950
860
+ },
861
+ {
862
+ "epoch": 0.8060724121716935,
863
+ "grad_norm": 1.0226566791534424,
864
+ "learning_rate": 3.387646513682442e-05,
865
+ "loss": 1.9265,
866
+ "step": 6000
867
+ },
868
+ {
869
+ "epoch": 0.8127896822731242,
870
+ "grad_norm": 0.9026387333869934,
871
+ "learning_rate": 3.362172968151838e-05,
872
+ "loss": 1.9248,
873
+ "step": 6050
874
+ },
875
+ {
876
+ "epoch": 0.819506952374555,
877
+ "grad_norm": 0.881735622882843,
878
+ "learning_rate": 3.33659741436156e-05,
879
+ "loss": 1.9239,
880
+ "step": 6100
881
+ },
882
+ {
883
+ "epoch": 0.8262242224759858,
884
+ "grad_norm": 1.0480873584747314,
885
+ "learning_rate": 3.3109228782911125e-05,
886
+ "loss": 1.9172,
887
+ "step": 6150
888
+ },
889
+ {
890
+ "epoch": 0.8329414925774166,
891
+ "grad_norm": 0.9652755260467529,
892
+ "learning_rate": 3.2851523976311214e-05,
893
+ "loss": 1.9212,
894
+ "step": 6200
895
+ },
896
+ {
897
+ "epoch": 0.8396587626788473,
898
+ "grad_norm": 0.9710230231285095,
899
+ "learning_rate": 3.2592890214239254e-05,
900
+ "loss": 1.9129,
901
+ "step": 6250
902
+ },
903
+ {
904
+ "epoch": 0.8463760327802781,
905
+ "grad_norm": 1.0008606910705566,
906
+ "learning_rate": 3.2333358097028284e-05,
907
+ "loss": 1.9194,
908
+ "step": 6300
909
+ },
910
+ {
911
+ "epoch": 0.8530933028817089,
912
+ "grad_norm": 0.9385756850242615,
913
+ "learning_rate": 3.207295833130049e-05,
914
+ "loss": 1.9222,
915
+ "step": 6350
916
+ },
917
+ {
918
+ "epoch": 0.8598105729831397,
919
+ "grad_norm": 1.0019196271896362,
920
+ "learning_rate": 3.18117217263342e-05,
921
+ "loss": 1.9175,
922
+ "step": 6400
923
+ },
924
+ {
925
+ "epoch": 0.8665278430845704,
926
+ "grad_norm": 0.9140141010284424,
927
+ "learning_rate": 3.154967919041859e-05,
928
+ "loss": 1.9108,
929
+ "step": 6450
930
+ },
931
+ {
932
+ "epoch": 0.8732451131860012,
933
+ "grad_norm": 0.8793635368347168,
934
+ "learning_rate": 3.128686172719684e-05,
935
+ "loss": 1.908,
936
+ "step": 6500
937
+ },
938
+ {
939
+ "epoch": 0.879962383287432,
940
+ "grad_norm": 1.009559988975525,
941
+ "learning_rate": 3.102330043199787e-05,
942
+ "loss": 1.9126,
943
+ "step": 6550
944
+ },
945
+ {
946
+ "epoch": 0.8866796533888628,
947
+ "grad_norm": 0.9987242817878723,
948
+ "learning_rate": 3.07590264881573e-05,
949
+ "loss": 1.915,
950
+ "step": 6600
951
+ },
952
+ {
953
+ "epoch": 0.8933969234902935,
954
+ "grad_norm": 1.0023444890975952,
955
+ "learning_rate": 3.049407116332802e-05,
956
+ "loss": 1.9125,
957
+ "step": 6650
958
+ },
959
+ {
960
+ "epoch": 0.9001141935917243,
961
+ "grad_norm": 0.9578864574432373,
962
+ "learning_rate": 3.022846580578071e-05,
963
+ "loss": 1.9116,
964
+ "step": 6700
965
+ },
966
+ {
967
+ "epoch": 0.9068314636931551,
968
+ "grad_norm": 0.9548916220664978,
969
+ "learning_rate": 2.9962241840694872e-05,
970
+ "loss": 1.9059,
971
+ "step": 6750
972
+ },
973
+ {
974
+ "epoch": 0.9135487337945859,
975
+ "grad_norm": 1.1118357181549072,
976
+ "learning_rate": 2.9695430766440736e-05,
977
+ "loss": 1.9027,
978
+ "step": 6800
979
+ },
980
+ {
981
+ "epoch": 0.9202660038960166,
982
+ "grad_norm": 1.0644875764846802,
983
+ "learning_rate": 2.942806415085255e-05,
984
+ "loss": 1.9093,
985
+ "step": 6850
986
+ },
987
+ {
988
+ "epoch": 0.9269832739974474,
989
+ "grad_norm": 0.9042929410934448,
990
+ "learning_rate": 2.9160173627493603e-05,
991
+ "loss": 1.9007,
992
+ "step": 6900
993
+ },
994
+ {
995
+ "epoch": 0.9337005440988783,
996
+ "grad_norm": 0.9487243890762329,
997
+ "learning_rate": 2.889179089191349e-05,
998
+ "loss": 1.9017,
999
+ "step": 6950
1000
+ },
1001
+ {
1002
+ "epoch": 0.940417814200309,
1003
+ "grad_norm": 1.0207464694976807,
1004
+ "learning_rate": 2.862294769789804e-05,
1005
+ "loss": 1.9048,
1006
+ "step": 7000
1007
+ },
1008
+ {
1009
+ "epoch": 0.9471350843017398,
1010
+ "grad_norm": 0.8947831392288208,
1011
+ "learning_rate": 2.8353675853712365e-05,
1012
+ "loss": 1.8949,
1013
+ "step": 7050
1014
+ },
1015
+ {
1016
+ "epoch": 0.9538523544031705,
1017
+ "grad_norm": 1.0309513807296753,
1018
+ "learning_rate": 2.8084007218337467e-05,
1019
+ "loss": 1.8972,
1020
+ "step": 7100
1021
+ },
1022
+ {
1023
+ "epoch": 0.9605696245046014,
1024
+ "grad_norm": 0.9352649450302124,
1025
+ "learning_rate": 2.7813973697700813e-05,
1026
+ "loss": 1.9065,
1027
+ "step": 7150
1028
+ },
1029
+ {
1030
+ "epoch": 0.9672868946060321,
1031
+ "grad_norm": 0.9510485529899597,
1032
+ "learning_rate": 2.754360724090137e-05,
1033
+ "loss": 1.8943,
1034
+ "step": 7200
1035
+ },
1036
+ {
1037
+ "epoch": 0.9740041647074629,
1038
+ "grad_norm": 0.9971142411231995,
1039
+ "learning_rate": 2.7272939836429563e-05,
1040
+ "loss": 1.8983,
1041
+ "step": 7250
1042
+ },
1043
+ {
1044
+ "epoch": 0.9807214348088936,
1045
+ "grad_norm": 0.9219502806663513,
1046
+ "learning_rate": 2.700200350838253e-05,
1047
+ "loss": 1.9019,
1048
+ "step": 7300
1049
+ },
1050
+ {
1051
+ "epoch": 0.9874387049103245,
1052
+ "grad_norm": 1.0496368408203125,
1053
+ "learning_rate": 2.6730830312675182e-05,
1054
+ "loss": 1.8948,
1055
+ "step": 7350
1056
+ },
1057
+ {
1058
+ "epoch": 0.9941559750117552,
1059
+ "grad_norm": 0.9791837930679321,
1060
+ "learning_rate": 2.6459452333247497e-05,
1061
+ "loss": 1.8907,
1062
+ "step": 7400
1063
+ },
1064
+ {
1065
+ "epoch": 0.9999328272989857,
1066
+ "eval_loss": 1.831755518913269,
1067
+ "eval_runtime": 1652.2475,
1068
+ "eval_samples_per_second": 72.271,
1069
+ "eval_steps_per_second": 9.034,
1070
+ "step": 7443
1071
+ },
1072
+ {
1073
+ "epoch": 1.0008060724121717,
1074
+ "grad_norm": 0.8881465196609497,
1075
+ "learning_rate": 2.618790167826851e-05,
1076
+ "loss": 1.8913,
1077
+ "step": 7450
1078
+ },
1079
+ {
1080
+ "epoch": 1.0075233425136025,
1081
+ "grad_norm": 0.9141615033149719,
1082
+ "learning_rate": 2.5916210476337416e-05,
1083
+ "loss": 1.8922,
1084
+ "step": 7500
1085
+ },
1086
+ {
1087
+ "epoch": 1.0142406126150332,
1088
+ "grad_norm": 0.9290676116943359,
1089
+ "learning_rate": 2.5644410872682262e-05,
1090
+ "loss": 1.8921,
1091
+ "step": 7550
1092
+ },
1093
+ {
1094
+ "epoch": 1.020957882716464,
1095
+ "grad_norm": 0.9989870190620422,
1096
+ "learning_rate": 2.5372535025356674e-05,
1097
+ "loss": 1.8913,
1098
+ "step": 7600
1099
+ },
1100
+ {
1101
+ "epoch": 1.027675152817895,
1102
+ "grad_norm": 0.8889585137367249,
1103
+ "learning_rate": 2.5100615101435078e-05,
1104
+ "loss": 1.8875,
1105
+ "step": 7650
1106
+ },
1107
+ {
1108
+ "epoch": 1.0343924229193255,
1109
+ "grad_norm": 0.906791627407074,
1110
+ "learning_rate": 2.4828683273206837e-05,
1111
+ "loss": 1.8811,
1112
+ "step": 7700
1113
+ },
1114
+ {
1115
+ "epoch": 1.0411096930207564,
1116
+ "grad_norm": 0.933408260345459,
1117
+ "learning_rate": 2.4556771714369775e-05,
1118
+ "loss": 1.8829,
1119
+ "step": 7750
1120
+ },
1121
+ {
1122
+ "epoch": 1.047826963122187,
1123
+ "grad_norm": 0.9643042683601379,
1124
+ "learning_rate": 2.4284912596223532e-05,
1125
+ "loss": 1.8757,
1126
+ "step": 7800
1127
+ },
1128
+ {
1129
+ "epoch": 1.054544233223618,
1130
+ "grad_norm": 1.0680428743362427,
1131
+ "learning_rate": 2.4013138083863217e-05,
1132
+ "loss": 1.8896,
1133
+ "step": 7850
1134
+ },
1135
+ {
1136
+ "epoch": 1.0612615033250488,
1137
+ "grad_norm": 0.9255633354187012,
1138
+ "learning_rate": 2.3741480332373772e-05,
1139
+ "loss": 1.885,
1140
+ "step": 7900
1141
+ },
1142
+ {
1143
+ "epoch": 1.0679787734264794,
1144
+ "grad_norm": 0.9109947681427002,
1145
+ "learning_rate": 2.346997148302555e-05,
1146
+ "loss": 1.8732,
1147
+ "step": 7950
1148
+ },
1149
+ {
1150
+ "epoch": 1.0746960435279103,
1151
+ "grad_norm": 0.9063991904258728,
1152
+ "learning_rate": 2.3198643659471493e-05,
1153
+ "loss": 1.8901,
1154
+ "step": 8000
1155
+ },
1156
+ {
1157
+ "epoch": 1.0814133136293411,
1158
+ "grad_norm": 1.0128731727600098,
1159
+ "learning_rate": 2.2927528963946435e-05,
1160
+ "loss": 1.8882,
1161
+ "step": 8050
1162
+ },
1163
+ {
1164
+ "epoch": 1.0881305837307718,
1165
+ "grad_norm": 0.8854324221611023,
1166
+ "learning_rate": 2.2656659473468877e-05,
1167
+ "loss": 1.8848,
1168
+ "step": 8100
1169
+ },
1170
+ {
1171
+ "epoch": 1.0948478538322026,
1172
+ "grad_norm": 1.0122106075286865,
1173
+ "learning_rate": 2.238606723604583e-05,
1174
+ "loss": 1.8738,
1175
+ "step": 8150
1176
+ },
1177
+ {
1178
+ "epoch": 1.1015651239336335,
1179
+ "grad_norm": 1.0020389556884766,
1180
+ "learning_rate": 2.2115784266881022e-05,
1181
+ "loss": 1.8758,
1182
+ "step": 8200
1183
+ },
1184
+ {
1185
+ "epoch": 1.1082823940350641,
1186
+ "grad_norm": 1.013856291770935,
1187
+ "learning_rate": 2.1845842544587014e-05,
1188
+ "loss": 1.879,
1189
+ "step": 8250
1190
+ },
1191
+ {
1192
+ "epoch": 1.114999664136495,
1193
+ "grad_norm": 1.0340920686721802,
1194
+ "learning_rate": 2.157627400740161e-05,
1195
+ "loss": 1.874,
1196
+ "step": 8300
1197
+ },
1198
+ {
1199
+ "epoch": 1.1217169342379256,
1200
+ "grad_norm": 0.9874864220619202,
1201
+ "learning_rate": 2.1307110549409143e-05,
1202
+ "loss": 1.8789,
1203
+ "step": 8350
1204
+ },
1205
+ {
1206
+ "epoch": 1.1284342043393565,
1207
+ "grad_norm": 0.9384549856185913,
1208
+ "learning_rate": 2.1038384016766856e-05,
1209
+ "loss": 1.8736,
1210
+ "step": 8400
1211
+ },
1212
+ {
1213
+ "epoch": 1.1351514744407873,
1214
+ "grad_norm": 1.1180434226989746,
1215
+ "learning_rate": 2.0770126203937057e-05,
1216
+ "loss": 1.8709,
1217
+ "step": 8450
1218
+ },
1219
+ {
1220
+ "epoch": 1.141868744542218,
1221
+ "grad_norm": 1.0236446857452393,
1222
+ "learning_rate": 2.0502368849925268e-05,
1223
+ "loss": 1.8701,
1224
+ "step": 8500
1225
+ },
1226
+ {
1227
+ "epoch": 1.1485860146436488,
1228
+ "grad_norm": 0.8483227491378784,
1229
+ "learning_rate": 2.0235143634525144e-05,
1230
+ "loss": 1.8744,
1231
+ "step": 8550
1232
+ },
1233
+ {
1234
+ "epoch": 1.1553032847450795,
1235
+ "grad_norm": 1.0728415250778198,
1236
+ "learning_rate": 1.9968482174570154e-05,
1237
+ "loss": 1.8707,
1238
+ "step": 8600
1239
+ },
1240
+ {
1241
+ "epoch": 1.1620205548465103,
1242
+ "grad_norm": 0.9894828796386719,
1243
+ "learning_rate": 1.970241602019288e-05,
1244
+ "loss": 1.8722,
1245
+ "step": 8650
1246
+ },
1247
+ {
1248
+ "epoch": 1.1687378249479412,
1249
+ "grad_norm": 0.9536831378936768,
1250
+ "learning_rate": 1.9436976651092144e-05,
1251
+ "loss": 1.875,
1252
+ "step": 8700
1253
+ },
1254
+ {
1255
+ "epoch": 1.1754550950493718,
1256
+ "grad_norm": 0.8805471658706665,
1257
+ "learning_rate": 1.9172195472808457e-05,
1258
+ "loss": 1.8671,
1259
+ "step": 8750
1260
+ },
1261
+ {
1262
+ "epoch": 1.1821723651508027,
1263
+ "grad_norm": 0.9444993734359741,
1264
+ "learning_rate": 1.890810381300831e-05,
1265
+ "loss": 1.8639,
1266
+ "step": 8800
1267
+ },
1268
+ {
1269
+ "epoch": 1.1888896352522336,
1270
+ "grad_norm": 0.9312206506729126,
1271
+ "learning_rate": 1.8644732917777578e-05,
1272
+ "loss": 1.8704,
1273
+ "step": 8850
1274
+ },
1275
+ {
1276
+ "epoch": 1.1956069053536642,
1277
+ "grad_norm": 0.8733024001121521,
1278
+ "learning_rate": 1.838211394792468e-05,
1279
+ "loss": 1.8661,
1280
+ "step": 8900
1281
+ },
1282
+ {
1283
+ "epoch": 1.202324175455095,
1284
+ "grad_norm": 0.9139480590820312,
1285
+ "learning_rate": 1.812027797529372e-05,
1286
+ "loss": 1.8733,
1287
+ "step": 8950
1288
+ },
1289
+ {
1290
+ "epoch": 1.209041445556526,
1291
+ "grad_norm": 0.8931354880332947,
1292
+ "learning_rate": 1.7859255979088268e-05,
1293
+ "loss": 1.8613,
1294
+ "step": 9000
1295
+ },
1296
+ {
1297
+ "epoch": 1.2157587156579566,
1298
+ "grad_norm": 0.8285694122314453,
1299
+ "learning_rate": 1.7599078842206024e-05,
1300
+ "loss": 1.8596,
1301
+ "step": 9050
1302
+ },
1303
+ {
1304
+ "epoch": 1.2224759857593874,
1305
+ "grad_norm": 0.9247407913208008,
1306
+ "learning_rate": 1.7339777347584896e-05,
1307
+ "loss": 1.8789,
1308
+ "step": 9100
1309
+ },
1310
+ {
1311
+ "epoch": 1.229193255860818,
1312
+ "grad_norm": 1.0588265657424927,
1313
+ "learning_rate": 1.708138217456088e-05,
1314
+ "loss": 1.8627,
1315
+ "step": 9150
1316
+ },
1317
+ {
1318
+ "epoch": 1.235910525962249,
1319
+ "grad_norm": 0.8769893646240234,
1320
+ "learning_rate": 1.6823923895238303e-05,
1321
+ "loss": 1.8663,
1322
+ "step": 9200
1323
+ },
1324
+ {
1325
+ "epoch": 1.2426277960636798,
1326
+ "grad_norm": 0.9970062375068665,
1327
+ "learning_rate": 1.6567432970872587e-05,
1328
+ "loss": 1.8632,
1329
+ "step": 9250
1330
+ },
1331
+ {
1332
+ "epoch": 1.2493450661651104,
1333
+ "grad_norm": 0.9387636780738831,
1334
+ "learning_rate": 1.6311939748266282e-05,
1335
+ "loss": 1.8658,
1336
+ "step": 9300
1337
+ },
1338
+ {
1339
+ "epoch": 1.2560623362665413,
1340
+ "grad_norm": 0.9504996538162231,
1341
+ "learning_rate": 1.605747445617851e-05,
1342
+ "loss": 1.8655,
1343
+ "step": 9350
1344
+ },
1345
+ {
1346
+ "epoch": 1.262779606367972,
1347
+ "grad_norm": 0.9407314658164978,
1348
+ "learning_rate": 1.5804067201748526e-05,
1349
+ "loss": 1.863,
1350
+ "step": 9400
1351
+ },
1352
+ {
1353
+ "epoch": 1.2694968764694028,
1354
+ "grad_norm": 0.8854203224182129,
1355
+ "learning_rate": 1.55517479669335e-05,
1356
+ "loss": 1.8636,
1357
+ "step": 9450
1358
+ },
1359
+ {
1360
+ "epoch": 1.2762141465708337,
1361
+ "grad_norm": 0.9684790372848511,
1362
+ "learning_rate": 1.530054660496125e-05,
1363
+ "loss": 1.8571,
1364
+ "step": 9500
1365
+ },
1366
+ {
1367
+ "epoch": 1.2829314166722643,
1368
+ "grad_norm": 0.896827757358551,
1369
+ "learning_rate": 1.5050492836798091e-05,
1370
+ "loss": 1.8549,
1371
+ "step": 9550
1372
+ },
1373
+ {
1374
+ "epoch": 1.2896486867736952,
1375
+ "grad_norm": 0.8551830649375916,
1376
+ "learning_rate": 1.4801616247632455e-05,
1377
+ "loss": 1.8664,
1378
+ "step": 9600
1379
+ },
1380
+ {
1381
+ "epoch": 1.296365956875126,
1382
+ "grad_norm": 0.911526620388031,
1383
+ "learning_rate": 1.4553946283374475e-05,
1384
+ "loss": 1.8639,
1385
+ "step": 9650
1386
+ },
1387
+ {
1388
+ "epoch": 1.3030832269765567,
1389
+ "grad_norm": 0.8315178155899048,
1390
+ "learning_rate": 1.4307512247172077e-05,
1391
+ "loss": 1.854,
1392
+ "step": 9700
1393
+ },
1394
+ {
1395
+ "epoch": 1.3098004970779875,
1396
+ "grad_norm": 0.8541605472564697,
1397
+ "learning_rate": 1.4062343295943998e-05,
1398
+ "loss": 1.8609,
1399
+ "step": 9750
1400
+ },
1401
+ {
1402
+ "epoch": 1.3165177671794184,
1403
+ "grad_norm": 0.8474893569946289,
1404
+ "learning_rate": 1.381846843693002e-05,
1405
+ "loss": 1.8603,
1406
+ "step": 9800
1407
+ },
1408
+ {
1409
+ "epoch": 1.323235037280849,
1410
+ "grad_norm": 0.8238804340362549,
1411
+ "learning_rate": 1.357591652425904e-05,
1412
+ "loss": 1.8509,
1413
+ "step": 9850
1414
+ },
1415
+ {
1416
+ "epoch": 1.3299523073822799,
1417
+ "grad_norm": 0.8796683549880981,
1418
+ "learning_rate": 1.3334716255535146e-05,
1419
+ "loss": 1.8595,
1420
+ "step": 9900
1421
+ },
1422
+ {
1423
+ "epoch": 1.3331765970309666,
1424
+ "eval_loss": 1.7995295524597168,
1425
+ "eval_runtime": 1651.9195,
1426
+ "eval_samples_per_second": 72.285,
1427
+ "eval_steps_per_second": 9.036,
1428
+ "step": 9924
1429
+ },
1430
+ {
1431
+ "epoch": 1.3366695774837107,
1432
+ "grad_norm": 0.8274658918380737,
1433
+ "learning_rate": 1.309489616844225e-05,
1434
+ "loss": 1.8547,
1435
+ "step": 9950
1436
+ },
1437
+ {
1438
+ "epoch": 1.3433868475851414,
1439
+ "grad_norm": 1.110520362854004,
1440
+ "learning_rate": 1.2856484637367655e-05,
1441
+ "loss": 1.8543,
1442
+ "step": 10000
1443
+ },
1444
+ {
1445
+ "epoch": 1.3501041176865722,
1446
+ "grad_norm": 0.991307258605957,
1447
+ "learning_rate": 1.2619509870044926e-05,
1448
+ "loss": 1.8542,
1449
+ "step": 10050
1450
+ },
1451
+ {
1452
+ "epoch": 1.3568213877880029,
1453
+ "grad_norm": 1.0729873180389404,
1454
+ "learning_rate": 1.2383999904216485e-05,
1455
+ "loss": 1.855,
1456
+ "step": 10100
1457
+ },
1458
+ {
1459
+ "epoch": 1.3635386578894337,
1460
+ "grad_norm": 0.9162331819534302,
1461
+ "learning_rate": 1.2149982604316311e-05,
1462
+ "loss": 1.862,
1463
+ "step": 10150
1464
+ },
1465
+ {
1466
+ "epoch": 1.3702559279908644,
1467
+ "grad_norm": 1.0340936183929443,
1468
+ "learning_rate": 1.1917485658173145e-05,
1469
+ "loss": 1.845,
1470
+ "step": 10200
1471
+ },
1472
+ {
1473
+ "epoch": 1.3769731980922952,
1474
+ "grad_norm": 0.9109688401222229,
1475
+ "learning_rate": 1.1686536573734625e-05,
1476
+ "loss": 1.854,
1477
+ "step": 10250
1478
+ },
1479
+ {
1480
+ "epoch": 1.383690468193726,
1481
+ "grad_norm": 0.901867151260376,
1482
+ "learning_rate": 1.1457162675812647e-05,
1483
+ "loss": 1.8607,
1484
+ "step": 10300
1485
+ },
1486
+ {
1487
+ "epoch": 1.3904077382951567,
1488
+ "grad_norm": 0.9229015111923218,
1489
+ "learning_rate": 1.1229391102850428e-05,
1490
+ "loss": 1.8508,
1491
+ "step": 10350
1492
+ },
1493
+ {
1494
+ "epoch": 1.3971250083965876,
1495
+ "grad_norm": 0.8118278384208679,
1496
+ "learning_rate": 1.1003248803711625e-05,
1497
+ "loss": 1.8478,
1498
+ "step": 10400
1499
+ },
1500
+ {
1501
+ "epoch": 1.4038422784980185,
1502
+ "grad_norm": 0.8655158281326294,
1503
+ "learning_rate": 1.0778762534491849e-05,
1504
+ "loss": 1.8583,
1505
+ "step": 10450
1506
+ },
1507
+ {
1508
+ "epoch": 1.410559548599449,
1509
+ "grad_norm": 0.8829045295715332,
1510
+ "learning_rate": 1.0555958855353029e-05,
1511
+ "loss": 1.8536,
1512
+ "step": 10500
1513
+ },
1514
+ {
1515
+ "epoch": 1.41727681870088,
1516
+ "grad_norm": 0.8320277333259583,
1517
+ "learning_rate": 1.0334864127380931e-05,
1518
+ "loss": 1.8546,
1519
+ "step": 10550
1520
+ },
1521
+ {
1522
+ "epoch": 1.4239940888023108,
1523
+ "grad_norm": 0.8472936749458313,
1524
+ "learning_rate": 1.0115504509466244e-05,
1525
+ "loss": 1.8528,
1526
+ "step": 10600
1527
+ },
1528
+ {
1529
+ "epoch": 1.4307113589037415,
1530
+ "grad_norm": 0.9551612734794617,
1531
+ "learning_rate": 9.89790595520956e-06,
1532
+ "loss": 1.846,
1533
+ "step": 10650
1534
+ },
1535
+ {
1536
+ "epoch": 1.4374286290051723,
1537
+ "grad_norm": 0.8682100772857666,
1538
+ "learning_rate": 9.6820942098507e-06,
1539
+ "loss": 1.8541,
1540
+ "step": 10700
1541
+ },
1542
+ {
1543
+ "epoch": 1.4441458991066032,
1544
+ "grad_norm": 0.7923955321311951,
1545
+ "learning_rate": 9.468094807222633e-06,
1546
+ "loss": 1.8502,
1547
+ "step": 10750
1548
+ },
1549
+ {
1550
+ "epoch": 1.4508631692080338,
1551
+ "grad_norm": 0.7568480968475342,
1552
+ "learning_rate": 9.255933066730449e-06,
1553
+ "loss": 1.8622,
1554
+ "step": 10800
1555
+ },
1556
+ {
1557
+ "epoch": 1.4575804393094647,
1558
+ "grad_norm": 0.9057785868644714,
1559
+ "learning_rate": 9.045634090355667e-06,
1560
+ "loss": 1.852,
1561
+ "step": 10850
1562
+ },
1563
+ {
1564
+ "epoch": 1.4642977094108955,
1565
+ "grad_norm": 0.8364023566246033,
1566
+ "learning_rate": 8.837222759686306e-06,
1567
+ "loss": 1.8495,
1568
+ "step": 10900
1569
+ },
1570
+ {
1571
+ "epoch": 1.4710149795123262,
1572
+ "grad_norm": 0.9518344402313232,
1573
+ "learning_rate": 8.630723732972998e-06,
1574
+ "loss": 1.8545,
1575
+ "step": 10950
1576
+ },
1577
+ {
1578
+ "epoch": 1.477732249613757,
1579
+ "grad_norm": 0.8617038130760193,
1580
+ "learning_rate": 8.426161442211552e-06,
1581
+ "loss": 1.8491,
1582
+ "step": 11000
1583
+ },
1584
+ {
1585
+ "epoch": 1.4844495197151877,
1586
+ "grad_norm": 0.9056522846221924,
1587
+ "learning_rate": 8.22356009025225e-06,
1588
+ "loss": 1.8539,
1589
+ "step": 11050
1590
+ },
1591
+ {
1592
+ "epoch": 1.4911667898166185,
1593
+ "grad_norm": 0.7397828102111816,
1594
+ "learning_rate": 8.022943647936315e-06,
1595
+ "loss": 1.8494,
1596
+ "step": 11100
1597
+ },
1598
+ {
1599
+ "epoch": 1.4978840599180492,
1600
+ "grad_norm": 0.7917934656143188,
1601
+ "learning_rate": 7.82433585125977e-06,
1602
+ "loss": 1.8504,
1603
+ "step": 11150
1604
+ },
1605
+ {
1606
+ "epoch": 1.50460133001948,
1607
+ "grad_norm": 0.8637117147445679,
1608
+ "learning_rate": 7.627760198565112e-06,
1609
+ "loss": 1.8502,
1610
+ "step": 11200
1611
+ },
1612
+ {
1613
+ "epoch": 1.511318600120911,
1614
+ "grad_norm": 1.025888204574585,
1615
+ "learning_rate": 7.433239947761095e-06,
1616
+ "loss": 1.8503,
1617
+ "step": 11250
1618
+ },
1619
+ {
1620
+ "epoch": 1.5180358702223415,
1621
+ "grad_norm": 0.7819210886955261,
1622
+ "learning_rate": 7.2407981135709735e-06,
1623
+ "loss": 1.8465,
1624
+ "step": 11300
1625
+ },
1626
+ {
1627
+ "epoch": 1.5247531403237724,
1628
+ "grad_norm": 0.9122748374938965,
1629
+ "learning_rate": 7.050457464809495e-06,
1630
+ "loss": 1.8467,
1631
+ "step": 11350
1632
+ },
1633
+ {
1634
+ "epoch": 1.5314704104252033,
1635
+ "grad_norm": 0.915520429611206,
1636
+ "learning_rate": 6.862240521689011e-06,
1637
+ "loss": 1.8435,
1638
+ "step": 11400
1639
+ },
1640
+ {
1641
+ "epoch": 1.538187680526634,
1642
+ "grad_norm": 0.7819466590881348,
1643
+ "learning_rate": 6.676169553154993e-06,
1644
+ "loss": 1.8476,
1645
+ "step": 11450
1646
+ },
1647
+ {
1648
+ "epoch": 1.5449049506280648,
1649
+ "grad_norm": 0.9129722714424133,
1650
+ "learning_rate": 6.492266574251249e-06,
1651
+ "loss": 1.8534,
1652
+ "step": 11500
1653
+ },
1654
+ {
1655
+ "epoch": 1.5516222207294956,
1656
+ "grad_norm": 0.834613561630249,
1657
+ "learning_rate": 6.310553343515249e-06,
1658
+ "loss": 1.8522,
1659
+ "step": 11550
1660
+ },
1661
+ {
1662
+ "epoch": 1.5583394908309263,
1663
+ "grad_norm": 0.8443191647529602,
1664
+ "learning_rate": 6.131051360403731e-06,
1665
+ "loss": 1.844,
1666
+ "step": 11600
1667
+ },
1668
+ {
1669
+ "epoch": 1.5650567609323571,
1670
+ "grad_norm": 0.858828604221344,
1671
+ "learning_rate": 5.953781862748983e-06,
1672
+ "loss": 1.8446,
1673
+ "step": 11650
1674
+ },
1675
+ {
1676
+ "epoch": 1.571774031033788,
1677
+ "grad_norm": 0.7843521237373352,
1678
+ "learning_rate": 5.778765824246099e-06,
1679
+ "loss": 1.8465,
1680
+ "step": 11700
1681
+ },
1682
+ {
1683
+ "epoch": 1.5784913011352186,
1684
+ "grad_norm": 0.9034683704376221,
1685
+ "learning_rate": 5.6060239519714565e-06,
1686
+ "loss": 1.8485,
1687
+ "step": 11750
1688
+ },
1689
+ {
1690
+ "epoch": 1.5852085712366493,
1691
+ "grad_norm": 0.8224745392799377,
1692
+ "learning_rate": 5.435576683932758e-06,
1693
+ "loss": 1.844,
1694
+ "step": 11800
1695
+ },
1696
+ {
1697
+ "epoch": 1.5919258413380803,
1698
+ "grad_norm": 0.8487153649330139,
1699
+ "learning_rate": 5.267444186650908e-06,
1700
+ "loss": 1.8399,
1701
+ "step": 11850
1702
+ },
1703
+ {
1704
+ "epoch": 1.598643111439511,
1705
+ "grad_norm": 0.9923579692840576,
1706
+ "learning_rate": 5.101646352773973e-06,
1707
+ "loss": 1.8438,
1708
+ "step": 11900
1709
+ },
1710
+ {
1711
+ "epoch": 1.6053603815409416,
1712
+ "grad_norm": 1.0088372230529785,
1713
+ "learning_rate": 4.938202798723632e-06,
1714
+ "loss": 1.8533,
1715
+ "step": 11950
1716
+ },
1717
+ {
1718
+ "epoch": 1.6120776516423725,
1719
+ "grad_norm": 0.9827088713645935,
1720
+ "learning_rate": 4.777132862374201e-06,
1721
+ "loss": 1.8482,
1722
+ "step": 12000
1723
+ },
1724
+ {
1725
+ "epoch": 1.6187949217438034,
1726
+ "grad_norm": 0.7875204086303711,
1727
+ "learning_rate": 4.618455600764701e-06,
1728
+ "loss": 1.846,
1729
+ "step": 12050
1730
+ },
1731
+ {
1732
+ "epoch": 1.625512191845234,
1733
+ "grad_norm": 0.8284263610839844,
1734
+ "learning_rate": 4.462189787844101e-06,
1735
+ "loss": 1.8462,
1736
+ "step": 12100
1737
+ },
1738
+ {
1739
+ "epoch": 1.6322294619466649,
1740
+ "grad_norm": 0.9832016229629517,
1741
+ "learning_rate": 4.308353912250077e-06,
1742
+ "loss": 1.8467,
1743
+ "step": 12150
1744
+ },
1745
+ {
1746
+ "epoch": 1.6389467320480957,
1747
+ "grad_norm": 0.8272432088851929,
1748
+ "learning_rate": 4.156966175121524e-06,
1749
+ "loss": 1.8507,
1750
+ "step": 12200
1751
+ },
1752
+ {
1753
+ "epoch": 1.6456640021495264,
1754
+ "grad_norm": 0.8643631935119629,
1755
+ "learning_rate": 4.008044487945087e-06,
1756
+ "loss": 1.851,
1757
+ "step": 12250
1758
+ },
1759
+ {
1760
+ "epoch": 1.6523812722509572,
1761
+ "grad_norm": 0.821950376033783,
1762
+ "learning_rate": 3.861606470435939e-06,
1763
+ "loss": 1.8422,
1764
+ "step": 12300
1765
+ },
1766
+ {
1767
+ "epoch": 1.659098542352388,
1768
+ "grad_norm": 0.8796666860580444,
1769
+ "learning_rate": 3.717669448453126e-06,
1770
+ "loss": 1.8456,
1771
+ "step": 12350
1772
+ },
1773
+ {
1774
+ "epoch": 1.6658158124538187,
1775
+ "grad_norm": 0.8530693650245667,
1776
+ "learning_rate": 3.5762504519496365e-06,
1777
+ "loss": 1.8408,
1778
+ "step": 12400
1779
+ },
1780
+ {
1781
+ "epoch": 1.666487539463962,
1782
+ "eval_loss": 1.7879396677017212,
1783
+ "eval_runtime": 1652.1563,
1784
+ "eval_samples_per_second": 72.275,
1785
+ "eval_steps_per_second": 9.035,
1786
+ "step": 12405
1787
+ },
1788
+ {
1789
+ "epoch": 1.6725330825552496,
1790
+ "grad_norm": 0.836791455745697,
1791
+ "learning_rate": 3.437366212957502e-06,
1792
+ "loss": 1.8465,
1793
+ "step": 12450
1794
+ },
1795
+ {
1796
+ "epoch": 1.6792503526566804,
1797
+ "grad_norm": 0.7426266670227051,
1798
+ "learning_rate": 3.3010331636081387e-06,
1799
+ "loss": 1.8475,
1800
+ "step": 12500
1801
+ },
1802
+ {
1803
+ "epoch": 1.685967622758111,
1804
+ "grad_norm": 0.8120975494384766,
1805
+ "learning_rate": 3.167267434188173e-06,
1806
+ "loss": 1.847,
1807
+ "step": 12550
1808
+ },
1809
+ {
1810
+ "epoch": 1.692684892859542,
1811
+ "grad_norm": 0.7585613131523132,
1812
+ "learning_rate": 3.0360848512309887e-06,
1813
+ "loss": 1.8452,
1814
+ "step": 12600
1815
+ },
1816
+ {
1817
+ "epoch": 1.6994021629609728,
1818
+ "grad_norm": 0.8330528736114502,
1819
+ "learning_rate": 2.907500935644203e-06,
1820
+ "loss": 1.8455,
1821
+ "step": 12650
1822
+ },
1823
+ {
1824
+ "epoch": 1.7061194330624034,
1825
+ "grad_norm": 0.998671293258667,
1826
+ "learning_rate": 2.781530900873305e-06,
1827
+ "loss": 1.8499,
1828
+ "step": 12700
1829
+ },
1830
+ {
1831
+ "epoch": 1.712836703163834,
1832
+ "grad_norm": 0.8430862426757812,
1833
+ "learning_rate": 2.6581896511016614e-06,
1834
+ "loss": 1.8425,
1835
+ "step": 12750
1836
+ },
1837
+ {
1838
+ "epoch": 1.7195539732652652,
1839
+ "grad_norm": 0.9203202724456787,
1840
+ "learning_rate": 2.537491779487147e-06,
1841
+ "loss": 1.8374,
1842
+ "step": 12800
1843
+ },
1844
+ {
1845
+ "epoch": 1.7262712433666958,
1846
+ "grad_norm": 1.0353131294250488,
1847
+ "learning_rate": 2.419451566435532e-06,
1848
+ "loss": 1.8439,
1849
+ "step": 12850
1850
+ },
1851
+ {
1852
+ "epoch": 1.7329885134681264,
1853
+ "grad_norm": 0.8823174834251404,
1854
+ "learning_rate": 2.3040829779108985e-06,
1855
+ "loss": 1.8408,
1856
+ "step": 12900
1857
+ },
1858
+ {
1859
+ "epoch": 1.7397057835695573,
1860
+ "grad_norm": 0.8599026799201965,
1861
+ "learning_rate": 2.19139966378325e-06,
1862
+ "loss": 1.8518,
1863
+ "step": 12950
1864
+ },
1865
+ {
1866
+ "epoch": 1.7464230536709882,
1867
+ "grad_norm": 0.8370893597602844,
1868
+ "learning_rate": 2.081414956213526e-06,
1869
+ "loss": 1.8489,
1870
+ "step": 13000
1871
+ },
1872
+ {
1873
+ "epoch": 1.7531403237724188,
1874
+ "grad_norm": 0.7799704670906067,
1875
+ "learning_rate": 1.9741418680762013e-06,
1876
+ "loss": 1.845,
1877
+ "step": 13050
1878
+ },
1879
+ {
1880
+ "epoch": 1.7598575938738497,
1881
+ "grad_norm": 0.8190951943397522,
1882
+ "learning_rate": 1.8695930914196664e-06,
1883
+ "loss": 1.8458,
1884
+ "step": 13100
1885
+ },
1886
+ {
1887
+ "epoch": 1.7665748639752805,
1888
+ "grad_norm": 0.8451455235481262,
1889
+ "learning_rate": 1.7677809959645548e-06,
1890
+ "loss": 1.8446,
1891
+ "step": 13150
1892
+ },
1893
+ {
1894
+ "epoch": 1.7732921340767112,
1895
+ "grad_norm": 0.8454949855804443,
1896
+ "learning_rate": 1.6687176276402261e-06,
1897
+ "loss": 1.8373,
1898
+ "step": 13200
1899
+ },
1900
+ {
1901
+ "epoch": 1.780009404178142,
1902
+ "grad_norm": 0.9424586296081543,
1903
+ "learning_rate": 1.572414707159553e-06,
1904
+ "loss": 1.8441,
1905
+ "step": 13250
1906
+ },
1907
+ {
1908
+ "epoch": 1.7867266742795729,
1909
+ "grad_norm": 0.8610557913780212,
1910
+ "learning_rate": 1.4788836286321606e-06,
1911
+ "loss": 1.8398,
1912
+ "step": 13300
1913
+ },
1914
+ {
1915
+ "epoch": 1.7934439443810035,
1916
+ "grad_norm": 0.9130797982215881,
1917
+ "learning_rate": 1.3881354582163525e-06,
1918
+ "loss": 1.8398,
1919
+ "step": 13350
1920
+ },
1921
+ {
1922
+ "epoch": 1.8001612144824344,
1923
+ "grad_norm": 0.9039607048034668,
1924
+ "learning_rate": 1.3001809328097914e-06,
1925
+ "loss": 1.8472,
1926
+ "step": 13400
1927
+ },
1928
+ {
1929
+ "epoch": 1.8068784845838652,
1930
+ "grad_norm": 0.8576995730400085,
1931
+ "learning_rate": 1.2150304587791873e-06,
1932
+ "loss": 1.8481,
1933
+ "step": 13450
1934
+ },
1935
+ {
1936
+ "epoch": 1.8135957546852959,
1937
+ "grad_norm": 0.9080734252929688,
1938
+ "learning_rate": 1.1326941107290351e-06,
1939
+ "loss": 1.8402,
1940
+ "step": 13500
1941
+ },
1942
+ {
1943
+ "epoch": 1.8203130247867265,
1944
+ "grad_norm": 0.8428635597229004,
1945
+ "learning_rate": 1.053181630309666e-06,
1946
+ "loss": 1.846,
1947
+ "step": 13550
1948
+ },
1949
+ {
1950
+ "epoch": 1.8270302948881576,
1951
+ "grad_norm": 0.890152633190155,
1952
+ "learning_rate": 9.765024250646238e-07,
1953
+ "loss": 1.8529,
1954
+ "step": 13600
1955
+ },
1956
+ {
1957
+ "epoch": 1.8337475649895882,
1958
+ "grad_norm": 0.803615391254425,
1959
+ "learning_rate": 9.026655673176454e-07,
1960
+ "loss": 1.8444,
1961
+ "step": 13650
1962
+ },
1963
+ {
1964
+ "epoch": 1.8404648350910189,
1965
+ "grad_norm": 0.8416168093681335,
1966
+ "learning_rate": 8.316797930992465e-07,
1967
+ "loss": 1.8413,
1968
+ "step": 13700
1969
+ },
1970
+ {
1971
+ "epoch": 1.8471821051924497,
1972
+ "grad_norm": 0.9183295369148254,
1973
+ "learning_rate": 7.635535011131178e-07,
1974
+ "loss": 1.8438,
1975
+ "step": 13750
1976
+ },
1977
+ {
1978
+ "epoch": 1.8538993752938806,
1979
+ "grad_norm": 0.8079262375831604,
1980
+ "learning_rate": 6.982947517424315e-07,
1981
+ "loss": 1.8446,
1982
+ "step": 13800
1983
+ },
1984
+ {
1985
+ "epoch": 1.8606166453953112,
1986
+ "grad_norm": 0.7838028073310852,
1987
+ "learning_rate": 6.35911266096173e-07,
1988
+ "loss": 1.845,
1989
+ "step": 13850
1990
+ },
1991
+ {
1992
+ "epoch": 1.867333915496742,
1993
+ "grad_norm": 0.8391366004943848,
1994
+ "learning_rate": 5.764104250956165e-07,
1995
+ "loss": 1.8483,
1996
+ "step": 13900
1997
+ },
1998
+ {
1999
+ "epoch": 1.874051185598173,
2000
+ "grad_norm": 0.9014426469802856,
2001
+ "learning_rate": 5.197992686010511e-07,
2002
+ "loss": 1.8438,
2003
+ "step": 13950
2004
+ },
2005
+ {
2006
+ "epoch": 1.8807684556996036,
2007
+ "grad_norm": 0.9228802919387817,
2008
+ "learning_rate": 4.660844945788501e-07,
2009
+ "loss": 1.8474,
2010
+ "step": 14000
2011
+ },
2012
+ {
2013
+ "epoch": 1.8874857258010345,
2014
+ "grad_norm": 0.7464848756790161,
2015
+ "learning_rate": 4.1527245830901563e-07,
2016
+ "loss": 1.8436,
2017
+ "step": 14050
2018
+ },
2019
+ {
2020
+ "epoch": 1.8942029959024653,
2021
+ "grad_norm": 0.771722674369812,
2022
+ "learning_rate": 3.6736917163322505e-07,
2023
+ "loss": 1.8438,
2024
+ "step": 14100
2025
+ },
2026
+ {
2027
+ "epoch": 1.900920266003896,
2028
+ "grad_norm": 0.7891733050346375,
2029
+ "learning_rate": 3.2238030224356897e-07,
2030
+ "loss": 1.8423,
2031
+ "step": 14150
2032
+ },
2033
+ {
2034
+ "epoch": 1.9076375361053268,
2035
+ "grad_norm": 0.8450707197189331,
2036
+ "learning_rate": 2.803111730119545e-07,
2037
+ "loss": 1.843,
2038
+ "step": 14200
2039
+ },
2040
+ {
2041
+ "epoch": 1.9143548062067577,
2042
+ "grad_norm": 0.9002699255943298,
2043
+ "learning_rate": 2.4116676136033135e-07,
2044
+ "loss": 1.8433,
2045
+ "step": 14250
2046
+ },
2047
+ {
2048
+ "epoch": 1.9210720763081883,
2049
+ "grad_norm": 0.8603649735450745,
2050
+ "learning_rate": 2.049516986717931e-07,
2051
+ "loss": 1.8396,
2052
+ "step": 14300
2053
+ },
2054
+ {
2055
+ "epoch": 1.927789346409619,
2056
+ "grad_norm": 0.8522045612335205,
2057
+ "learning_rate": 1.7167026974261313e-07,
2058
+ "loss": 1.8498,
2059
+ "step": 14350
2060
+ },
2061
+ {
2062
+ "epoch": 1.93450661651105,
2063
+ "grad_norm": 0.8563119769096375,
2064
+ "learning_rate": 1.4132641227528054e-07,
2065
+ "loss": 1.841,
2066
+ "step": 14400
2067
+ },
2068
+ {
2069
+ "epoch": 1.9412238866124807,
2070
+ "grad_norm": 0.8482009172439575,
2071
+ "learning_rate": 1.1392371641262001e-07,
2072
+ "loss": 1.8502,
2073
+ "step": 14450
2074
+ },
2075
+ {
2076
+ "epoch": 1.9479411567139113,
2077
+ "grad_norm": 0.8035847544670105,
2078
+ "learning_rate": 8.946542431300942e-08,
2079
+ "loss": 1.8467,
2080
+ "step": 14500
2081
+ },
2082
+ {
2083
+ "epoch": 1.9546584268153424,
2084
+ "grad_norm": 0.8062044382095337,
2085
+ "learning_rate": 6.795442976679501e-08,
2086
+ "loss": 1.8438,
2087
+ "step": 14550
2088
+ },
2089
+ {
2090
+ "epoch": 1.961375696916773,
2091
+ "grad_norm": 0.7858136296272278,
2092
+ "learning_rate": 4.939327785390691e-08,
2093
+ "loss": 1.8431,
2094
+ "step": 14600
2095
+ },
2096
+ {
2097
+ "epoch": 1.9680929670182037,
2098
+ "grad_norm": 0.7691812515258789,
2099
+ "learning_rate": 3.37841646427417e-08,
2100
+ "loss": 1.8442,
2101
+ "step": 14650
2102
+ },
2103
+ {
2104
+ "epoch": 1.9748102371196345,
2105
+ "grad_norm": 0.8699848055839539,
2106
+ "learning_rate": 2.1128936930320254e-08,
2107
+ "loss": 1.8388,
2108
+ "step": 14700
2109
+ },
2110
+ {
2111
+ "epoch": 1.9815275072210654,
2112
+ "grad_norm": 0.7812600135803223,
2113
+ "learning_rate": 1.142909202380138e-08,
2114
+ "loss": 1.8403,
2115
+ "step": 14750
2116
+ },
2117
+ {
2118
+ "epoch": 1.988244777322496,
2119
+ "grad_norm": 0.856472373008728,
2120
+ "learning_rate": 4.6857775633152305e-09,
2121
+ "loss": 1.837,
2122
+ "step": 14800
2123
+ },
2124
+ {
2125
+ "epoch": 1.994962047423927,
2126
+ "grad_norm": 0.7702723741531372,
2127
+ "learning_rate": 8.997913861857888e-10,
2128
+ "loss": 1.8449,
2129
+ "step": 14850
2130
+ },
2131
+ {
2132
+ "epoch": 1.9997984818969572,
2133
+ "eval_loss": 1.786035418510437,
2134
+ "eval_runtime": 1652.2214,
2135
+ "eval_samples_per_second": 72.272,
2136
+ "eval_steps_per_second": 9.035,
2137
+ "step": 14886
2138
+ }
2139
+ ],
2140
+ "logging_steps": 50,
2141
+ "max_steps": 14888,
2142
+ "num_input_tokens_seen": 0,
2143
+ "num_train_epochs": 2,
2144
+ "save_steps": 2481,
2145
+ "stateful_callbacks": {
2146
+ "TrainerControl": {
2147
+ "args": {
2148
+ "should_epoch_stop": false,
2149
+ "should_evaluate": false,
2150
+ "should_log": false,
2151
+ "should_save": true,
2152
+ "should_training_stop": true
2153
+ },
2154
+ "attributes": {}
2155
+ }
2156
+ },
2157
+ "total_flos": 3.539345213296214e+18,
2158
+ "train_batch_size": 2,
2159
+ "trial_name": null,
2160
+ "trial_params": null
2161
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff