bluelightai-dev/clt-mixed-eval-data-tokenized-Qwen3
Viewer
•
Updated
•
115k
•
50
bluelightai-dev/clt-mixed-eval-data
Viewer
•
Updated
•
60k
•
25
bluelightai-dev/clt-mixed-data-tokenized-Qwen3
Viewer
•
Updated
•
2.6M
•
113
bluelightai-dev/clt-pretrain-eval-data-tokenized-Qwen3-256
Viewer
•
Updated
•
194k
•
85
bluelightai-dev/clt-pretrain-data-dedup-tokenized-Qwen3-1024
Viewer
•
Updated
•
2.52M
•
122
bluelightai-dev/clt-pretrain-data-v2-dedup
Preview
•
Updated
•
62
bluelightai-dev/clt-pretrain-data-tokenized-Qwen3-1024
Viewer
•
Updated
•
2.44M
•
73
bluelightai-dev/clt-pretrain-data-v2
Preview
•
Updated
•
53
bluelightai-dev/MathPile_Commercial-formatted
Viewer
•
Updated
•
389k
•
23
bluelightai-dev/clt_posttrain_data_tokenized
Viewer
•
Updated
•
1.34M
•
12
bluelightai-dev/common-corpus-sample-open-web
Viewer
•
Updated
•
4.8M
•
10
bluelightai-dev/common-corpus-sample-open-source
Viewer
•
Updated
•
2.02M
•
6
bluelightai-dev/common-corpus-sample-open-science
Viewer
•
Updated
•
284k
•
7
bluelightai-dev/common-corpus-sample-open-government
Viewer
•
Updated
•
373k
•
15
•
1
bluelightai-dev/common-corpus-sample-open-culture
Viewer
•
Updated
•
462k
•
2
bluelightai-dev/clt_posttrain_data_tokenized_test_1000
Viewer
•
Updated
•
1.22k
•
3
bluelightai-dev/dclm-full-deduped-sample
Viewer
•
Updated
•
4.92M
•
5
bluelightai-dev/the-stack-dedup-sample
Viewer
•
Updated
•
474k
•
5
bluelightai-dev/pythia_clt_pretrain_data_tokenized
Viewer
•
Updated
•
3.5M
•
14
bluelightai-dev/clt_eval_data_qwen3_tokenized_256
Viewer
•
Updated
•
245k
•
13
bluelightai-dev/clt_pretrain_data_qwen_tokenized
Viewer
•
Updated
•
16.7M
•
346
bluelightai-dev/clt_posttrain_data_qwen_tokenized
Viewer
•
Updated
•
1.34M
•
35
bluelightai-dev/clt_pretrain_data
Viewer
•
Updated
•
6.12M
•
555
bluelightai-dev/clt_posttrain_data
Viewer
•
Updated
•
935k
•
135