XF-Blog
ProjectMachine LearningdevelopmentAbout
PROJECT
preliminary experiment for LLM distillation and pretraining
preliminary experiment for LLM distillation and pretraining
This experiment is to verify the effectiveness of the various methods from papers.

Language Model on TinyStories

About this project

What is this project?

What is the significance and purpose of this project?

What is the limitation of this project?

The distillation and training process

About the model structure

How to distill and train the model

What is the error that has been corrected?

Figure 1: shows each loss component. After 2000 steps, loss_soft and loss_mimic are not used in the training process.

Figure 2: learning rate for distillation and training.


Work in progress

What can be done to improve the effect? Why is the effect not good?

Potential problems

What is the work in progress?

What is the future work?