1 Distillation with Reasoning: can DeepSeek R1 Teach Better Than Humans?
karolyntarpley edited this page 2025-02-11 22:35:15 +08:00


Inclusion of thinking "chains of idea" (CoT) in the model output significantly improves its quality, however it increases reasoning expense. - Distillation transfers thinking understanding from a pricey teacher model to a more economical trainee, reducing general inference expense.

  1. A human expert's chain of idea.
  2. The final answer.

    We broadened this dataset by including:

    Synthetic R1 reasoning, i.e., the CoT generated by DeepSeek R1.

    Then, we fine-tuned three versions of the model (using LoRA on llama-3.1 -8 B-instruct), each with various training targets:

    Direct Answer Only: Generate the final response without revealing reasoning. Human Expert CoT: Generate the last response alongside a thinking chain looking like the human specialist's. Synthetic R1 CoT: Generate the final response along with DeepSeek R1's artificial reasoning chain. The table listed below summarizes average accuracy and thinking length:

    - Note: The precision for the 5-shot baseline might differ from numbers reported elsewhere due to various evaluation setups. The key focus is on comparing relative efficiency across distillation techniques, not on beating other designs.

    From this research study, artificial thinking CoTs from DeepSeek R1 appear superior to human-expert CoTs in improving performance, albeit with a higher reasoning expense due to their longer length.

    Fireworks AI Inference and Fine-Tuning Platform

    DeepSeek R1 is available on the Fireworks AI platform. An easy to use distillation user interface will soon belong to FireOptimizer. If you need earlier gain access to, please contact us to explore alternatives.

    Conclusions

    By including reasoning-based information through distillation, companies can drastically improve design performance without bearing the full concern of human-annotated datasets. DeepSeek R1's capability to produce long, high-quality reasoning chains makes it a powerful instructor model-showing that, in some cases, the device might just out-teach the human.