Clone
1
Distillation with Reasoning: can DeepSeek R1 Teach Better Than Humans?
jaygiltner6791 edited this page 2025-02-10 20:37:48 +08:00


Inclusion of reasoning "chains of idea" (CoT) in the model output substantially improves its quality, however it increases reasoning expense.

  1. A human specialist's chain of idea.
  2. The final response.

    We broadened this dataset by adding:

    Synthetic R1 reasoning, i.e., the CoT generated by DeepSeek R1.

    Then, we fine-tuned 3 variants of the model (utilizing LoRA on llama-3.1 -8 B-instruct), each with various training targets:

    Direct Answer Only: Generate the final response without showing thinking. Human Expert CoT: Generate the last answer together with a thinking chain resembling the human expert's. Synthetic R1 CoT: Generate the last answer together with DeepSeek R1's synthetic reasoning chain. The table listed below summarizes average accuracy and reasoning length:

    - Note: The precision for the 5-shot standard may differ from numbers reported in other places due to various assessment setups. The essential focus is on comparing relative performance throughout distillation approaches, not on beating other models.

    From this research study, artificial reasoning CoTs from DeepSeek R1 appear exceptional to human-expert CoTs in boosting efficiency, albeit with a higher inference expense due to their longer length.

    Fireworks AI Inference and Fine-Tuning Platform

    DeepSeek R1 is available on the Fireworks AI platform. An easy to use distillation interface will quickly belong to FireOptimizer. If you require earlier gain access to, please get in touch to check out options.

    Conclusions

    By incorporating reasoning-based information through distillation, companies can drastically improve model without bearing the complete problem of human-annotated datasets. DeepSeek R1's ability to produce long, high-quality reasoning chains makes it an effective teacher model-showing that, in many cases, the machine may just out-teach the human.