1 Distillation with Reasoning: can DeepSeek R1 Teach Better Than Humans?
christinewhitt edited this page 2025-02-11 21:54:33 +08:00


Inclusion of reasoning "chains of idea" (CoT) in the model output substantially improves its quality, however it increases inference expense.

  1. A human expert's chain of thought.
  2. The last response.

    We expanded this dataset by including:

    Synthetic R1 reasoning, i.e., the CoT generated by DeepSeek R1.

    Then, we fine-tuned 3 variations of the model (using LoRA on llama-3.1 -8 B-instruct), each with different training targets:

    Direct Answer Only: Generate the last answer without showing thinking. Human Expert CoT: Generate the last answer alongside a reasoning chain resembling the human specialist's. Synthetic R1 CoT: Generate the final answer along with R1's artificial thinking chain. The table listed below summarizes average precision and thinking length:

    - Note: The precision for the 5-shot baseline might vary from numbers reported elsewhere due to various examination setups. The essential focus is on comparing relative performance throughout distillation methods, not on beating other designs.

    From this study, artificial reasoning CoTs from DeepSeek R1 appear superior to human-expert CoTs in improving efficiency, albeit with a higher inference cost due to their longer length.

    Fireworks AI Inference and Fine-Tuning Platform

    DeepSeek R1 is available on the Fireworks AI platform. An easy to use distillation interface will soon become part of FireOptimizer. If you require earlier gain access to, please contact us to check out options.

    Conclusions

    By including reasoning-based information through distillation, companies can significantly enhance model performance without bearing the full concern of human-annotated datasets. DeepSeek R1's capability to produce long, top quality thinking chains makes it a powerful instructor model-showing that, sometimes, the device might just out-teach the human.