1 Distillation with Reasoning: can DeepSeek R1 Teach Better Than Humans?
lottiemcness41 edited this page 1 month ago


Inclusion of thinking "chains of thought" (CoT) in the model output significantly enhances its quality, but it increases inference expense.

  1. A human expert's chain of idea.
  2. The last answer.

    We broadened this dataset by including:

    Synthetic R1 thinking, i.e., the CoT created by DeepSeek R1.

    Then, bytes-the-dust.com we fine-tuned 3 variants of the model (using LoRA on llama-3.1 -8 B-instruct), wiki-tb-service.com each with different training targets:

    Direct Answer Only: Generate the last answer without revealing thinking. Human Expert CoT: Generate the final response alongside a thinking chain looking like the human specialist's. Synthetic R1 CoT: Generate the final response alongside DeepSeek R1's artificial reasoning chain. The table below sums up average accuracy and thinking length:

    - Note: The accuracy for the 5-shot baseline might vary from numbers reported elsewhere due to various examination setups. The key focus is on comparing relative efficiency across distillation approaches, not on beating other models.

    From this study, artificial reasoning CoTs from DeepSeek R1 appear exceptional to human-expert CoTs in improving efficiency, albeit with a higher inference cost due to their longer length.

    Fireworks AI Inference and Fine-Tuning Platform

    DeepSeek R1 is available on the Fireworks AI platform. An easy to use distillation interface will soon be part of FireOptimizer. If you need earlier gain access to, please contact us to check out alternatives.

    Conclusions

    By incorporating reasoning-based data through distillation, organizations can drastically enhance model efficiency without bearing the complete problem of human-annotated datasets. DeepSeek R1's capability to produce long, premium reasoning chains makes it a powerful instructor model-showing that, in many cases, the machine might just out-teach the human.