Distill information

1. Overview

DeepSeek-R1 is an open-source AI model developed by a Chinese company, focusing on reasoning capabilities and designed to compete directly with OpenAI o1. In various tests, DeepSeek-R1 demonstrated performance on par with or exceeding OpenAI o1, especially in mathematical tasks and multi-step reasoning.

However, OpenAI o1 holds an advantage in certain areas such as programming and general knowledge. Notably, DeepSeek-R1 is significantly more cost-effective and allows community customization, whereas OpenAI o1 remains a proprietary model.

2. Detailed Test Results

2.1. Mathematics

TestDeepSeek-R1OpenAI o1-1217Remarks
AIME 202479.8%79.2%DeepSeek-R1 excels in complex mathematical reasoning.
MATH-50097.3%96.4%DeepSeek-R1 performs exceptionally well in high school-level mathematics.

2.2. Programming

TestDeepSeek-R1OpenAI o1-1217Remarks
Codeforces96.3%96.6%OpenAI o1 has a slight edge, but the difference is minor.
SWE-bench Verified49.2%48.9%DeepSeek-R1 has a slight advantage in software engineering reasoning.

2.3. General Knowledge

TestDeepSeek-R1OpenAI o1-1217Remarks
GPQA Diamond71.5%75.7%OpenAI o1 leads in factual knowledge.
MMLU90.8%91.8%OpenAI o1 has a slight advantage in multi-domain knowledge.

- DeepSeek-R1 can explain its reasoning process, which is crucial for fields requiring transparency, such as research and complex decision-making.

- DeepSeek-R1 is open-source, allowing developers and researchers to customize and use it as needed, whereas OpenAI's model is proprietary.

- DeepSeek-R1 is significantly more cost-effective than OpenAI o1.

3. Distilled Models Performance

image

Purpose of Distillation:

  • Distillation in AI refers to the process of creating smaller, more efficient models from larger ones, retaining most of their reasoning capabilities while reducing computational requirements. DeepSeek has applied this technique to create a series of distilled models from R1, utilizing the Qwen and Llama architectures.

Distilled Models Based on Qwen:

  • DeepSeek-R1-Distill-Qwen-1.5B:
    • The smallest distilled model, achieving 83.9% on MATH-500.
    • Low performance on LiveCodeBench (16.9%), indicating limited programming capabilities.
    • Suitable for basic mathematical tasks but not strong in coding.
  • DeepSeek-R1-Distill-Qwen-7B:
    • High score on MATH-500 (92.8%), demonstrating strong mathematical reasoning.
    • Moderate performance on GPQA Diamond (49.1%), balancing math and factual knowledge.
    • Not optimized for complex coding tasks (LiveCodeBench 37.6%, CodeForces 1189 rating).
  • DeepSeek-R1-Distill-Qwen-14B:
    • Strong performance on MATH-500 (93.9%), handling complex mathematical problems.
    • Score of 59.1% on GPQA Diamond, showing reasoning based on factual knowledge.
    • Performance on LiveCodeBench (53.1%) and CodeForces (1481 rating) indicates room for improvement in coding tasks.
  • DeepSeek-R1-Distill-Qwen-32B:
    • The largest Qwen-based model, achieving the highest score on AIME 2024 (72.6%), excelling in advanced multi-step reasoning.
    • Outstanding results on MATH-500 (94.3%) and GPQA Diamond (62.1%), showing strength in both mathematical reasoning and factual knowledge.
    • Results on LiveCodeBench (57.2%) and CodeForces (1691 rating) indicate flexibility but not optimal coding performance.

Distilled Models Based on Llama:

  • DeepSeek-R1-Distill-Llama-8B:
    • Strong performance on MATH-500 (89.1%) and moderate on GPQA Diamond (49.0%).
    • Lower scores on coding benchmarks like LiveCodeBench (39.6%) and CodeForces (1205 rating), indicating limitations in programming tasks.
  • DeepSeek-R1-Distill-Llama-70B:
    • The largest distilled model, excelling in MATH-500 (94.5%), the best among all distilled models.
    • Achieved 86.7% on AIME 2024, making it an excellent choice for advanced mathematical reasoning.
    • Strong coding performance (LiveCodeBench 57.5%, CodeForces 1633 rating), comparable to OpenAI o1-mini or GPT-4o.
  • Qwen-based distilled models focus on **efficiency and scalability**, balancing performance and computational cost.
  • Llama-based distilled models prioritize **high performance** and advanced reasoning, excelling in tasks requiring mathematical and factual accuracy.

4. Operating Cost

Cost TypeDeepSeek-R1OpenAI o1
Cached Input$0.14 per million tokens$7.50 per million tokens
Input$0.55 per million tokens$15.00 per million tokens
Output$2.19 per million tokens$60.00 per million tokens
  • DeepSeek-R1 is significantly cheaper across all cost categories.
  • Example: Output costs for DeepSeek-R1 are $2.19 per million tokens, compared to $60.00 per million tokens for OpenAI o1.
  • This makes DeepSeek-R1 a better option for cost-sensitive applications, including large-scale AI projects or startups with limited budgets.

5. DeepSeek Deployment Requirements

Modeln_paramweight sizevRAM/ neuron RAMSpeed (response_token/second)
DeepSeek-R1671B~1,342 GB
DeepSeek-R1-Distill-Qwen-1.5B1.8B1.1 GB~0.7 GB218.42 (RTX 6000 ada)
DeepSeek-R1-Distill-Qwen-7B7.6B4.7 GB~3.3 GB
DeepSeek-R1-Distill-Qwen-14B14.8B9.0 GB~6.5 GB
DeepSeek-R1-Distill-Qwen-32B32.8B19 GB~14.9 GB
DeepSeek-R1-Distill-Llama-8B8.0B4.9 GB~3.7 GB
DeepSeek-R1-Distill-Llama-70B70.6B42 GB~32.7 GB