What is Deep Seek?

DeepSeek is a Chinese startup specializing in the development of open-source AI models and applications. Currently, there are two primary applications under the DeepSeek umbrella:

  1. DeepSeek-V3: A large language model that forms the foundation for DeepSeek’s AI capabilities.

  2. DeepSeek-R1: A reasoning model built on top of DeepSeek-V3, designed to enhance logical and analytical tasks.

Together, DeepSeek-V3 (the large language model) and DeepSeek-R1 (the reasoning model) represent the core of DeepSeek’s innovative AI solutions.

What does open-source mean for DeepSeek R1?

DeepSeek R1’s weights are freely accessible, allowing developers worldwide to use and modify them. Hosting the model on private servers ensures data sovereignty, preventing transmission to China if operated outside Chinese jurisdiction.

Does DeepSeek require supervised fine-tuning?

Unlike traditional AI models, DeepSeek R1 challenges the need for supervised fine-tuning, developing R1-Zero, which eliminates this stage, questioning its necessity.

Is pre-training essential for AI?

DeepSeek questions conventional pre-training, which demands vast datasets and computational resources. As experts predict a future scarcity of quality training data, alternative methods become crucial.

What were the training costs for DeepSeek?

Training required 2,788 thousand H800 GPU hours, costing approximately $5.576 million ($2/GPU hour). DeepSeek, however, notes that the costs only include the costs of the official training of V3, excluding the costs associated with prior research and experiments of architecture, algorithms and data (DeepSeek-V3, p.5).

What makes DeepSeek unique?

  • Uses a mixture of expert models instead of a single massive model, reducing parameters from trillions to billions.
  • Implements quantization, cutting weights from 32-bit to 8-bit, enhancing efficiency.
  • Tokenizes phrases instead of words, improving processing speed.


By itnews