Jump to content

DeepSeek

From Wikipedia, the free encyclopedia

DeepSeek
Native name
杭州深度求索人工智能基础技术研究有限公司
Company typePrivate
IndustryInformation technology
FoundedMay 2023; 1 year ago (2023-05)
Founder
  • Liang Wenfeng
HeadquartersHangzhou, Zhejiang, China
Key people
  • Liang Wenfeng (CEO)
OwnerHigh-Flyer
Websitedeepseek.com

DeepSeek (Chinese: 深度求索; pinyin: Shēndù Qiúsuǒ) is a Chinese artificial intelligence (AI) firm and family of Large Language Models based in Hangzhou. It is founded and backed by the Chinese hedge fund, High-Flyer. It has released its models as open source. The latest version, DeepSeek-V3, is competitive with other LLMs released in 2024 such as that of Qwen and OpenAI.

Background

[edit]

In 2015, High-Flyer was set up by three engineers from Zhejiang University who began trading as students during the 2007–2008 financial crisis. The firm made use of machine learning to trade stocks.[1] In 2019 it established High-Flyer AI which was dedicated to research on AI algorithms and its basic applications.[2] By 2021, all of High-Flyer's strategies were using AI which drew comparisons to Renaissance Technologies.[3]

In April 2023, High-Flyer announced it would form a new independent body to research artificial general intelligence. It would not be used for stock trading and would be separate from High-Flyer's financial business.[4] In May 2023, the company was launched as DeepSeek.[2] DeepSeek's development is funded by High-Flyer.[3]

After releasing DeepSeek-V2 in May 2024 which offered strong performance for a low price, DeepSeek became known as the catalyst for China's AI model price war. It was quickly dubbed the "Pinduoduo of AI", and other major tech giants such as ByteDance, Tencent, Baidu, and Alibaba also had to start cutting the price of their AI models. Despite the low price charged by DeepSeek it was profitable compared to its rivals that were losing money.[5]

So far DeepSeek is focused only on research and has no detailed plans for commercialization.[5]

Release history

[edit]

On 2 November 2023, DeepSeek unveiled its first model DeepSeek Coder which was free for commercial use and fully open source.[6]

On 29 November 2023, DeepSeek launched DeepSeek LLM (large language model) which scaled up to 67B parameters. It developed to compete with other LLMs available at the time with a performance approaching that of GPT-4. However it faced challenges in computational efficiency and scalability.[6] A chat version of the model called DeepSeek Chat was also released.[7]

In May 2024, DeepSeek-V2 was launched. Financial Times reported that it was cheaper than its peers with a price of 2 RMB for every million output tokens. University of Waterloo Tiger Lab's leaderboard ranked DeepSeek-V2 seventh on its LLM ranking.[3]

In November 2024, DeepSeek R1-Lite-Preview was released which was designed to excel in tasks requiring logical inference, mathematical reasoning, and real-time problem-solving. DeepSeek claimed it exceeded performance of OpenAI o1 on benchmarks such as American Invitational Mathematics Examination (AIME) and MATH.[8] However The Wall Street Journal stated when it used 15 problems from the 2024 edition of AIME, OpenAI o1 reached the solutions faster than DeepSeek R1-Lite-Preview.[9]

In December 2024, DeepSeek-V3 was launched. It came with 671 billion parameters and trained in around 55 days at a cost of US$5.58 million, using significantly less resources compared to its peers. It was trained on a dataset of 14.8 trillion tokens. Benchmark tests showed it outperformed Llama 3.1 and Qwen 2.5 while matching GPT-4o and Claude 3.5 Sonnet.[10][11][12] DeepSeek's optimization on limited resources highlighted potential limits of US sanctions on China's AI development.[13]

The model is a mixture of experts with Multi-head Latent Attention Transformer, with 256 routed experts and 1 shared expert. Each token activates 37B parameters.[14]

stage cost (thousand GPU-hour) cost (million $)
pretraining 2664 5.328
context extension 119 0.24
finetuning 5 0.01
total 2788 5.576

See also

[edit]

References

[edit]
  1. ^ "Billions Going to China's Quants Takes Fight to Global Funds". Bloomberg News. 31 May 2020. Archived from the original on 25 May 2022. Retrieved 28 December 2024.
  2. ^ a b Ottinger, Lily (9 December 2024). "Deepseek: From Hedge Fund to Frontier Model Maker". ChinaTalk. Archived from the original on 28 December 2024. Retrieved 28 December 2024.
  3. ^ a b c McMorrow, Ryan; Olcott, Eleanor (9 June 2024). "The Chinese quant fund-turned-AI pioneer". Financial Times. Archived from the original on 17 July 2024. Retrieved 28 December 2024.
  4. ^ Yu, Xu (17 April 2023). "[Exclusive] Chinese Quant Hedge Fund High-Flyer Won't Use AGI to Trade Stocks, MD Says". Yicai Global. Archived from the original on 31 December 2023. Retrieved 28 December 2024.
  5. ^ a b Schneider, Jordan (27 November 2024). "Deepseek: The Quiet Giant Leading China's AI Race". ChinaTalk. Retrieved 28 December 2024.
  6. ^ a b Se, Ksenia (28 August 2024). "Inside DeepSeek Models". Turing Post. Archived from the original on 18 September 2024. Retrieved 28 December 2024.
  7. ^ Sharma, Shubham (1 December 2023). "Meet DeepSeek Chat, China's latest ChatGPT rival with a 67B model". VentureBeat. Archived from the original on 23 December 2024. Retrieved 28 December 2024.
  8. ^ Franzen, Carl (20 November 2024). "DeepSeek's first reasoning model R1-Lite-Preview turns heads, beating OpenAI o1 performance". VentureBeat. Archived from the original on 22 November 2024. Retrieved 28 December 2024.
  9. ^ Huang, Raffaele (24 December 2024). "Don't Look Now, but China's AI Is Catching Up Fast". The Wall Street Journal. Archived from the original on 27 December 2024. Retrieved 28 December 2024.
  10. ^ Jiang, Ben (27 December 2024). "Chinese start-up DeepSeek's new AI model outperforms Meta, OpenAI products". South China Morning Post. Archived from the original on 27 December 2024. Retrieved 28 December 2024.
  11. ^ Sharma, Shubham (26 December 2024). "DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch". VentureBeat. Archived from the original on 27 December 2024. Retrieved 28 December 2024.
  12. ^ Wiggers, Kyle (26 December 2024). "DeepSeek's new AI model appears to be one of the best 'open' challengers yet". TechCrunch.
  13. ^ Shilov, Anton (27 December 2024). "Chinese AI company's AI model breakthrough highlights limits of US sanctions". Tom's Hardware. Archived from the original on 28 December 2024. Retrieved 28 December 2024.
  14. ^ DeepSeek-AI; Liu, Aixin; Feng, Bei; Xue, Bing; Wang, Bingxuan; Wu, Bochao; Lu, Chengda; Zhao, Chenggang; Deng, Chengqi (27 December 2024), DeepSeek-V3 Technical Report, doi:10.48550/arXiv.2412.19437, retrieved 30 December 2024
[edit]