Chinese version ChatGPT — Hype or Hope?
KEYWORDS: ChatGPT, Google, Microsoft, AI, Baidu, Alibaba, AIGC, Bytedance, ChineseChatGPT, Data, Computing
ChatGPT has become a global phenomenon. Launched as a chatbot by OpenAI in November 2022, ChatGPT has become the fastest-growing consumer application in history, with 100 million monthly active users in two months.
On 6th, February, Google launched its own chatbot, Bard. The next day, Microsoft launched the new Bing, a search engine with ChatGPT built-in.
As ChatGPT gained momentum, the Chinese Internet giants followed suit and joined the battlefield.
On 7th, February, Baidu (Nasdaq: BIDU) officially confirmed that the name of its ChatGPT-like project had been confirmed as Wenxin Yiyan (文心一言), or ERNIE Bot in English.
The next day, new players joined the ChatGPT battle, with Alibaba (Nasdaq: BABA), Bytedance, JD.com (Nasdaq: JD), etc. successively announcing that they have developed or plan to develop ChatGPT-like related products.
At the moment, it has become a hot topic of public discussion as to who can be the first to launch the most popular Chinese version ChatGPT.
1. Is ChatGPT enough for the Chinese market?
Well, we doubt it.
First of all, the current ChatGPT may not work well in a Chinese context.
This is because, OpenAI’s API business, at least so far, is not friendly to Chinese users and does not officially support Chinese API calls.
Moreover, even if Chinese companies could call it, ChatGPT or GPT-3 is mainly trained with English data, and there is still a performance gap between the Chinese language and English.
As a result, ChatGPT will provide content that looks plausible but is the wrong answer to a number of knowledge-based questions in the Chinese context.
For instance, when we asked ChatGPT to continue the last forty chapters of “Dream of the Red Chamber”, the results of the continuation were very different. ChatGPT even misspelled the names of characters in the story.
When asked which verses in a Chinese poem describe a certain scene, ChatGPT was able to answer correctly but surprisingly, these two lines do not exist in the poem.
If it has been asked to write a seven-character quatrain about Internet technology, it would not have understood the relevant rules (four lines in the poem, seven characters each).
For today’s Chinese market, not only do they need more advanced international AI tools, but they also need tools that are better suited to the Chinese market.
After all, most previous AI bots were based on English, and although they had Chinese language capabilities, their overall performance and familiarity with Chinese were inferior to those of Chinese companies.
2. From Data to Computing, how is China’s AI performing?
Because ChatGPT relies on a large amount of computing and data for training. Therefore, it is necessary to discuss how China’s AI and related companies are performing.
The 2022–2023 China Artificial Intelligence Computing Power Development Assessment Report, jointly published by IDC and Inspur, points out that China’s AI computing power continues to grow rapidly, reaching 268 trillion instructions per second (EFLOPS) in 2022, surpassing the scale of general-purpose computing power. The compound annual growth rate of China’s intelligent computing power scale is expected to reach 52.3% over the next five years.
IDC forecasts that the scale of intelligent computing in China will continue to grow at a rapid pace, with China’s intelligent computing scale expected to reach 1271.4 EFLOPS by 2026, a compound annual growth rate of 52.3% over the next five years, compared to a compound annual growth rate of 18.5% for general-purpose computing scale over the same period.
The “intelligence” of AI comes from the continuous training of the underlying data sources and therefore requires high-quality native content to learn.
According to a research note from BOC International, large platform companies with data and capital advantages are expected to benefit more than AI software companies.
According to information provided by Ali, Alibaba DAMO Academy has leading technical capabilities in AI, large models and other underlying technologies required for ChatGPT, and has previously launched the M6 large model with a scale of 10 trillion beyond Google and Microsoft, and the AI model open source community “Model Scope”.
For example, Baidu’s Wenxin (文心) series of large models with a parameter scale of 260 billion, is the world’s largest single model in Chinese and has gained universal application in various industries.
In addition, Baidu has built one of the world’s largest knowledge maps, with over 5 billion entities and 550 billion facts, which are constantly being developed and updated.
JD Cloud & AI, from JD.com, interacts with users 10 million times a day, allowing the algorithm to be iteratively updated in a timely manner.
3. Chinese giants reveal their ChatGPT Roadmap
Baidu says that Its ChatGPT-like project, ERNIE Bot (文心一言), will complete internal testing and open it up to the public in March. The product is currently being prepared for a pre-launch sprint.
Bytedance’s Artificial Intelligence Lab (AI Lab) has conducted research and development similar to ChatGPT and AIGC and may provide technical support for PICO in the future.
According to sources familiar with the matter, PICO’s current business development is less than expected, and for this reason, Bytedance AI Lab will carry out more exploration in VR content creation. It is reported that Bytedance AI Lab was established in 2016, and its research areas mainly involve natural language processing, data mining, machine learning, speech and audio, etc.
The Ali version of ChatGPT, a chatbot, is under development and is currently being tested internally. Ali’s version of the ChatGPT conversation bot will be deeply integrated with DingTalk.
Tencent said that it has already laid out the relevant directions, and special research is being carried out in an orderly manner. Tencent will continue to invest in the research and development of cutting-edge technologies such as AI, and will further carry out cutting-edge research and application exploration based on its previous technical reserves in AI large-scale models, machine learning algorithms and NLP.
On February 10, JD Cloud & AI announced that its Yanxi (言犀) AI application platform will launch an industrial version of ChatGPT: ChatJD, points into its products and services to promote the industrial implementation of AI.
NetEase Youdao may launch ChatGPT homologation technology products in the future, with application scenarios around online education.
Founder Securities pointed out in its research report that China is currently a world leader in natural language understanding and related AI technologies, with domestic AI majors such as Baidu, Tencent Youtu, Alibaba, SenseTime, Kuaishou, Bytedance, NetEase, iFLYTEK increasing their investment in the AIGC field. With the rapid development of AI technology, AI technology providers, especially NLP (natural language processing) head providers will be the first to benefit.
It is well known that algorithms, computing power and data are the “three pillars” of the first stage of technological development in deep learning. With ChatGPT, now we are entering into a new stage of “large language models (LLMs)”. In this new stage, whoever has the massive data and computing resources will be able to build LLMs.
Right after the LLMs based on English language data, Chinese language data will undoubtedly be a new battleground, so let’s wait and see.