ZhiYuan Research Institute Jointly Builds Chinese Internet Corpus CCI to Provide Resources for Big Data and Artificial Intelligence Industries


At the 2024 Beijing Cultural Forum, the Beijing Academy of Artificial Intelligence (BAAI) officially announced the release of the next-generation Chinese Internet corpus CCI3.0 (Chinese Corpora Internet), further promoting data co-construction and sharing. CCI3.0 includes a 1000GB dataset and a 498GB high-quality subset CCI3.0-HQ, marking another important update following the initial open-source release of CCI1.0 in November 2023 and the release of CCI2.0 in April 2024.
Sunshine Intelligent Technology Co., Ltd. has a registered capital of 700 million yuan. Its scope of business includes big data services, internet security services, and the development of artificial intelligence application software. The company is wholly owned by Sunshine Insurance. The intelligent technology company involves multiple AI businesses, showcasing potential for future development.
Big data and large models are areas of focus for the Worth Buying company, which is developing the 'Worth Buying Consumption Content Large Model' based on a general large model. The company aims to enhance the efficiency of platform search and content distribution through big data. Their product database has recorded nearly 220,000 brands and 11.23 million aggregated products. The large model will be applied to various products based on the 'Worth Buying Consumption Content Large Model.'
DingTalk launches the new AI hardware product DingTalk A1Pro, priced at 1299 CNY. It is positioned as a professional AI audio card, specifically designed for frequent business travelers. The device has a thickness of only 6.4mm, supports magnetic attachment and touchscreen, and is equipped with a professional-grade MEMS directional microphone. It features the "AI Office + Emergency Power Supply" integrated functions, expanding the boundaries of DingTalk's integrated software and hardware services.
As Generative AI sweeps through the programming field, the Zig open-source project has introduced a strict policy in the opposite direction: completely prohibiting the use of code or comments generated by large language models for contributions. After Simon Willison's interpretation, it sparked a discussion within the community about the trade-off between technical efficiency and talent development. The core conflict lies in the choice between code production and talent growth. The Zig maintainers redefined 'contributions,' emphasizing originality and the learning process.