Return to the Library

Artificial Intelligence-Generated Content (AIGC) White Paper [Excerpt]


This white paper from the China Academy of Information and Communications Technology, a key research institution advising the government on science and technology issues, explores the potential impacts of generative AI. Written before the launch of ChatGPT, the paper focuses on applications around the consumer experience – in terms of e-commerce, film and TV, and news and broadcasting.

FacebookTwitterLinkedInEmailPrintCopy Link
Original text
English text
See an error? Drop us a line at
View the translated and original text side-by-side

III. AIGC application scenarios


Against the backdrop of the protracted and recurring COVID-19 pandemic, demand for digital content in various industries has seen an upsurge, and there is an urgent need to narrow the gap between the consumption and supply of content in the digital world. With its verisimilitude, diversity, controllability, and ease of composition, AIGC is expected to help enterprises improve the efficiency of content production, as well as provide them with more richly diverse, dynamic, and interactive content, and it may help highly digitalized industries with abundant demand for content, such as media, e-commerce, film and television, and entertainment, achieve significant innovation-based development.


Figure 2.  AIGC applications

图 2 AIGC应用视图

Source: CAICT

(i) AIGC + media: Production based on human-computer collaboration is promoting media convergence


With the accelerating increase of the global informatization level in recent years, the integrated development of AI and the mass media industry has continued to progress. AIGC is a new method of content production that comprehensively empowers content production for media. Applications such as writing bots, interview assistants, video subtitle production, audio-visual broadcasting, video compilation, and AI-synthesized hosts and anchors have appeared continuously, and have penetrated the whole process from collection and editing to broadcasting. They have profoundly changed how media content is generated, becoming an important force driving media convergence.


Collection and editing: (1) The achievement of recorded voice transcription has improved the work experience of media workers. Transcribing recorded speech into text with the help of speech recognition technology effectively compresses the repetitive work of recording and organizing during the article production process, further assuring the timeliness of news. During the 2022 Winter Olympics, iFlytek’s smart voice recorder helped reporters quickly produce articles in two minutes through cross-language voice transcription. (2) The achievement of intelligent news writing has improved the timeliness of news information. Algorithm-based automatic news writing automates some of the labor-intensive gathering and editing work, helping media produce content faster, more accurately, and with greater intelligentization. For example, Quakebot, a robotic reporter of the Los Angeles Times website, wrote and published relevant news only three minutes after the earthquake in Los Angeles in March 2014. Wordsmith, an intelligent writing platform used by the Associated Press, can write 2,000 reports per second. The China Earthquake Network’s writing robot finished compiling and distributing news within seven seconds after the 2017 Jiuzhaigou earthquake. Yicai Media Group’s “DT Draft King” can write 1,680 words a minute.1 (3) The achievement of intelligent video editing has improved the value of video content. The use of intelligent video editing tools such as video subtitle generation, video compilation, video topic segmentation, and video super-resolution efficiently saves manpower and time costs and maximizes the value of copyrighted content. During the 2020 National People’s Congress, the People’s Daily used an “intelligent cloud video editor” to quickly generate videos, and was able to achieve technical operations such as automatic subtitle matching, real-time character tracking, fixing of picture flutter, and speedy switching between horizontal and vertical screen orientations, so as to meet the requirements of multi-platform distribution.2 During the 2022 Winter Olympics, CCTV Video used an “AI intelligent content production editing system” to efficiently produce and distribute video highlights of the Winter Olympics ice and snow events, creating more possibilities for the in-depth value development of copyrighted sports media content.


In terms of dissemination, AIGC applications are concentrated mainly in areas such as news broadcasting, with AI-synthesized anchors as the core application. AI-synthesized anchors have created a precedent for real-time voice and character animation synthesis in the field of news. It is only necessary to enter the text content that needs to be broadcast. The computer will then generate the corresponding news video with an AI-synthesized anchor reporting, and ensure that the character in the video maintains naturally consistent sound, expressions, and lip movements, displaying the same effectiveness in conveying information as real anchors. Looking at the application of AI-synthesized anchors in the media area, three characteristics are apparent. (1) The scope of application is expanding. At present, Xinhua News Agency, CCTV, People’s Daily, and other national media, and Hunan TV and other provincial and municipal media, have begun to actively layout the application of AI-synthesized anchors, launching virtual news hosts including “Xin Xiaowei” and “Little C.” They have also pushed their application of the technology from news broadcasting to a wider range of scenarios such as special show hosting, reporting, weather forecasting, etc., and profoundly empowered the dissemination of major events such as the National People’s Congress, the Winter Olympics, and the Winter Paralympics. (2) Application scenarios are being upgraded. In addition to regular news broadcasting, a series of AI-synthesized anchors have begun supporting multilingual broadcasting and sign language broadcasting. During the 2020 National People’s Congress, a multilingual virtual anchor used Chinese, Korean, Japanese, English, and other languages for news reporting, realizing the broadcasting of one voice in multiple languages, delivering Chinese news to the world, and conforming to the trend of information sharing in the information age.3 During the 2022 Winter Olympics, Baidu, Tencent, and other enterprises launched broadcasting with sign language by “digital humans” to provide sign language commentary for millions of hearing-impaired users, further promoting the progress of barrier-free watching of the games. (3) Application forms are being perfected. In terms of imagery, there has been a gradual expansion from 2D to 3D; in terms of actuation, the range has begun to extend from the mouth to facial expressions, limbs, fingers, and background content material; and in terms of content construction, the direction is from supporting SaaS-based platform tool construction to exploring intelligentized production. For example, Tencent’s “Lingyu” 3D digital sign language interpreter achieves the generation of lip movements, facial expressions, body movements, finger movements, etc., and works with a visualized action editing platform which supports the fine-tuning of sign language actions.


AIGC is having a profound impact on media organizations, media practitioners, and media audiences. For media organizations, incorporating AIGC into the news production process greatly improves production efficiency and brings new visual and interactive experiences. It enriches the forms of news reporting, accelerates the digital transformation of media, and promotes the transformation of media to smart media. For media practitioners, AIGC can help them produce more humanistic, socially significant, and economically valuable news. It can also automate part of the laborious work of news gathering, editing, and broadcasting, so that they can focus more on work that requires in-depth thinking and creativity, such as news features, in-depth reports, special reports, and other niche areas that have greater need for the strengths human beings have in the precise analysis of things, the proper handling of emotional elements, and other aspects. For media audiences, the application of AIGC enables them to obtain news content in richer and more diverse forms in a shorter period of time, which improves the timeliness and convenience of obtaining news information. It also lowers the technical barriers of the media industry, which prompts media audiences to have more opportunities to participate in the production of content, greatly enhancing their sense of participation.


(ii) AIGC + e-commerce: Promoting the blending of virtual and real, and creating an immersive experience


With the development and application of digital technology and the upgrading and acceleration of consumption, the field of e-commerce is developing in the direction of immersive shopping experience. AIGC is accelerating the construction of 3D merchandise models, virtual brand representatives, and even virtual showrooms, and through combination with new technologies like augmented reality (AR) and virtual reality (VR), an immersive shopping experience is achieved with audio-visual and other kinds of multi-sensory interaction.


Generation of 3D product models for product display and virtual trial is enhancing the online shopping experience. Based on images of products from different angles, 3D geometric models and textures of merchandise are automatically generated with the help of visual generation algorithms. Supplemented by online virtual “seeing and trying on,” they provide a differentiated online shopping experience close to the real thing, helping to efficiently boost user conversion. Baidu, Huawei, and other enterprises have launched automated 3D product modeling services, supporting the completion of 3D product photo shooting and generation in minutes, with millimeter-level accuracy. Unlike traditional 2D displays, 3D models can show the appearance of a product from all angles, which can significantly reduce the user’s product selection and communication time, enhance the user experience, and quickly facilitate merchandise transactions. At the same time, the 3D product models thus generated can also be used for trying on online, greatly restoring the experience of trying out goods or services, and allowing consumers more opportunities for exposure to the absolute value of products or services. An example is Alibaba’s 3D Tmall Home Furnishing City, which it put online in April 2021. By providing merchants with 3D design tools and AI-based 3D product model generation services, it helps them quickly build a 3D shopping space. It also supports consumers in doing their own home furnishing, providing them an immersive “cloud shopping” experience. Data show that the average conversion rate of 3D shopping is 70%, 9 times higher than the industry average. The unit price compared with normal guided transactions increased by more than 200%, while the rate of commodity returns and exchanges fell significantly. Many brands have also begun exploring and making attempts in the virtual trial direction. Examples include Uniqlo’s virtual fitting, Adidas’ virtual shoe try-on, Chow Tai Fook virtual jewelry try-on, Gucci virtual try-on for watches and glasses, Ikea’s virtual furniture matching, and Porsche’s virtual test drive.4 Although the traditional manual modeling method is still used, more consumer-level tools are expected to emerge in the future as AIGC technology continues to progress, thus gradually reducing the threshold and cost of 3D modeling and facilitating the large-scale commercialization of virtual try-on applications.


Creating virtual hosts and empowering interactive livestream marketing (“live shopping”) Creating virtual livestream hosts based on visual, voice, and text generation technology provides audiences 24-hour uninterrupted product recommendations and introductions, as well as online service capabilities, and for merchants it lowers the barriers to livestreaming. Compared to a livestream shopping “studio” with real people, virtual hosts have three major advantages: First, virtual hosts can fill the livestreaming gaps left by real hosts, so that the livestream can have non-stop rotation, both providing greater viewing time flexibility and a more convenient shopping experience for users, but also creating greater business growth for participating merchants. For example, the virtual hosts of brands like L’Oreal, Philips, and Perfect Diary generally go online at midnight and do nearly 9 hours of livestreaming, forming with human hosts a 24-hour seamless livestreaming service. Second, the virtualization of brand representatives can accelerate the store or brand rejuvenation process, narrow the distance to new consumer groups, and shape a store’s image for the metaverse era. In the future, it can be extended and applied to more diverse virtual scenarios in the metaverse to achieve multi-sphere dissemination. For example, the makeup brand Carslan launched its own brand virtual image and introduced it into its livestream as the daily virtual host and shopping guide of its Tmall flagship store. At the same time, traditional enterprises that already have a virtual brand IP image can directly utilize the existing image by quickly transforming it into a virtual brand representative. For example, during Haier’s livestream promotion activity in May 2020, the “Haier Brothers” virtual IP with which we are all familiar came to the livestream and interacted with the human host and fans, receiving ten million plays. Third, the persona of a virtual host is more stable and controllable. In situations where a leading brand representative is limited and could undergo a “public persona collapse,” a virtual representative’s persona, words, and deeds are controlled by the brand, so there is stronger controllability and security than with a real star. Brands do not have to worry about the persona of a virtual image collapsing, bringing them negative news, bad reviews, and financial losses.


Empowering online malls and offline showrooms to accelerate their evolution and provide consumers new shopping scenarios. The rapid, low-cost, and high-volume construction of virtual showrooms can be achieved by reconstructing the 3D geometric structure of scenes from 2D images. For businesses, this will effectively reduce the barriers and costs of building 3D shopping spaces. For some industries that originally relied heavily on offline stores, it has opened up room for imagining online-offline fusion, while for consumers it is providing a new online-offline fusion consumer experience. Some brands have already begun to try to create virtual spaces. For example, during the 100th anniversary celebration of its brand, luxury goods merchant Gucci moved its offline Gucci Garden Archetypes exhibition to the Roblox online game platform, launching a two-week virtual exhibition with five themed exhibition halls whose contents corresponded to those of the real exhibition. In July 2021, Alibaba showcased its “Buy+” VR project for the first time, offering a 360° virtual shopping site open for shopping experience. In November 2021, Nike and Roblox partnered to launch the Nikeland virtual world open to all Roblox users. Following the successful application of image-based 3D reconstruction technology in Google Maps’ immersive view feature, the automated construction of virtual showrooms will be better applied and developed in the future.

赋能线上商城和线下秀场加速演变,为消费者提供全新的购物场景。通过从二维图像中重建场景的三维几何结构,实现虚拟货场快速、低成本、大批量的构建,将有效降低商家搭建3D购物空间的门槛及成本,为一些原本高度倚重线下门店的行业打开了线上线下融合的想象空间,同时为消费者提供线上线下融合的新消费体验。目前一些品牌已经开始尝试打造虚拟空间。例如奢侈品商Gucci在一百周年品牌庆典时,把线下的Gucci Garden Archetypes展览搬到了游戏Roblox上,推出了为期两周的虚拟展,5个主题展厅的内容与现实展览相互对应。2021年7月,阿里巴巴首次展示了其虚拟现实计划“Buy+”,并提供360°虚拟的购物现场开放购物体验。2021年11月,Nike和Roblox合作,推出虚拟世界Nikeland,并向所有Roblox用户开放。随着基于图像的3D重建技术在谷歌地图沉浸式视图功能中的成功应用,虚拟货场的自动化构建未来将得到更好的应用和发展。

(iii) AIGC + film and television: Expanding the creative space and enhancing the quality of creative works


With the rapid development of the film and TV industry, process problems, from the early creation phase and shooting to post-production, have also emerged. There are development pains, such as a relative lack of high-quality scripts, high production costs, and poor quality of some works, so structural upgrading is urgently needed. The use of AIGC technology can stimulate ideas for film and TV script creation, expand the space for film and TV character and scene creation, and greatly improve the quality of post-production for film and TV products, thereby helping to maximize the cultural and economic value of film and TV works.


AIGC provides new ideas for script creation. By analyzing and summarizing massive script data and producing scripts quickly according to preset styles, with creators then doing the screening and secondary processing, creators are thereby inspired to broaden their creative thinking, and the creative cycle is shortened. Foreign countries have taken the lead in carrying out related attempts. As early as June 2016, New York University used AI to write the screenplay for the movie Sunspring, which, after filming and production, was shortlisted in the top ten in Sci-Fi London’s 48-Hour Challenge competition.5 In 2020, students at Chapman University in the United States used OpenAI’s GPT-3 large language model to create a screenplay and produced the short film Solicitors. Some domestic vertical technology companies have begun to provide services related to intelligent script production, such as Haima Qingfan’s “novel to screenplay” intelligent writing function, which has been of service in more than 30,000 episode scripts, more than 8,000 in-theater or made-for-streaming movie screenplays, including hits like Hi, Mom and The Wandering Earth, and over five million online novels.

AIGC为剧本创作提供新思路。通过对海量剧本数据进行分析归纳,并按照预设风格快速生产剧本,创作者再进行筛选和二次加工,以此激发创作者的灵感,开阔创作思路,缩短创作周期。国外率先开展相关尝试,早在2016年6月,纽约大学利用人工智能编写的电影剧本《Sunspring》,经拍摄制作后入围伦敦科幻电影(Sci-Fi London)48小时挑战前十强。2020年,美国查普曼大学的学生利用OpenAI的大模型GPT-3创作剧本并制作短片《律师》。国内部分垂直领域的科技公司开始提供智能剧本生产相关的服务,如海马轻帆推出的“小说转剧本”智能写作功能,服务了包括《你好,李焕英》《流浪地球》等爆款作品在内的剧集剧本30000多集、电影/网络电影剧本8000多部、网络小说超过500万部。

AIGC expands the space for character and scene creation. First, through AI synthesis of faces, voices, and other related content, it is possible to realize “digital resurrection” of deceased actors, replacement of “bad actors,” synchronization of audio and video in multi-language translations, age-spanning of actors’ roles, and synthesis of difficult action to reduce the impact of actors’ limitations on film and TV productions. For example, in the CCTV documentary China Reinvents Itself, CCTV and Tech iFlytek used AI algorithms to learn the voice data of the late dubbing artist Li Yi’s past documentaries, and synthesized the dubbing according to the documentary’s script. Coupled with post-processing editing and optimization, this ultimately allowed Li Yi’s voice to be reproduced. During the 2020 broadcast of the TV show Healer of Children, an academic scandal of the actor playing the main character adversely affected publicity and distribution. Intelligent video face-switching technology was then used to replace the main actor, thus reducing the losses of the program in the process of creation. In 2021, the British company Flawless launched the visualization tool TrueSync to address the problem of unsynchronized lip shapes of characters in multilingual translations. It can accurately adjust the facial features of actors through AI-based in-depth video synthesis technology to make the actors’ lip shapes match dubbing or subtitles in different languages. Second, virtual physical scenes are being synthesized through AI to generate scenes that are impossible or too costly to film in real life, greatly broadening the boundaries of the imagination for film and TV works and giving audiences better visual effects and auditory experiences. In the 2017 hit Detective Samoyeds, for example, a large number of scenes in the drama were virtually generated through AI technology. Workers collected a large amount of scene information early on, and special effects personnel performed digital modeling to produce simulated shooting scenes, while the actors performed in a green screen studio. By combining real-time matting technology, the actor’s movements were fused with the virtual scenes to finally generate the footage.6


AIGC empowers film and TV editing and boosts post-production. First, it allows the repair and restoration of film and TV images and improves the clarity of image materials, assuring the picture quality of film and TV works. For example, the China Film Digital Production Base and the University of Science and Technology of China jointly developed the “China Film Shensi” AI image processing system, which has successfully restored many film and TV productions such as Amazing China and Street Angel. With the AI Shensi system, the time it takes to restore a movie can be shortened by three-quarters and the cost can be cut in half. Meanwhile, streaming media platforms such as iQIYI, Youku, and Xigua Video have begun to develop AI-based restoration of classic movies and television shows as a new growth area. Second, it achieves film and TV trailer generation. After learning audiovisual techniques from hundreds of thriller trailers, Watson, an AI system under the IBM banner, produced a 6-minute trailer by picking movie scenes from the 90-minute film Morgan that meet the characteristics of a thriller trailer. Although the trailer needed to be reworked by production staff before it was finalized, it shrank the trailer production cycle from about a month to 24 hours. Third, it achieves the automatic conversion of film and TV content from 2D to 3D. The AI-backed automatic 3D content production platform “Zhengrong” launched by DreamWld Tech supports the dimensional conversion of film and TV works, and improves the efficiency of theater-level 3D conversion more than a thousand fold.


(iv) AIGC + entertainment: Expanding boundaries and gaining development momentum


In the digital economy era, entertainment not only brings consumers closer to products and services, but also indirectly satisfies modern people’s desire for a sense of belonging, which is becoming increasingly important. With the help of AIGC technology, through the generation of interesting images, audio, and video, the creation of virtual idols, and the development of digital avatars for consumers, the entertainment industry can rapidly expand its own boundaries radially in ways that are more readily accepted by consumers, thereby gaining new growth momentum.


Generating interesting images, audio, and video to stimulate user participation and enthusiasm. In terms of image and video generation, AIGC applications represented by AI-based face swapping greatly satisfy users’ need for novelty, and become tools for breaking out of the pack. For example, the image video synthesis applications FaceAPP, ZAO, and Avatarify, once launched, immediately went viral to trigger a craze, topping the App Store free download list; an interactive app for generating portraits using 56 ethnic group clothing photos, launched by the People’s Daily New Media Center for the 70th anniversary of the National Day, promptly swept through users’ networks, with more than 738 million photographs synthesized; in March 2020, Tencent launched an activity for taking pictures with (Chinese idol girl group) Rocket Girls 101 within the Game for Peace avatar-driven game. Such interactive content has greatly stimulated user’s emotions and brought about rapid breakthroughs in social communication. In terms of voice synthesis, voice modification increases interactive entertainment. For example, QQ and many other social media software, as well as Game for Peace and many other games, have integrated voice modification functions, allowing users to experience a variety of different voices, such as “big uncle,” “cute girl,” etc., making communication a joyful game.


Creating virtual idols and releasing IP value. First, it allows the co-creation of synthesized songs with users, so as to continuously deepen the adhesion of fans. “Virtual singers,” represented by Hatsune Miku and Luo Tianyi, are virtual characters created based on the VOCALOID voice synthesis engine software. Real people provide the sound source, and then a voice is synthesized by the software, allowing fans to participate in-depth in the co-creation of virtual singers. Take Luo Tianyi as an example: Anyone who creates lyrics and music through the voice library can achieve the effect of “Luo Tianyi singing a song.” In the ten years since Luo Tianyi’s debut on July 12, 2012, musicians and fans have created more than 10,000 works for Luo Tianyi, providing users with more room for imagination and creativity while establishing deeper connections with fans. Second, AI-synthesized audio and video animation supports virtual idol-based content monetization in more diverse scenarios. With the growing maturity of audio and video synthesis, holographic projection, AR, VR, and other technologies, scenarios for monetizing virtual idols have gradually diversified. Virtual idols can now be monetized through concerts, music albums, advertisement endorsements, live broadcasting, and derivative products. At the same time, as the commercial value of virtual idols continues to be revealed, brands are increasingly willing to link with virtual IP. For example, “Ling Ling [翎 Ling], an internet celebrity created jointly by Xmov Ai and Next Generation, debuted in May 2020 and has now cooperated with VOGUE, Tesla, GUCCI, and other brands.


Developing consumer-end user avatars and laying out the consumer metaverse. Since the release of Animoji on Apple cell phones in 2017, the iteration of avatar technology has developed from a single cartoon animal avatar to AI-automated generation of cartoon images of real people, so that users have more creative autonomy and a more lifelike image library. Major technology giants are actively exploring avatar-related applications, accelerating the layout of a future with a grand fusion of the “virtual digital world” and the real world. For example, at the 2020 World Internet Conference, Baidu demonstrated the ability to design dynamic virtual characters based on 3D virtual image generation, virtual image actuation and other AI technologies. You only need to take a photo to quickly generate a virtual image that can mimic the expressions and movement of “you” in a few seconds. In the developer exhibition area of the 2021 Apsara Conference, Alibaba Cloud demonstrated its latest technology—the Cartoon Smart Drawing project. It became a conference hit, attracting nearly 2,000 people to come experience it. Alibaba Cloud’s Cartoon Smart Drawing adopts a hidden variable mapping technology solution. With pictures of a person’s face as input, it can automatically generate virtual images with personal characteristics and discover their distinctive features, such as eye size and nose shape, while also tracking the user’s facial expressions to generate real-time animation, giving ordinary people the opportunity to create their very own cartoon images. In the foreseeable future, avatars serving as the user’s personal identity in the virtual world and vehicle for interaction will be further integrated with people’s productive lives and lifestyles, and will lead to the development of a virtual goods economy.


(v) AIGC + other: Promoting digital-real integration and accelerating industrial upgrading


In addition to the above industries, AIGC applications are also developing rapidly in other industries such as education, finance, healthcare, and manufacturing. In the education field, AIGC is breathing new life into educational materials. Compared with traditional methods such as reading and lectures, AIGC provides educators with new tools to deliver knowledge to students in more vivid and convincing ways by making originally abstract and flat textbooks concrete and three-dimensional. For example, videos can be made of historical figures talking directly to students, injecting new vitality into an unappealing lecture; realistic virtual teachers can be synthesized to make digital teaching more interactive and interesting, and so on. In the financial field, AIGC is helping to achieve cost reductions and efficiencies. On one hand, through AIGC, automated production of financial information and product introduction video content can be achieved to improve the efficiency of financial institutions’ content operations. On the other hand, AIGC can be used to shape dual-channel (audio and video) customer service with virtual digital humans, bringing greater warmth to financial services. In the medical field, AIGC is empowering the whole diagnosis and treatment process. AIGC can be used in assisted diagnosis to improve the quality of medical images, enter electronic medical records, etc., completing the liberation of doctors’ intelligence and energy so that their resources can be focused on the core business, thereby improving the business ability of doctor groups. In terms of rehabilitation therapy, AIGC can synthesize speech audio for people who have lost their voices, synthesize limb projections for people with disabilities, and synthesize non-aggressive medical accompaniment for patients with mental illnesses. It can comfort patients in a humanized ways, thereby soothing their emotions and accelerating their recovery. In the manufacturing field, AIGC is increasing industrial efficiency and value. First, incorporating computer-aided design (CAD) greatly shortens the engineering design cycle. By automating repetitive, time-consuming, and low-level tasks in engineering design, AIGC can shrink to a matter of minutes engineering design tasks that would otherwise take thousands of hours. It also supports the generation of derived designs to provide inspiration for engineers and designers. In addition, it supports the introduction of changes in designs to achieve dynamic simulation. For example, for its BMW VISION NEXT 100 concept car, BMW used AIGC-assisted design to develop the car’s interior and its dynamic and functional exterior skin. Second, it accelerates the construction of digital twin systems. By quickly transforming digital geometries formed based on physical environments into real-time parametric 3D modeling data, digital twins of real-world factories, industrial equipment, production lines, etc. can be created efficiently. Overall, AIGC is developing into a horizontal combination deeply integrated with various other industries, and its applications are accelerating their penetration into all aspects of the economy and society.

除以上行业之外,教育、金融、医疗、工业等各行各业的AIGC应用也都在快速发展。教育领域,AIGC赋予教育材料新活力。相对于阅读和讲座等传统方式,AIGC为教育工作者提供了新的工具,使原本抽象、平面的课本具体化、立体化,以更加生动、更加令人信服的方式向学生传递知识。例如制作历史人物直接与学生对话的视频,给一场毫无吸引力的演讲注入新的活力;合成逼真的虚拟教师,让数字教学更具互动性和趣味性等。金融领域,AIGC助力实现降本增效。一方面可通过AIGC实现金融资讯、产品介绍视频内容的自动化生产,提升金融机构内容运营的效率;另一方面,可通过AIGC塑造视听双通道的虚拟数字人客服,让金融服务更有温度。医疗领域,AIGC赋能诊疗全过程。在辅助诊断方面,AIGC可用于改善医学图像质量、录入电子病历等,完成对医生的智力、精力的解放,让医生资源专注到核心业务中,从而实现医生群体业务能力的提升。在康复治疗方面,AIGC可以为失声者合成语言音频,为残疾者合成肢体投影,为心理疾病患者合成无攻击感的医护陪伴等,通过用人性化的方式来抚慰患者,从而舒缓其情绪,加速其康复。工业领域,AIGC提升产业效率和价值。一是融入计算机辅助设计CAD(Computer-aided Design),极大缩短工程设计周期。AIGC通过将工程设计中重复的、耗时的和低层次的任务自动化,可使原来需要耗费数千小时的工程设计缩短到分钟级。同时支持生成衍生设计,为工程师或设计师提供灵感。此外,还支持在设计中引入变化,实现动态模拟。如宝马公司在其BMW VISION NEXT 100概念车中通过AIGC辅助设计开发了汽车动态功能性外表皮和内饰。二是加速数字孪生系统的构建。通过将基于物理环境形成的数字几何图形,快速转化为实时参数化的3D建模数据,高效创建现实世界中工厂、工业设备和生产线等的数字孪生系统。总体来看,AIGC正在发展成与其他各类产业深度融合的横向结合体,其相关应用正加速渗透到经济社会的方方面面。

IV. Problems facing AIGC’s development


Now that AI technology development has entered the fast lane, AIGC is playing important roles in all aspects of social production and life because of its rapid response capability, lively knowledge output, and abundant application scenarios. At the same time, however, AIGC’s key technologies, the core capabilities of enterprises, and relevant laws and regulations have not yet been perfected, and disputes around fairness, responsibility, and safety are proliferating, triggering a series of problems in urgent need of solution.


Key technologies are not fully mature enough, and there are still problem areas and difficulties in large-scale promotion and implementation. At present, AIGC technology is constantly being upgraded to further release the productivity of content, but key AI technologies still have limitations that impede the industry development process. First, AI algorithms have inherent flaws. AI algorithms have yet to overcome technical limitations in terms of transparency, robustness, bias, and discrimination, leading to numerous problems in algorithm application. Transparency: Due to the black-box operation mechanisms of algorithm models, their operation rules and causal logic are not obvious to developers. This characteristic makes the generation mechanisms of AI algorithms difficult for humans to understand and interpret, and once an algorithm makes an error, the lack of transparency will undoubtedly hinder external observers from correcting and removing errors. Robustness: Algorithm operation is prone to interference from data, models, training methods, and other factors, giving rise to non-robustness. For example, when the amount of training data is insufficient, an algorithm that has been tested with good performance on a specific dataset is likely to be affected by slight perturbations from small amounts of random noise, thus causing the model to give incorrect conclusions after the algorithm is put into application. When content is updated with online data, the algorithm is likely to produce deviations in the performance of the system, which may lead to system failure. Bias and discrimination: Algorithms use data as raw material. If data used initially has biases, those biases may persist over time, invisibly affect the results of AI algorithm operation, and ultimately lead to bias or discrimination in the content generated by algorithms, triggering user disputes about the fairness of the algorithms. Second, AIGC content editing and creation technology is imperfect. AI-enabled content editing and creation technology is still constrained by shortcomings, resulting in technical barriers to industry development. Text generation: Enterprises have bottlenecks in natural language understanding technology. Very often, templates are simplistically applied to generate mechanistic filler, resulting in similar and monotonous text structures. Moreover, it is difficult for text generation to truly produce emotional, anthropomorphic expression, departing from users’ expectations that text synthesis products be easy-to-read and of high-quality. Speech synthesis: Problems such as lack of fluency in speech expression and strongly mechanical-seeming voices are conspicuous. Emotional embedding in speech requires large-scale data volumes to support training, and the modeling requirements are very high, which increases the complexity of use and also makes it difficult to control the corresponding costs, restricting enterprises from unlocking the technology’s value. Visual generation: There are problems such as where intelligent image processing results are less than ideal and real-time motion capture is insufficiently accurate. In applications, due to the inability of large vision models to complete multiple visual perception tasks simultaneously, the accuracy, reduction, and simulation of machine vision are imperfect, so manual labeling is required at a later stage. Thus, the problems of high technical barriers and low production efficiency have not been well solved.


The core capabilities of enterprises are uneven, threatening the healthy and secure development of the web content ecosystem. With the open-sourcing and openness of digital technology, AIGC technology R&D barriers and production costs have been continuously lowered, resulting in a mixed bag of platform enterprises in the market, and the dearth of core competencies among enterprises has caused serious obstacles to the construction of a good network ecosystem. First, content moderation ability needs to be improved. In recent years, AIGC enterprises, as entities with primary responsibility for internet content governance, have implemented their responsibility by establishing content moderation mechanisms, and “machine moderation + human moderation” has become their basic moderation method. In terms of machine moderation, the moderation accuracy rate is affected by the type of moderation, the multiplicity of content violation variants, and the intensification of countermeasure efforts by illegal and gray-area industries, resulting in a high rate of false alarms and the need for overlaid manual moderation. As for human moderation, the use of human moderation outsourcing services has become the mainstream in the market, but the performance of different human moderation teams varies with respect to personnel management, business process management, moderation capabilities, etc., and the industry has not formed a unified standard. Overall, the lack of qualified moderators may lead to an outpouring of illegal and illicit content containing fake and undesirable information, seriously harming the industry and even the entire network ecosystem. Second, enterprises need to further build their technology management ability. Because AIGC technology is becoming more and more complex, and its application in enterprises is often highly dynamic, enterprises need to have corresponding technology management capabilities for them to serve as technology designers and service providers. However, enterprises are commercial in nature, and where resources are limited, this often means they will tend to satisfy their own interests first and invest insufficiently in technological security and institutional safeguards. In this respect, the gaps between enterprises are very obvious. Enterprise with “deep pockets” and long development histories are more likely to have better levels of technological protection and management, and vice versa. Many small enterprises entering the market will put AIGC into application before their technology management ability is up to standard, providing a hotbed for plagiarism, infringement, content faking, malicious marketing, and other illegal and gray-area industry chains. Third, enterprises have yet to perfect their risk governance capacity. The “Guiding Opinions on Strengthening the Comprehensive Governance of Network Information Service Algorithms” clearly proposes strengthening the requirement for enterprises to shoulder primary responsibility. Enterprises should build and perfect AI management capacity and effectively prevent all risks in the process of AI development. However, the current AIGC technology is still in the early stages of development, and its risks are characterized by unknowns and complexity. Many enterprises have not yet perfected their risk prediction, prevention, and emergency response abilities, and the risk management concept has not been implemented in engineering and technology practices. This problem makes it likely for enterprises to miss opportunities to nip risks in the bud. When you are in a passive state in the complex network security game, once internal threats or external attacks are suffered, they can very easily trigger security risks to the network information content ecosystem.


The relevant rules and guidelines still need to be improved, and there is a mismatch problem between development and governance. AI industry rules and guidelines have been launched continually in recent years, and the governance system has begun to take shape, but as the progress of science and technology accelerates, institutional construction may not always keep up with it. This in turn gives rise to a mismatch problem between the development of technological innovation on one hand and policy support and legal regulation on the other. First, policies to support the industry’s development need to be implemented. China’s 14th Five-Year Plan, issued in March 2021, put forward the policy of “forging new digital economy advantages” and emphasized the important value of AI and other emerging digital industries in improving national competitiveness. Guided by this planning, and faced with the rapid development of AIGC-related industries, especially digital cultural industries, the central government has issued a number of policies to promote the development of new digital cultural industries. The latest “Opinions on Promoting the Implementation of the National Cultural Digitization Strategy,” issued in May 2022, calls for the study and formulation of industrial policies to support cultural digitization, emphasizing that localities should formulate specific implementation plans according to local conditions, and that relevant agencies should refine their policies and measures. In the future, the strength of support, promotion of implementation, and dynamic adjustment of policies of different localities and agencies will determine the degree of mutual construction between technology and society, which will play an important role in the development of AIGC technology in social contexts. Secondly, the ability to copyright AIGC has yet to be clarified. Currently, China’s Copyright Law stipulates that the objects of copyright are “works.” Just looking at the legal text, China’s current intellectual property law system stipulates that a subject of law is a person who enjoys rights, has obligations, and bears responsibilities, so it is difficult for non-human-produced intelligentization content to obtain copyright protection through the logic of “work-creation-author.”7 This view was supported in a 2019 judgment by the Beijing Internet Court. However, in the 2020 case of Tencent v. WDZJ.com, the Shenzhen Nanshan District Court held that an article written by AI is a copyright-protected work if it meets the requirement of originality. Ambiguity in the legal concepts triggered the reversal of judicial decisions, leading to the real-life dilemma of unclear copyright attribution for AIGC works. This problem may not only lead to an inability to obtain copyright protection for works created using AIGC technology, keeping AI technology from realizing its creative value. Due to the massive copying behavior of AI, it may also dilute the originality of rights holders of existing works, threatening the legitimate rights and interests of others. Third, the new technology makes supervision more difficult. In recent years, with the continuous maturation of AI technology, the content generated by computers after deep learning has become increasingly realistic, achieving the effect of “confusing fake and real.” By the same token, the application threshold is also decreasing. Everyone can easily “swap faces,” “modify voices,” and even join an “internet troll army.” Because of the universal “seeing is believing” cognitive trait among the public, when the technology is misused, it is likely that fake content will reach users through the internet instantly and in highly credible ways, causing the public’s judgment to fail in the game of ideas, given the difficulty of identifying trolls and false information. This in turn involves another real problem. That is, due to the virtual identity cloak provided by the internet and the development of related technologies, the producers of fake content are decentralized, mobile, large in number, and hidden, which makes tracking them an ever more difficult and complex task. Coupled with the vagueness and lagging nature of rules and guidelines, defining the boundaries for borderline counterfeiting behaviors poses a real dilemma, and this undoubtedly creates serious obstacles to the regulation of content.


To top

Cite This Page

"Artificial Intelligence-Generated Content (AIGC) White Paper [Excerpt] [人工智能生成内容(AIGC)白皮书]", CSIS Interpret: China, original work published in China Academy of Information and Communications Technology (CAICT) [中国信息通信研究院], September 1, 2022

FacebookTwitterLinkedInEmailPrintCopy Link