Recently, it was reported that this trillion-dollar tech giant was accused in a class-action lawsuit of directly contacting Anna's Archive, attempting to obtain up to 500TB of pirated e-book data for training its large model. This action has sparked strong opposition from book authors, who believe that NVIDIA's move not only infringes on copyright but also shows extreme methods under competitive pressure.
Anna's Archive is a well-known repository of pirated e-books. Despite repeated warnings that its data sources are illegally obtained, NVIDIA still sought help from it, aiming to accelerate model training. According to the lawsuit documents, several book authors cited internal communications from NVIDIA, indicating that the company had attempted to collaborate with Anna's Archive to include these pirated books in its large language model's pre-training data.
Over the past few years, NVIDIA has not only held a position in the graphics card market but has also been training its own AI models, such as NeMo and Retro-48B. In order to catch up with competitors like OpenAI's ChatGPT, NVIDIA rushed to showcase its latest large model at the developer day in the fall of 2023. For this purpose, the company apparently chose pirated resources as a shortcut.
Although NVIDIA initially denied the allegations of infringement, claiming that the use of these data constitutes fair use, the situation has become increasingly complex as the lawsuit progresses. Book authors emphasized that NVIDIA's actions were driven by competitive pressure, forcing it onto the path of piracy. They also revealed that NVIDIA not only contacted Anna's Archive but also downloaded books from other pirate websites such as LibGen, Sci-Hub, and Z-Library.
Currently, Anna's Archive is facing escalating legal troubles, and its future development is worrying. Although NVIDIA is being questioned in the lawsuit, its influence does not seem to have been significantly damaged. The technology community will continue to monitor the development of this case, watching how this struggle between AI and copyright will evolve.
