Technical News

Adobe targeted by proposed class action, accused of abusing authors’ work in AI training

Like almost every other tech company in existence, Adobe has relied heavily on AI over the past few years. The software company has launched a number of different AI services since 2023, including Firefly, its AI-based media generation suite. Now, however, the company’s total embrace of the technology may have led to problems, as a new lawsuit claims it used pirated books to train one of its AI models.

A proposed class-action lawsuit on behalf of Elizabeth Lyon, an Oregon author, claims that Adobe used pirated versions of numerous books, including her own, to train the company’s SlimLM program.

Adobe describes SlimLM as a small series of language models that can be “optimized for document support tasks on mobile devices.” It says SlimLM was pre-trained on SlimPajama-627B, an “open source, multi-body, deduplicated dataset” released by Cerebras in June 2023. Lyon, who has written a number of guides for writing non-fiction, says some of his works were included in a pre-training dataset that Adobe had used.

Lyon’s lawsuit, first reported by Reuters, says his writings were included in a processed subset of a manipulated data set that was the basis of Adobe’s program: “The SlimPajama data set was created by copying and manipulating the RedPajama data set (including the Books3 copy),” the lawsuit says. “Thus, because it is a derived copy of the RedPajama dataset, SlimPajama contains the Books3 dataset, including the copyrighted works of Plaintiff and Class Members.”

“Books3” – a massive collection of 191,000 books that were used to train GenAI systems – is a constant source of legal problems for the tech community. RedPajama has also been named in a number of disputes. In September, a lawsuit against Apple claimed the company used copyrighted material to train its Apple Intelligence model. The dispute cited the dataset and accused the tech company of copying copyrighted works “without consent and without credit or compensation.” In October, a similar lawsuit against Salesforce also claimed the company used RedPajama for training purposes.

Unfortunately for the tech industry, such lawsuits have now become commonplace. AI algorithms are trained on massive data sets, and in some cases, these data sets have reportedly included hacked material. In September, Anthropic agreed to pay $1.5 billion to several authors who sued it and accused it of using pirated versions of their works to train its chatbot, Claude. The case was seen as a potential turning point in ongoing legal battles over copyrighted material in AI training data, of which there are many.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button