Home Artificial Intelligence Platform lets creators monetize their content for use in LLM training

by Paul Barker

Platform lets creators monetize their content for use in LLM training

news

Jul 17, 20245 mins

Artificial Intelligence

Avail’s Corpus tool ‘flies in the face’ of comments made by head of Microsoft AI, says analyst.

Credit: Shutterstock

Avail, an AI research firm that focuses on the media industry, today launched Corpus, a platform it said enables creators and media rights holders to license their work to AI model developers.

Corpus, the Brooklyn, New York-based firm said in a release, enables “rights holders to seek compensation for both catalog content and real-time answers derived from their work.”

A company FAQ describes it as a “monetization platform for creators, media companies and rights holders of all kinds. We connect content owners with AI companies interested in licencing their work for training purposes or real-time chatbot answer retrieval.” The Corpus homepage contains a valuation calculator that provides creators an estimate of their catalog’s worth based on recent benchmarks, Avail said.

On the site, it states that it has partnered with OpenAI, Anthropic, film production and distribution company 30West, AI-based wealth management firm Range, and venture capitalists General Catalyst and Seven Seven Six.

Bill Wong, AI research fellow at Info-Tech Research Group, viewed the launch of Corpus as a positive move for creators, and necessary in order to reset “expectations that Big Tech vendors have regarding their use of copyrighted data.”

While, he said, an initiative such as this has the potential to be beneficial not only to content creators, but also to those firms who train AI models, “there will be challenges in resetting expectations and making this work in an efficient manner. The advantage of accessing curated data is that it provides a higher quality of data to train the model. However, the administration of this may be a challenge, such as calculating the right costs, perhaps implementing new types of watermarks, etc.”

Wong added that Avail’s Corpus tool “flies in the face” of recent comments made by Mustafa Suleyman, the CEO of Microsoft AI, in an interview at the recent Aspen Ideas Festival. “While attempting to define what kind of content is protected by publishers, he proceeded to say: ‘With respect to content already on the open web, the social contract of that content since the 1990s has been that it is fair use. Anyone can copy it, recreate it, or reproduce it. That has been freeware, if you like; that’s been the understanding.’”

Had the internet had a tool like Corpus available in the 1990s, said Wong, “I am sure content creators would have been properly acknowledged and compensated for their content. Today, the jury is still assessing whether copyright data for LLM training should fall under ‘fair use,’ but accessing data in real-time should be recognized as of value to both users and vendors, and this content should not be considered freeware.”

Today, he said, the US copyright office has not prevented “LLM vendors from using copyrighted data to train their models. The vendors typically state that the use of the copyrighted data falls under the legal concept of ‘fair use,’ which allows people/companies to use limited portions of the work for non-commercial, educational, or transformative uses.”

According to Wong, “It is the ‘transformative’ use the vendors argue that is how the LLMs are using the data. Ingested data is not simply reproduced by the LLM; the content is transformed and used to generate new content for new uses. However, I don’t believe that when the ‘fair use’ doctrine was first defined, they considered a program that would ingest all the data, be used for commercial purposes, and disrupt the industry of the creators.”

The launch of Corpus follows an announcement late last month that seven companies that license music, images, videos, and other data used for training AI systems have formed a trade association to promote responsible and ethical licensing of intellectual property. To be known as the Dataset Providers Alliance (DPA), the primary goals are to standardize the licensing of intellectual property for AI and ML datasets, facilitate industry collaboration, be an advocate for content creators’ rights and protect intellectual property.

What can potentially happen if an organization does end up getting caught for copyright violations? Consider: in March, France’s competition authority fined Google, its parent company Alphabet, and two subsidiaries a total of €250 million ($271 million) for breaching a previous agreement on using copyrighted content for training its Bard AI service, now known as Gemini.

The Autorité de la concurrence said that the search giant failed to comply with a June 2022 settlement over the use of news stories in its search results, News and Discover pages. Google avoided a fine at that point by pledging to enter into good-faith negotiations with news providers over compensation for their content, among other actions.

Next read this:

by Paul Barker

Paul Barker is a freelance journalist whose work has appeared in a number of technology magazines and online with the subject matter ranging from cybersecurity issues and the evolving world of edge computing to information management and artificial intelligence advances.

Americas

Asia

Europe

Oceania

Topics

About

Policies

Our Network

More

Platform lets creators monetize their content for use in LLM training

Avail’s Corpus tool ‘flies in the face’ of comments made by head of Microsoft AI, says analyst.

More from this author

Chance of Nvidia losing antitrust probe unlikely, says analyst

D-Wave launches new quantum roadmap geared to AI/ML

Retirement of Office 365 connectors in Teams not sitting well

Cloudflare offers simpler way to stop AI bots

Japanese government says ‘sayonara’ to floppy disk

Omnissa downplays its VMware past in official launch

Box announces upgrade to Box AI, integration with GPT-4o

AR/VR headset sales decline is temporary: IDC

Most popular authors

Show me more

Microsoft's Patch Tuesday updates: Keeping up with the latest fixes

For August, Patch Tuesday means patch now

Germany’s BSI guns for better tech security

Podcast: Is the gold rush for AI talent slowing down?

Podcast: Google loses antitrust, and the world yawns

Podcast: Does a chief risk officer make sense?

Is there still a gold rush for AI talent?

Tech news roundup: Google antitrust, Delta-Microsoft tiff, and stuck astronauts

Do companies need a Chief Risk Officer?

Platform lets creators monetize their content for use in LLM training

Avail’s Corpus tool ‘flies in the face’ of comments made by head of Microsoft AI, says analyst.

Related content

ChatGPT users speechless over delays

Public opinion on AI divided

There aren't nearly enough workers to support new US chip production

From our editors straight to your inbox

More from this author

Chance of Nvidia losing antitrust probe unlikely, says analyst

D-Wave launches new quantum roadmap geared to AI/ML

Retirement of Office 365 connectors in Teams not sitting well

Cloudflare offers simpler way to stop AI bots

Japanese government says ‘sayonara’ to floppy disk

Omnissa downplays its VMware past in official launch

Box announces upgrade to Box AI, integration with GPT-4o

AR/VR headset sales decline is temporary: IDC

Most popular authors

Show me more

Microsoft's Patch Tuesday updates: Keeping up with the latest fixes

For August, Patch Tuesday means patch now

Germany’s BSI guns for better tech security

Podcast: Is the gold rush for AI talent slowing down?

Podcast: Google loses antitrust, and the world yawns

Podcast: Does a chief risk officer make sense?

Is there still a gold rush for AI talent?

Tech news roundup: Google antitrust, Delta-Microsoft tiff, and stuck astronauts

Do companies need a Chief Risk Officer?