ChatGPT accused of sucking up news articles to enrich its database

Do AI models violate protected content during training? In the United States, the matter may well be settled in court. Several US media allege that OpenAI, the company behind ChatGPT used news articles during its learning phase.

To train successful AI models, algorithms need to train on astronomical amounts of information. This step, called machine (Or deep) learning is vital for the creation of a viable and competent artificial intelligence. However, can AI suck up all content from the web, even those protected by law? During its apprenticeship, ChatGPT surveyed a significant part of the web pages present on the Internet, until 2021 officially.

Tech news journalist Francesco Marconi asked ChatGPT about his training sources and whether he used any media-produced articles. Not without surprise, the chatbot then unveiled a list of 20 information sites used to learn new knowledge. Reuters, the New York Times, The Guardian, BBC News, CNN and many other online newspapers were reportedly viewed. However, as Francesco Marconi points out, the extraction of data without the authorization of the publishers could constitute a violation of the conditions of use.

CNN and the wall street journal might react

Quoted in the previous list, the financial newspaper wall street journal would not have concluded any agreement with OpenAI, according to Jason Conti, general counsel of the Dow Jones unit of News Corporation, the parent company of the daily, relayed by Bloomberg. “Anyone who wants to use the work of journalists from the wall street journal to train artificial intelligence should obtain the necessary rights from Dow Jones. We take the misuse of the work of our journalists seriously, and we are looking into this situation”he assures.

For its part, CNN plans to contact OpenAI to request payment for a license. Currently, ChatGPT’s use of the site’s articles would violate its terms of service, according to a source familiar with the matter.

In France, when asking ChatGPT about its sources, the AI ​​says it has been trained on “a wide variety of information sources, including news articles, books, websites, forums, scientific publications, databases and many other types of text”. As in the United States, the agent draws up a fairly long list of French-language media: Le Monde, Le Figaro, Liberation, Les Echos, L’Express, Le ParisienFrance 24, BFM TV, RFI, Europe 1, The Huffington Post,, MediapartAgence France-Presse (AFP).

Related Articles

Back to top button