Why ChatGPT Threatens Journalists & Publishers Business Models

March 9, 2023March 10, 2023 Admin

ChatGPT just got fired. The artificial brain run by OpenAI was briefly hired as a reporter last week by a news agency. The first assignment was this: “Create news story using this source material, add names, place, when it happened: At first sight ChatGPT‘s write-up of the brutal mugging of an elderly woman in Buenos Aires, Argentina, looked passable – certainly impressive for a computer that worked out for itself how to structure a news report with a punchy intro, short sentences a coherent structure conforming to the inverted pyramid taught in most journalist schools.

But the detailed news editor feedback raised enough alarm bells for the agency concerned (who did not want to be named in this piece) to scrap its experiment in artificial intelligence:

You got the name of the victim wrong
The age of the victim wrong
The location of the crime wrong
The description of the video didn’t match the footage
Said the perpetrator was unidentified when we have a name and an age
Said the perpetrator was at large when he was arrested
Appear to have fabricated a quotation by the mayor of Buenos Aires
Got the name of the mayor of Buenos Aires wrong
Appear to have fabricated a quotation by the victims’ children.

Although numerous publishers are looking at using ChatGPT technology for limited content production purposes using strictly controlled source material, the AI bot is not coming for journalists’ jobs yet – at least not directly. As the above road test illustrates, it would be foolhardy in the extreme to let ChatGPT loose in the wild as a reporter.

But ChatGPT does present a huge threat to publisher business models as a new form of search that answers users’ questions without citing sources or providing links back.

The first great online unbundling dealt a blow to newspapers as readers could find specific information, from cinema times to job advertisements, without needing the whole print package. This second phase could see key information unbundled from journalism completely, with the AI bot reading the news for you and then sharing key information minus the need to lool at adverts or pay a subscription.

Bing has already incorporated ChatGPT into its service (and has a millions-strong waiting list for those wanting to trial it). If Google follows suit, an army of lawyers can be expected to beat a path to the door of the search engines.

The lack of transparency from ChatGPT about how it finds answers does not just threaten traffic for publishers (by removing the need to click on links in order to answer search questions), it also raises huge questions about how misinformation and deep fakes can be spotted in future.

News Corp chief executive Robert Thomson revealed this week that his company has already entered discussions with an AI company about receiving compensation for use of its copyright content.

And lawyers are already on the case of ChatGPT at the company’s UK subsidiary News UK, where chief lawyer Angus McBride wrote in The Times this week urging new regulator the Digital Markets Unit to urgently grapple with the issue of AI-written news.

He said: “A future of algorithmically-generated news – in particular, opinion scraped from uncited sources – as the predominant method of delivery is dystopian. The impact will result in enormous damage to the essential business of journalism itself.”

UK publishers are hopeful that the Digital Markets Unit will finally get up and running this year and lead to Australia-style regulation of the media to support news producers at the expense of Google and Facebook (who together take more than £10bn a year out of the UK ad market).

Despite revenue so far in the tens of millions, the company behind ChatGPT is already worth around $29bn according to the Wall Street Journal.

The unsourced answers it provides to user questions are already based on reading and exploiting publishers’ copyrighted news content. OpenAI has asked no publishers for permission and provided no payments.

When asked where it gets its news information from and what publishers it reads, ChatGPT told Press Gazette: “As an AI language model, I do not have a preferred source of news or publishers that I read. Instead, my training data comes from a wide range of sources, including books, articles, websites and other text-based materials that are publicly available on the internet.

“That being said, I do not actively seek out or monitor current news events on my own. Instead, I rely on users like you to ask me questions about current events or to provide me with specific information to respond to. I can then use my understanding of natural language processing and my vast database of information to provide you with the most accurate and relevant response possible.”

After varying the same question in a few different ways, the chatbot revealed: “Some of the news publishers and websites that contribute to my training data include CNN, BBC, Reuters, The New York Times, The Washington Post, The Guardian, Al Jazeera and many others.”

When asked directly it said that it also analyses text from The Times, The Sun and even Press Gazette. So it seems fair to assume that most reputable news publishers are contributing to the success of ChatGPT without recompense.

ChatGPT and news: What the lawyers say

Whether publishers have a right in law to charge ChatGPT for analysing and exploiting their content is a moot point. News Corp certainly thinks they do but legal opinion is far from set on this. Unless new legislation specifically addresses the issue, it may take a test case to decide.

Lindsay Gledhill, head of IP at Harper James, said: “Intellectual property law always lags behind technological advance.

“Our copyright laws were drafted to deal with the invention of the printing press. We updated them to deal with computers, and then again to handle the internet – though some say copyright hasn’t caught up with that yet.

“But copyright judges, who are supposed to be able to say whether a ‘substantial part’ of a source has been copied, are going to be physically unable to do their jobs where ChatGPT has scanned thousands of source materials – we’re going to need another big update on the law to handle this.”

Alan Harper, a partner in the intellectual property team at Walker Morris, said: “It will only be a matter of time until someone brings a test case, as Getty has done [with AI-generated images], which will test established principles against new technology.”

Harry Jewson, an associate at Burges Salmon, agreed: “Litigation is expected on many fronts across multiple jurisdictions. Regulators are already actively considering their response.

“Publishers will need to be aware of the differences between copyright laws in key territories, particularly around exceptions: the UK, for example, currently permits text and data analysis for non-commercial purposes and has proposed a wide-ranging exception allowing text and data mining even for commercial purposes.”

Andrew Wilson-Bushell from Simkins said OpenAI’s decision to train ChatGPT on pre-2021 data and not connect it to the internet may mitigate the legal risk. And he noted that publishers will be closely watching ChatGPT’s integration with Bing.

“Once ChatGPT is commercially integrated into Bing, rights holders may be rightly concerned that this could result in a reduction in click-through rate. This will depend on how the technology is implemented and is a recurring trend in the industry as search engines seek to keep users on-page (such as via Google’s ‘Featured Snippets’).”

As has been the case with Google, the future of publishers’ relationship with ChatGPT may end with negotiation rather than litigation.

SEO expert Luke Budka at PR agency Definition said that if ChatGPT turns out to be the “traffic killer” for publishers which it seems likely to be, they could simply block it from crawling their work.

“If it is we’d anticipate publishers blocking Bing and Google from harvesting their content because it needs to be a two-way street. The most obvious way of doing that would be to prevent content from being included in the five datasets that GPT-3.5 [the current iteration of Chat GPT] is trained on.

“The two datasets that crawl the internet are Common Crawl and WebText2. There’s uncertainty about how to prevent inclusion in WebText2 because it’s based on Reddit upvotes, but the Common Crawl bot is called CCBot.

“Given the ongoing global legal actions involving Google and publishers regards Google ‘stealing’ their content, you’d have thought blocking CCBot will quickly rise up the agenda for publishers if they start to see an erosion of organic traffic.

“It’s a slippery slope for Bing and Google though. They need great content for their indices – they, therefore, need to encourage and reward content creators. Otherwise, the snake starts eating its tail as the internet becomes full of AI-generated secondhand content.”