Why ChatGPT Threatens Journalists & Publishers Business Models
ChatGPT just got fired. The artificial brain run by OpenAI was briefly hired as a reporter last week by a news agency. The first assignment was this: âCreate news story using this source material, add names, place, when it happened: At first sight ChatGPTâs write-up of the brutal mugging of an elderly woman in Buenos Aires, Argentina, looked passable â certainly impressive for a computer that worked out for itself how to structure a news report with a punchy intro, short sentences a coherent structure conforming to the inverted pyramid taught in most journalist schools.
But the detailed news editor feedback raised enough alarm bells for the agency concerned (who did not want to be named in this piece) to scrap its experiment in artificial intelligence:
- You got the name of the victim wrong
- The age of the victim wrong
- The location of the crime wrong
- The description of the video didnât match the footage
- Said the perpetrator was unidentified when we have a name and an age
- Said the perpetrator was at large when he was arrested
- Appear to have fabricated a quotation by the mayor of Buenos Aires
- Got the name of the mayor of Buenos Aires wrong
- Appear to have fabricated a quotation by the victimsâ children.
Although numerous publishers are looking at using ChatGPT technology for limited content production purposes using strictly controlled source material, the AI bot is not coming for journalistsâ jobs yet â at least not directly. As the above road test illustrates, it would be foolhardy in the extreme to let ChatGPT loose in the wild as a reporter.
But ChatGPT does present a huge threat to publisher business models as a new form of search that answers usersâ questions without citing sources or providing links back.
The first great online unbundling dealt a blow to newspapers as readers could find specific information, from cinema times to job advertisements, without needing the whole print package. This second phase could see key information unbundled from journalism completely, with the AI bot reading the news for you and then sharing key information minus the need to lool at adverts or pay a subscription.
Bing has already incorporated ChatGPT into its service (and has a millions-strong waiting list for those wanting to trial it). If Google follows suit, an army of lawyers can be expected to beat a path to the door of the search engines.
The lack of transparency from ChatGPT about how it finds answers does not just threaten traffic for publishers (by removing the need to click on links in order to answer search questions), it also raises huge questions about how misinformation and deep fakes can be spotted in future.
News Corp chief executive Robert Thomson revealed this week that his company has already entered discussions with an AI company about receiving compensation for use of its copyright content.
And lawyers are already on the case of ChatGPT at the companyâs UK subsidiary News UK, where chief lawyer Angus McBride wrote in The Times this week urging new regulator the Digital Markets Unit to urgently grapple with the issue of AI-written news.
He said: âA future of algorithmically-generated news â in particular, opinion scraped from uncited sources â as the predominant method of delivery is dystopian. The impact will result in enormous damage to the essential business of journalism itself.â
UK publishers are hopeful that the Digital Markets Unit will finally get up and running this year and lead to Australia-style regulation of the media to support news producers at the expense of Google and Facebook (who together take more than ÂŁ10bn a year out of the UK ad market).
Despite revenue so far in the tens of millions, the company behind ChatGPT is already worth around $29bn according to the Wall Street Journal.
The unsourced answers it provides to user questions are already based on reading and exploiting publishersâ copyrighted news content. OpenAI has asked no publishers for permission and provided no payments.
When asked where it gets its news information from and what publishers it reads, ChatGPT told Press Gazette: âAs an AI language model, I do not have a preferred source of news or publishers that I read. Instead, my training data comes from a wide range of sources, including books, articles, websites and other text-based materials that are publicly available on the internet.
âThat being said, I do not actively seek out or monitor current news events on my own. Instead, I rely on users like you to ask me questions about current events or to provide me with specific information to respond to. I can then use my understanding of natural language processing and my vast database of information to provide you with the most accurate and relevant response possible.â
After varying the same question in a few different ways, the chatbot revealed: âSome of the news publishers and websites that contribute to my training data include CNN, BBC, Reuters, The New York Times, The Washington Post, The Guardian, Al Jazeera and many others.â
When asked directly it said that it also analyses text from The Times, The Sun and even Press Gazette. So it seems fair to assume that most reputable news publishers are contributing to the success of ChatGPT without recompense.
ChatGPT and news: What the lawyers say
Whether publishers have a right in law to charge ChatGPT for analysing and exploiting their content is a moot point. News Corp certainly thinks they do but legal opinion is far from set on this. Unless new legislation specifically addresses the issue, it may take a test case to decide.
Lindsay Gledhill, head of IP at Harper James, said: âIntellectual property law always lags behind technological advance.
âOur copyright laws were drafted to deal with the invention of the printing press. We updated them to deal with computers, and then again to handle the internet â though some say copyright hasnât caught up with that yet.
âBut copyright judges, who are supposed to be able to say whether a âsubstantial partâ of a source has been copied, are going to be physically unable to do their jobs where ChatGPT has scanned thousands of source materials â weâre going to need another big update on the law to handle this.â
Alan Harper, a partner in the intellectual property team at Walker Morris, said: âIt will only be a matter of time until someone brings a test case, as Getty has done [with AI-generated images], which will test established principles against new technology.â
Harry Jewson, an associate at Burges Salmon, agreed: âLitigation is expected on many fronts across multiple jurisdictions. Regulators are already actively considering their response.
âPublishers will need to be aware of the differences between copyright laws in key territories, particularly around exceptions: the UK, for example, currently permits text and data analysis for non-commercial purposes and has proposed a wide-ranging exception allowing text and data mining even for commercial purposes.â
Andrew Wilson-Bushell from Simkins said OpenAIâs decision to train ChatGPT on pre-2021 data and not connect it to the internet may mitigate the legal risk. And he noted that publishers will be closely watching ChatGPTâs integration with Bing.
âOnce ChatGPT is commercially integrated into Bing, rights holders may be rightly concerned that this could result in a reduction in click-through rate. This will depend on how the technology is implemented and is a recurring trend in the industry as search engines seek to keep users on-page (such as via Googleâs âFeatured Snippetsâ).â
As has been the case with Google, the future of publishersâ relationship with ChatGPT may end with negotiation rather than litigation.
SEO expert Luke Budka at PR agency Definition said that if ChatGPT turns out to be the âtraffic killerâ for publishers which it seems likely to be, they could simply block it from crawling their work.
âIf it is weâd anticipate publishers blocking Bing and Google from harvesting their content because it needs to be a two-way street. The most obvious way of doing that would be to prevent content from being included in the five datasets that GPT-3.5 [the current iteration of Chat GPT] is trained on.
âThe two datasets that crawl the internet are Common Crawl and WebText2. Thereâs uncertainty about how to prevent inclusion in WebText2 because itâs based on Reddit upvotes, but the Common Crawl bot is called CCBot.
âGiven the ongoing global legal actions involving Google and publishers regards Google âstealingâ their content, youâd have thought blocking CCBot will quickly rise up the agenda for publishers if they start to see an erosion of organic traffic.
âItâs a slippery slope for Bing and Google though. They need great content for their indices â they, therefore, need to encourage and reward content creators. Otherwise, the snake starts eating its tail as the internet becomes full of AI-generated secondhand content.â
Read Related Articles and Posts –
[catlist id=2592]