Unmasking Undisclosed AI Content in Scholarly Papers: A Growing Concern

Politics, News Sept. 10, 2023, 5:14 a.m.

Undisclosed AI-generated content in scholarly papers, including ChatGPT-assisted manuscripts, raises ethical concerns, challenges peer review, and threatens research integrity in scientific publishing.

In a recent development that underscores the evolving landscape of scientific publishing, the inadvertent deployment of AI-generated content is causing a stir within the academic community. On August 9th, Physica Scripta, a reputable journal, unknowingly published a paper that harbored hidden traces of artificial intelligence intervention. The unexpected revelation was spotted by Guillaume Cabanac, a vigilant scientific sleuth and computer scientist hailing from the University of Toulouse in France. Cabanac, with a discerning eye for anomalies, stumbled upon a peculiar phrase, "Regenerate response," conspicuously nestled on the manuscript's third page. This phrase, it turns out, was the innocuous label of a button within ChatGPT, a widely-used AI chatbot renowned for its ability to effortlessly generate coherent text in response to user queries.

Intriguingly, the authors of the paper, unbeknownst to the journal's editorial board, had utilized ChatGPT during the drafting process of their manuscript. This revelation came to light when Kim Eggleton, the head of peer review and research integrity at IOP Publishing, the publisher of Physica Scripta, verified this with the authors themselves. A critical ethical breach was unveiled in this process—the authors had failed to declare their reliance on ChatGPT while submitting their work for peer review. Consequently, this non-disclosure prompted the journal to take a drastic step, retracting the paper as it flagrantly violated their ethical policies. Notably, the corresponding author, Abdullahi Yusuf, who holds affiliations with Biruni University in Istanbul and the Lebanese American University in Beirut, remained unresponsive to Nature's inquiries.

This particular incident, however, is not an isolated case of a ChatGPT-assisted manuscript slipping into a peer-reviewed journal without the mandatory declaration. Since April, Cabanac has detected over a dozen journal articles bearing the unmistakable ChatGPT fingerprints, such as the phrases 'Regenerate response' or 'As an AI language model, I...'. He diligently posted these findings on PubPeer, a platform dedicated to fostering discourse among scientists about published research. Notably, numerous publishers, including industry giants like Elsevier and Springer Nature, have explicitly stated that authors can employ ChatGPT and similar large language model (LLM) tools during manuscript composition, provided they openly acknowledge their usage.

Nonetheless, the methodology employed by Cabanac primarily focuses on unrefined and unwitting deployments of ChatGPT, where authors inadvertently forget to excise these telltale phrases. The true extent of undisclosed ChatGPT involvement in peer-reviewed papers is likely far greater, as more sophisticated authors adeptly edit out these conspicuous markers. Moreover, ChatGPT itself has undergone subtle changes in its user interface, with the previously overt 'Regenerate response' button now appearing as 'Regenerate' following a recent tool update.

Beyond the realm of Physica Scripta, Cabanac has unearthed undisclosed use of ChatGPT in papers published by Elsevier journals, amplifying the scope of this emerging issue. In a recent case, Cabanac scrutinized a paper published on August 3rd in Resources Policy, which explored the impact of e-commerce on fossil-fuel efficiency in developing countries. Here, he discerned that some of the equations within the paper were puzzling, but the clinching evidence lay above a table: the text nonchalantly confessed, 'Please note that as an AI language model, I am unable to generate specific tables or conduct tests...'. In response, a spokesperson for Elsevier conveyed their awareness of the situation and assured an ongoing investigation. Regrettably, the authors of the paper, affiliated with Liaoning University in Shenyang, China, and the Chinese Academy of International Trade and Economic Cooperation in Beijing, remained silent in response to Nature's outreach.

The emergence of papers authored either partially or entirely by computer software, without the requisite disclosure, is by no means a novel phenomenon. Typically, these AI-generated papers exhibit subtle yet detectable traces, including peculiar language patterns or mistranslated phrases, setting them apart from their human-crafted counterparts. Matt Hodgkinson, the research integrity manager at the UK Research Integrity Office, posits that if authors conscientiously remove boilerplate ChatGPT phrases, the sophisticated chatbot's output becomes virtually indistinguishable from human-written text. This phenomenon underscores the escalating contest between malevolent actors seeking to subvert the system and the gatekeepers striving to maintain its integrity.

In addition to infiltrating peer-reviewed journals, ChatGPT's covert involvement has also seeped into conference papers and preprints—manuscripts that bypass formal peer review. In instances where these issues have surfaced on PubPeer, some authors have grudgingly admitted to their utilization of ChatGPT, albeit without prior disclosure.

Elisabeth Bik, a microbiologist and an independent research integrity consultant, foresees a troubling trend emerging. She contends that the meteoric rise of ChatGPT and analogous generative AI tools will potentially embolden paper mills, unscrupulous companies that churn out and peddle counterfeit manuscripts to researchers keen on bolstering their publication records. In her estimation, this burgeoning problem has the potential to grow exponentially, rendering the detection of such papers an increasingly daunting task.

Beneath the surface of this issue lies a more profound concern—overburdened peer reviewers, grappling with time constraints, often lack the bandwidth to conduct comprehensive examinations of manuscripts for red flags. David Bimler, a veteran researcher known for unveiling fraudulent papers under the pseudonym Smut Clyde, laments the relentless pressure within the scientific ecosystem, where the imperative to "publish or perish" prevails. He contends that the sheer volume of content being produced has outpaced the capacity of gatekeepers to scrutinize it adequately.

Notably, ChatGPT and other LLMs are not immune to producing erroneous references, a potential telltale sign that peer reviewers can leverage to identify their use in manuscripts. Hodgkinson points out that fictitious references serve as a glaring red flag. An illustrative case comes from Retraction Watch, which highlighted a preprint on millipedes, purportedly written using ChatGPT. It was brought to attention by a researcher cited in the work who astutely identified the fake references.

Rune Stensvold, a microbiologist at the State Serum Institute in Copenhagen, encountered a peculiar variant of this problem when a student requested a copy of a paper bearing his name as a co-author, dated back to 2006. Intriguingly, no such paper existed. Further investigation revealed that a student had sought AI assistance, asking a chatbot to suggest papers on Blastocystis, an intestinal parasite genus. Astonishingly, the chatbot had ingeniously conjured a reference, attributing it to Stensvold. This incident underscored the need for meticulous scrutiny, with Stensvold musing that a closer examination of the references section could serve as an initial filter against AI-generated content.

In conclusion, the emergence of AI-assisted content in scholarly publications, particularly when undisclosed, represents a significant ethical and integrity challenge for the scientific community. The situation not only places added pressure on already overworked peer reviewers but also raises concerns about the potential proliferation of fraudulent manuscripts. As the scientific publishing landscape continues to evolve, it becomes increasingly imperative to establish robust mechanisms for detecting and addressing AI-generated content while preserving the integrity of the research ecosystem.