(The?) 3 kinds of papermills

Adam Day
3 min readOct 31, 2022

TL;DR: Join me at ConTech Live to hear about a recent project with Open Credo to see if we could detect unusual co-authorships in a dataset created by Anna Abalkina. Sign up here!

Papermilling has a few definitions which you see here and there. Sometimes it’s “organised manipulation of the peer-review process” or it might be “manufacturing of fraudulent research papers” and so on.

I described the general roles involved in papermilling in a previous blog post.

But there’s actually a lot of nuance in how different papermills operate. Consider what happens when we have multiple agents, forgers and authors operating in a network.

I’m going to describe here 3 quite different things which get called ‘paper mills’. Perhaps there are more than 3 main classes of papermill? Add a comment if you can think of another one!

1. Fabrication from scratch

Any organisation which fabricates research papers from scratch for sale. People have been fabricating research for as long as research has been a thing, but it’s the organised part that makes this new. This happens at a vastly greater scale than it used to and the problem continues to grow.

There are definitely templates and re-used images and text in these papers, but I think ‘from scratch’ is a fair distinction between this and the other types.

All those fake western blots? That’s this. Those cancer genetics papers from obscure hospital research departments? That’s this too. Take a look at cases highlighted by Jennifer Byrne, Elisabeth Bik, Cheshire, Smut Clyde and others for examples.

My Papermill Alarm API, mentioned in previous blog posts, is mostly trained to highlight this kind of papermilling.

2. Fabrication by plagiarism

Fabrication from scratch requires skill. What if our forger doesn’t know how to write a fake paper from scratch? This is a sad forger:

I’ve made him look sad by rotating his mouth 180°. Image manipulation is not my forté.

So what can he do? How can he turn that frown upside-down?

The idea is simple. If he can’t write a fake paper, he can just copy a real one! Or he can get creative and copy several bits of papers and paste the bits into a new manuscript. If he’s worried about getting caught by anti-plagiarism software, he can always drop the text into a paraphrasing tool. This is where we get “Tortured Phrases” (see the great work of Guillaume Cabanac et al on this subject).

The difference between organised plagiarism and papermilling-from-scratch is simply that, here, the forger starts with 1 or more existing papers and converts them into a ‘new’ one.

3. Authorship brokering

This is where authorship slots on papers, fabricated or not, are sold. This is actually just the ‘agent’ part of any papermilling operation, but sometimes we see it as a standalone service.

A famous case of a brokerage advertising authorships for sale was brought to light by Retraction Watch some years ago. Anna Abalkina, a researcher at Freie Universität Berlin investigated this matter by tracking down the papers advertised on that site. It’s a very interesting case and well-worth reading about.

It’s fortunate that the papers for sale on that site were advertised so openly. Consider now that anyone can sell authorship on any paper privately. How can we detect that? I’ve learned recently that it might be possible to spot these brokered authorships.

Join me at ConTech Live!

Earlier this year, SAGE Publishing worked with Open Credo, a software development consultancy here in London, on a project to see if we could detect unusual co-authorships using the dataset created by Anna Abalkina in her work. If you’d like to know more, I suggest signing up for our forthcoming ConTech presentation!

--

--

Adam Day

Creator of Clear Skies, the Papermill Alarm and other tools clear-skies.co.uk #python #machinelearning #ai #researchintegrity