Keystone: detecting bad actors with network analysis

Adam Day
3 min readApr 3, 2024

TL;DR: We can detect individuals with a high probability of being involved in milling papers. The question is: how should we respond?

Keystone connects any paper to the network of milled papers and calculates that article’s level of connection to papermilling. This image shows the mathematical ‘centre’ of the coauthorship part of the network. I like to imagine there’s some sort of papermill-godfather figure in there somewhere, or at least a Del boy.

A few years ago, my bike was stolen. It was an organised job. The thieves arrived after dark, cut a hefty lock clean off, and my bike disappeared silently forever. I imagined it dismantled and sold off into some vast black market network. Poor bike. The police admitted that they weren’t going to do anything. I guess that if they catch someone stealing a bike there isn’t a lot they can do anyway. It isn’t a serious-enough crime to lock someone up.

If a crime is easy to do, pays well, and has no punishment, then that is an endemic crime. It isn’t going away because the people who do it have incentives and don’t have disincentives.

Organised research fraud fits a similar niche.

It’s encouraging to see so much movement to change how we respond to research misconduct in recent months with initiatives like United2Act and working groups organised by the STM Association.

It’s clear that historic practices for responding to misconduct are not the right ones for today. (If they were, there wouldn’t be so much of it.) So discussion about the way forward is very welcome.

There’s a problem with retractions. It’s not that we shouldn’t do them; they help to clean up the literature. But if you mill fake papers for people under a pseudonym and those fake papers get retracted, that doesn’t necessarily affect you. So retraction isn’t a disincentive much of the time.

I spoke last year about the importance of KYA — Know Your Author. If we know who our authors are, we can build relationships and trust. That gives us confidence in dealing with people we know and tells us when we need to get to know people better.

Because it’s all about relationships.

“Keystone” is a network analysis built on OpenAlex that I’ve been running in one form or another since shortly after the Papermill Alarm was released in 2022. We analyse the relationships between papers, individuals, institutions and other signals to learn about the papermill problem. Last year, Keystone went through some substantial development. It is now integrated into the Papermill Alarm and is already in use by a number of publishers.

I’ve spent a lot of time with Keystone and I’m seeing some interesting things.

  • We can tell which researchers are most likely to be running papermill operations.
  • Some Keystone-authors’ papers are real. This puts an interesting spin on investigations. That ‘KYA’ principle applies: an author only needs to have one verified fake in their history for me to be concerned about all of their output.

So that begs the question: if papermill detection is ‘done’ to enough of a degree that we can identify the current generation of bad actors, how should we respond so that there isn’t another one like this?

I’ve said before that we don’t prescribe how to act on alerts from Clear Skies’ services. But, if it was my journal, and I had cases to investigate, I would start with Keystone. Doing so targets the source of the problem and disincentivises practicing researchers from being involved with milling. It’s bang for buck.

Would you buy an authorship slot from someone with a dozen retractions on their record?

--

--

Adam Day

Creator of Clear Skies, the Papermill Alarm and other tools clear-skies.co.uk #python #machinelearning #ai #researchintegrity