The COVID-surge in research papers: explaining the gender-disparity
Edit: since writing this post, I have been able to confirm that rejected article tracking data shows a surge in the publication of rejected articles in journals which coincides with the timing of the pandemic. My best guess is that, in lockdown, people simply had more time to submit their old rejected manuscripts.
It has been reported in numerous places that the proportion of journal submissions with female authors has reduced since the start of COVID restrictions in March (1,2,3,4).
A common explanation proposed for this is that women have taken on greater domestic responsibilities during these times and this has prevented them from doing research. I have no doubt that this is true. But there’s more to the story — there’s something else going on here which I think is interesting.
There’s a well-known fact among academic publishers which is that, as soon as COVID lockdown started across the globe in early March, the number of research papers being submitted for peer-review surged. A recent preprint reported that Elsevier saw increases as high as 58% over the previous year. This is in agreement with the surge in submissions that we saw at SAGE (my employer).
At the time, this seemed like good news. While many businesses were suffering as a result of the virus, ours was actually seeing an increase in activity. But, where were these papers coming from? Here are some obvious possible explanations:
- The new papers were about COVID-19 and so the virus caused some acceleration in scientific output.
- Lockdown gave academics more time to write papers resulting in an increase in scientific production.
- The papers were already written and just waiting to be submitted somewhere. In fact, they were predominantly rejected articles i.e. articles which had been written some time ago, submitted to journals, rejected, and then awaiting submission to some other venue.
While the first 2 of these explanations are no-doubt true to an extent, I believe that the third explanation accounts for most of the surge.
My $0.02
The best explanation I can see here is that, lockdown didn’t significantly increase the rate of scientific production, but it did give researchers an opportunity to submit a backlog of rejected articles.
- There wasn’t really time to start writing new papers between the start of March and the start of the COVID-surge, the effect seems too quick to be explainable by new research.
- This explanation accounts for the rise in submissions and something else: a rise in rejection rate. It appears that the COVID-surge included a lot of lower-quality papers.
Furthermore, if we look at submissions to ArXiv, there is no COVID-surge. Nothing close to the ~50% rise in submissions as seen at Elsevier and SAGE. ArXiv, as a preprint server, more closely represents the rate of scientific production than journal submissions do.
(I think that there is maybe some growth in ArXiv submissions after June 2020, but bearing in mind that a percentage of ArXiv is post-print, this could be caused by COVID-surge papers being archived after peer-review.)
This raises some new questions
If women’s scientific production has fallen during COVID times, I think we would see a change in the trend in submissions to ArXiv. But perhaps it’s just too small to see, or obscured somehow.
Nevertheless, it seems more reasonable to conclude that we see a drop in the proportion of papers written by women that are being submitted to journals for some other reason. This also seems to be consistent with thorough research findings like these. But why?
- Frankly, I don’t know. To hazard a guess: since men often have longer careers in science, then perhaps they have more rejected articles sitting around waiting to be re-submitted. This seems to explain the change in the proportion of submissions to journals from women and also agrees with the other data shown here.
- But, as I say, it’s just a guess based on incomplete data. Maybe there’s another reason why we see a gender disparity in the COVID-surge. What do you think?
Rejected articles are interesting, aren’t they? My team at SAGE recently released an open-source Python package for tracking rejected articles. At some point, if I have time, I will use this package to check the average time between rejection and publication during COVID-times. If I’m right, then we’ll see a drop in the average time to publication for rejected articles.
Appendix: Some other data on publishers’ output
I did wonder if I was seeing a difference between ArXiv and Elsevier because they cover different topic areas. ArXiv is more physicsy and Elsevier is more health-sciences. However, what we see across most publishers is a rise in the rate of DOI-registrations (a proxy for publications) a few months after the start of lockdown. This would be consistent with those publishers experiencing a surge in submissions beginning in March and this being reflected in publications later in the year (due to the delay caused by peer-review). We can see this in the publications of physics publishers like the American Physical Society and the American Institute of Physics. Beware that the data here can be very noisy.
And here are some other publishers so that you can see that the COVID-surge was a wide-spread phenomenon and not limited to a small number of publishing houses.
I also looked at IOP Publishing, the largest physics publisher (and a former employer of mine). However, I did not see an upswing in their output since March. That said, there are a lot of other things going on here.
- IOP publishes a large proportion of conference proceedings (which are commissioned, so not likely to be rejected articles).
- Also, there is significant recent growth which may also obscure the covid-surge.
Finally, I did take a quick look at genders of first names of authors on ArXiv submissions. The data is obviously horrendous — with the most common category being ‘unknown’. Better methods of determining gender from names are available, but I chose this one because it is quick. For what it’s worth, I see no change in the trendline for women’s names, but the data has a large error.