Konteks: Dalam halaman ini saya menangkap artikel ini untuk memberikan ilustrasi permainan JIF yang ternyata juga terjadi di luar negeri. Jurnal menegosiasikan faktor pembagi (jumlah artikel) dengan ISI (sekarang Clarivate) untuk mendapatkan angka JIF. Pada akhirnya angka JIF diharapkan akan meningkatkan minat penulis untuk mengirimkan artikel.
<aside> 💡 Tautan orisinal: https://quantixed.org/2016/01/05/the-great-curve-ii-citation-distributions-and-reverse-engineering-the-jif/
</aside>
There have been calls for journals to publish the distribution of citations to the papers they publish (1 2 3). The idea is to turn the focus away from just one number – the Journal Impact Factor (JIF) – and to look at all the data. Some journals have responded by publishing the data that underlie the JIF (EMBO J, Peer J, Royal Soc, Nature Chem). It would be great if more journals did this. Recently, Stuart Cantrill from Nature Chemistry actually went one step further and compared the distribution of cites at his journal with other chemistry journals. I really liked this post and it made me think that I should just go ahead and harvest the data for cell biology journals and post it.
This post is in two parts. First, I’ll show the data for 22 journals. They’re broadly cell biology, but there’s something for everyone with Cell, Nature and Science all included. Second, I’ll describe how I “reverse engineered” the JIF to get to these numbers. The second part is a bit technical but it describes how difficult it is to reproduce the JIF and highlights some major inconsistencies for some journals. Hopefully it will also be of interest to anyone wanting to do a similar analysis.
Citation distributions for 22 cell biology journals
The JIF for 2014 (published in the summer of 2015) is worked out by counting the total number of 2014 cites to articles in that journal that were published in 2012 and 2013. This number is divided by the number of “citable items” in that journal in 2012 and 2013. There are other ways to look at citation data, different windows to analyse, but this method is used here because it underlies the impact factor. I plotted out histograms to show the citation distributions at these journals from 0-50 citations, inset shows the frequency of papers with 50-1000 cites.


As you can see, the distributions are highly skewed and so reporting the mean is very misleading. Typically ~70% papers pick up less than the mean number of citations. Reporting the median is safer and is shown below. It shows how similar most of the journals are in this field in terms of citations to the average paper in that journal. Another metric, which I like, is the H-index for journals. Google Scholar uses this as a journal metric (using citation data from a 5-year window). For a journal, this is a number, h, which reveals how many papers got >=h citations. A plot of h-indices for these journals is shown below.

Here’s a summary table of all of this information together with the “official JIF” data, which is discussed below.
Reverse engineering the JIF
The analysis shown above was straightforward. However, getting the data to match Thomson-Reuters’ calculations for the JIF was far from easy.
I downloaded the citation data from Web of Science for the 22 journals. I limited the search to “articles” and “reviews”, published in 2012 and 2013. I took the citation data from papers published in 2014 with the aim of plotting out the distributions. As a first step I calculated the mean citation for each journal (a.k.a. impact factor) to see how it compared with the official Journal Impact Factor (JIF). As you can see below, some were correct and others were off by some margin.
For most journals there was a large difference between this number and the official JIF (see below, left). This was not a huge surprise, I’d found previously that the JIF was very hard to reproduce (see also here). To try and understand the difference, I looked at the total citations in my dataset vs those from the official JIF. As you can see from the plot (right), my numbers are pretty much in agreement with those used for the JIF calculation. Which meant that the difference comes from the denominator – the number of citable items.

What the plots show is that, for most journals in my dataset, there are fewer papers considered as citable items by Thomson-Reuters. This is strange. I had filtered the data to leave only journal articles and reviews (which are citable items), so non-citable items should have been removed.
It’s no secret that the papers cited in the sum on the top of the impact factor calculation are not necessarily the same as the papers counted on the bottom.
Now, it’s no secret that the papers cited in the sum on the top of the impact factor calculation are not necessarily the same as the papers counted on the bottom (see here, here and here). This inconsistency actually makes plotting a distribution impossible. However, I thought that using the same dataset, filtering and getting to the correct total citation number meant that I had the correct list of citable items. So, what could explain this difference?