Konteks: Dalam halaman ini saya menangkap artikel ini untuk memberikan ilustrasi permainan JIF yang ternyata juga terjadi di luar negeri. Jurnal menegosiasikan faktor pembagi (jumlah artikel) dengan ISI (sekarang Clarivate) untuk mendapatkan angka JIF. Pada akhirnya angka JIF diharapkan akan meningkatkan minat penulis untuk mengirimkan artikel.

<aside> đź’ˇ Tautan orisinal: https://www.frontiersin.org/articles/10.3389/fnhum.2018.00037/full

</aside>

Introduction

The most groundbreaking, transformative research results deserve a broad readership and a large audience. Therefore, scientists submit their best work to the journals with the largest audience. While the number of scientists has been growing exponentially over the last decades, the number of journals with a large audience has not kept up, neither has the number of articles published per journal. Consequently, rejection rates at the most prestigious journals has fallen below 10% and the labor of rejecting submissions has become these journals’ largest cost item. Assuming that this exclusivity allows the journals to separate the wheat from the chaff, successful publication in these journals is treated as a quality signal in hiring, promotion and funding decisions. If anything, these developments have fueled the circularity of this relationship: today, publishing ground-breaking science in a high ranking journals is not only important for science to advance but also for an author’s career to advance. Even before science became hypercompetitive at every level, now and again results published in prestigious journals were later found to be false. This is the nature of science. Science is difficult, complicated and perpetually preliminary. Science is self-correcting and better experimentation will continue to advance science to the detriment of previous experiments. Today, however, fierce competition exacerbates this trait and renders it a massive problem for scholarly journals. Now it has become their task to find the ground-breaking among the too-good-to-be-true data, submitted by desperate scientists, who face unemployment and/or laboratory closure without the next high-profile publication. This is a monumental task, given that sometimes it takes decades to find that one or the other result rests on flimsy grounds. How is our hierarchy of more than 30,000 journals holding up?

At first glance, it appears as if our journals fail miserably. Evaluating retractions, the capital punishment for articles found to be irreproducible, it was found that the most prestigious journals boast the largest number (Fang and Casadevall, 2011) and that most of these retractions are due to fraud (Fang et al., 2012). However, data on retractions suffer from two major flaws which make them rather useless for answering questions about the contribution of journals to the reliability (or lack thereof) of our scholarly literature: (1) retractions cover only about 0.05% of the literature; and (2) they are confounded by error-detection variables that are hard to trace. So may be our journals are not doing so horribly after all?

Journal Ranking

The most widely used metric to rank journals is Clarivate Analytics’ “Impact Factor” (IF), a measure based loosely on citations. Despite the numerous flaws described (e.g., Moed and van Leeuwen, 1996; Seglen, 1997; Saha et al., 2003; Rossner et al., 2007; Adler et al., 2009; Hernán, 2009; Vanclay, 2011; Brembs et al., 2013), the IF is an excellent and consistent descriptor of subjective journal hierarchy, i.e., the level of prestige scientists ascribe to the journals in their respective fields (Gordon, 1982; Saha et al., 2003; Yue et al., 2007; Sønderstrup-Andersen and Sønderstrup-Andersen, 2008). That a measure so flawed still conforms to the expectations of the customers expected to pay for it, is remarkable in its own right. Due to this consistency, the IF is used here as a measure for the subjective ranking of journals by the scientists using these journals: to what extent is this subjective notion of prestige warranted, based on the available evidence? Are prestigious journals really better at detecting the real breakthrough science in the sea of seemingly breakthrough science than average journals?

Retractions and Error Detection

If anything, one could tentatively interpret what scant data there are on retractions, as suggestive that increased scrutiny may only play a minor role in a combination of several factors leading to more retractions in higher ranking journals.

For instance, there are low ranking journals with high retraction rates (Fang et al., 2012), showing that the involved parties are motivated to retract articles even in low ranking journals. In fact, in absolute terms, most retracted articles come from low-ranking journals. This would be difficult to explain if low ranking journals were less willing to retract and/or scholars less motivated to pursue retractions from these journals. On the other hand, one can make the claim that the numbers would show even more retractions in low ranking journals, if the motivation and willingness to retract were equal. As neither willingness of journals to retract nor motivation in individuals to force a retraction can be quantified, all that the data can show is that it is not an all-or-nothing effect: there is both willingness and motivation to retract also for articles in lower ranking journals.

Another reason why scrutiny might be assumed to be higher in more prestigious journals is that readership is higher, leading to more potential for error detection. More eyes are more likely to detect potential errors. The consequence of this reasonable and plausible factor is difficult to test empirically. However, one could make a more easily testable, analogous claim, such as that one would also expect increased readership to lead to a higher potential not only for retractions but also for citations. More eyes are more likely to detect a finding worth citing. In fact, if anything, citations ought to correlate better with journal rank than retractions because citing an article in a leading journal is not only technically easier than forcing a retraction, it also benefits one’s own research by elevating the perceived importance of one’s own field. However, the opposite is the case: The coefficient of determination for citations with journal rank currently lies around 0.2, while that coefficient comes to lie at just under 0.8 for retractions and journal rank (Brembs et al., 2013). So while there may be a small effect of scrutiny/motivation, the evidence seems to suggest that it is a relatively minor effect, if there is one at all.

Taken together, there is currently no strong case to be made as to whether the likely increased scrutiny and readership of highly-ranked journals is a major factor driving retractions or not. If that were the case, it would indicate that the apparent increased unreliability in high-ranking journals is merely an artifact of the increased scrutiny to retract, combined with an increased willingness of these journals to correct the scientific record. At least two lines of inquiry did not turn up any conclusive evidence for such an argument. With such unclarified confounds in such a tiny section of the literature, it is straightforward to disregard retractions as extreme outliers and focus instead of the 99.95% of unretracted articles in order to estimate the reliability of highly ranked journals.

The Other 99.95% of the Literature

In the literature covering unretracted, peer-reviewed articles, one can identify at least eight lines of evidence suggesting that articles published in higher ranking journals are methodologically either not stronger or, indeed, weaker than those in lower ranking journals. In contrast, there is no evidence that articles published in higher ranking journals are methodologically stronger. Methodology here refers to several measures of experimental and statistical rigor with a potential bearing on subsequent replication or re-use. There is currently one article with evidence that higher ranking journals are better at detecting duplicated images (Bik et al., 2016).

In the following, I will quickly review the lines of evidence in the order of decreasing evidential strength.

Crystallographic Quality

The quality of computer models of molecular structures, derived from crystallographic work, can be quantified by a method which includes the deviations from known atomic distances and other factors (Brown and Ramaswamy, 2007). Averaging the quality metric for each journal, high-ranking journals such as Cell, Molecular Cell, Nature, EMBO Journal and Science publish significantly substandard structures (Figure 1, courtesy of Dr. Ramaswamy, methods in Brown and Ramaswamy, 2007). The molecular complexity or the difficulty of the crystallographic work cannot explain this finding, as these factors are incorporated in the computation of the quality metric.

Figure 1. Ranking journals according to crystallographic quality reveals high-ranking journals with the lowest quality work. The quality metric (y-axis) is computed as a deviation from perfect. Hence, lower values denote higher quality work. Each dot denotes a single structure. The quality metric was normalized to the sample average and journals ranked according to their mean quality. Asterisks denote significant difference from sample average. Figure courtesy of Dr. Ramaswamy, methods in Brown and Ramaswamy (2007).