Brian Leiter criticizes the new Google Scholar Metrics, which uses h-index and various similar measures to assess journals. He writes that "since it doesn't control for frequency of publication, or the size of each volume, its results are worthless." Some of my friends on Facebook are wondering why he's saying this, so I'll try to offer a helpful toy example here.

Consider two journals: Philosophical Quality, which publishes 25 papers a year, all of which are good and well-cited; and the Journal of Universal Acceptance, which publishes 25 equally good and well-cited papers a year as well as 975 bad papers that nobody ever cites. Google gives both journals the same score along all its metrics. Since tacking extra uncited papers onto a journal doesn't affect the number of papers in it with at least that number of citations, JUA's additional bad papers make no impact on the h-index (or on Google's other measures defined in terms of h-index, like h-median or h-core). But if you're looking at someone's CV and they've got a paper in one of these journals, you should be more impressed by a Phil Quality paper than an JUA paper. The Phil Quality paper is likely to be good, while a JUA paper is likely to be bad. Still, Google will see them as equal.



Synthese and Phil Studies seem to be benefitting from this phenomenon, as they publish lots more papers than other journals and get higher rankings in the Google metrics. They're good journals! But they're definitely not #1 and #2, which is how Google has them. Meanwhile, the consensus #1 journal in Brian's polls, Philosophical Review, publishes only about 15 papers a year and ranks 17th on Google. (I'll admit that I'm kind of rooting for Phil Review, as my paper on the Humean Theory of Motivation came out there. And now it's part of their h-core! Hopefully that hasn't biased me into giving a bad argument.) 

This is what happens when you rank journals in the same way you rank the output of individuals. If two people have the same number of papers that get cited a lot, but one has a lot of other papers that nobody cares about, I wouldn't say the person with more papers has worse research output. (Depending on the situation, I might see that person as an equally good researcher with an additional weird hobby.) But if a journal accepts a lot of papers that nobody cares about, it makes publishing there less prestigious.

This isn't to say that journals should accept fewer papers — in fact, I think they should accept more, given that a lot of good philosophy is going unpublished these days for lack of journal space. There's plenty of good stuff out there that's taking forever to find a home. The lack of publications is holding back debates, and a serious backlog is developing. Phil Review could probably publish three times as many papers with no reduction in average quality. But my point is just that we shouldn't look at a measurement that's indifferent to low average quality, and use it as we would use an average-quality measurement.

I don't think Google Scholar Metrics is totally worthless, though. If you're comparing journals with the same number of papers, it could be helpful. Also, it might be useful if you care about something other than the average quality of a paper in a particular journal. Maybe you're a librarian and you're trying to figure out what a journal subscription is worth. Phil Quality isn't any better than JUA from that standpoint — either way, you get 25 good papers! So maybe Google Scholar Metrics would help librarians. And maybe someone can come up with a clever mathy fix for h-index that corrects for the effects of accepting more papers. (Update: I see that Kate Devitt is working on something like that.) But for figuring out how to score a job candidate's CV, Google hasn't yet given us anything to rely on. 

Posted in

27 responses to “Google Scholar Metrics Doesn’t Tell You The Average Quality Of Papers In Philosophy Journals”

  1. Rachel Avatar
    Rachel

    Or perhaps we should be less obsessed with ranking things like journals and papers?

    Like

  2. Mike Avatar

    It’s hard to see how the example isn’t question-begging. Let me modify it. Suppose journal Phil Quality accepts 25 mediocre papers that get cited all the time, while journal Acceptance accepts 25 mediocre papers that get cited all the time and 900 good papers that never get cited. Do we conclude that the journals are equally good? I don’t think so. You suggest that we need to make the index sensitive to acceptance rates to eliminate the chances that a mediocre journal with a high acceptance rate gets ranked too high. I say that we should not make the index sensitive to acceptance rates to eliminate the chances that a good journal with a high acceptance rate gets ranked too low. I frankly think that neither of the examples shows much about the merit of the google criteria.

    Like

  3. Moti Mizrahi Avatar

    What is the evidence for the claim that papers published in Phil Review are better (on average) than papers published in Phil Studies or Synthese? This is not a rhetorical question. I am wondering why we should accept the claim that the top spot is reserved for Phil Review and any ranking system that suggests otherwise (e.g., Google Scholar) must be rejected as faulty.

    Like

  4. rankings ranking Avatar
    rankings ranking

    Yes h index clearly does not tell you average quality of any one paper or predict the success of the next paper. So maybe people will not want to use it when judging a CV full of individual papers. However h does tell you the likelihood that a journal will have really good papers in it (here “good” is going to be stipulated in terms of citations over 5 years, I know, stick with me) and thus might be useful for other purposes.
    A lot of people don’t seem to like the h index because it doesn’t punish journals for lowly cited papers only rewards them for highly cited papers. If that’s your worry use impact factor. It’s doubtful whether the numbers in the toy example above actually reflect a real world example in philosophy in the google rankings. It’s also a really interesting empirical question just how much phil studies benefits from volume or how much h scores in general really are a function of volume. In any event, here’s another way to look at it. Suppose out of the 15 papers at phil review and 100 at phil studies per year, 10 and 50 are good, respectively in terms of level of citation. That’s a way better ratio for phil review, and many share the intuition that rankings should reflect that. On the other hand, while phil review was being really selective, phil studies gave our discipline 40 extra just as good (as stipulated) articles that year! In other words, maybe a small push away from that level of hyperselectivity could be a good thing and at the disciplinary level having multiple measures of journal ranking helps us to see this other perspective, too?

    Like

  5. Neil Sinhababu Avatar

    Rachel, even if we’re too obsessed with rankings, we shouldn’t have bad rankings.
    Mike, I don’t say anything about making the rankings sensitive to acceptance rates. If some journal accepts every paper because every paper is good, and all its papers get heavily cited, that’s great and it should be ranked highly. Also, Google is making a pretty strong assumption that paper quality is correlated with how much the papers get cited. If things typically go as they do in your example, Google’s rankings are in big trouble for a totally different reason.
    rankings ranking, I think I agree with most of your comment. If Google has the effect of muddling our sense for average paper quality, but getting the smaller journals to move up to accepting 50 papers a year instead of just 25, that might be good on balance. Hyperselectivity by journals is a problem.

    Like

  6. Mike Avatar

    Neil,
    While publishing papers that have citations makes a journal better (we can agree), publishing papers that have no citations does not make a journal worse. It just leaves it neutral. You seem to deny the latter.

    Like

  7. Neil Sinhababu Avatar

    Mike, throughout this post I’m granting Google its assumption that citations correlate with quality. If this is false, everything Google is doing collapses, because they’re running a citation-based system. (In fact I think this correlation is very rough and breaks down in many ways.) But making Google’s assumption, it will be true that publishing lots of papers that don’t get cited reduces the average quality of papers in a journal, since it means that the journal is publishing lots of bad papers.

    Like

  8. Neil Sinhababu Avatar

    Sorry that your comment got caught in the spam filter for a while, Moti.
    I’m just expressing my own judgment up there, but I do think the average Phil Review paper presents more novel views and interesting arguments (and engages with more literature) than the average Phil Studies or Synthese paper. Obviously there are lots of exceptions in both directions — there are a few Phil Review papers that I think shouldn’t have been published anywhere without massive revisions, but as far as I can recall I think that less frequently for Phil Review than for other journals.

    Like

  9. Neil Avatar
    Neil

    It would be extremely surprising if Phil Rev were not of a higher quality than Phil Stud. Here’s why: the first is widely held to be better, and these kinds of judgments tend to be self-fulfilling. Given these beliefs, one would expect authors to direct their best papers to PR and referees to apply higher standards.

    Like

  10. BLS Nelson Avatar

    Doubtful. If we need to adopt fine grained standards in order to whittle down the pile, that does not mean the standards we choose shall be ‘higher’. It just means we will choose some standard or other.

    Like

  11. Jonathan Birch Avatar

    In the real world, the Journal of Universal Acceptance would not have a high h-index. It would be a laughing stock, good people would not publish there, and no one would cite the papers it published.
    The point here is that Phil. Studies and Synthese must be doing something right to get such a high h-index. They are not just publishing everything they receive. They are publishing as much as they can without damaging their reputations as places in which philosophical work worth reading and citing is published, thereby maximizing their total impact on the field. That’s worth knowing isn’t it? It tells the editors they have got the trade-off more or less right, if maximizing the journal’s total impact is their goal.
    By contrast, Phil. Review is concerned with maintaining its prestige to such an extent that it publishes very little. The upshot is that the journal ends up having less impact, in total, on philosophical debates, if one takes the h-index to be a decent measure of total impact. And that too is worth knowing, isn’t it? It might even persuade the editors to reconsider their approach and publish a bit more, if they care about the journal’s total impact.
    In short, I find Google’s list quite refreshing. The whole point is to rank journals not by their unquantifiable aura of prestige but by their quantifiable total impact. I think that’s a worthwhile exercise, provided we don’t make the mistake of conflating total impact with prestige.

    Like

  12. Anon grad student Avatar
    Anon grad student

    If you switch to the physics tab in the Google rankings, you’ll notice the prevalence of arXiv.
    (1) this shows that the place in the ranking does not have to correlate with the quality of the journal (arXiv is an open archive, not a journal; there is no selection except for rough fit with the subcategory). And so (imagine evul administrator in his cubicle obsessing over the new big data corporate buzzword based metric for impact) suggests that the very use of the ranking to assess journals is problematic, at least in case of some of the fields with the healthy culture of preprints and information exchange.
    (2) Jonathan’s point about rankings capturing maximization of total impact is a good one: arXiv has extreme impact, and the Google ranking captures that.
    (3) pity there’s no philpapers or philsci archive on the list 😦

    Like

  13. Neil Avatar
    Neil

    So you referee by reference to some standard or other? Good to know. Remind me never to ask you to referee a paper.
    More seriously, I have often received comments along the following lines: While this paper is interesting, it is not good enough for journal like X. And indeed many top journals advise that in the light of their acceptance rates, referees should apply especially strict standards. So long as people are somewhat capable of following this advice – and I have no doubt that very many are – we will get the self-fulfilling prophecy.

    Like

  14. Michael Kremer Avatar
    Michael Kremer

    Suggestion: divide h5-index by the number of articles published over the same five year period. (Do not count book reviews; this makes this a less than totally mechanical process so I am not going to implement it now.) I wonder what the results would then be?

    Like

  15. philosopher Avatar
    philosopher

    Incidentally, if you look at the Google Scholar sub-category of journals in “Epistemology and Scientific History” you will see that there is an archive on the list. See below, Item 12. Whether there is an archive or not on the list is a function of the epistemic culture and practices in the sub-field or discipline.
    1. Synthese
    2. Noûs
    3. Philosophy of Science
    4. The British Journal for the Philosophy of Science
    5. Erkenntnis
    6. The Journal of Philosophy
    7. Journal of Philosophical Logic
    8. Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics
    9. The Review of Symbolic Logic
    10. Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences
    11. Studies in History and Philosophy of Science Part A
    12. arXiv History and Philosophy of Physics (physics.hist-ph)
    13. Foundations of Science

    Like

  16. Anon Avatar
    Anon

    I initially completely misread the example journal titles, interpreting “Journal of Universal Acceptance” as “Journal that everyone thinks is the best because everyone thinks it’s the best” and “Quality Journal” as “Journal that is truly good even though it’s not recognized as such.”
    But that’s just crazy. How could such journals possibly exist?

    Like

  17. CJ Avatar
    CJ

    I suppose you could say that a high h-index for, say, Nous is not good evidence that P(x is a good paper|x is published in Nous) is high, but it is good evidence that P(x is published in Nous|x is a good paper) is high compared to other journals. That is, knowing the h-index alone doesn’t tell you much about the average quality of the papers in Nous, but it might tell you that by not reading Nous you’d be missing out on a good proportion of the class of all papers worth reading.

    Like

  18. BLS Nelson Avatar

    The first-person plural was meant as a reference to the profession, which includes those who will claim to be appealing to higher standards. The worry is just that, at some point, you have to grasp at straws when the demographic conditions require it. There is no good reason to believe that the genuine good faith of everyone involved will mitigate that basic worry.

    Like

  19. Marcus Arvan Avatar

    Hi Neil: Thanks for your interesting post. In response to Moti’s comment, you write: “I’m just expressing my own judgment up there, but I do think the average Phil Review paper presents more novel views and interesting arguments (and engages with more literature) than the average Phil Studies or Synthese paper. Obviously there are lots of exceptions in both directions — there are a few Phil Review papers that I think shouldn’t have been published anywhere without massive revisions, but as far as I can recall I think that less frequently for Phil Review than for other journals.”
    But of course empirical psychology has shown such judgments to often be deeply infected by biases. For instance, the mere fact that you KNOW you are reading Phil Review might affect your judgments on paper quality, just as consumers’ beliefs on how expensive wine costs is known to affect their judgments of the wine’s quality.
    I say this because, although I can’t recall where I came across it, I seem to recall someone doing a blind-study in another discipline (I think it was psychology) where they sent anonymously papers that appeared in a top journal and papers from a lower-ranked journal to people in the field…and people, on average, didn’t rate the articles in the top journal as superior.
    I, at any rate, do not share your impressions. My feeling is that some top journals in philosophy (I won’t name which) tend to be far more conservative than lower-ranked journals, and not in a good way.

    Like

  20. philosopher Avatar
    philosopher

    Marcus,
    You probably mean this article:
    http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=6577844&fileId=S0140525X00011183
    It is important not to draw conclusions beyond what the study supports. The sample size is small. The study was conducting with psychology journals. Caution is in order.
    Indeed, it is worth reading the study with care.

    Like

  21. Kate Devitt Avatar

    Hi Anon,
    I have ranked leading philosophy of science journals (as picked by Leiter) and ranked them alongside regular philosophy journals using Google Scholar data here: http://mnemosynosis.livejournal.com/31341.html
    [If you’re interested, I combine Leiter and Google scholar data here: http://mnemosynosis.livejournal.com/31062.html

    Like

  22. Marcus Arvan Avatar

    Philosopher: Thanks for the link! Yes, of course, one must read the study with care. But my real point was that we know far too much about implicit biases–across a wide variety of cases–to just trust our impressions about things. Some people may have the impression that “top journals” tend to publish better stuff than lower ranked journals. Others (such as myself) may have the impression that they don’t. My full impression, for what it is worth, is that top-ranked journals publish great pieces of work more often than lower-ranked journals, but good pieces of work no more often (and perhaps less often) than lower-ranked journals. But again, these are just my impressions. My more general point is that impressions aren’t worth the paper (or blog posts) they’re printed on. More studies like the one you linked to would be a far better way to determine what the truth is. Someone in philosophy should carry some out!

    Like

  23. philosopher Avatar
    philosopher

    In my opinion, philosophers are generally not the best suited to carrying out such studies. They generally lack the training. Whatever study is done should meet the standards of peer review in bibliometrics (yes, there is such a field). There are people (even philosophers) who do this type of work. But it is also very time consuming. The particular design of the study I linked to is also ethically questionable. One is deceiving journals editors and referees who are very overworked.
    A junior faculty member, especially one without tenure, should focus on publishing philosophy in philosophy journals.

    Like

  24. Marcus Arvan Avatar

    Philosopher: all fair points!

    Like

  25. loopine Avatar
    loopine

    FYI the British Philosophical Association have published a position statement on the role of metrics in evaluating research (in the UK context). It’s here.

    Like

  26. Neil Sinhababu Avatar

    Regarding the journal titles, I was just chasing silly puns. Philosophical Quality sounds enough like Philosophical Quarterly, and Journal of Universal Acceptance is sort of the reverse of the famed Journal of Universal Rejection. http://www.universalrejection.org/
    Marcus, your points about bias are entirely reasonable. I don’t know what the right position is between total skepticism about assessments of journal quality, and full confidence in my immediate judgments.

    Like

  27. Philipp Blum Avatar

    I think it would be very helpful if philosophy journals made much more information on their acceptance rates and their submission statistics available. I don’t see any reason why they don’t. At dialectica, we have been doing this for 14 years:
    http://www.philosophie.ch/dialectica/dialectica_statistics.pdf
    Any comments, esp. on the very low rate of submissions by female philosophers, very welcome.

    Like

Leave a comment