Benoit and Marsh on excellence (or not)

In a paper just published in the ESR, Benoit and Marsh confirm that research excellence is measurable — even for political scientists, some of whom argue that reality is constructed. They show that research quality varies considerably. Should research budgets be cut, there is now a basis for cuts that minimise damage to quality.

Some of you will want to bitch that Benoit and Marsh feature as the numbers 1 and 3 on their own ranking. This is nonsense. The correlation between the various indices is high. The same people are top regardless of the quality measure used, and people-in-the-know already roughly knew who would do well. This exercise primarily serves the community — and the authors invested time that they could have used to publish in a more prestigious journal.

10 replies on “Benoit and Marsh on excellence (or not)”

This is a very interesting paper. No matter what one may think of such attempts to use impact as measured by bibliometric data as a proxy for research quality, such practices are likely to have increasing significance. The paper enhances understanding of the strengths and limitations of such methods.
Though noting that politics is not the only discipline where the messenger is highly ranked, I agree with Richard that the transparency of the methodology means that any bias in the approach would be clear to the reader.
Arguably a more substantial point is the organization of the ranking around political science departments. This has the effect of including staff from other disciplines (eg economics) who happen to be in political science departments and excluding those whose work straddles the concerns of political science but are located in, for example, schools of business, social policy or law. The difficulty of adopting any other mechanism for defining scope is, that in contrast with economics, there is little agreement on what are the core journals publishing in which defines you as a member of the discipline.
As to Richard’s suggestion that the outcomes of this exercise could be used to make resource allocation decisions (like the research selectivity exercises in the UK) I would question whether there would be sufficient confidence in it. Whilst it might approximate quite well to what knowledgeable people might think, there remain issues of confidence in bibliometric data. There is, for example a bias towards high activity sub-fields in generating citations, and the risk that excellent research in low citation fields might be regarded as of poor quality. We are not yet at a point where the quality of unviversity activity is measured exclusively by impact. Though the UK Research Excellence Framework is developing in such a way as to place greater emphasis on impact, quality (assessed through peer review) appears likely to retain a central place in their measurements.

Excellent points, but I’d rather see a discriminate cut on the basis of imperfect data than an indiscrimate cut. We’re heading for the latter one.

It is indeed hard to compare output and impact across different disciplines, fields, and subfields. We should resist, however, the tendency of the lazy and incompetent to define their “specialisation” as so unique that it is incomparable and therefore excellent. If you’re doing research that is appreciated by five people on the planet, and cited by only two of them, then perhaps one can rightfully doubt your contribution to the greater good.

I’m reminded of Amartya Sen’s take on inequality in debates like this. Sen argued that looking for a complete ordering of income distributions on the basis of how much inequality they contain was too demanding. Inequality is multidimensional and so the relation “is more unequal than” should be regarded as incomplete.

Sen proposed an “intersection quasi-ordering” as a way of dealing with this. Let there be a set of quasi-orderings, indexed 1 to m, each of which is complete. Each quasi-ordering in this set is a “reasonable” representation of the true “is more unequal than relation”. Sen then proposed that we take the intersection of this set of quasi-orderings. The outcome is likely to be an incomplete quasi-ordering. (Essentially this is a Pareto-like construction). Sen called this an intersection quasi-ordering.

Although completeness is given up, the procedure identifies the uncontroversial pairwise comparisons and is silent about the rest. It is transitive, of course.

For academics, the index set could be the usual suspects: citations, publications in leading journals, h-index etc.

To put the same point differently; although some economic historians are better than some theorists, does that mean that every historian can be compared to every theorist? Pockets of comparability does not imply universal comparability.

Fair enough. It is hard to compare economic historians to game theorists. But it is easy to notice that economic historians in Ireland make a bigger international splash than game theorists in Ireland.


I have a variety of criticisms of the utility of this approach, and the approach itself.

1 The stated objective is the identification of research to be kept/culled under budgetary constraints.
The authors leap nimbly to the conclusion that individual authors should secure or lose funding, rather than faculties, fields of research, schools of thought, subjects, etc. There is no discussion of how such an approach would compare to some metric based on social utility, economic benefit, predictive power (of theories), etc. of the subject/school/faculty.
So they recommend selection of researcher, irrespective of utility, benefit, or predictive power of the work.

2 The approach in identifying ‘productive’ authors again makes no reference to the usefulness of the work; e.g. a series of oft-referenced reviews-of-the-literature – style papers might outperform some original work.
Further, Garfield himself warned that the h-index was intended for libraries’ selection of journals with the most impact, and that it’s use in evaluating individuals was only when faced with large numbers of people to examine in a short space of time, as in this respect it is quite a rough guide.

3 The authors, surprisingly, completely fail to mention alternative ranking algorithms, and the pros and cons of the H-index and other systems (g-index, advogato, pagerank,erdos, …).
Criticisms of the h-index :
* Michael Nielsen :”…the h-index contains little information beyond the total number of citations, and is not properly regarded as a new measure of impact at all”.
* is bounded by the total number of publications.
* does not consider the context of citations.
* does not account for confounding factors such as “gratuitous authorship”
* is a natural number and thus lacks discriminatory power.
* does not account for singular successful publications.
* is affected by the accuracy of the citation data bases from which it is computed.
* does not take into account the presence of self-citations.
* does not account for the number of authors of a paper.

4 Omitted from the paper was a discussion of the suitability of Google Scholar (thier preferred choice for finding citations), as, among other issues, the journals used by GS, and the frequency of updates, are not published presumably for commercial competition reasons; also note that access to journals by GS search robots depend on online availability and access which in turn are manipulable by various network administrators.

5 The proposed method relies blindly and uncritically on peer review, which has a number of criticisms:

“While passing the peer-review process is often considered in the scientific community to be a certification of validity, it is not without its problems. Drummond Rennie, deputy editor of Journal of the American Medical Association is an organizer of the International Congress on Peer Review and Biomedical Publication, which has been held every four years since 1986. He remarks, “There seems to be no study too fragmented, no hypothesis too trivial, no literature too biased or too egotistical, no design too warped, no methodology too bungled, no presentation of results too inaccurate, too obscure, and too contradictory, no analysis too self-serving, no argument too circular, no conclusions too trifling or too unjustified, and no grammar and syntax too offensive for a paper to end up in print.”

Richard Horton, editor of the British medical journal The Lancet, has said that “The mistake, of course, is to have thought that peer review was any more than a crude means of discovering the acceptability — not the validity — of a new finding. Editors and scientists alike insist on the pivotal importance of peer review. We portray peer review to the public as a quasi-sacred process that helps to make science our most objective truth teller. But we know that the system of peer review is biased, unjust, unaccountable, incomplete, easily fixed, often insulting, usually ignorant, occasionally foolish, and frequently wrong.”

6 Not addressed in the paper is the problem of “confirmation bias”, where authers seek out and cite only work which supports thier own views, however malformed.

(a) Bruno S. Frey, CESifo, Zurich, March 2009
“Academic economists today are caught in a “Publication Impossibility Theorem System” or PITS. To further their careers, they are required to publish in A-journals, but this is impossible for the vast majority because there are few slots open in such journals. Such academic competition maybe useful to generate hard work; however, there may be serious negative consequences: the wrong output may be produced in an inefficient way, the wrong people may be selected..”

(b) Confirmational Response Bias and the Quality of the Editorial Processes Among American Social Work Journals
William M. Epstein
University of Nevada, Las Vegas
Research on Social Work Practice, Vol. 14, No. 6, 450-458 (2004)

Objective: To experimentally test for confirmational response bias among social work journals and to assess the time-liness and quality of the referee review process. Method: A positive and a negative version of two stimulus articles were sent to two randomized groups of 31 social work journals; journals were stratified by prestige; the timeliness of journal responses were recorded; four judges rated the quality of referee reviews against a high-quality referee review from a prestigious clinical psychology journal. Results: The differences in acceptance rates between positive and negative versions of the stimulus articles were significant in one case and not significant in the other. Combining the results of this experiment with a previous experiment produced significant results overall; the quality of 73.5% of the referee reviews were inadequate. Conclusion: There are substantial problems of bias, timeliness, and quality in the editorial decisions and review processes of social work journals.

(c) How we know—and sometimes misjudge—what others know: Imputing one’s own knowledge to others.
Nickerson, Raymond S.
Psychological Bulletin. Vol 125(6), Nov 1999, 737-759.

To communicate effectively, people must have a reasonably accurate idea about what specific other people know. An obvious starting point for building a model of what another knows is what one oneself knows, or thinks one knows. This article reviews evidence that people impute their own knowledge to others and that, although this serves them well in general, they often do so uncritically, with the result of erroneously assuming that other people have the same knowledge.

7 Lastly, I found a paper that discusses the gamesmanship and cliqueishness in the symbiosis between citation rankings, peer review, journals, and academic politics:

Frederic S. Lee
Australasian Journal of Economics Education; Vol3#1, 2006,%20%20Numbers%201%20and%202,%202006/5%20Lee%20-%20THE%20RANKING%20GAME,%20CLASS,%20AND%20SCHOLARSHIP%20IN%20AMERICAN%20MAINSTREAM%20ECONOMICS.pdf


Together, the journal and department ranking studies establish that
top departments publish in quality economic journals and quality journals
publish economists from the top departments. This symbiotic relationship
existed in the 1950s (Clearly and Edwards, 1960; and Yotopoulos, 1961)
and, as noted above, in the 1960s. Moreover, it has replicated itself to the
present day as economists in the top twenty-four departments repeatedly
directed their efforts to publish in those quality journals and end up
contributing over fifty percent of the articles and pages, although this
significantly under-estimates their dominance of the top journals. For the
period 1974 to 1994, the top 15 departments contributed 40% of the pages
to the top journals, while the top 24 departments contributed 51%.
Moreover, the degree of concentration of the top 24 departments in the top
50 departments for pages produced in top journals for the period 1971 to
1983 is 72%. In addition, for the period 1985 to 1990, the top 15
departments contributed nearly 75% of the total pages contributed by
American economic departments to the American Economic Review, Econometrica,
Economic Journal, Journal of Political Economy, and
Quarterly Journal of Economics. Finally, from 1977 to 1997, the top 24
departments contributed more than 51% of the American-based authors to
the top fifteen journals; and in 1995 the top 24 departments contributed
more than 54% of the American-based authors in the top fifteen journals
and 51% to the top thirty journals.20 [Graves, Marchand, and Thompson,
1982; Hirsch, Randall, Brooks, and More, 1984; Laband, 1985a; Bairam,
1994; Scott and Mitias, 1996; Hodgson and Rothman, 1999; and Kocher
and Sutter, 2001]

Adopting the publishing values of their professors, Ph.D. graduates
of the top twenty-four departments contributed more than 70% of the
articles and pages in the top journals. From 1970 to 1979, more than 70%
of the pages in the American Economic Review, Journal of Political
Economy, and Quarterly Journal of Economics were contributed by
graduates of the top 24 departments; from 1975 to 1984, 84% of the articles
and pages in the top journals contributed by authors who earned Ph.D.s
from eighty American doctoral programs from 1975 to 1984 came from
graduates of the top 24 departments; from 1977 to 1997 more than 67% of
all authors and 83% of the authors with American Ph.Ds. in the top fifteen
journals graduated from the top 24 departments; and in 1995 70% of the
authors in the top thirty journals with American Ph.Ds., came from the top
fifteen departments. [Hogan, 1986; Laband, 1986; Hodgson and Rothman,
1999; Collins, Cox, and Stango, 2000; and Kocher and Sutter, 2001; also
see Cox and Chung, 1991]

Since the top departments have historically employed each others’
graduates and exported their huge surplus to lower ranking departments
(while importing very few of their graduates), their faculty are relatively
homogeneous in terms of their graduate training, the graduate training they
offer, and their publishing expectations. In addition, the lower ranking
departments have increasingly become clones of the top ranked
departments. Finally, over 40% of the editors of the top journals obtained
their Ph.Ds. from the top twenty-four departments while 43% of the editors
resided in them, which implies that nearly all of the departments have an
editor from a top journal (Yoels, 1974; and Hodgson and Rothman, 1999).

With the symbiotic relationship between quality journals and top
departments combined with the homogeneity of graduate training and
publishing expectations and the dispersion of journal editors across the top
departments, economists in these departments have all the right social
characteristics to be successful—they have the right training, employment
location, and social connections. In short, the top departments and their
faculty form, it would seem, a class with distinct social characteristics that
ensure them access to publishing in the top journals, the control of the
journals themselves, and the prestige to have their work be taken more
seriously than others not of their class. This is evident in the case of
articles written by economists not affiliated with top ranked departments
that appear in top ranked journals in that they tend to receive fewer
citations than economists affiliated with top departments whose articles
appear in the same journals. Hence department (or academic-class)
affiliation affects perceptions of the significance of research rather than the
research itself. [Oromaner, 1983]

Having said all that, though, the paper is probably dumbed-down enough to both impress and convince the slack-jawed knuckle-dragging buffoons that appear to be in charge of the country.

If you had done your homework properly, then you would have discovered that although everything you say is true, nothing of what you say would substantially affect the ranking.

Many people have raised the same objections as you do. Some people have tried to correct the standard ranking methods — and they invariably found that this increases the effort but does not affect the result.

Imperfect information beats no information.


my point was that any simplistic ranking will have had all interesting information removed from it;
and that any really useful ranking will need much information contained outside of the paper per se.

Yes, a ‘h’ is a rough and ready guide, BUT is highly subject to being gamed (web “search-rank-optimisations”, based in ranking similar to ‘h’) and doesn’t guide funding triage in a really useful way.

Ferdinand Braudel opines that the existence of universities in europe made the real difference between the evolving polity of early medieval europe and the caliphates of the same era, and that these institutions permitted the concentration of learning that enabled most that came after feudalism.

Allowing funding of research do bedecided by ‘KPI’, rather than people who actually know what they’re doing, is about as clever as fireworks in the alexandrian library, IMO.

Anything can be gamed, but citations and thus the h-index are hard. This is my experience as someone who tries to game everything. Besides, there are a number of studies that explicitly tested (and rejected) your hypothesis. Just browse through recent issues of Scientometrics.

Your reference to Medieval Europe is neither here nor there. Universities have a completely different role in society in 2009 than in 1969, let alone in 1088. Criteria that may have been suitable then are not now.

Opinions like “we can’t measure quality” are typically used to hide and protect low quality.

If you don’t like the work of B&M, it is your duty as an academic to improve upon it, rather than bitch about it.

Although this post is off the topic, it ias

This is the saddest peice of academic navel gazing that I have come across in a long time.

Here we are in a state – the governance of which leaves a lot to be desired. Yet two political scientists spend their time assessing how to assess their own work.

It reinforces the view I formed during the Irish Political Reform conference in TCD in June – the political scientists here have a lot to do to demonstrate the kind of relevance to the rest of us that the economists are doing. The striking thing about the economists is that they did the same work during the 1980s and almost certainly during the 1950s too.

I will gladly change my view if anyone – political scientist or otherwise – can show me a consistent body of work, by political scientists working in our 3rd level institutions/research institutes, on how to enhance our way of governing ourselves through the creation of checks and balances on how power is exercised in this state.

IMO, bad management, inefficiency and corruption thrive where there is secrecy. All three are material to the functioning of the economy in Ireland.

If anyone doubts me, check out the response of the political science community to the restrictions on Freedom of Information brought in after the 2002 election and reinforced at every possible step since then.

Comments are closed.