From the inbox (Data is Plural 2019.08.28 edition):
Citations and self-citations. A team led by meta-research pioneer John Ioannidis has developed a dataset of citation metrics for science’s 100,000 most-cited authors. The dataset includes each author’s name, institutional affiliation, number of publications, total citations, “h-index,” and more. For each citation metric, there’s a second version that excludes self-citations. Related: “Hundreds of extreme self-citing scientists revealed in new database” (Nature).
So, of course, I went down that rabbit hole and haven't gotten much else done today. The Nature article begins like this:
The world’s most-cited researchers, according to newly released data, are a curiously eclectic bunch. Nobel laureates and eminent polymaths rub shoulders with less familiar names, such as Sundarapandian Vaidyanathan from Chennai in India. What leaps out about Vaidyanathan and hundreds of other researchers is that many of the citations to their work come from their own papers, or from those of their co-authors.
First, I wondered if there were any economists is these data. The answer is yes. Ioannidis et al. (2016) use Scopus and have the researchers categorized in 176 fields. These fields include 'agricultural economics & policy', 'econometrics', 'economics' and 'economic theory'. There are field ('name1') and subfield ('name2') columns in the data. In the Ioannidis et al. (2019) data there are n=949 scientists that include one of these key words in name1 and n=802 include one of these key words in name2. There are n=1401 scientists researchers in the full data (n=105,000 in the 2018 spreadsheet) who have economics in name1 or name2 (this is my broad definition of economist -- some are obviously not really economists but I've left them in the data). Of these 1401 economists, 350 have one of these key words in both name1 and name2.
The Tukey's Hinges on 'self-citation percentage' are 2.35% (25th percentile) and 8.06% (75th percentile). The top five self-citers have percentages of 28%, 28%, 35%, 46% and a whopping 79%. Here is the box and whisker plot (click the image for a larger version):
Here is another visual aid with 'self-citation percentage' on the vertical axis (click the image for a larger version):
Considering the rankings (from the 2019 article):
Here, we used Scopus data to compile a database of the 100,000 most-cited authors across all scientific fields based on their ranking of a composite indicator that considers six citation metrics (total citations; Hirsch h-index; coauthorship-adjusted Schreiber hm-index; number of citations to papers as single author; number of citations to papers as single or first author; and number of citations to papers as single, first, or last author)
The top 10 economists in these data are:
- Ostrom, Elinor
- Acemoglu, Daron
- Teece, David J.
- Fama, Eugene F.
- Heckman, James J.
- Porter, Michael E.
- Shleifer, Andrei
- Stiglitz, Joseph E.
- Camerer, Colin F.
- Folke, Carl
None of these folks have self-citation percentages outside of the Tukey's Hinges.
There is a 98% correlation between the economist rankings with and without self-citations within the sample of n=1401. But, it is still possible to increase your ranking with self-citations. For example, the economist with 79% self-citations is ranked 702nd including self-citations but last (1401st) once self-citations are taken out. The top 5 economists who have influenced their rankings with self-citations have moved up 334, 334, 427, 699 and 787 spots. Tukey's Hinges for the gain in ranking are -37 (25th percentile) and 18 (75th percentile). The 90th percentile is 74 and the 95th percentile is a 130 unit gain in rankings. Here is the histogram (click the image for a larger version):
A regression of the ranking excluding self-citations on rankings including self-citations has a slope of 0.98 (see the correlation above) with a standard error of 0.005. The 90% confidence interval does not include 1. The constant is 11 with a 90% confidence interval of 4.84, 15.7. This is the estimate of (something like) the mean of the gain in rankings from self-citations.
Here is the graph (click on the image for a larger version):
The variable on the vertical axis is the ranking without self-citations. The variable on the horizontal axis is the ranking with self-citations. All of the dots above the line are economists that have increased their ranking with self-citations. Of course, some of these increased rankings are trivial (especially considering that all self-citations aren't flagrant, e.g., when your n+1st paper builds on your nth paper). The standardized residual ranges from -1.64 to 10.9 so there are statistical outliers in the dots above the line. I count 27 cases with a standardized residual greater than 3. These economists might be my list of the most egregious self-citers. Their average rank without self-citations is 1189 (out of 1401). With self-citations their ranking is 880. Note that 8 of these 27 do not have a field of economics (economics, ag econ, metrics and theory) in name1 and only 8 have a field of economics in both name1 and name2. In other words, there may be more non-economists in this list of egregious self-citers than at first glance/a.
a/Some other day I might take a closer look at the sample.
References
Ioannidis, John P.A. ; Baas, Jeroen; Klavans, Richard; Boyack, Kevin (2019), “Supplementary data tables for "A standardized citation metrics author database annotated for scientific field" (PLoS Biology 2019)”, Mendeley Data, v1
http://dx.doi.org/10.17632/btchxktzyw.1
Ioannidis JPA, Baas J, Klavans R, Boyack KW (2019) A standardized citation metrics author database annotated for scientific field. PLOS Biology 17(8): e3000384.https://doi.org/10.1371/journal.pbio.3000384
Ioannidis JP, Klavans R, Boyack KW (2016) Multiple Citation Indicators and Their Composite across Scientific Disciplines. PLOS Biology 14(7): e1002501. https://doi.org/10.1371/journal.pbio.1002501