MESUR publications

Weighted Betweenness, normalized

1  0.035SCIENCE

2  0.032NATURE

3  0.020PNAS

4  0.017LNCS

5  0.006LANCET

Weighted Closeness, normalized

1  0.670SCIENCE

2  0.665NATURE

3  0.644PNAS

4  0.591LNCS

5  0.587BIOCHEM BIOPH RES CO

Two things to note: first, the “alternative” network metrics such as PageRank, closeness and betweenness centrality do pretty well. Just eyeballing their rankings it is easy to see that they may even do a better job at identifying highly popular and prestigious journals than the impact factor, e.g. Science and Nature. Second, the usage metrics do an excellent job of ranking journals according to their popularity or prestige as well. In fact, the results aren’t all that different from the citation metrics. Of course, this will always tend to be true for the top 5 journals. The interesting differences will be found in the medium to lower rankings.


Therefore, rather than eyeballing the top rankings, we can calculate the similarities between the rankings produced by a pair of metrics in terms of rank-order correlation coefficients. Here’s an example. The
graph on the right shows the scatterplot of journal’s Impact Factor and PageRank values. The rank-order correlation is moderately positive (0.609) which means that a journals’ Impact Factor (x-axis) and PageRank values (y-axis) are correlated (one goes up when the other goes up and vice versa), but there are nevertheless significant differences. We can calculate such correlations for the rankings produced by each pair of metrics.


We calculated only 47 metrics in total; 23 for the citation graph, 23 for the usage graph, and the Impact Factor. So calculating correlation coefficients for each pair will lead to  a matrix of 47 x 47 correlations (actually, 47 x 47 - 47 / 2 because they are symmetric). This matrix provides a full picture of how the rankings produced by all our citation and usage metrics relate to each other. It is sufficient information to produce a rough map like I discussed above. The map will layout the positions of each metric so that the spatial distance on the map respect the calculated correlations. Therefore metrics that express a similar aspect of “impact” will be clustered in the map, whereas those that express differing aspects of “impact” will be further apart.


The actual mathematical technique to do this is called “principal component analysis” (PCA). PCA attempts to determine a set of underlying components that best explain the variations in the similarities and dissimilarities among a set of items. The components are ranked according to how well they explain the variation in the item similarities, so when we select the 2 top ranked (hence “principal”) components we have a 2D model to most accurately maps the items according to their similarities. The result is the map shown below.
The x-axis is given by the first component, i.e. the one that explains the highest amount of variance in the metric correlations. The y-axis is given by the second component, the one that explains the second highest amount of variance. As expected, the x-axis splits the metrics results nicely into the usage (left) and citation metrics (right); it’s the most distinctive separation between the sets of metrics. The y-axis is a little more complicated because it corresponds to a secondary source of variation. The citation metrics split into three main groups. From the top: closeness, degree (with the Impact Factor) and betweenness.  The latter leads to results close to Pagerank which is not all that surprising if you think about their definitions. The Impact Factor sits among the degree metrics which is also not surprising since it amounts to a normalized in-degree. The usage metrics are much less separated and seem to cluster rather strongly. Still we find a similar vertical distribution. From the top, degree and closeness, followed by PageRank and Betweenness.


The most distinctive feature of the map

 

PageRank

Betweenness

Impact Factor

Closeness

Usage

Citation

Degree

Closeness

Pagerank

Betweenness

Degree

Articles:

* Johan Bollen, Marko A. Rodriguez and Herbert Van de Sompel. MESUR: usage-based metrics of scholarly impact (poster). In Proceedings of the Joint Conference on Digital Libraries, Vancouver, June 2007.

* Johan Bollen,  Marko A. Rodriguez, Herbert Van de Sompel, Lyudmilla Balakireva, and Aric Hagberg, The Largest Scholarly Semantic Network...Ever (poster). In Proceedings of the 16th International World Wide Web conference, May 2007.

  1. *Marko A. Rodriguez, Johan Bollen and Herbert Van de Sompel. A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and their Usage, In Proceedings of the Joint Conference on Digital Libraries, Vancouver, June 2007.

  2. *Johan Bollen and Herbert Van de Sompel. Usage Impact Factor: the effects of sample characteristics on  usage-based impact metrics. Journal of the American Society for Information Science and technology, 59(1), pages 001-014 (cs.DL/0610154).


Lectures and slides:

  1. *November 5-9, 2008 - IPAM workshop on Social Data Mining and Knowledge Building. Organizing Committee en lecture. Johan Bollen.

  2. *November 8, 2007- UCLA Information Studies Colloquium Series. “Scholarly Assessment from Usage Data: A New Perspective on Impact”. Johan Bollen

  3. *November 1-2, 2007 - NISO Workshop on “Understanding the Data Around Us: Gathering and Analyzing Usage Data”,Dallas, Texas. Plenary lecture and lecture on metrics for scholarly evaluation. Johan Bollen

  4. *June 21-22, 2007 - 2007 ICSTI Public Conference, Nancy, France. “MESUR: Assessing scholarly status from usage data”.  Johan Bollen.

  5. *June, 2007 - LANL Research Library public lecture. “An RDF/RDFS/OWL tutorial”. Marko A. Rodriguez. http://www.soe.ucsc.edu/~okram/papers/talks/rdfrdfsowl.pdf

  6. *June 18, 2007 - Joint Conference on Digital Libraries, Vancouver, Canada. “ MESUR: usage-based metrics of scholarly impact”, Johan Bollen, Marko A. Rodriguez and Herbert Van de Sompel. http://mesur.lanl.gov/JCDL07poster_bollen.pdf

  7. *June 6th, 2007 - Society of Scholarly Publishing, 29th meeting, San Francisco: Imagining the future: scholarly communication 2.0. Johan Bollen.

  8. *May 11th, 2007 - World Wide Web Conference 2007, Banff, Canada. Poster: The Largest Scholarly Semantic Network... Ever. (http://www.mesur.org/WWW07_jbollen.pdf)” Johan Bollen, Marko A. Rodriguez, Herbert Van de Sompel, Lyudmilla Balakireva, Wenzhong Zhao and Aric Hagberg.

  9. *April 18th, 2007 - CERN Workshop on Innovations in Scholarly Communication (OAI5), Geneva. “MESUR: metrics from scholarly usage of resources.” Johan Bollen

  10. *March 30th, 2007 - Santa Fe Institute Seminar, Santa Fe. “MESUR:  Modeling and Analysis of the Scholarly Community”. Johan Bollen

  11. *March 27th, 2007 - ACS National Meeting, Chicago. “The Evolving Network of Scientific Communication. “Modeling the scholarly community from usage data”. Johan Bollen

  12. *March 13th, 2007 - EUSIDIC Annual Conference 2007, Roskilde University, Denmark. “The MESUR project: semantic networks for scholarly assessment.” Johan Bollen.