Mo(o)re(—Sloan) fun with Academic Lineage

While we were at the Moore—Sloan Data Science Environments community build, Jevin West continued our work analyzing academic lineage with Myria.

On the citation graph we’re using, InfoMap identifies 5,292 unique paper clusters. For each cluster, we took the top paper rated by Eigenfactor and fed these 5,292 papers as the seed set for the Least Common Ancestor (LCA) query which we ran on Myria. The query itself took a little over 4 hours, but it finished successfully.

Of the nearly 14 million pairs of papers, 7.1M of them have a common ancestor—a hit rate of about 50%. Among these 7.1M LCAs, here are the ten most frequent papers (and their frequencies):

  1. (47,129) Some Methods for Strengthening the Common 𝝌2 Tests (Cochran, 1954)
  2. (35,585) The Evolution of Reciprocal Altruism (Trivers, 1971)
  3. (34,195) On the Mathematical Foundations of Theoretical Statistics (Fisher, 1922)
  4. (34,093) The Tragedy of the Commons (Hardin, 1968)
  5. (32,067) Some Difficulties of the Determination Problem (Harrison, 1933)
  6. (29,458) Diverse Doctrines of Evolution, Their Relation to the Practice of Science and of Life (Jennings, 1927)
  7. (28,149) An Analysis of Transformations (Box, 1964)
  8. (26,000) Fitting the Negative Binomial Distribution to Biological Data (Bliss, 1953)
  9. (25,410) A Method for Cluster Analysis (Edwards, 1965)
  10. (24,611) A Theory of the Allocation of Time (Becker, 1965)

We hope to dig into this more next week, but looking at the results is already pretty fascinating:

  • These papers are generally highly cited, but (paraphrasing Jevin) may not be currently recognized as the very top according to standard citation metrics.
  • The time range is pretty interesting — early to mid 20th century all around.
  • Mathematics and statistics seems to dominate this part of the list. We do see some more fundamental life science papers nearby, the first of which is The Gene (Goldschmidt, 1928) at #17.
  • The top hit represents 0.66%, or 1 in 151 of the results, and the 10th hit represents 0.34% or 1 in 289 of the results.
  • And, of course, I have not directly come across any of these papers in my work, but I should go read them!

What are your thoughts: Do you recognize these papers? Have you read them? What else should we think about?

Comments !

blogroll

social