by Dr. Tahir Yaqoob on May 4, 2012

In this post I’ll talk about some interesting aspects of exoplanet research paper statistics and sociology. There is a nice tool on the physics/astrophysics preprint server http://arxiv.org that allows you to search the database of research papers for a keyword or phrase. It’s very basic and there is no documentation. I mean none. It’s not even obvious whether the tool searches the title of a paper only or the entire text. As it happens, it is the entire text, although it would be useful if there was an option to search titles only. Nevertheless, we can do some interesting things with it because searching the whole text gives you an idea of how important the subject of the search phrase is to the astronomical community in general, and therefore, presumably to astrophysics (with the human “bandwagon” effect convolved into it). Actually it’s not that simple because once a topic is recognized to be important in the long term funding is skewed towards it which spawns more research on that topic. True, funding is affected by peer review but at the top level (e.g., an actual mission) the politics are much more complicated. Anyway, let’s say that what we are look for here are trends more than absolutes, which are easier to interpret and actually potentially more interesting. As it happens, when using the search tool I was suprised to learn something that I hadn’t thought of.

But first, let’s look at a couple of controls. The graph below is what I got when I typed in the word telescope.

Frequency of mentions of the word telescope in research papers on arXiv

Percentage of research papers in astrophysics on arXiv that mention the word telescope

For one thing, this proves that the tool is not just searching the title because clearly the word telescope does not appear in the title of more than 40% of papers. (Note: this and all other search results were obtained for the category Astrophysics; the other categories are various branches of physics.) In this and all other graphs, the percentage values before 1997 or so should be ignored because it took some time for people to routinely submit their preprints after the arXiv was first introduced. The telescope curve shows a slight rise that is greater than the fluctuations by a few percent but I’m not sure if we can infer anything significant from that.

The next graph shows another control, what I got when I entered the word detector. Clearly flat, with any variation less than the fluctuations. Actually, I’m struck by the fluctuations being so small. It seems to be a robust statement to say that about 20% of the papers mention the word detector. I also infer that the percentage of observational papers as opposed to theory papers is not less than 20%. Theory papers almost never mention the word detector, but they do sometimes use the word telescope. Many observational papers don’t mention the word detector or telescope. So it’s hard to say what the ratio of observational to theory papers is.

Frequency of mentions of the word detector in research papers on arXiv

Percentage of research papers in astrophysics on arXiv that mention the word detector

So now, moving on to exoplanets. I typed in the phrases exoplanets and extrasolar planets for plotting on the same graph. This does not capture all papers on exoplanets but at least it would give a lower bound. The graph below shows the results.

Which is popular or correct - exoplanets or extrasolar planets?

Percentage of research papers in astrophysics that mention exoplanets and extrasolar planets

As expected, the share of astrophysics papers mentioning exoplanets or extrasolar planets has risen continuously from almost nil to nearly 8% (the eyeball sum of the red and blue curves at 2011-2012) which in itself is remarkable.

However, there is something else fascinating about the curves, and it’s sociological. I have always wondered about the relative frequencies of the use of the terms exoplanet versus extrasolar planet and the curves give some interesting insights. Before 2002 extrasolar planet was overwhelmingly popular. Between 2002 and 2009 the term exoplanet rose in popularity and the two curves meet in 2009. Between 2009 and now (May 2012) exoplanet has been more popular and it’s popularity is still increasing, the ratio now being nearly 2:1. The Kepler mission was launched in March 2009 and I don’t think it is a coincidence that this was the turning point for exoplanet to become more popular, although I’m not sure what it means exactly. Were people working on Kepler science already preferring exoplanet and the dispersion of results and secondary effects in referring to those results in other papers and the media had an roll-on effect? But the gap was already closing between 2002 and 2009 before Kepler. On the other hand the extrasolar planet curve appears to have flattened out since 2009 but the fluctuations are large. Another piece of relevant data: the Google Keyword Tool shows that the terms exoplanets and extrasolar planets appear in search phrases 18,100 and 14,800 times per month (globally). There’s not much in the difference. But the ratio is a lot less than 2 which is approximately what the research graphs show. So perhaps this reflects a difference in usage between the general public and the scientific community.

As another control, I typed in the word quasar and the resulting graph is shown below.

Percentage of research papers in astrophysics on arXiv that mention the word quasar

Percentage of research papers in astrophysics on arXiv that mention the word quasar

Now, quasars were discovered 50 years ago and they were all the rage in astrophysics in the 1960s and 1970s. The graph shows that between 2000 and 2012 there was a drop in mentions by about 40%. While we have certainly learned a lot about quasars in the last 12 years, there is still a long way to go to understand them. Having been personally involved with quasar research for over 25 years, I had a hunch that there has been significant exodus on quasar research because people either think they are boring or the problems have been solved. The graph confirms my suspicions. It is not a coincidence that the drop started in 2000. By that time dark energy (discovered in 1998) was all the rage and people were lost to that cause. If you want to amuse yourself further, go to the culturomics tool and type in dark matter and dark energy, displaying both on the same graph. There is some fascinating sociology behind the curves that you will see.

