Understanding the outliers in any endeavor is imperative because any general explanation of something (a distribution, a correlation, etc.) must absolutely be able account for the outliers. The outliers can make or break a general interpretation, explanation, or theory. Outliers potentially are more interesting than the rest of the group because there may be new and/or poorly understood phenomena at work, and the challenge of understanding how a general interpretation needs to be modified can be interesting.

Exoplanet properties have their fair share of outliers and here I will focus on just one property, the average density. This is a critical parameter because it is often the *only* physical parameter that will currently give any handle or constraints on the composition of an exoplanet. The average density is not even available for about two-thirds of all confirmed exoplanets because you need both a mass and a radius. Now, if we naively look at a plot of density versus radius, we might get the impression that there are some very extreme exoplanets lurking in there. Such a plot is shown below, which is from a snapshot when the confirmed exoplanet population was at 777/623/105 [= total confirmed planets/number of planetary systems/number of multiplanet systems]. Out of 777 only 241 have a density measurement. (By the way, in case you haven’t checked recently, the total confirmed population shot up from 777 to 823 in early September 2012). On the other hand, outliers demand more rigorous scrutiny and sanity-checking because you have to be extra careful that there hasn’t been a mistake or mistakes made.

The exoplanet densities and sizes are given in terms of the corresponding values for Earth. We see that one of the objects that sticks out prominently is the one with the highest density, Kepler-25b. To show just how much of an outlier it is, the same plot is shown again below, this time with a linear density scale.

The apparent density of Kepler-25b, 251 times that of Earth, would be problematic if it were real because such high-density matter would need an unusual mechanism to produce and maintain it. Indeed the very identity of Kepler-25b as a planet would be questionable. Also, the surface gravity would be so high (655 times that of Earth) that a person would weigh hundreds of tons on the planet. Combined with the fact that masses of exoplanets are usually *lower limits*, that would imply that the density shown is also a lower limit and could be even higher. Indeed the Kepler-25b page in the Extrasolar Planets Encyclopedia (EPE) says that the *minimum mass* of Kepler-25b is 12.7 times that of Jupiter (greater than 4000 times the mass of the Earth). The size of Kepler-25b is about 2.5 times that of Earth so the planet would be an extreme outlier in density. There is only one research paper cited for the planet and upon investigation we find that the mass listing in EPE is wrong: the number is *not* a lower limit, it is an *upper limit*. Why? Because the Kepler-25b mass is not estimated in the usual way because there are no Doppler follow-up measurements to the Kepler transit results. Rather, the mass is based on a calculation which asks how large could the mass be before the planet’s orbit becomes unstable. So, the mass of Kepler-25b could be way smaller, even as small as Earth. So it is not really likely to be an outlier! At least for the moment it should not be regarded as such.

Moreover, it is not just Kepler-25b that appears at face-value to be a problem: looking at the first plot shown earlier, anything more than a couple of times the density of Earth should be scrutinized and questioned. Upon further investigation we find that ALL of the points in the plot above the green horizontal line are Kepler planets with no mass-measurement follow-ups, so they could all come down in density. *By itself, Kepler cannot estimate exoplanet masses*. So for the moment everything above the green line and possibly some more planets below it are not worrying in terms of extreme high densities.

Now, what about the extremes to the left of the diagram? There are a handful of planets that appear to be smaller in size than Earth and between about 1 to 10 times the density of Earth. If their shown parameters are real they would be extremely *interesting* objects. Let’s take the most extreme one, the smallest planet, which turns out to be another Kepler object (KOI 55: KOI = Kepler Object of Interest). Upon further investigation we find that in fact there is no radius measurement: the radius is calculated based on many assumptions which may not be correct. The paper by Charpinet et al. (2011) describing these calculations says that the radius is “essentially unknown”. But the EPE happily shows a value of 0.85 times that of Earth without any shown uncertainty, or any indication that it is, for all practical purposes, just a random value selected from the wide range of possibilities. Obviously this could be misleading to the unsuspecting browser. In fact, everything to the left of the orange vertical line could be moved to the right of the line for similar reasons so at the moment we should not be worried about these apparent outliers.

Most of the points in the diagram, the ones forming the concentrated cluster towards the bottom-right, are the low-density hot Jupiters, but the group does have a tail extending to potentially worrying tiny densities, a couple of percent of the density of Jupiter. However, since these densities *are* calculated from *lower mass limits* the points at the lower extreme end can move up. Aside from the fact that the densities of some of the hot Jupiters in the main cluster of points are difficult to explain in the context of the current understanding of the structure of hot Jupiters, the apparent outliers at the three extremities of the density versus radius distribution are not yet a cause for concern.

The bottom line to take away? The extremities of the density versus radius distribution (and indeed in other exoplanet distributions) are currently extremely fuzzy and should not be taken seriously at face value. Moreover, the data in the EPE are likely entered by young people (I don’t have anything against youth, I wish I still had it myself). Or at least it seems that the data entry is not done by people who understand the science behind the data and the data entry is not checked adequately by someone who does. Mistakes creep in (I see them all the time in the EPE). And omission of uncertainties is lethal. In some cases you may as well choose a random number. So exercise caution in interpreting any plots or results that just use the raw data without scrutiny.

The data for the density plots cannot be directly downloaded from the EPE, you have to do some filtering and calculation, so I am making the Excel file that I constructed available for download, which might be useful if you want to use the data for teaching, or if you just want to play around with it yourself. I have also included a surface gravity column in the file.

I’d like to take this opportunity to thank readers for tweeting and otherwise notifying cyberspace of these posts, it really does make a difference to extending the outreach that is possible. I hope the excel file is useful. I also had someone make a funny book trailer video – the theme is that apparently aliens don’t like the Kindle. Enjoy!