I love graphs – my eyes quickly glaze over at a table of numeric data, but a graph, used correctly, can quickly and easily tell the whole story.
I’m going to look at an example that touches on a few graphical and statistical concepts near and dear to my heart, as well as carbon geochemistry.
Already this is a warning sign – the comparisons the author makes are entirely qualitative, apparently based up on eyeballing the graph. However, trend lines are created by a statistical process called a linear regression, which comes with a caveat: it will fit a trend line to ANY data given to it, linear or nonlinear. Fortunately, there are also ways of evaluating how good a trend line is.
It’s not that we don’t expect there to be any residuals when the phenomenon being modeled is in fact linear – real life data are noisy; we’d expect there to be variation that a linear model doesn’t account for. But if the only nonlinear component of the data was noise, that component would look, well, noisy. (See Figure 3)
Yet, when we look at the residuals from the atmospheric CO2 data, they look anything but noisy:
What we see is a clear validation of our observation that the CO2 concentrations show curvature – and significant curvature at that (the p-value for the quadratic fit shown is ~10^-16).
Now that we’ve got a model for the residuals, we can combine it with our original linear fit to get a much better description of the data. The reasoning is that, since:
Residuals = Linear Fit – Data
And since we have a quadratic model for the residuals:
Quadratic Fit = Linear Fit – Data
We can build a nonlinear model for the data:
Nonlinear Model = Linear Fit – Quadratic Fit
When we do this, we get a curve which describes the data much better than the linear regression:
In fact, this nonlinear model looks an awful lot light the light gray curve in Fig 1! Apparently C3 fit the data with some sort of nonlinear model to create the image, but completely ignored it otherwise!
Thusfar, what we have learned is that there is significant curvature in these CO2 data. We’re justified in describing it with a two-degree polynomial model
Y = a*X^0 + b*X^1 + c*X^2
This quadratic model is consistent with an exponential model. An exponential function is actually a sort of infinite-degreed polynomial. If the data are exponential, our quadratic model is describing the first few terms of this infinite polynomial:
e^X = a*X^0 + b*X^1 + c*X^2 + d*X^3 + … + an*X^n + …
Distinguishing between the cases of quadratic, exponential, and superexponential growth requires more complicated statistical techniques, such as those described in (Husler & Sornette 2011). It’s a harder question, and to be honest I’m still thinking about it. What’s easy to see is that these data are most certainly NOT linear.
Actually, if you’ve read through some of my earlier discussions, you’ve seen one reason to doubt the linearity of CO2 data. Remeber back when we were talking about the rate of change in CO2 levels, and we found that the rate has increased over time? A linear growth in CO2 is characterized by a constant rate of change; if the rate is increasing, CO2 growth can’t be linear! And though the curvature may appear small, 1.5% of the variation in CO2 concentrations, we’ve already seen that over several decades it really adds up.
When I first was reading the C3 article, my reaction was more or less: who cares? Sure, the CO2 concentrations were clearly nonlinear, but why does it really matter whether they’re superexponential, or “merely” exponential? It’s great to have a model for data, but in this case you’re not going to use it to interpolate, since the data are already fairly complete, and you don’t need a model to fill in gaps. It would also seem questionable to me to use such a model to reliably extrapolate into the future. It might be good to define a business as usual scenario, but the actual course of future CO2 concentrations is not set by our choice of curve fitting; it in fact depends strongly on human agency.
I found the answer in “Evidence for super-exponentially accelerating atmospheric carbon dioxide growth” (Husler & Sornette 2011), the paper that sparked the Joe Romm blag that produced the WattsUpWithThat blagoblag that spawned the C3 blagoblagoblag that led to the current blagoblagoblagoblag. What they did was build a model relating quantities like economic production, technological sophistication, and emissions. Then they used the model and historical data to characterize the economy of world carbon emissions. The authors write:
“The coexistence of a quasi-exponential growth of human population with a super-exponential growth of carbon dioxide content in the atmosphere is a diagnostic that, until now, improvements in carbon efficiency per unit of production worldwide has been dramatically insufficient. [...] This statement may appear shocking and counter-factual for developed countries. But, at the scale of the whole planet, one can observe that improvement in carbon emissions (i.e., decrease per unit of output) in the developed countries are counteracted by the increases of carbon emissions in some major developing countries, such as China, India and Brazil, which use carbon emission inefficient technologies (for instance heavily based on coal burning). ”
There’s a few more lulz in the graph/data department over at C3. Check back soon for Part II.
* Why only January values are used is not made clear.
Statistical processing by ZunZun
Andreas D. Hüsler, & Didier Sornette (2011). Evidence for super-exponentially accelerating atmospheric carbon dioxide
growth arXiv arXiv: 1101.2832v3