Category: stats


Today in LabLulz, I’m going to walk through a recent preparation I did in my chemistry lab: increasing and measuring the concentration of hydrogen peroxide.

WARNING: This procedure involves heat and the end product is a powerful oxidizer. Don’t get burned and don’t get it on yourself – wear gloves, splash-resistant goggles, and an apron. I had a spill of ~15%, all over everything, including myself. It was okay, but only because I followed safety protocols. I didn’t have the apron though, and I had to get pantsless.

Hydrogen peroxide is an interesting substance; it’s formula is H2O2, meaning that it is composed of two hydrogen atoms bonded to two oxygen atoms.

sdfsfasdf

Figure 1. Behold, the hydrogen peroxide molecule!

It is a powerful oxidizer, decaying into water and free oxygen. This is because the bond between the two oxygen atoms, called the peroxide bond, is unstable. Some substances which contain the peroxide bond are even explosive, like triacetonetriperoxide. Because it’s an explosive precursor, and somewhat dangerous on its own, concentrated hydrogen peroxide can be difficult to come by. The weak 3% solution found in drugstores is all that is available to DIYers, hobbyists, and other scientists outside of the mainstream chemical supply chain.

Fortunately, it is relatively trivial to increase the concentration from 3% to around 30%. There are several tutorials on the subject at YouTube (TheChemLife; zhmapper, nerdalert226) so I’m going to focus on measuring the concentration of the end product, a procedure which the videos tend to treat very qualitatively. I hope this tutorial will be informative and useful, even outside of punklabs; the process is easily generalized and density is important in many fields, including medicine and winemaking.

The concentrating procedure is pretty simple: pour about 500 mL of the 3% solution into a beaker and heat it, forcing the excess water to evaporate until there is a tenth as much liquid left (peroxide boils at 150 C, compared to 100 C for water.) There are only a couple of tricky points: the liquid must NOT boil, only steam – if it starts boiling, the peroxide will decay. Bits of dust and dirt will also cause disintegration, so the equipment must be kept very clean and free from scratches.

Okay, so after a few hours, I have about 50 mL of liquid. I drop a bit into a solution of corn starch and potassium iodide, and the mixture turns black, a positive test for oxidizers. I add a squirt to some sulfuric acid and copper wire, and the metal wire begins bubbling and the solution begins to turn blue with copper sulfate*. This reaction is faster and more vigorous than when I try it with the 3% solution, so I’ve clearly succeeded in increasing the concentration, but to what level? To answer that question, I’m going to measure the density of the solution.

The Setup

Figure 2. The Densitometry Setup. Note the safety equipment. Note also the lab notebook, which is essential. Other sights include a bit of iron oxalate, tongs, and a desiccator.

Here’s my setup. I don’t have a nice buret with a stopcock, so instead I have repurposed a graduated medical pipette (I picked up a huge box of these at the Scrap Exchange). This is controlled with a valve and a syringe plunger. It’s a little drippy and derpy, not great for dispensing a planned volume, but it works fine to measure the amount that has been dispensed, which is sufficient for our purposes. The milligram scale was lent to me by a friend after armed thugs stole my old one (thanks, B!) The brown glass vial is good for containing peroxide, since light speeds up its decomposition.**

Once the room and the beaker are at the same temperature (20 deg C), I draw about 8 mL of my peroxide up into the pipette, and start adding peroxide a bit at a time. By the time I had figured out the fluidics system, I’d added about 3.0 mL, so that’s where the data start. I then would squirt a bit of peroxide into the vial, note the volume and the mass, and repeat. I took 8 different measurements this way.

Then, after I put everything away and cleaned up, I sat down with a cup of coffee for a bit of data entry.

sdfasdf

Figure 3. Data, in analog and digital form. This is a page from my lab notebook, and the spreadsheet in gnumeric (inset).

I usually store my data in spreadsheets, and my processing and analysis with Python. Once the spreadsheet file is ready, I get started and load up my data.

$ ipython –pylab

In [1]: import xlrd
In [2]: phial = xlrd.open_workbook('densityData.xls')
In [3]: data = phial.sheet_by_index(0)
In [4]: raw_volume = data.col_values(1)[5:]
In [5]: raw_mass = data.col_values(3)[5:]

Next, I take a quick peek at the data just to make sure that everything is as expected.

In [6]: plot(raw_volume, raw_mass, 'bo')
test

Figure 4 Plotting the raw data.

Looking good; the data are nice and linear, meaning that the slope (and therefor the density) is well defined. But this is the raw data, which include the mass of the bottle. I also didn’t start at zero on my pipette, just wrote down the volume the liquid was at when I took each mass measurement. It doesn’t matter too much, but strictly speaking, we want to compare just the mass of the liquid in the bottle to the volume of liquid drained from the pipette. Let’s go ahead and calculate that (it’s a lot like calculating temperature anomaly.)

In [12]: volume = array(raw_volume) - 3.0*ones_like(raw_volume)
In [12]: mass = array(raw_mass) -9.988*ones_like(raw_mass) 

Much better. Now, to get some basic statistics on these data, let’s apply a linear regression.

In [13]: from scipy import stats
In [16]: (m, b, r, p, std) =stats.linregress(volume, mass)
In [17]: (m, b, r, r**2,  p, std)

Out[17]:
(1.1189534883720933,
0.067450581395348319,
0.99932418729044048,
0.99864883130369941,
7.7125664942037092e-10,
0.01680292147514929)

It’s mostly the slope, m = 1.12 g/mL, which we’re interested in.

In [16]: linFit = polyval([m,b], volume)
In [29]: plot(volume, linFit, 'r-')
In [30]: plot(volume, mass, 'bo')
In [31]: title('Mass vs. Volume for H2O2')
In [32]: xlabel('Volume (mL)')
In [33]: ylabel('Mass (g)')
In [36]: text(1.25, 4.5, 'M = %fg/mL*V + %fg'%(m,b))
final

Figure 4. The mass of peroxide is plotted as a function of its volume, and a linear regression applied.

Now that we have the density, we can use this graph from H2O2.com.  (It’s derived from Easton et al 1952. The paper actually reports on density measurements at 0, 10, 25, 50, and 96 degrees, so these are probably interpolated curves).  We’ll draw a horizontal line at the measured density (~1.12 g/mL) until we hit the curve corresponding to the temperature (20C). Then we draw a vertical line and read off the concentration where it crosses the horizontal axis. Result? The concentration is about 32%. (Figure 5)

asdfasdfasdfsd

Figure 5.

You may wonder why I went through so much trouble of taking multiple data points and calculating a trend line. If density is defined as mass per volume, then surely I can measure it in one go, by massing a single sample of known volume. Right? The problem is that any one such measurement might be a little wonky. Maybe one fewer drop than usual wiggled out, or a draft of air was pushing down a little on the scale. Look back at Figure 4; see the data point third from the end, visibly above the trend line? If I was only taking one measurement, I might get unlucky enough to get an outlier like that one.  By using a linear regression, I can aggregate the data, and hopefully all those small outside factors will tend to cancel out.

Another reason is that the linear regression allows us to calculate uncertainty in the slope of the line, and therefor in the density. There are good online explanations of this, so I’m just going to churn through the equations.

In [26]: N = len(volume)
In [27]: unc = std *sqrt(N/(N*sum(volume**2)-sum(volume)**2))
In [28]: unc
Out[28]: 0.005463100999105781

We can thus report the density as 1.119 +/- 0.005 g/mL. If we wanted, we could use the uncertainty in the density to calculate the uncertainty in concentration the same way we calculated the concentration estimate: plot lines at 1.119 + 0.005 g/mL for the upper bound, and 1.119 – 0.005 g/mL for the lower bound and work backwards to get a confidence interval of concentrations. In our case though, the uncertainty is so small that it’s about the width of the original line not really worth propagating.

I feel comfortable just calling this 32% peroxide. Success!

* An easier, quicker test is to add peroxide to a catalyst that causes it to decay, but I didn’t have any manganese dioxide or horseradish on hand. Also, everyone loves copper sulfate.

** This is really a moot point, since I am keeping this sample in the fridge, and there is not even a lightbulb to philosophize about. But it’s also why you buy peroxide in opaque bottles.

~~~

Easton, M., Mitchell, A., & Wynne-Jones, W. (1952). The behaviour of mixtures of hydrogen peroxide and water. Part 1.?Determination of the densities of mixtures of hydrogen peroxide and water Transactions of the Faraday Society, 48 DOI: 10.1039/TF9524800796

I love graphs – my eyes quickly glaze over at a table of numeric data, but a graph, used correctly, can quickly and easily tell the whole story.

‘Used correctly’ is the key phrase – for all their power, graphs are infamously easy to bungle, and when used incorrectly they can misinform – or lie outright.

I’m going to look at an example that touches on a few graphical and statistical concepts near and dear to my heart, as well as carbon geochemistry.

Fig. 1: An image from C3Headlines; the 3 C's are "Climate, Conservative, Consumer". Oh, and the article is titled "The Left/Liberal Bizarro, Anti-Science Hyperbole Continues". It sure would be tragic if they made obvious n00b mistakes after using such language. Click for link!

Coming from an article on the website C3Headlines, this image claims that carbon dioxide concentrations have ‘Linear, Not Exponential Growth’. thereby ‘expos[ing] the lunacy of typical left/liberal/progressive/Democrat anti-science’, The author has reached this conclusion by graphing January CO2 levels* and fitting a linear trendline to them.

Already this is a warning sign – the comparisons the author makes are entirely qualitative, apparently  based up on eyeballing the graph. However, trend lines are created by a statistical process called a linear regression, which comes with a caveat: it will fit a trend line to ANY data given to it, linear or nonlinear. Fortunately, there are also ways of evaluating how good a trend line is. View full article »

temperature aNOMalies

If you are new to climate science, you might be wondering what, exactly, this ‘temperature anomaly’ thing is that you keep hearing about. I know I was a bit confused at first! This post explains the concept, using a real-world example.

Absolute temperatures (yearly averaged) from two sites in the UK: one urban (St. James Park, green) and one rural (Rothamsted, red). Although the urban site is consistently warmer, the two sites show the same warming trend. But is there a way to compare them directly? Data from Jones et al. 2008, kindly provided by Dr. Jones.

Cities tend to be warmer than their surrounding countrysides, a fact known as the urban heat island effect (UHI). This occasionally is offered as an alternative explanation for greenhouse warming, but it fails on closer inspection. We can use data from Jones et al. (2008) [PDF] to see one reason UHI can’t explain observed warming. One time series is from St. James Park, in the city of London; the other is from nearby Rothamsted, a rural site some tens of miles away. As you can see, the urban location is consistently about 2 C warmer; however, the warming is nearly identical at both sites (a strongly significant 0.03 deg C/year). Jones et al. note:

“… the evolution of the time series is almost identical. As for trends since 1961 all sites give similar values …  in terms of anomalies from a common base period, all sites would give similar values.”

This gives us a hint about what a temperature anomaly is: View full article »

Last time, we saw that some mathematical systems are so sensitive to initial conditions that even very small uncertainties in their initial state can snowball, causing even very similar states to evolve very differently. The equations describing fluid turbulence are examples of such a system; Lorenz’s discovery of extreme sensitivity to initial conditions ended hopes for long term weather forecasting. Because the state of the weather can only be known so well, the small errors and uncertainties will quickly build up, rendering weather simulations useless for looking more than a few days ahead of time.

But Lorenz’s discovery doesn’t have much impact on climate modelling, contrary to the claims of some climate skuptix. Climate is not weather, and modelling is not forecasting.

Weather refers to the state of the atmosphere at a particular time and place: What temperature is it? Is it raining? How hard is the wind blowing, and in which direction? Climate, on the other hand, is defined in terms of the statistical behavior of these quantities:

“Climate in a narrow sense is usually defined as the average weather, or more rigorously, as the statistical description in terms of the mean and variability of relevant quantities over a period of time ranging from months to thousands or millions of years. [...] Climate change refers to a change in the state of the climate that can be identified (e.g., by using statistical tests) by changes in the mean and/or the variability of its properties, and that persists for an extended period, typically decades or longer. ” IPCC

Many climate skuptik talking points derive from confusing these two quantities, in much the same way that a gambler might win a few hands of poker and decide that they are on a roll.

Although it is generally not possible to predict a specific future state of the weather (there is no telling what temperature it will be in Oregon on December 21 2012), it is still possible to make statistical claims about the climate (it is very likely that Oregon’s December 2012 temperatures will be colder than its July 2012 temperatures). It is very likely that the reverse will be true in New Zealand. It is safe to conclude that precipitation will be more frequent in the Amazon than in the Sahara, even if you can’t tell exactly when and where that rain will fall.

In fact, Lorenz’s groundbreaking paper, ‘Deterministic Nonperiodic Flow’, would seem to endorse this sort of statistical approach to understanding fluid dynamics:

“Because instantaneous turbulent flow patterns are so irregular, attention is often confined to the statistics of turbulence, which, in contrast to the details of turbulence, often behave in a regular well-organized manner.” (Lorenz 1963)

Let’s take a closer look.

Fig. 1. Three solutions of the Lorenz equations, starting at virtually identical points. Although the solutions are similar at first, they rapidly decouple around T=12.

The Lorenz equations consist of three variables describing turbulent fluid flow (X,Y, and Z), and three controlling parameters (r, b, and s). The equations are differential equations, meaning that a variable is described in terms of how it changes over time- saying ‘Johnny is driving west at 60 miles per hour’ is a simple differential equation. In order to solve a DiffEq, you need an initial condition – “Johnny started in Chicago” is an initial condition; without knowing that, you can’t say where she will be after driving for three hours. View full article »

A part of my John Everett series – read more: 0/I - II.0 - II.5 - II.75 -  III.0 - III.3 - IV.0 - IV.4 - IV.8 - V - VII - VIII - Full Report 

The CO2 scenarios are literally falling flat and need revision. The observational trend line shows monotonic growth – pretty much a straight line as in the chart below of global marine CO2 measurements (NOAA data)4, while the IPCC scenarios used in most research rely on an accelerating growth. Certainly the predicted rapid acceleration of the IS92a model (see solid black line in middle of the figure on the right) is missing from the NOAA data plotted below. In fact, if the last 8 or 12 years are representative of the future, we might imagine a downward slope in the growth rate.

Last time, we looked at one claim Dr. Everett makes in this paragraph: that the measured rate of change in atmospheric carbon dioxide is inconsistent with the emissions scenarios used to predict future ocean acidification. To do this, he plays fast and loose with quantities and their derivatives (the rates at which they change.) The imprecision extends even to his quoted numbers: in pidginthe previous paragraph, he gives a growth rate as “3.05 ppm”. That’s a not a growth rate; it’s a concentration. He means 3.05 ppm per year. His projection is an extrapolation of “the average rate of increase for the past 10 years (1.87/year)…” 1.87 WHAT per year? I know that he means 1.87 ppm/year, but a lot of people wouldn’t, and I shouldn’t have to make assumptions. If Everett is being sloppy with his units, he’s being sloppy with his science.

The other claim that Dr. Everett draws from the rate of change in CO2 is that “the growth rate seems to be leveling off, if not declining [...] In fact, if the last 8 or 12 years are representative of the future, we might imagine a downward slope in the growth rate. ” Look at the graph of the growth rate again. It goes up and down- a lot.

The record of changes in atmospheric carbon dioxide since 1980. It's got it's ups and downs. Click to see the full record back to 1959.

Follow

Get every new post delivered to your Inbox.