For the past three months there has been real excitement in the physics community. Oh the direct detection of gravity waves was a huge, but it was sort of expected. Gravitational radiation have been seen indirectly for some time and upgrades to the LIGO detector was going to be sensitive enough that detection was likely if Einstein was to be believed. A great discovery and probably a Nobel Prize, but more importantly a new branch of astrophysics and cosmology has been opened - we have an entirely new kind of telescope. Wonderful, but the real excitement has been a bump in the diphoton spectrum recorded at Atlas and CMS at CERN at 750 GeV. In December it was 2.6 sigma at Atlas and now is around 3.6 at both detectors. Gibberish and blather for most, but for a few others an electric excitement linked to passion.
So why mention this...?
Our economy increasingly depends on information. Information, at least contextually meaningful information, has become a commodity. Recently Charlie pointed out that some disciplines that should value carefully vetted information don't. His comment and the current excitement in physics suggested an opportunity to talk about tempering excitement with being careful while engaging in some incredible play. I'll jump into a bit of the sausage making in physics, but will try and stay away from the technical bits. First some motivation.
In the mid 70s it became clear a theoretical model called the Standard Model explained a good deal of fundamental physics. It nicely unified the strong, weak and electromagnetic forces - only gravity was left out. It made a number of predictions including the Higgs field which partly explains how mass comes about. But although it has been enormously successful it has some serious issues. For example it doesn't shed any light on why gravity is so weak - why is it 100,000,000,000,000,000,000,000,000,000,000 times weaker than the weak force? This turns out to be part of the hierarchy problem - the electron is about 200 times lighter than the muon and 3,500 times lighter than the tau. Likewise for the quarks: the top quark is 75,000 times heavier than the up quark. Nature's lego set has blocks that vary widely in size. The hint is some other undiscovered particles lurk. If something called spontaneously broken supersymmetry exists, I'll stay away from talking about that for now, many of the pieces fall into place.
A hunt for supersymmetric particles has been underway for a few decades. One of the reasons for building the Large Hadron Collider and its detectors was to find and learn about the Higgs field, but searching for supersymmetry as well as entirely new physics is at least as important. To date supersymmetry hasn't panned out. There are physicists who have spent a few decades focusing on this in theory and experiment, but hard work and wishing doesn't impress Nature. The great thing about science is ultimately Nature is the only standard. It isn't quite time to call it quits on supersymmetry, but it has been rather quiet.
So a few months ago word leaked out of a small bump - a resonance indicating a particle might existing with a mass of about 750 billion electron volts - about five times as massive as the Higgs boson and about 800 times greater than that of the everyday proton. The evidence wasn't great - 2.6 sigma at one of the detectors, but it was seen in both detectors at the same mass.
In particle physics the standard for claiming a new particle exists is five standard deviations - five sigma. Many two and three sigma bumps have faded upon closer inspection. The game is to have enough collisions to get good statistics and understand the signal at a very deep level. The fact that this isn't a discovery yet doesn't mean it isn't exciting. Experimentalists will look more carefully when the collider is running again and in the meantime everyone is looking for flaws in what they've done so far. Unlike some fields you can have considerable stature showing a claimed result is false . This is exactly what happened with the primordial gravity waves that were announced by the BICEP2 collaboration two years ago. This electrified the physics world as it was such a beautiful story - the problem was they didn't understand the signal's noise background deeply enough so they had to retract.
People are past the wtf stage. Now experimentalists are trying to understand the background (these signals are embedded in piles of noise that necessitate the use of multiple filters - you need to make sure you aren't fooling yourself) and preparing for future runs the theorists are engaged in serious play seeing if there are theoretical frameworks that can explain the signal reported to date and perhaps make a prediction or two - experimentally testable hypotheses. Today I looked at the arXiv - over 300 papers trying to explain what is going on. Many theory papers are somewhat polished versions of the neat idea that you can't dismiss after a few weeks of work. Papers at this level a kind of communal work in progress. Sometimes the preprints have a feel of the weekly colloquium.
Particle physics has an open atmosphere where people communicated by preprint. When I was a grad student in the late 70s the practice was at the point where fax machines were working around the clock and the main reason for looking at final journals was for references in your papers. This openness lead to the need for doing something on the Internet and Tim Breners -Lee's project led to something rather useful called the web.1
Back to this statistical significance thing... what is five sigma?
Five sigma corresponds to a probability or p-value of 3 x 10-7 or 1 in 3.5 million. Now for a few important comments. This is not the only standard for discovery! It happens to be one that is important, but you'd be deluding yourself if you only relied on it. Also it doesn't mean the Higgs boson does or doesn't exist. It would be sweet if you could talk about probabilities that something existed, but there is a tricky 'if' in the real thing. It is the probability that if the particle does not exist, the data taken at the Large Hadron Collider, would be at least as extreme as what they observed.
A researcher will use p-values to test the likelihood of hypotheses. In an experiment comparing some phenomenon A to phenomenon B, researchers construct two hypotheses: that A and B are not correlated - the null hypothesis, and that A and B are correlated- the research hypothesis. Now she assumes the null hypothesis (it's the most conservative supposition, intellectually) and calculates the probability of obtaining data at least as extreme as what is experimentally observed, given that there is no relationship between A and B. This gives the p-value, can be based on any of several different statistical tests.
A low p-value is low means there there is only a small chance (p=0.01 is one percent) that the data would have been observed by chance without the correlation. Different fields of science have established gold standards. Some of medical and social science use p = 0.05.2
Sigma is just a standard deviation - just a measure of the how wide a distribution of data points is about a mean. It takes a very specific meaning for a normal (gaussian) distribution - its usual context. A large sigma means the data are spread out - a wide curve rather than a spike. You can use this to measure how much of of the data is within a certain range. Again for a normal distribution about 68% of data points lie within one sigma, 95% within two sigma and so on. If something is five sigma out it means it is to the left or right of most of the curve fewer than one in a million. The numbers quoted in particle physics imply it is all on one side of the curve, which is where the 1 in 3.5 million comes from. Pretty rare.3 The initial diphoton excess signal sigma that got people's attention was 2.6 corresponding to a probability of 4.7 x 10-3 or about 1 in 215. A sigma of 3.6 yields 1.6 x 10-4 or one in about 6,300. Not good enough, but part of the added information is that the signal was seen at both detectors at the same energy - another bit of probability to play with.
This process of finding the sigma of a signal isn't direct or easy. There are tens of thousands of data channels and filters making their own contribution. Each of these needs to be understood (that means almost constant verification and calibration) and has its own contribution. All of this goes into a big number grinder where everyone who touches a component has his or her fingerprints on the final result. People err on the side of being conservative.
OK - that's one very cautious and conservative community when it comes to understanding information. What about the rest of us? How well do you understand the filters on social media? How about the regular news? How about information that is mission critical to business decisions you make? Who has their fingers on the information as it is processed and exactly what is happening to it? How do you verify it? Do you verify it? How deep is your trust? Some of these are qualitative others are quantitative and some are in between...
I've been asked to look at a fair amount of data visualization and processing over the years. Some of it is great, but much of it - including that from many data scientists - is very low quality. Sadly it makes up for it in the minds of many for being pretty and for certifying a required result... If information is a commodity what can be done about counterfeit coin?
__________
1 The spinoff from the pursuit of physics has been remarkable. Most of medical instrumentation had its beginnings in physics. The fact you have people trying to measure at a level where they have to invent apparatus is a great way to discover and invent new technologies. Even dealing with massive amounts of information has made fundamental contributions to computer science and statistics.
2 This weak (by the standards of experimental 'hard' sciences) plus a reliance on p-values alone are artifacts of poor repeatability - a serious problem in medical and social science to the point where it can be debated if they are really science. I have quite a bit to say on that, but not now. Some of medical science is based on single studies, often with drug company funding, that are about as good as flipping a coin. I use a commonly prescribed drug based on this level of confidence.
3 A friend is very tall - her height is about five sigma out from the mean for women's height.
__________
Recipe Corner
There are several approaches to Phat Phrik Khing... As a vegetarian I've made my own curry paste as most varieties have shrimp paste or fish sauce. Thai Kitchen has a vegan red curry paste that is widely distributed (supermarkets and Amazon) and it is an ok base for throwing something together quickly. You can also make your own sauce with a visit to an Southeast Asian market for galangal and makrut lime leaves (also online). If you make your own the difference in flavor between using a mortar and pestle and a food processor is huge, so do it by hand. If you can find fresh tofu, many Asian markets offer it, go for it as it is vastly superior to the stuff you find in supermarkets. The rough guide has a homemade sauce, but you can substitute a store bought and, of course, you can substitute meat for the tofu is you like. You can use vegetable or coconut oil. (the base recipe was copied from an article while waiting in a dental office a decade ago)
Phat Phrik Khing
Ingredients
° 6 dried guajillo, California, or pasilla chilies, stemmed and seeded
° 6 medium garlic cloves, roughly chopped
° 2 medium shallots, peeled and roughly chopped
° a few red Thai bird chilies, roughly chopped - I like a mild heat so 2 or 3 for me. If you are into heat use more!
° 1 large bunch cilantro stems, roughly chopped
° 3 makrut (kaffir) lime leaves fresh or dried, discard the hard central stem, leaves roughly chopped
° 1 stalk fresh lemongrass, bottom 4 to 5 inches only, discard the tough outer leaves and slice the core thinly
° 1 inchish knob galangal, peeled
° 1/2 teaspoon freshly ground pepper
° Kosher salt
° 3 tablespoons vegetable oil or coconut oil.
° 1 340g (12 ounce) block of firm tofu, cut to one by one by half inch, press firmly between paper towels to dry
° 1 pound green beans, trimmed and cut into one and a half inch lengths
° 1 tablespoon sugar
° 1 tablespoon soy sauce
° Your favorite teamed rice for serving
Technique
(homemade sauce)
° Place chilies in bowl and cover with boiling water. Cover the bowl and set aside for 10 minutes.
° Put the garlic, shallots, chilies, cilantro, lime leaves, lemongrass, galangal, ground pepper, and about a tablespoon of salt in a mortar and pestle and pound away 'till you get a rough paste. Drain chilies, add to mortar, and continue pounding until well distributed - it doesn't need to be super smooth.
(commercial sauce)
° use enough red chili paste - about 8 to 12 ounces depending on your taste and the tofu and beans you have
____
° Heat a tablespoon of oil in a wok over medium-high heat until shimmering. Add tofu, spread into a single layer, and cook, occasionally shaking pan gently, until crisp on one side, (3 or 4 minutes).
° Flip tofu and continue cooking until second side is crisp. Transfer to a bowl and set aside.
° Add a tablespoon of oil to the wok and crank heat to high. When oil is not - close to smoking - add beans and cook, stirring and tossing occasionally, until blistered and tender, (2 to 4 minutes depending on how hot the oil is and how brave/crazy you are). Transfer to bowl with tofu.
° Add the remaining tablespoon of oil to your wok and return to high heat. When it is hot add the curry paste all and cook constantly stirring, until sizzling, (about 1 minute). Return tofu and beans to pan, along with sugar and soy sauce. Stir and toss to combine and coat tofu and beans in curry paste. Season to taste.
° Serve immediately with steamed rice.