I was asked to put together a guest post for the blog of my friend Juliette's organization, the Vibrant Data Project. I'll write about them later - they're interested in data and our personal rights. I decided to write a small bit on information and the education we need to be literate in the current age. It is a rich subject and I have only touched it.
Here is the draft - here it is in the wethedata blog with a more clever title.
The clock told me twenty minutes had gone by, but something was very wrong. I was expecting the wonderful odor that came with the Maillard reaction - that wonderful dance of chemistry between amino acids and reducing sugars that bring out many of the rich flavors we love in roasting and barbecuing. The poor sweet potato fries were warm, but the reaction had mostly failed. How could this be?
I had set the oven for 400°F - hot enough to trigger the reaction, but not so hot as to ruin the expensive olive oil. A quick check of the oven with an infrared thermometer gave a clue. Even though the controls told me it was regulating at 400°, it was only 310° around the measuring probe and 280°F on the pan. I cranked the oven to 550° and watched it like a hawk so I wouldn’t burn my fries. In the end they weren’t great and I ordered a new temperature sensor for the oven with overnight delivery.
There is a common notion that suggests a progression from data to information to knowledge and finally to wisdom. The “data” in this case would have been the temperature reading on the oven, the signal that it was regulating and the cooking time. In fact there was some hidden information - namely the oven had stopped measuring temperature accurately enough for my purposes. The “data” had additional information attached. I didn’t have a rich enough understanding of this simple system when I closed the oven door.
My sport is physics - the simplest and most basic science. The idea is you watch Nature and ask very carefully phrased questions and by observation and experimentation you get an answer. You may not like what she is telling you, but if you were very careful at all steps of the process and asked a clean enough question, someone else will be able to repeat what you did - often using a somewhat different path to get at the same question and, mirabile dictu, they get the same response. Getting all of the bits right can be fantastically difficult. You have to question everything you do as it is very easy to fool yourself. Everything is suspect. If you must use data, information, knowledge and wisdom perhaps the order should be changed. You need wisdom to school yourself and acquire the knowledge of the system you are studying so you can categorize and understand enough of the fundamental information sources and manipulations to know if any of the upfront information can be called “data” in your community.
None of your measurements will be black or white. Everything will have something of an error and this turns out to be fundamental. So you don’t fool yourself you calculate the size of these errors. Multiple measurements mean that you need to understand how to calculate an error for your system - you need to know how these errors propagate through your calculations. And finally you need to publish your result with a confidence level. This will give other physicists an idea of how good your result is - how well it expresses what Nature has told you. And what you have said isn’t really taken seriously until others can repeat it.
A physicist will use the term “data”, but it is very contextual. It means that it is information where there is a solid understanding of the process that produced it. Without this understanding it is just information with the baggage that the additional information that is attached to it is not well understood and could very well be wrong.
We’re entering this wonderful period where there is a lot of information online and, through the efforts of organizations and citizens alike, it is rapidly expanding. Unfortunately the accuracy of large amounts of it is not well understood. It is very useful - no - it is critical - that those of us who produce and use this information have a basic understanding of its accuracy. When you mash information up from a variety of sources can you state numerically your confidence level?
Some bits of information are relatively straight forward and are generally measured to good enough accuracy for your purpose and others aren’t. There need to be mechanisms for you to find this out.
Anyone dealing with this sort of information needs to become educated in its manipulation. I’ve been recommending that a requirement for high school graduation should include a year of probability and statistics. If you are a bit rusty in this area it is a part of your responsibility as someone who interprets these information streams to learn on your own. Fortunately it isn’t difficult and good resources exist. Over the past year I tutored a young woman who only has a high school education, but now she is more adroit at the sport of statistics than many MBAs and computer scientists I’ve worked with over the years.
Many organizations worry about buzzword compliance and doing “big data” is something that seems to echo everywhere. I’ve had the good fortune to examine a few projects and it is immediately clear that some have failed badly as the people who architected and those who use them have a poor understanding of the sources of error and its propagation.
There is great potential out there, but there is also great potential for making an utter fool of yourself and your organization. Good citizenship in this emerging world demands a new sort of literacy, so if you aren’t there yet I encourage you to cowboy or cowgirl up and learn the language.
There is too much else to talk about, but I promised to keep this short but there’s one more thing.
The science behind understanding climate change has been excellent. Bits and pieces of it are simple physics, but much of it is part of a very complex system that is exceedingly difficult to sort out. Dozens of teams have approached it in their own way and, although individual confidence levels are often not fantastic, one can estimate a global confidence level for the community and it is much better than almost any medical test you can imagine.
Sadly the public is largely illiterate when it comes to understanding this point and any scientist knows enough to not project perfect confidence. The result has been tragic. Certain interests have been able to manipulate public opinion and force a political stalemate in some countries during a very critical and leveraging time. This leads me to something else the information and, in particular, the scientific community needs to understand.
Storytelling is something that appears to be fundamentally important to many people. Scientists, including this author, are terrible storytellers and now that information and science are important elements that feed into policy, things break. Storytelling can be used for good or for evil. Those of us who play with information need to gain skills and, perhaps more importantly, partner with those who understand this art. The stories need to be solid - they have to trace back to a fundamental confidence level - but if you mean to change the world you need to be speaking in the most direct voice.
This is a brief introduction that skips much of the detail. For example people are fantastic pattern recognition engines and tend to find patterns were none exist. I've touched on it in several earlier posts, but need to go in much deeper in future posts as apophenia is deeply wired into the human species and is central to how we filter sensory information and perceive the world.1 I've had a look at a few "big data" operations in the corporate world. Some appear to well well thought out while some are complete disasters. Over-determination is common and is often used to support pre-conceived notions. Visualization often is substituted for rigor - hey - if it looks good, it must be right, eh?
Yesterday I was thinking a bit about education upon reading a post by another reader of this group. It wasn't central to his approach, but I found myself thinking about the Malvina Reynolds classic, Little Boxes. Rather than hunt through my ancient library I searched for a performance to remind myself of some of the lyrics and found this delightful performance by the Canadian group Walk off the Earth.
A sweet version of Malvina Reynolds' "Little Boxes" by the Canadian Group Walk off the Earth
For completeness here is Malvina who, as Pete Seeger observed, is just like everyone else except she woke up and wrote a new song every morning before breakfast...
I love it when a serious organization has a bit of a laugh and dresses it in formal clothing - here the New England Journal of Medicine has a very unscientific piece on one of my favorite treats. There is real evidence that dark chocolate in small quantities can be very good for you, but this is a great example of passing around low quality information with an inaccurate conclusion that happens to justify what all of us want to see.
So for your chocolate urges here is a nice recipe I've been using for years. It is a riff on a recipe I no longer know the source for. I currently like to make it with a shortbread crust with chocolate mixed in with the crust, but present an easier version with a standard chocolate wafer cookie based crust. For bonus points you can top it with shaved bittersweet chocolate.
Simple, But Rich Chocolate Cheesecake
° about two dozen Nabisco Chocolate Wafer Cookies (the 9 ounce package)
° 1 tablespoon of white cane sugar
° 1/4 cup (a half stick) of butter, melted
° 275 grams high quality 70% bittersweet chocolate chopped. I have been using Lindt 70% or the large (something under 10 ounce) Scharffen Berger bar.
° 4 eight ounce packages of cream cheese warmed to room temperature
° 275 grams white cane sugar (about 1-1/4c plus 2 tablespoons)
° 30 grams (1/4 cup) unsweetened cocoa powder - Hershey's works well (Hershey's cocoa powder is surprisingly good and inexpensive)
° 4 large eggs
° preheat oven to 350°F and butter a 9" springform pan
° process the cookies in a food processor until finely ground and blend in the sugar. Add the butter and process until blended.
° press the crumbs into the bottom, but not the sides of the pan.
° bake until it starts to set - about five minutes
° melt the chopped chocolate in a double boiler - remove the bowl and allow to cool to a point where it is slightly warm, but still pourable.
° in the food processor blend in the cream cheese, sugar and cocoa powder until it is smoothly blended. Blend in the eggs separately, one at a time.
° Mix in the slightly warm chocolate and pour the filling onto the crust in the springform pan.
° bake at 350°F until the center is just set and starts to appear dry - about an hour for me
° cool for about 10 minutes and run a knife around the edges to loosen the cake.
° refrigerate at least six hours.
1 Visual and audio apophenia exist and a case can be made for the other senses.