In Japan there is even a word to describe the various limits in innovative thinking. Taga, which literally describes the metal hoops which keep a tight hold on the wooden boards which make a barrel, is used to describe the current state of Japanese innovation. Taga is what causes organizations to decide unconsciously and automatically what is possible and what is not based on current circumstances, not future predictions, hopes or opportunities. It stops completely the ability of a company to adopt a positive attitude towards any change or new idea. Taga is usually fostered in a tacit agreement to, or unspoken understanding of, customary rules or organizational paradigms within a company. When new people join a company (usually it’s the hope that new people bring new ideas) they tend to quickly become unconsciously accustomed to thinking along the lines of the existing organization paradigm. This means that it can be extremely difficult for a company to be aware of taga limiting creativity and implementation of new ideas within your own company.
Spectrum: If we could turn now to the subject of big data, a theme that runs through your remarks is that there is a certain fool’s gold element to our current obsession with it. For example, you’ve predicted that society is about to experience an epidemic of false positives coming out of big-data projects.
Michael Jordan: When you have large amounts of data, your appetite for hypotheses tends to get even larger. And if it’s growing faster than the statistical strength of the data, then many of your inferences are likely to be false. They are likely to be white noise.
Spectrum: How so?
Michael Jordan: In a classical database, you have maybe a few thousand people in them. You can think of those as the rows of the database. And the columns would be the features of those people: their age, height, weight, income, et cetera.
Now, the number of combinations of these columns grows exponentially with the number of columns. So if you have many, many columns—and we do in modern databases—you’ll get up into millions and millions of attributes for each person.
Now, if I start allowing myself to look at all of the combinations of these features—if you live in Beijing, and you ride bike to work, and you work in a certain job, and are a certain age—what’s the probability you will have a certain disease or you will like my advertisement? Now I’m getting combinations of millions of attributes, and the number of such combinations is exponential; it gets to be the size of the number of atoms in the universe.
Those are the hypotheses that I’m willing to consider. And for any particular database, I will find some combination of columns that will predict perfectly any outcome, just by chance alone. If I just look at all the people who have a heart attack and compare them to all the people that don’t have a heart attack, and I’m looking for combinations of the columns that predict heart attacks, I will find all kinds of spurious combinations of columns, because there are huge numbers of them.
So it’s like having billions of monkeys typing. One of them will write Shakespeare.
Spectrum:Do you think this aspect of big data is currently underappreciated?
Michael Jordan: Definitely.
Spectrum: What are some of the things that people are promising for big data that you don’t think they will be able to deliver?
Michael Jordan: I think data analysis can deliver inferences at certain levels of quality. But we have to be clear about what levels of quality. We have to have error bars around all our predictions. That is something that’s missing in much of the current machine learning literature.
Some port areas have extreme high pollution levels caused by ships and diesel trucks. In some areas regulation is beginning to have an effect, but real change demands different energy sources. Siemens is installing a trolley-like catenary system for diesel-electric hybrids in Southern California at the Los Angeles and Long Beach ports. Their animation offers an explanation.
Ultra high energy cosmic rays create a shower of particles when they encounter the Earth's atmosphere. The shower is large enough that distributed detector arrays are used to detect them. You need very large arrays and/or a lot of time to obtain a meaningful sample of events.
Greg noted an interesting approach that would use smartphones. The camera sensor can detect some types of shower particles if they happen to hit it. Of course the probability is small and you couldn't tell much from single events, but smartphones know where they are and what time it is and have a link to the Internet. There are also a lot of them. The scheme is to use them when they're idle, but plugged into a power source (so they don't drain the battery). When they detect energy being deposited in the camera sensor they send a report with the type of phone, time, place and rough amount of energy. The individual points are mapped by the experimentors who look for time and space correlated shower footprints.
A neat idea, but it would take a lot of cooperation to build a meaningful array. The problem is more social than technical in nature at this point.
Observing Ultra-High Energy Cosmic Rays with Smartphones
Daniel Whiteson,1 Michael Mulhearn,2 Chase Shimmin,1 Kyle Brodie,1 and Dustin Burns2
1Department of Physics and Astronomy, University of California, Irvine, CA 92697
2Department of Physics, University of California, Davis, CA
We propose a novel approach for observing cosmic rays at ultra-high energy (> 1018 eV) by re- purposing the existing network of smartphones as a ground detector array. Extensive air showers generated by cosmic rays produce muons and high-energy photons, which can be detected by the CMOS sensors of smartphone cameras. The small size and low efficiency of each sensor is compen- sated by the large number of active phones. We show that if user adoption targets are met, such a network will have significant observing power at the highest energies.
The source of ultra-high energy cosmic rays (UHECR), those with energy above 1018 eV, remains a puzzle even many decades after their discovery, as does the mecha- nism behind their acceleration. Their high energy leaves them less susceptible to bending by magnetic fields be- tween their source and the Earth, making them excel- lent probes of the cosmic accelerators which produce them [1, 2]. But the mechanism and location of this enor- mous acceleration is still not understood, despite many theoretical conjectures [3–6].
When incident on the Earth’s atmosphere, UHECRs produce extensive air showers, which can be detected via the particle flux on the ground, the flourescence in the air, or the radio and acoustic signatures. A series of dedicated detectors [7–9] have detected cosmic rays at successively higher energies, culminating in observation up to 3 · 1020 eV. The flux of particles drops precipitously above 1018 GeV, due to the suppression via interaction with the cosmic microwave background [10, 11], making observation of these particles challenging.
To accumulate a sufficient number of observed showers requires either a very long run or a very large area. Con- structing and maintaining a new detector array with a large effective area presents significant obstacles. Current arrays with large, highly-efficient devices (Auger , AGASA ) cannot grow dramatically larger without becoming much more expensive. Distributed detector ar- rays with small, cheaper devices (ERGO , etc) have the potential to grow very large, but have not achieved the size and density required to probe air showers, poten- tially due to the organizational obstacles of production, distribution and maintenance of their custom-built de- vices.
It has been previously shown that smartphones can de- tect ionizing radiation [15, 16]. In this paper, we demon- strate that a dense network of such devices has power sufficient to detect air showers from the highest energy cosmic rays. We measure the particle-detection efficiency of several popular smartphone models, which is necessary for the reconstruction of the energy and direction of the particle initiating the shower. With sufficient user adop- tion, such a distributed network of devices can observe UHECRs at rates at least comparable to conventional cosmic ray observatories. Finally, we describe the oper- ating principles, technical design and expected sensitivity of the CRAYFIS (Cosmic RAYs Found In Smartphones) detector array. Preliminary applications for Android and iOS platforms are available for testing .