A brief introduction to the math recognition problem and automatic math recognition using modern artificial intelligence and pattern recognition methods. Includes a call for data. About 14 minutes.
(C) 2018 by John F. McGowan, Ph.D.
About Me
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).
Sabine Hossenfelder’s Lost in Math: How Beauty Leads Physics Astray (Basic Books, June 2018) is a critical account of the disappointing progress in fundamental physics, primarily particle physics and cosmology, since the formulation of the “standard model” in the 1970’s. It focuses on the failure to find new physics at CERN’s $13.25 billionLarge Hadron Collider (LHC) and many questionable predictions that super-symmetric particles, hidden dimensions, or other exotica beloved of theoretical particle physicists would be found at LHC when it finally turned on. In many ways, this lack of progress in fundamental physics parallels and perhaps underlies the poor progress in power and propulsion technologies since the 1970s.
The main premise of Lost in Math is that theoretical particle physicists like the author have been lead astray by an unscientific obsession with mathematical “beauty” in selecting and also refusing to abandon theories, notably super-symmetry (usually abbreviated as SUSY in popular physics writing), despite an embarrassing lack of evidence. The author groups together several different issues under the rubric of “beauty” including the use of the terms beauty and elegance by theoretical physicists, at least two kinds of “naturalness,” the “fine tuning” of the constants in a theory to make it consistent with life, the desire for simplicity, dissatisfaction with the complexity of the standard model (twenty-five “fundamental” particles and a complex Lagrangian that fills two pages of fine print in a physics textbook), doubts about renormalization — an ad hoc procedure for removing otherwise troubling infinities — in Quantum Field Theory (QFT), and questions about “measurement” in quantum mechanics. Although I agree with many points in the book, I feel the blanket attack on “beauty” is too broad, conflating several different issues, and misses the mark.
In Defense of “Beauty”
As the saying goes, beauty is in the eye of the beholder. The case for simplicity or more accurately falsifiability in mathematical models is on a sounder, more objective basis than beauty however. In many cases a complex model with many terms and adjustable parameters can fit many different data sets. Some models are highly plastic. They can fit almost any data set not unlike the way saran wrap can fit almost any surface. These models are wholly unfalsifiable.
A mathematical model which can match any data set cannot be disproven. It is not falsifiable. A theory that predicts everything, predicts nothing.
Some models are somewhat plastic, able to fit many but not all data sets, not unlike a rubber sheet. They are hard to falsify — somewhat unfalsifiable. Some models are quite rigid, like a solid piece of stone fitting into another surface. These models are fully falsifiable.
A simple well known example of this problem is a polynomial with many terms. A polynomial with enough terms can match any data set. In general, the fitted model will fail to extrapolate, to predict data points outside the domain of the data set used in the model fitting (the training set in the terminology of neural networks for example). The fitted polynomial model will frequently interpolate, predict data points within the domain of the data set used in the model fitting — points near and in-between the training set data points, correctly. Thus, we can say that a polynomial model with enough terms is not falsifiable in the sense of the philosopher of science Karl Popper because it can fit many data sets, not just the data set we actually have (real data).
This problem with complex mathematical models was probably first encountered with models of planetary motion in antiquity, the infamous epicycles of Ptolemy and his predecessors in ancient Greece and probably Babylonia/Sumeria (modern Iraq). Pythagoras visited both Babylonia and Egypt. The early Greek accounts of his life suggest he brought back the early Greek math and astronomy from Babylonia and Egypt.
Early astronomers, probably first in Babylonia, attempted to model the motion of Mars and other planets through the Zodiac as uniform circular motion around a stationary Earth. This was grossly incorrect in the case of Mars which backs up for about two months about every two years. Thus the early astronomers introduced an epicycle for Mars. They speculated that Mars moved in uniform circular motion around a point that in turn moved in uniform circular motion around the Earth. With a single epicycle they could reproduce the biannual backing up with some errors. To achieve greater accuracy, they added more and more epicycles, producing an ever more complex model that had some predictive power. Indeed the state of the art Ptolemaic model in the sixteenth century was better than Copernicus’ new heliocentric model which also relied on uniform circular motion and epicycles.
The Ptolemaic model of planetary motion is difficult to falsify because one can keep adding more epicycles to account for discrepancies between the theory and observation. It also has some predictive power. It is an example of a “rubber sheet” model, not a “saran wrap” model.
In the real world, falsifiability is not a simple binary criterion. A mathematical model is not either falsifiable and therefore good or not falsifiable and therefore bad. Rather falsifiability falls on a continuum. In general, extremely complex theories are hard to falsify and not predictive outside of the domain of the data used to infer (fit) the complex theory. Simpler theories tend to be easier to falsify and if correct are sometimes very predictive as with Kepler’s Laws of Planetary Motion and subsequently Newton’s Law of Gravitation, from which Kepler’s Laws can be derived.
Unfortunately, this experience with mathematical modeling is known but has not been quantified in a rigorous way by mathematicians and scientists. Falsifiabiliy remains a slogan primarily used against creationists, parapsychologists, and other groups rather than a rigorous criterion to evaluate theories like the standard model, supersymmetry, or superstrings.
A worrying concern with the standard model with its twenty-five fundamental particles, complex two-page Lagrangian (mathematical formula), and seemingly ad hoc elements such as the Higgs particle and Kobayashi-Maskawa matrix is that it is matching real data entirely or in part due to its complexity and inherent plasticity, much like the historical epicycles or a polynomial with many terms. This concern is not just about subjective “beauty.”
Sheldon Glashow’s original formulation of what became the modern standard model was much simpler, did not include the Higgs particle, did not include the charm, top, or bottom quarks, and a number of other elements (S.L. Glashow (1961). “Partial-symmetries of weak interactions”. Nuclear Physics. 22 (4): 579–588. ). Much as epicycles were added to the early theories of planetary motion, these elements were added on during the 1960’s and 1970’s to achieve agreement with experimental results and theoretical prejudices. In evaluating the seeming success and falsifiability of the standard model, we need to consider not only the terms that were added over the decades but also the terms that might have been added to salvage the theory.
Theories with symmetry have fewer adjustable parameters and are less plastic, flexible, less able to match the data regardless of what data is presented. This forms an objective but poorly quantified basis for intuitive notions of the “mathematical beauty” of symmetry in physics and other fields.
The problem is that although we can express this known problem of poor falsifiability or plasticity (at the most extreme an ability to fit any data set) with mathematical models and modeling qualitatively with words such as “beauty” or “symmetry” or “simplicity,” we cannot express it in rigorous quantitative terms yet.
Big Science and Big Bucks
Much of the book concerns the way the Large Hadron Collider and its huge budget warped the thinking and research results of theoretical physicists, rewarding some like Nima Arkani-Hamed who could produce catchy arguments that new physics would be found at the LHC and encouraging many more to produce questionable arguments that super-symmetry, hidden dimensions or other glamorous exotica would be discovered. The author recounts how her Ph.D. thesis supervisor redirected her research to a topic “Black Holes in Large Extra Dimensions” (2003) that would support the LHC.
Particle accelerators and other particle physics experiments have a long history of huge cost and schedule overruns — which are generally omitted or glossed over in popular and semi-popular accounts. The not-so-funny joke that I learned in graduate school was “multiply the schedule by pi (3.14)” to get the real schedule. A variant was “multiply the schedule by pi for running around in a circle.” Time is money and the huge delays usually mean huge cost overruns. Often these have involved problems with the magnets in the accelerators.
The LHC was no exception to this historical pattern. It went substantially over budget and schedule before its first turn on in 2008, when around a third of the magnets in the multi-billion accelerator exploded, forcing expensive and time consuming repairs (see CERN’s whitewash of the disaster here). LHC faced significant criticism over the cost overruns in Europe even before the 2008 magnet explosion. The reported discovery of the Higgs boson in 2012 has substantially blunted the criticism; one could argue LHC had to make a discovery. 🙂
The cost and schedule overruns have contributed to the cancellation of several accelerator projects including ISABELLE at the Brookhaven National Laboratory on Long Island and the Superconducting Super Collider (SSC) in Texas. The particle physics projects must compete with much bigger, more politically connected, and more popular programs.
The frequent cost and schedule overruns mean that pursuing a Ph.D. in experimental particle physics often takes much longer than advertised and is often quite disappointing as happened to large numbers of LHC graduate students. For theorists, the pressure to provide a justification for the multi-billion dollar projects is undoubtedly substantial.
While genuine advances in fundamental physics may ultimately produce new energy technologies or other advances that will benefit humanity greatly, the billions spent on particle accelerators and other big physics experiments are certain, here and now. The aging faculty at universities and senior scientists at the few research labs like CERN who largely control the direction of particle physics cannot easily retrain for new fields unlike disappointed graduate students or post docs in their twenties and early thirties. The hot new fields like computers and hot high tech employers such as Google are noted for their preference for twenty-somethings and hostility to employees even in their thirties. The existing energy industry seems remarkably unconcerned about alleged “peak oil” or climate change and empirically invests little if anything in finding replacement technologies.
Is there a way forward?
Sabine, who writes on her blog that she is probably leaving particle physics soon, offers some suggestions to improve the field, primarily focusing on learning about and avoiding cognitive biases. This reminds me a bit of the unconscious bias training that Google and other Silicon Valley companies have embraced in a purported attempt to fix their seeming avoidance of employees from certain groups — with dismal results so far. Responding rationally if perhaps unethically to clear economic rewards is not a cognitive bias and almost certainly won’t respond to cognitive bias training. If I learn that I am unconsciously doing something because it is in my economic interest to do so, will I stop?
Future progress in fundamental physics probably depends on finding new informative data that does not cost billions of dollars (for example, a renaissance of table top experiments), reanalysis of existing data, and improved methods of data analysis such as putting falsifiability on a rigorous quantitative basis.
This is an edited video of my presentation on “Automating Complex Data Analysis” to the Bay Area SAS Users Group (BASAS) on August 31, 2017 at Building 42, Genentech in South San Francisco, CA.
The demonstration of the Analyst in a Box prototype starts at 14:10 (14 minutes, 10 seconds). The demo is a video screen capture with high quality audio.
Unfortunately there was some background noise from a party in the adjacent room starting about 12:20 until 14:10 although my voice is understandable.
Complex data analysis attempts to solve problems with one or more inputs and one or more outputs related by complex mathematical rules, usually a sequence of two or more non-linear functions applied iteratively to the inputs and intermediate computed values. A prominent example is determining the causes and possible treatments for poorly understood diseases such as heart disease, cancer, and autism spectrum disorders where multiple genetic and environmental factors may contribute to the disease and the disease has multiple symptoms and metrics, e.g. blood pressure, heart rate, and heart rate variability.
Another example are macroeconomic models predicting employment levels, inflation, economic growth, foreign exchange rates and other key economic variables for investment decisions, both public and private, from inputs such as government spending, budget deficits, national debt, population growth, immigration, and many other factors.
A third example is speech recognition where a complex non-linear function somehow maps from a simple sequence of audio measurements — the microphone sound pressure levels — to a simple sequence of recognized words: “I’m sorry Dave. I can’t do that.”
State-of-the-art complex data analysis is labor intensive, time consuming, and error prone — requiring highly skilled analysts, often Ph.D.’s or other highly educated professionals, using tools with large libraries of built-in statistical and data analytical methods and tests: SAS, MATLAB, the R statistical programming language and similar tools. Results often take months or even years to produce, are often difficult to reproduce, difficult to present convincingly to non-specialists, difficult to audit for regulatory compliance and investor due diligence, and sometimes simply wrong, especially where the data involves human subjects or human society.
This talk discusses the current state-of-the-art in attempts to automate complex data analysis. It discusses widely used tools such as SAS and MATLAB and their current limitations. It discusses what the automation of complex data analysis may look like in the future, possible methods of automating complex data analysis, and problems and pitfalls of automating complex data analysis. The talk will include a demonstration of a prototype system for automating complex data analysis including automated generation of SAS analysis code.
I attended a “Machine Learning at Google” event at the Google Quad 3 building off Ellis in Mountain View last night (August 23, 2017). This seemed to be mostly a recruiting event for some or all of Google’s high profile Machine Learning/Deep Learning groups, notably the team responsible for TensorFlow.
Woman Opens Event
I had no trouble finding the registration table when I arrived and getting my badge. All the presentations seemed to run on time or nearly on time. There was free food, a cute bag with Google gewgaws, and plenty of seating (about 280 seats with attendance about 240 I thought).
The event invitation that I received was rather vague and it did not become clear this was a recruiting event until well into the event. It had the alluring title:
An Exclusive Invite | Machine Learning @ Google
Ooh, exclusive! Aren’t I special! Along with 240 other attendees as it turned out. 🙂
Andrew Zaldivar (see below) explicitly called it a recruiting event in the Q&A panel at the end. It would have been good to know this as I am not looking for a job at Google. That does not mean the event wasn’t interesting to me for other reasons, but Google and other companies should be up front about this.
Although I think the speakers were on a low platform, they weren’t up high enough to see that well, even though I was in the front. This was particularly true of Jasmine Hsu who was short. I managed to get one picture of her not fully or mostly obscured by someone’s head. Probably a higher platform for the presenters would have helped.
A good looking woman who seemed to be some sort of public relations or marketing person opened the event at 6:30 PM. She went through all the usual event housekeeping and played a slick Madison Avenue style video on the coming wonders of machine learning. Then she introduced the keynote speaker Ravi Kumar.
Ravi Kumar Keynote
Ravi was followed by a series of “lightning talks” on machine learning and deep learning at Google by Sandeep Tata, Heng-Tze Cheng, Ian Goodfellow, James Kunz, Jasmine Hsu, and Andrew Zaldivar.
The presentations tended to blur together. The typical machine learning/deep learning presentation is an extremely complex model that has been fitted to a very large data set. Giant companies like Google and Facebook have huge proprietary data sets that few others can match. The presenters tend to be very confident and assert major advances over past methods and often to match or exceed human performance. It is often impossible to evaluate these claims without access to both the huge data sets and vast computing power. People who try to duplicate the reported dramatic results with more modest resources often report failure.
The presentations often avoid the goodness-of-fit statistics, robustness, and overfitting issues that experts in mathematical modeling worry about with such complex models. A very complex model such as a polynomial with thousands of terms can always fit a data set but it will usually fail to extrapolate outside the data set correctly. Polynomials, for example, always blow up to plus or minus infinity as the largest power term dominates.
In fact one Google presenter mentioned a “training-server skew” problem where the field data would frequently fail to match the training data used for the model. If I understood his comments, this seemed to occur almost every time supposedly for different reasons for each model. This sounded a lot like the frequent failure of complex models to extrapolate to new data correctly.
Ravi Kumar’s keynote presentation appeared to be a maximum likelihood estimation (MLE) of a complex model of repeat consumption by users: how often, for example, a user will replay the same song or YouTube video. MLE is not a robust estimation method and it is vulnerable to outliers in the data, almost a given in real data, yet there seemed to be no discussion of this issue in the presentation.
Often when researchers and practitioners from other fields that make heavy use of mathematical modeling such as statistics or physics bring up these issues, the machine learning/deep learning folks either circle the wagons and deny the issues or assert dismissively that they have the issues under control. Move on, nothing to see here.
Sandeep TataHang TzeIan Goodfellow on Deep Learning Research at GoogleJasmine Hsu on Robotics and Computer VisionJames KunzAndrew Zaldivar on SPAM Fighting with Machine Learning
Andrew Zaldivar introduced the Q&A panel for which he acted as moderator. Instead of having audience members take the microphone and ask their questions uncensored as many events do, he read out questions supposedly submitted by e-mail or social media.
Andrew Zaldivar Introduces the PanelQ and A Panel
The Q&A panel was followed by a reception from 8-9 PM to “meet the speakers.” It was difficult to see how this would work with about thirty (30) audience members for each presenter. I did not stay for the reception.
I found the presentations interesting but they did not go into most of the deeper technical questions such as goodness-of-fit, robustness, and overfitting that I would have liked to hear. I feel Google should have been clearer about the purpose of the event up front.
One of the most common arguments for learning math (or computer programming or chess or <insert your favorite subject here>) is that math teaches you to think. This argument has a long history of failing to convince skeptical students and adults especially where more advanced mathematics such as algebra and calculus is concerned.
The “math teaches you to think” argument has several problems. Almost any intellectual activity including learning many sports teaches you to think. Reading Shakespeare teaches you to think. Playing Dungeons and Dragons teaches you to think. What is so special about math?
Math teaches ways of thinking about quantitative problems that can be very powerful. As I have argued in a previous post Why Should You Learn Mathematics? mathematics is genuinely needed to make informed decisions about pharmaceuticals and medical treatments, finance and real estate, important public policy issues such as global warming, and other specialized but important areas. The need for mathematics skills and knowledge beyond the basic arithmetic level is growing rapidly due to the proliferation of, use, and misuse of statistics and mathematical modeling in recent years.
Book Smarts Versus Street Smarts
However, most math courses and even statistics courses such as AP Statistics teach ways of thinking that do not work well or even work at all for many “real world” problems, social interactions, and human society.
This is not a new problem. One of Aesop’s Fables (circa 620 — 524 BC) is The Astronomer which tells the tale of an astronomer who falls into a well while looking up at the stars. The ancient mathematics of the Greeks, Sumerians, and others had its roots in ancient astronomy and astrology.
Proof of the Pythagorean Theorem from 1200 A.D.
Why does mathematical thinking often fail in the “real world?” Most mathematics education other than statistics teaches that there is one right answer which can be found by precise logical and mathematical steps. Two plus two is four and that is it. The Pythagorean Theorem is proven step by step by rigorous logic starting with Euclid’s Postulates and Definitions. There is no ambiguity and no uncertainty and no emotion.
If a student tries to apply this type of rigorous, exact thinking to social interactions, human society, even walking across a field where underbrush has obscured a well as in Aesop’s Fable of the Astronomer, the student will often fail. Indeed, the results can be disastrous as in the fable.
In fact, at the K-12 level and even college, liberal arts such as English literature, history, debate, the law do a much better job than math in teaching students the reality that in many situations there are many possible interpretations. Liberal arts deals with people and even the most advanced mathematics has failed to duplicate the human mind.
In dealing with other people, we can’t read their minds. We have to guess (estimate) what they are thinking to predict what they may do in the future. We are often wrong. Mathematical models of human behavior generally don’t predict human behavior reliably. Your intuition from personal experience, learning history, and other generally non-quantitative sources is often better.
The problem is not restricted to human beings and human society. When navigating in a room or open field, some objects will be obscured by other objects or we won’t happen to be looking at them. Whether we realize it or not, we are making estimates — educated guesses — about physical reality. A bush might be just a bush or it might hide a dangerous well that one can fall into.
The Limits of Standard Statistics Courses
It is true that statistics courses such as AP Statistics and/or more advanced college and post-graduate statistics addresses these problems to some degree: unlike basic arithmetic, algebra, and calculus. The famous Bayes Theorem gives a mathematical framework for estimating the probability that a hypothesis is true given the data/observations/evidence. It allows us to make quantitative comparisons between competing hypotheses: just a bush versus a bush hiding a dangerous well.
However, many students at the K-12 level and even college get no exposure to statistics or very little. How many students understand Bayes Theorem? More importantly, there are significant unknowns in the interpretation and proper application of Bayes Theorem to the real world. How many students or even practicing statisticians properly understand the complex debates over Bayes Theorem, Bayesian versus frequentist versus several other kinds of statistics?
All or nearly all statistics that most students learn is based explicitly or implicitly on the assumption of independent identically distributed random variables. These are cases like flipping a “fair” coin where the probability of the outcome is the same every time and is not influenced by the previous outcomes. Every time someone flips a “fair” coin there is the same fifty percent chance of heads and the same fifty percent chance of tails. The coin flips are independent. It does not matter whether the previous flip was heads or tails. The coin flips are identically distributed. The probability of heads or tails is always the same.
The assumption of independent identically distributed is accurate or very nearly accurate for flipping coins, most “fair” games of chance used as examples in statistics courses, radioactive decay, and some other natural phenomena. It is generally not true for human beings and human society. Human beings learn from experience and change over time. Various physical things in the real world also change over time.
Although statistical thinking is closer to the “real world” than many other commonly taught forms of mathematics, it still in practice deviates substantially from everyday experience.
Teaching Students When to Think Mathematically
Claims that math (or computer programming or chess or <insert your favorite subject here>) teaches thinking should be qualified with what kind of thinking is taught, what are its strengths and weaknesses, and what problems is it good for solving.
The image of a Latin proof of the Pythagorean Theorem with diagrams is from Wikimedia Commons and is in the public domain. The original source is a manuscript from 1200 A.D.
Mathematician with Calipers from The School of Athens fresco by Raphael (1509-1511)
Why should you learn mathematics? By mathematics, I am not referring to basic arithmetic: addition, subtraction, multiplication, division, and raising a number to a power — for example for an interest calculation in personal finance. There is little debate that in the modern world the vast majority of people need to know basic arithmetic to buy and sell goods and services and perform many other common tasks. By mathematics I mean more advanced mathematics such as algebra, geometry, trigonometry, calculus, linear algebra, and college level statistics.
I am not referring to highly specialized advanced areas of mathematics such as number theory or differential geometry generally taught after the sophomore year in college or in graduate school.
A number of educators such as Eloy Ortiz Oakley, the chancellor of California’s community colleges, have embraced a similar view, even arguing that abolishing the algebra requirement is a civil rights issue since some minority groups fail the algebra requirement at higher rates than white students. Yes, he did say it is a civil rights issue:
The second thing I’d say is yes, this is a civil rights issue, but this is also something that plagues all Americans — particularly low-income Americans. If you think about all the underemployed or unemployed Americans in this country who cannot connect to a job in this economy — which is unforgiving of those students who don’t have a credential — the biggest barrier for them is this algebra requirement. It’s what has kept them from achieving a credential.
At present, few jobs, including the much ballyhooed software development jobs, require more than basic arithmetic as defined above. For example, the famous code.org“What Most Schools Don’t Teach” video on coding features numerous software industry luminaries assuring the audience how easy software development is and how little math is involved. Notably Bill Gates at one minute and forty-eight seconds says: “addition, subtraction…that’s about it.”
Bill Gates assessment of the math required in software development today is largely true unless you are one of the few percent of software developers working on highly mathematical software: video codecs, speech recognition engines, gesture recognition algorithms, computer graphics for games and video special effects, GPS, Deep Learning, FDA drug approvals, and other exotic areas.
Thus, the question arises why people who do not use mathematics professionally ought to learn mathematics. I am not addressing the question of whether there should be a requirement to pass algebra to graduate high school or for a college degree such a veterinary degree where there is no professional need for mathematics. The question is whether people who do not need mathematics professionally should still learn mathematics — whether it is required or not.
People should learn mathematics because they need mathematics to make informed decisions about their health care, their finances, public policy issues that affect them such as global warming, and engineering issues such as the safety of buildings, aircraft, and automobiles — even though they don’t use mathematics professionally.
The need to understand mathematics to make informed decisions is increasing rapidly with the proliferation of “big data” and “data science” in recent years: the use and misuse of statistics and mathematical modeling on the large, rapidly expanding quantities of data now being collected with extremely powerful computers, high speed wired and wireless networks, cheap data storage capacity, and inexpensive miniature sensors.
Health and Medicine
An advanced knowledge of statistics is required to evaluate the safety and effectiveness of drugs, vaccines, medical treatments and devices including widely used prescription drugs. A study by the Mayo Clinic in 2013 found that nearly 7 in 10 (70%) of Americans take at least one prescription drug. Another study published in the Journal of the American Medical Association (JAMA) in 2015 estimated about 59% of Americans are taking a prescription drug. Taking a prescription drug can be a life and death decision as the horrific case of the deadly pain reliever Vioxx discussed below illustrates.
The United States and the European Union have required randomized clinical trials and detailed sophisticated statistical analyses to evaluate the safety and effectiveness of drugs, medical devices, and treatments for many decades. Generally, these analyses are performed by medical and pharmaceutical companies who have an obvious conflict of interest. At present, doctors and patients often find themselves outmatched in evaluating the claims for the safety and effectiveness of drugs, both new and old.
The FDA has instituted an FDA Adverse Events Reporting System (FDAERS) for doctors and other medical professionals to report deaths and serious health problems such as hospitalization suspected of being caused by adverse reactions to drugs. In 2014, 123,927 deaths were reported to the FDAERS and 807,270 serious health problems. Of course, suspicion is not proof and a report does not necessarily mean the reported drug was the cause of the adverse event.
Vioxx (generic name rofecoxib) was a pain-killer marketed by the giant pharmaceutical company Merck (NYSE:MRK) between May of 1999 when it was approved by the United States Food and Drug Administration (FDA) and September of 2004 when it was withdrawn from the market. Vioxx was marketed as a “super-aspirin,” allegedly safer and implicitly more effective than aspirin and much more expensive, primarily to elderly patients with arthritis or other chronic pain. Vioxx was a “blockbuster” drug with sales peaking at about $2.5 billion in 2003 1 and about 20 million users 2. Vioxx probably killed between 20,000 and 100,000 patients between 1999 and 2004 3.
Faulty blood clotting is thought to be the main cause of most heart attacks and strokes. Unlike aspirin, which lowers the probability of blood coagulation (clotting) and therefore heart attacks and strokes, Vioxx increased the probability of blood clotting and the probability of strokes and heart attacks by about two to five times.
Remarkably, Merck proposed and the FDA approved Phase III clinical trials of Vioxx with too few patients to show that Vioxx was actually safer than the putative 3.8 deaths per 10,000 patients rate (16,500 deaths per year according to a controversial study used to promote Vioxx) from aspirin and other non-steroidal anti-inflammatory drugs (NSAIDs) such as ibuprofen (the active ingredient in Advil and Motrin), naproxen (the active ingredient in Aleve), and others.
The FDA guideline, Guideline for Industry: The Extent of Population Exposure to Assess Clinical Safety: For Drugs Intended for Long-Term Treatment of Non-Life-Threatening Conditions (March 1995), only required enough patients in the clinical trials to reliably detect a risk of about 0.5 percent (50 deaths per 10,000) of death in patients treated for six months or less (roughly equivalent to one percent death rate for one year assuming a constant risk level) and about 3 percent (300 deaths per 10,000) for one year (recommending about 1,500 patients for six months or less and about 100 patients for at least one year without supporting statistical power computations and assumptions in the guideline document).
The implicit death rate detection threshold in the FDA guideline was well above the risk from aspirin and other NSAIDs and at the upper end of the rate of cardiovascular “events” caused by Vioxx. FDA did not tighten these requirements for Vioxx even though the only good reason for the drug was improved safety compared to aspirin and other NSAIDs. In general, the randomized clinical trials required by the FDA for drug approval have too few patients – insufficient statistical power in statistics terminology – to detect these rare but deadly events 4.
To this day, most doctors and patients lack the statistical skills and knowledge to evaluate the safety level that can be inferred from the FDA required clinical trials. There are many other advanced statistical issues in evaluating the safety and effectiveness of drugs, vaccines, medical treatments, and devices.
Finance and Real Estate
Mathematical models have spread far and wide in finance and real estate, often behind the scenes invisible to casual investors. A particularly visible example is Zillow’s ZEstimate of the value of homes, consulted by home buyers and sellers every day. Zillow is arguably the leading online real estate company. In March 2014, Zillow had over one billion page views, beating competitors Trulia.com and Realtor.com by a wide margin; Zillow has since acquired Trulia.
Zillow’s algorithm for valuing homes is proprietary and Zillow does not disclose the details and/or the source code. Zillow hedges by calling the estimate an “estimate” or a “starting point.” It is not an appraisal.
However, Zillow is large and widely used, claiming estimates for about 110 million homes in the United States. That is almost the total number of homes in the United States. There is the question whether it is so large and influential that it can effectively set the market price.
Zillow makes money by selling advertising to realty agents. Potential home buyers don’t pay for the estimates. Home sellers and potential home sellers don’t pay directly for the estimates either. This raises the question whether the advertising business model might have an incentive for a systematic bias in the estimates. One could argue that a lower valuation would speed sales and increase commissions for agents.
Zillow was recently sued in Illinois over the ZEstimate by a homeowner — real estate lawyer Barbara Andersen 🙂 — claiming the estimate undervalued her home and made it difficult therefore to sell the home. The suit argues that the estimate is in fact an appraisal, despite claims to the contrary by Zillow, and therefore subject to Illinois state regulations regarding appraisals. Andersen has reportedly dropped this suit and expanded to a class-action lawsuit by home builders in Chicago again alleging that the ZEstimate is an appraisal and undervalues homes.
On the other hand, Zillow CEO Spencer Rascoff’s Seattle home reportedly sold for $1.05 million on Feb. 29, 2016, 40 percent less than the Zestimate of $1.75 million shown on its property page a day later (March 1, 2016). 🙂
As in the example of Vioxx and other FDA drug approvals, it is actually a substantial statistical analysis project to independently evaluate the accuracy of Zillow’s estimates. What do you do if Zillow substantially undervalues your home when you need to sell it?
Murky mathematical models of the value of mortgage backed securities played a central role in the financial crash in 2008. In this case, the models were hidden behind the scenes and invisible to casual home buyers or other investors. Even if you are aware of these models, how do you properly evaluate their effect on your investment decisions?
Public Policy
Misleading and incorrect statistics have a long history in public policy and government. Darrell Huff’s classic How to Lie With Statistics (1954) is mostly concerned with misleading and false polls, statistics, and claims from American politics in the 1930’s and 1940’s. It remains in print, popular and relevant today. Increasingly however political controversies involve often opaque computerized mathematical models rather than the relatively simple counting statistics debunked in Huff’s classic book.
Huff’s classic and the false or misleading counting statistics in it generally required only basic arithmetic to understand. Modern political controversies such as Value Added Models for teacher evaluation and the global climate models used in the global warming controversy go far beyond basic arithmetic and simple counting statistics.
The Misuse of Statistics and Mathematics
Precisely because many people are intimidated by mathematics and had difficulty with high school or college mathematics classes including failing the courses, statistics and mathematics are often used to exploit and defraud people. Often the victims are the poor, marginalized, and poorly educated. Mathematician Cathy O’Neil gives many examples of this in her recent book Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (2016).
The misuse of statistics and mathematics is not limited to poor victims. Bernie Madoff successfully conned large numbers of wealthy, highly educated investors in both the United States and Europe using the arcane mathematics of options as a smokescreen. These sophisticated investors were often unable to perform the sort of mathematical analysis that would have exposed the fraud.
Rich and poor alike need to know mathematics to protect themselves from this frequent and growing misuse of statistics and mathematics.
Algebra and College Level Statistics
The misleading and false counting statistics lampooned by Darrell Huff in How to Lie With Statistics does not require algebra or calculus to understand. In contrast, the college level statistics often encountered in more complex issues today does require a mastery of algebra and sometimes calculus.
For example, one of the most common probability distributions encountered in real data and mathematical models is the Gaussian, better known as the Normal Distribution or Bell Curve. This is the common expression for the Gaussian in algebraic notation.
[latex]x[/latex] is the position of the data point. [latex]\mu[/latex] is the mean of the distribution. If I have a data set obeying the Normal Distribution, most of the data points will be near the mean [latex]\mu[/latex] and fewer further away. [latex]\sigma[/latex] is the standard deviation — loosely the width — of the distribution. [latex]\pi[/latex] is the ratio of the circumference of a circle to the diameter. [latex]e[/latex] is Euler’s number (about 2.718281828459045).
This is a histogram of simulated data following the Normal Distribution/Bell Curve/Gaussian with a mean [latex]\mu[/latex] of zero (0.0) and a standard deviation [latex]\sigma[/latex] of one (1.0):
Simulated Data Following the Normal Distribution
To truly understand the Normal Distribution you need to know Euler’s number e and algebraic notation and symbolic manipulation. It is very hard to express the Normal Distribution with English words or basic arithmetic. The Normal Distribution is just one example of the use of algebra in college level statistics. In fact, an understanding of calculus is needed to have a solid understanding and mastery of college level statistics.
People should learn mathematics — meaning subjects beyond basic arithmetic such as algebra, geometry, trigonometry, calculus, linear algebra, and college level statistics — to make informed decisions about their health care, personal finances and retirement savings, important public policy issues such as teacher evaluation and public education, and other key issues such as evaluating the safety of buildings, airplanes, and automobiles.
There is no doubt that many people experience considerable difficulty learning mathematics whether due to poor teaching, inadequate learning materials or methods, or other causes. There is and has been heated debate over the reasons. These difficulties are not an argument for not learning mathematics. Rather they are an argument for finding better methods to learn and teach mathematics to everyone.
End Notes
1“How did Vioxx debacle happen?” By Rita Rubin, USA Today, October 12, 2004 The move was a stunning denouement for a blockbuster drug that had been marketed in more than 80 countries with worldwide sales totaling $2.5 billion in 2003.
3 A “blockbuster” drug is pharmaceutical industry jargon for a drug with at least $1 billion in annual sales. Like Vioxx, it need not be a “wonder drug” that cures or treats a fatal or very serious disease or condition.
Received: 9 February 2012 Accepted: 30 July 2012 Published: 20 August 2012
The premarketing clinical trials required for approval of a drug primarily guard against type 1 error. RCTs are usually statistically underpowered to detect the specific harm either by recruitment of a low-risk population or low intensity of ascertainment of events. The lack of statistical significance should not be used as proof of clinical safety in an underpowered clinical trial.
The image of an ancient mathematician or engineer with calipers, often identified as Euclid or Archimedes, is from The School of Athens fresco by Raphael by way of Wikimedia Commons. It is in the public domain.
On October 13, 1601 the famous astronomer and astrologer Tycho Brahe (1546-1601) — new friend, confidant, and adviser to the Holy Roman Emperor Rudolf II, one of the most powerful men in Europe — became unexpectedly and gravely ill at a banquet in the Imperial capital of Prague. Tycho was a colorful, athletic and brilliant Danish nobleman who had defied convention by taking a commoner as his wife and by pursuing the study of astronomy instead of the more common pastimes of his fellow nobles in Denmark which he pointedly disdained as frivolous and unimportant.
Tycho Brahe
Tycho suffered horribly for about a week, seemed to begin recovering, and then died unexpectedly on October 24, 1601. Tycho was noted for his good health and vigor. His death came as a surprise to his family, friends, and colleagues.
The Imperial court was a hotbed of intrigue and filled with peculiar and often ambitious men who frequently worked with highly toxic chemicals: astrologers, alchemists and magicians in the employ of the Holy Roman Emperor Rudolf II (1552-1612) who hoped to unlock the secrets of the universe by funding a research program that might be called the Manhattan Project of its time.
From the start there were rumors Tycho had been poisoned and some suspected his assistant the young, equally brilliant mathematician and astronomer Johannes Kepler, remembered today as the author of Kepler’s Three Laws of Planetary Motion.
Tycho was buried with great pomp and circumstance in Prague on November 4, 1601 with some of his friends and colleagues making pointed aspersions in Kepler’s direction during the ceremonies. In time, his beloved wife Kirsten Barbara Jørgensdatter was buried beside him.
Tycho and Kepler
By one year earlier (1600), Tycho had accumulated over a lifetime by far the most accurate measurements of the positions of the planets over time, especially the planet Mars thought by astrologers and kings to influence the occurrence and outcomes of wars and conflict. After years of lavish royal patronage in Denmark, Tycho had a falling out with the new king and fled to the mostly German-speaking Holy Roman Empire of Rudolf II. Here with funding from Rudolf II he hoped to analyze his data and confirm his own novel theory of the solar system, the known universe at the time, in which the Earth was the center with the Sun and Moon orbiting the Earth and all the other planets orbiting the Sun.
Tycho hired the brilliant young up-and-coming astronomer and mathematician Johannes Kepler (1571-1630) to analyze his data. Kepler hoped to use Tycho’s data to confirm his own Theory of Everything based on the hot new Sun-centered theory of Nicolaus Copernicus (1473-1543).
Johannes Kepler (1610)
Tycho and Kepler had a stormy working relationship until Tycho’s untimely death in 1601 which left Kepler with the access to Tycho’s data that he desired. In the chaos accompanying Tycho’s death, Kepler quietly walked off with Brahe’s notebooks containing his data on Mars. The ensuing controversy with Brahe’s family was eventually resolved more-or-less amicably, but in the mean time Kepler had the data he had sought.
In one of the great ironies of scientific history, Kepler proceeded to discover that his pet theory, the other variants of Copernicus’s Sun-centered system, the traditional Earth-centered system of Klaudius Ptolemy, and Tycho’s hybrid Earth-Sun centered system were all wrong, although he was never able to fully accept that the data ruled out his system as well.
The models were all mathematically equivalent although they had different physical interpretations. All incorrectly assumed that the motions of Mars and the other planets were built up from uniform circular motion, the infamous epicycles.
Kepler’s analysis of Tycho’s data on the planet Mars took about five years — including his work in 1600 and 1601 when he had limited access to the full data set. In 1605, while taking a break during the Easter Holiday, Kepler had his Eureka moment. He realized that the orbit of Mars was elliptical with the Sun at one focus of the ellipse and that the speed of the planet varied inversely with distance from the Sun so that the plane swept out the same area in the same time. These two insights are now known as Kepler’s First and Second Laws, and they ensured the fame of both Brahe and Kepler to the present day.
Did Kepler Murder Tycho?
In 1991, soon after the end of the Cold War, the National Museum in Prague gave a somewhat peculiar goodwill gift to Denmark, a small box with a six centimeter long sample of Tycho Brahe’s mustache hair, acquired years earlier when Tycho’s crypt was refurbished in 1901. Tycho and his wife’s skeletons were examined and then reburied in 1901, but a few samples were taken and given to the National Museum in Prague.
The gift reopened the old question of whether Kepler or someone else had poisoned Tycho in 1601. Kepler had been a seemingly deeply religious man who had given up a comfortable teaching job in Graz rather than abandon his Lutheran faith and convert to Catholicism. He was later excommunicated from the Lutheran Church for publicly rejecting the Lutheran doctrine of ubiquity with no apparent gain to himself. This latter doctrine was an esoteric theological issue nonetheless of paramount importance in the conflict between the Lutherans, Calvinists, and Catholics that would soon lead to the horrific Thirty Years War (1618-1648).
This same stubbornness in holding to his views and perhaps the jealousy of his colleagues had led Kepler into bitter clashes with Tycho and others during his career. Could such a man have committed murder for the lucrative position of Imperial Mathematician in Rudolf II’s court, fame, or even a fanatical desire to extend human knowledge whatever the cost?
The hair from Tycho’s mustache was examined using modern forensic techniques by Bent Kaempe, Director of the Department of Forensic Chemistry at the Institute of Forensic Medicine at the University of Copenhagen — one of the leading toxicologists in Europe, and potentially lethal levels of mercury detected. Kaempe concluded that:
Tycho Brahe’s uremia can probably be traced to mercury poisoning, most likely due to Brahe’s experiments with his elixir 11-12 days before his death.
Mercury and mercury compounds, some extremely toxic, were widely used in alchemy. Tycho himself used a mercury compound at low doses, potentially deadly at higher doses, for his health — following the alchemical ideas of Paracelsus (1493-1541): the elixir mentioned by Kaempe.
Some experts argued the mercury measurements demonstrated that Tycho had been poisoned and murdered with a mercury compound. Others suggested that the mercury was due to the embalming process or some other contamination of Tycho’s remains.
The controversy led to the exhumation of Tycho’s skeleton in 2010 in an attempt to settle the issue. The analysis of Tycho’s remains seemingly ruled out lethal levels of mercury as the cause of death in 2012 and seems to have been generally consistent with natural causes, a bladder infection.
The Limits of Forensic Science
After over four-hundred years, it seems unlikely that we will ever know for sure if Tycho Brahe was poisoned and, if so, by whom. Even today, people in their fifties die unexpectedly from heart attacks and other causes at rates substantially higher than people in their twenties, thirties, and forties. Medicine was very limited in Tycho’s time — often more dangerous than doing nothing in fact. Modern sanitation measures were almost non-existent even at an Imperial court.
On the other hand, Rudolf II had recruited and gathered around himself in Prague some of the most brilliant, highly educated, ambitious, and strange men of his time, many experts like Tycho in toxic chemicals used in alchemy and medicine. Many were probably familiar with plants and herbs available in Renaissance Europe, some of which could have been used as deadly poisons as well. He offered these men enormous wealth at a time when most people in Europe lived in dire poverty.
Kepler’s own mother was accused of and convicted of witchcraft. She was specifically accused of poisoning another woman with a magic potion. Kepler himself was a highly educated and brilliant man. It is quite conceivable that he could have known much about poisons, perhaps even ones unknown or rarely used today. He had close access to Tycho, his boss.
The mercury measurements of Tycho’s mustache hair is one of many examples of overconfidence in forensic science. This overconfidence is often an explicit or implicit claim that forensic techniques — if done right — can give an absolutely certain or almost certain (for example, the one in many trillion odds often quoted for DNA matches in criminal cases) answer.
This false certainty is a claim made by governments, prosecutors, scientists who should know better, and many others. It is heavily promoted in the popular media with television shows like CSI (2000-2015), Numb3rs (2005-2010), Quincy (1976-1983), blockbuster movies like Silence of the Lambs (1991), and many others.
Numerous cases in recent years have demonstrated the uncertainty of forensic methods in the real world. These include the Brandon Mayfield case for fingerprint analysis, the questionable use of DNA “profiling” in the Amanda Knox murder case in Italy, the failure of DNA analysis in the Jaidyn Leskie case in Australia, and many more.
In the case of DNA, the astronomical DNA match odds frequently quoted by prosecutors are highly misleading because they do not include a valid statistical model for the probability of contamination of the samples in the field, at the crime scene by investigators, or at the forensic laboratory where the DNA match is performed. Almost certainly the odds of a false match due to some sort of contamination scenario are much higher than the one in several trillion odds often cited by prosecutors.
Contamination in the field is a likely explanation for the mercury levels in Tycho’s mustache. The mercury may have come from the embalming process. Perhaps Tycho or someone near him somehow spilled some mercury compound on his mustache while he was ill and dying. Tycho worked with and used mercury compounds frequently and they were likely present in his home.
The reality is that there is limited data in many crimes and possible crimes like Tycho’s death. There are usually many interpretations possible for that data, some more likely than others, some improbable but not impossible. In many cases, we don’t even know the prior probability of those interpretations. In the case of Tycho, we don’t know the probability that his mustache was contaminated with mercury by embalming or some other cause.
Mathematically, we now know there are an infinite number of mathematical models that can match any finite set of data with a desired level of accuracy. In an early example of this, Kepler was able to show that the traditional Ptolemaic Earth-centered model of the Solar System, the hot new Copernican Sun-centered model, and the hybrid model of Tycho were mathematically equivalent and matched the data equally well — predicting the future position of Mars in the Zodiac to about one percent accuracy, a few degrees.
Most of this infinity of mathematical models matching a finite data set are extremely complicated. We typically throw out the more complicated models to get a finite set of arguably plausible choices.
Historically, this method of selecting mathematical models has proven remarkably powerful in physics and astronomy. Kepler discovered that a simple mathematical model of non-uniform elliptical motion explained the seemingly extremely complex motions of Mars and the other planets. Newton, Maxwell, Einstein, and others have repeated this success with other data and phenomena including gravitation, electromagnetism, and radioactive decay.
Many Model for the Same Data
This infinity of possible mathematical models for a finite data set is the mathematical explanation for the many possible interpretations of the data from a crime scene. Even if we exclude extremely complicated and implausible models and interpretations a priori, we are still typically left with a number of possibilities, notably including contamination scenarios with DNA and other forensic methods such as the measurements of the mercury in Tycho’s mustache hair.
The figure above illustrates the many models problem in a forensic context. It shows a simulation with four simulated data points. These data points could be, for example, the strength of a DNA signal or the mercury level in different parts of Tycho’s body. The high point on the right is the mercury level in his mustache hair. The problem is the level could be lower in his body — too low to cause death. The other points could be measurements of the mercury level in various bones from his skeleton.
We don’t actually have the soft tissues where the putative poison would have done its deadly work. These have decayed away. The analyst must therefore infer the mercury level in those tissues over four hundred years ago from measurements today. The red line represents, for example, the threshold for a lethal level of mercury in Tycho’s body. Thus, depending on which model is chosen, mercury did or did not kill Tycho. In reality the forensic analysis is often much more complex and difficult to perform than this simple simulated example and illustration.
In conclusion, the certainty or near certainty often claimed or implied in many forensic analyses is frequently illusory.
The image of the planet Mars is from the NASA Jet Propulsion Laboratory (JPL) and is in the public domain. It is a mosaic of images taken by observation satellites in orbit around Mars. It shows the giant Valles Marineris canyon on Mars front and center. It is one of the most popular images of Mars.
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).