The Myth of Falsifiability Article

NOTE: This is an updated version of my presentation “The Myth of Falsifiability.” I have added a few comments on the application of falsifiability and falsifiability metrics to models of the COVID-19 pandemic. The main focus is on the safety and effectiveness of drugs and medical treatments and financial models of investments, but the relevance to COVID-19 models should be obvious. A video version of this presentation is available at https://youtu.be/6y6_6x_kmlY

The article starts with a discussion of the myth of falsifiability, a commonly cited doctrine often used to exclude certain points of view and evidence from consideration as “not scientific”. It discusses the glaring problems with the popular versions of this doctrine and the lack of a rigorous quantitative formulation of a more nuanced concept of falsifiability as originally proposed, but not developed, by the philosopher Karl Popper. The article concludes with a brief accessible presentation of our work on a rigorous quantitative falsifiability metric useful in practical science and engineering.

The scientific doctrine of falsifiability is key in practical problems such as confirming the accuracy and reliability of epidemiological models of the COVID-19 pandemic, the safety and effectiveness of pharmaceuticals and the safety and the reliability of financial models. How confident can we be of unprecedented world-changing policies ostensibly based on a plethora of conflicting models of the COVID-19 pandemic combined with highly incomplete and rapidly changing data?

How confident can we be that FDA approved drugs are safe, let alone effective? How confident can we be of AAA ratings for securities based on mathematical models?

In practice falsifiability is commonly cited to exclude certain points of view and evidence from consideration as “not scientific”.

The Encyclopedia Brittanica gives a typical example of the popular version of falsifiability:

Criterion of falsifiability, in the philosophy of science, a standard of evaluation of putatively scientific theories, according to which a theory is genuinely scientific only if it is possible in principle to establish that it is false.

Encyclopedia Brittanica

In practice, this popular version of falsifiability gives little guidance in evaluating whether an epidemiological model is reliable, a drug is safe or effective or a triple-A rated security is genuinely low risk. In actual scientific and engineering practice, we need a reliable estimate of how likely the apparent agreement of model and data is due to flexibility in the model from adjustable parameters, ad hoc changes to the mathematical model, and other causes such as data selection procedures. I will discuss this in more detail later in this article.

Karl Popper and The Logic of Scientific Discovery

The Austrian philosopher Karl Popper developed and presented a theory of falsifiability in his book The Logic of Scientific Discovery. This book is often cited and rarely read. My copy is 480 pages of small type.

Popper was especially concerned with rebutting the ostensibly scientific claims of Marxism and other ideologies. Popper was a deep thinker and understood that there were problems with a simple concept of falsifiability as I discuss next.

Falsifiability is largely encountered in disputes about religion and so-called pseudo-science, for example parapsychology. It is notably common in disputes over teaching evolution and creationism, the notion that God created the universe, life and human beings in some way, in schools. In the United States, creationism often refers to a literal or nearly literal interpretation of the Book of Genesis in the Bible.

This is a typical example from the RationalWiki arguing that creationism is not falsifiable and therefore is not science.

RationalWiki Example of the Common Use of Falsifiability

Remarkably, the doctrine of falsifiability is very rarely invoked in the scholarly scientific peer-reviewed literature, almost never outside of rare articles specifically rebutting the legitimacy of topics such as creationism and alleged pseudo-science. For example, a search of the arxiv.org preprint archive (1.6 million articles) turned up only eight matches for falsifiability and Popper as shown here.

Scientific and Engineering Citation of Falsifiability is Extremely Rare

In fact, there are many historical examples of scientific theories that could not be falsified but have been confirmed.

The existence of Black Swans, discovered in Australia. No matter how long one fails to find a single black swan, this does not mean they do not exist.

Stones falling from the sky, meteorites, were rejected by science for over a century despite many historical and anecdotal accounts of these remarkable objects.

Images of a Black Swan and a Meteorite
A Black Swan and a Meteorite

What experiment could we reasonably now perform that would falsify the existence of black swans and meteorites? Does this mean they are not scientific even though they exist?

The Hebrew Bible

Divine creation of the world and the existence of God are both examples of propositions that are impossible to falsify or disprove, but they can be almost completely verified by evidence that would be accepted by nearly all people as almost conclusive.

For example if we were to discover the Hebrew text of the Bible encoded in a clear way in the DNA of human beings, this would be strong – not absolutely conclusive – evidence for divine creation.

If the Sun were to stop in its course for an hour tomorrow and a voice boom out from the Heavens: “This is God. I created the world and human beings. Make love not war.” this would be reasonably accepted as nearly conclusive evidence of God and creation.

The Matrix: The World is a Computer Simulation

Of course, any evidence for God or any other remarkable or unexpected phenomenon can be explained by invoking other extreme possibilities such as time travel, super-advanced space aliens or inter-dimensional visitors, or a computer simulation reality as in The Matrix movie.

I am not endorsing any religion or divine creation in making this point. I am simply pointing out the deep flaws in the doctrine of falsifiability as generally invoked.

Fritz Zwicky and the Velocity Curves for the Triangulum Galaxy (Messier 33 or M33)

Let’s leave the world of religion and theology behind and take a look at the problems with falsifiability in mainstream scientific cosmology including the scientific account of creation, the Big Bang Theory.

In the 1930’s Fritz Zwicky, shown on the left, an astronomer at the California Institute of Technology (Caltech) noticed that the velocities of the orbit of stars in our Galaxy, the Milky Way, around the Galactic Center failed to decline with distance from the Galactic Center as predicted by both Newton’s theory of gravity and Einstein’s more recent General Theory of Relativity.

The plot on the right shows a similar dramatic discrepancy in a nearby galaxy, the Triangulum Galaxy, also known as Messier 33 (M33).

These observations would appear to falsify both Newton and Einstein’s theories of gravity in a dramatic way. Did scientists forthrightly falsify these theories as RationalWiki and other popular version of falsifiability claim they would?

NO. They did not. Instead they postulated a mysterious “dark matter” that could not be observed that fixed the gross discrepancy between theory and experiment.

In the last century, numerous additional discrepancies at the larger scales of clusters and super-clusters of galaxies have been observed, leading to the introduction of additional types of dark matter to get the theory to match the observations. None of these hypothetical dark matter candidates have ever been observed despite many searches.

Hubble Space Telescope

Einstein’s General Theory of Relativity originally included an additional term, usually known as the cosmological constant, to prevent the expansion of the universe. Einstein is reported to have called this term his “greatest blunder” after observations by Edwin Hubble showed otherwise unexplained extragalactic redshifts that could be explained as caused by the expansion of the universe, what is now called the Big Bang Theory.

The observation of the red shifts appeared to falsify Einstein’s theory. Einstein quickly dropped the cosmological constant term, achieving agreement with the data.

The Hubble Space Telescope discovered evidence that the expansion of the universe was accelerating, something the General Theory of Relativity failed to predict.

The Cosmological Term

Did scientists falsify the General Theory at this point? NO. Einstein had chosen the value of the cosmological constant to exactly balance the predicted expansion which initially contradicted known observations and theoretical prejudices. By using a different cosmological constant, modern scientists could reproduce the acceleration found by the Hubble.

Einstein, right even when he was wrong! Modern cosmologists attribute the non-zero cosmological constant to a mysterious dark energy permeating the universe. So far the dark energy, like the dark matter before it, has never been directly observed.

The modern Big Bang Theory incorporates other as yet unobserved entities such as “inflation” as well.

The Martian Epicycle

In practice, it is almost always possible to salvage a scientific theory by postulating undetected and perhaps unmeasurable entities such as dark matter, dark energy, inflation, and the original Ptolemaic epicycles.

In the Ptolemaic Earth-centered solar system Mars orbits the Earth. Mars is observed to back up in the Zodiac for about two months every two years. This clearly contradicted the Earth-centered model. This gross discrepancy was largely fixed by introducing an epicycle in which Mars orbits an invisible point which in turn orbits the Earth as shown in the plot on the right. The ancients interpreted Mars as a god or angel and justified the epicycles as complex dance moves dictated by the king of the gods or a monotheistic God.

In mathematical terms, a rigorous quantitative theory such as the General Theory of Relativity or Newton’s Theory of Gravity is a mathematical formula or expression. Discrepancies between these theories and observation can be resolved by adding, subtracting, or modifying different terms in the formula, such as the cosmological constant term. These modified terms often correspond to hypothetical entities such as dark energy.

MOND Alternative to General Relativity with Dark Matter

Many alternative theories to general relativity exist. MOND or Modified Newtonian Dynamics is the leading competitor at the moment. It can explain many (not all) observations without resorting to unobserved dark matter.

In fact, many complex mathematical theories such as those produced by modern machine learning and deep learning methods can “explain” the observations in scientific cosmology.

This is not surprising because complex theories with many adjustable parameters like the cosmological constant are plastic and can fit a wide range of data, in extreme cases like saran wrap can fit almost any solid surface.

A simple example of this saran wrap like behavior of complex mathematical formulae is the Taylor polynomial. A Taylor polynomial with enough terms can approximate almost any function arbitrarily well.

The Fourth (4th) Degree Taylor Polynomial Fitted to Periodic Data

The plot here shows a Taylor polynomial approximating a periodic function, the trigonometric sine, better and better as the degree, number of terms, increases.

Sixth (6th) Degree Taylor Polynomial Fitted to the Same Periodic Data
Eighth (8th) Degree Taylor Polynomial Fitted to the Same Periodic Data
Tenth (10th) Degree Taylor Polynomial Fitted to the Same Periodic Data
All the Taylor Polynomial Models (Degrees 4,6,8, and 10) and Data in One Plot

The region of interest (ROI), containing the data used in the fit, is the region between the red triangle on the left and the red triangle on the right.

Notice the agreement with the data in the Region of Interest improves as the degree, the number of terms, increases. R SQUARED is roughly the fraction of the data explained by the model. Notice also the agreement for the Taylor Polynomial actually worsens outside the Region of Interest as the number of terms increases.

In general the Taylor Polynomial will predict new data within the Region of Interest well but new data outside the ROI poorly.

If agreement is poor, simply add more terms – like the cosmological constant – until agreement is acceptable.

This is why the Ptolemaic theory of planetary motion with epicycles could not be falsified.

Falsifiability Metric Table for Cosmology

Is Scientific Cosmology Falsifiable?

In real scientific practice, falsifiability is too vaguely defined and is not quantitative.

Falsifiability is not a simple binary, yes or no criterion in actual practice.
Rather some theories are highly plastic and difficult to falsify. Some are less plastic, stiffer and easier to falsify. Falsifiability or plasticity is a continuum, not a simple binary yes or no, 0 or 1.0.

Falsifiability in Drug Approvals

Nonetheless, the question of falsifiability is of great practical importance. For example, many drugs are advertised as scientifically proven or strongly implied to be scientifically proven to reduce the risk of heart attacks and extend life, to slow the progression of cancer and extend life for cancer patients, and to effectively treat a range of psychological disorders such as paranoid schizophrenia, clinical depression, and Attention Deficit Hyperactivity Disorder (ADHD).

All of these claims have been questioned by a minority of highly qualified medical doctors, scientists, and investigative reporters.

Are these claims falsifiable? If not, are they therefore not scientific? How sure can we be that these drugs work? Does the doctrine of falsifiability give any insight into these critical questions?

Somehow we need to adjust the seeming agreement of models with data for the plasticity of the models – their ability to fit a wide range of data sets due to complexity.

Falsifiability in Drug Approvals

In Pharmaceutical Drug Approvals, the scientific theory being tested is that a drug is both safe and effective. Can erroneous claims of safety or effectiveness by pharmaceutical companies be falsified – not always it seems.

In the VIOXX scandal, involving a new pain reliever, marketed as a much more expensive super-aspirin that was safer than aspirin and other traditional pain relievers which can cause sometimes fatal gastrointestinal bleeding after prolonged use, scientists omitted several heart attacks, strokes, and deaths from the reported tallies for the treatment group.

This omission is similar to omitting the cosmological constant term in General Relativity. Indeed the ad hoc assumptions used to omit the injuries and deaths could be expressed mathematically as additional terms in a mathematical model of mortality as a function of drug dose.

Surveys of patients treated with VIOXX after approval showed higher heart attack, stroke, and death rates than patients treated with traditional pain relievers. Merck was nearly bankrupted by lawsuit settlements.

Vioxx: The Killer Pain Reliever Safer Than Aspirin
Merck Withdraws Vioxx from Market in 2004
Merck Stock Drops

Falsifiability in Financial Risk Models

Falsifiability of Financial Risk Models

Moving from the world of drug risks to finance: the 2008 housing and financial crash was caused in part by reliance on financial risk models that underestimated the risk of home price declines and mortgage defaults.

Many of these models roughly assumed the popular Bell Curve, also known as the Normal or Gaussian distribution. The Bell Curve is frequently used in grading school work. It also tends to underestimate the risk of financial investments.

Are financial models falsifiable? Not always it seems.

Falsifiability of Coronavirus COVID-19 Pandemic Models

The public response to the current (April 12, 2020) Coronavirus COVID-19 Pandemic has been shaped by frequently complex, sometimes contradictory, and changing epidemiological models such as the widely cited Imperial College Model from the group headed by Professor Nell Ferguson as well as a competing model from Oxford — and many other models as well. There has been considerable well-justified controversy and confusion over these models.

Can we “falsify” these models in the popular binary “yes” or “no” sense of falsifiability? They are certainly imperfect and have failed various predictions, hence various revisions. Many key parameters such as the actual mortality rate broken down by age, sex, race, pre-existing medical conditions, and other risk factors have not been measured. The Imperial College Model is reportedly quite complex and may well be very “plastic” (not very falsifiable).

In fact, all or most of the models have been “falsified” in the binary falsification sense in real time as they have made predictions that failed and have been revised in various ways. Obviously a more nuanced measure, such as the falsifiability metric discussed below, is needed to evaluate the reliability of the models and compare them.

Falsifiability in Math Recognition

This is an example of the falsifiability problem in our work at Mathematical Software. We have a large, growing database of known mathematics, functions such as the Bell Curve and the Cauchy-Lorenz function shown here. Our math recognition software identifies the best candidate mathematical models for the data from this database.

The math recognizer yields an ordered list of candidate models ranked by goodness of fit, in this example the coefficient of determination, loosely the percent of agreement with the data.

The plot is an analysis of some financial data. On the vertical axis we have the percent agreement of the model with the data, One hundred percent is perfect agreement. Technically the value on the vertical axis is the coefficient of determination, often referred to as R squared.

On the horizontal axis is the probability of getting a return on investment less than the risk free return, the return from investing in a Treasury bond, about two (2) percent per year. This probability varies dramatically from model to model. It is a key metric for investment decisions.

Our best model is the Cauchy-Lorenz model, beating out the popular Bell Curve. BUT, what if the Cauchy-Lorenz is more plastic (less falsifiable) than the Bell Curve? The better agreement may be spurious. The difference in risk is enormous! Cauchy-Lorenz means a high risk investment and the Bell Curve means a low risk investment.

This problem has been encountered many times in statistics, data analysis, artificial intelligence, and many other related fields. A wide variety of ad hoc attempts to solve it have been offered in the scientific and engineering literature. For example, there are many competing formula to correct the coefficient of determination R**2 (R SQUARED) but there does not appear to be a rigorous and/or generally accepted solution or method. These adjusted R**2 formulae included Wherry’s formula, McNemar’s formula, Lord’s formula, and Stein’s formula (see graphic below).

Various Ad Hoc Adjustments for the Flexibility of Mathematical Models

The formulae do not, for example, take into account that different functions with the same number of adjustable parameters can have different degrees of plasticity/falsifiability.

In many fields, only the raw coefficient of determination R**2 is reported.

A Prototype Falsifiability Metric

This is an example of a prototype falsifiability metric illustrated with the Taylor Polynomials.

The metric consists of an overall falsifiability measure for the function, the value F in the title of each plot, and a function or curve adjusting the raw goodness of fit, the coefficient of determination or R SQUARED in this case, for each model.

The plots show the Taylor Polynomial A times X + B in the upper left, the Taylor Polynomial A times X squared plus B times X + C in the upper right, the 6th degree Taylor Polynomial in the lower left, and the tenth degree Taylor Polynomial in the lower right.

The red marker shows the adjusted value of an R SQUARED value of 0.9 or ninety percent.

As terms are added to the model the falsifiability decreases. It is easier for the more complex models to fit data generated by other functions! The Taylor Polynomials of higher degree are more and more plastic. This is reflected in the decreasing value of the falsifiability metric F.

In addition, the goodness of fit metric, R SQUARED here, is adjusted to compensate for the higher raw values of R SQUARED that a less falsifiable, more plastic function yields. An unfalsifiable function will always give R SQUARED of 1.0, the extreme case .The adjusted R**2 enables us to compare the goodness of fit for models with different numbers of terms and parameters, different levels of falsifiability.

Conclusion

Conclusion Slide

In conclusion, a simple “yes” or “no” binary falsifiability as commonly defined (e.g. in the Encyclopedia Brittanica) does not hold up in real scientific and engineering practice. It is too vaguely defined and not quantitative. It also excludes scientific theories that can be verified but not ruled out. For example, in the present (April 12, 2020) crisis, it is clearly useless in evaluating the many competing COVID-19 pandemic models and their predictions.

Falsifiability does reflect an actual problem. Scientific and engineering models — whether verbal conceptual models or rigorous quantitative mathematical models — can be and often are flexible or plastic, able to match many different sets of data and in the worse case such as the Taylor Polynomials, essentially any data set. Goodness of fit statistics such as R**2 are boosted by this plasticity/flexibility of the models making evaluation of performance and comparison of models difficult or impossible at present.

A reliable quantitative measure is needed. What is the (presumably Bayesian) probability that the agreement between a model and data is due to this flexibility of the model as opposed to a genuine “understanding” of the data? We are developing such a practical falsifiability measure here at Mathematical Software.

(C) 2020 by John F. McGowan, Ph.D.

About Me

John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).

The Myth of Falsifiability

The Myth of Falsifiability

Is that COVID-19 model true? An intro to quantitative falsifiability metrics for confirming the safety and effectiveness of drugs and medical treatments, the reliability of mathematical models used in complex derivative securities and other practical applications. It starts with a discussion of the myth of falsifiability, a commonly cited doctrine often used to exclude certain points of view and evidence from consideration as “not scientific”. It discusses the glaring problems with the popular versions of this doctrine and the lack of a rigorous quantitative formulation of a more nuanced concept of falsifiability as originally proposed, but not developed, by the philosopher Karl Popper. The video concludes with a brief accessible presentation of our work on rigorous quantitative falsifiability metrics for practical science and engineering.

The video is also available at: https://www.bitchute.com/video/naogjPPRGkaG/

UPDATE: The Myth of Falsifiability Article

It is generally faster to read the article than watch the video.

(C) 2020 by John F. McGowan, Ph.D.

About Me

John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).

Video of “Automating Complex Data Analysis” Presentation to the Bay Area SAS Users Group

 

This is an edited video of my presentation on “Automating Complex Data Analysis” to the Bay Area SAS Users Group (BASAS) on August 31, 2017 at Building 42, Genentech in South San Francisco, CA.

The demonstration of the Analyst in a Box prototype starts at 14:10 (14 minutes, 10 seconds). The demo is a video screen capture with high quality audio.

Unfortunately there was some background noise from a party in the adjacent room starting about 12:20 until 14:10 although my voice is understandable.

Updated slides for the presentation are available at: https://goo.gl/Gohw87

You can find out more about the Bay Area SAS Users Group at http://www.basas.com/

Abstract:

Complex data analysis attempts to solve problems with one or more inputs and one or more outputs related by complex mathematical rules, usually a sequence of two or more non-linear functions applied iteratively to the inputs and intermediate computed values. A prominent example is determining the causes and possible treatments for poorly understood diseases such as heart disease, cancer, and autism spectrum disorders where multiple genetic and environmental factors may contribute to the disease and the disease has multiple symptoms and metrics, e.g. blood pressure, heart rate, and heart rate variability.

Another example are macroeconomic models predicting employment levels, inflation, economic growth, foreign exchange rates and other key economic variables for investment decisions, both public and private, from inputs such as government spending, budget deficits, national debt, population growth, immigration, and many other factors.

A third example is speech recognition where a complex non-linear function somehow maps from a simple sequence of audio measurements — the microphone sound pressure levels — to a simple sequence of recognized words: “I’m sorry Dave. I can’t do that.”

State-of-the-art complex data analysis is labor intensive, time consuming, and error prone — requiring highly skilled analysts, often Ph.D.’s or other highly educated professionals, using tools with large libraries of built-in statistical and data analytical methods and tests: SAS, MATLAB, the R statistical programming language and similar tools. Results often take months or even years to produce, are often difficult to reproduce, difficult to present convincingly to non-specialists, difficult to audit for regulatory compliance and investor due diligence, and sometimes simply wrong, especially where the data involves human subjects or human society.

A widely cited report from the McKinsey management consulting firm suggests that the United States may face a shortage of 140,000 to 190,000 such human analysts by 2018: http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/big-data-the-next-frontier-for-innovation.

This talk discusses the current state-of-the-art in attempts to automate complex data analysis. It discusses widely used tools such as SAS and MATLAB and their current limitations. It discusses what the automation of complex data analysis may look like in the future, possible methods of automating complex data analysis, and problems and pitfalls of automating complex data analysis. The talk will include a demonstration of a prototype system for automating complex data analysis including automated generation of SAS analysis code.

(C) 2017 John F. McGowan, Ph.D.

About the author

John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).