In coverage of the coronavirus COVID-19 pandemic, one often sees a value for the infection fatality rate (also known as the actual mortality rate, which is different from the “case fatality rate”) of 0.1 percent, meaning one in 1000 people infected by the “flu” dies. Infected includes people who are asymptomatic, have mild cases — anyone who is actually infected even if never detected. It is often explicitly or implicitly argued that if the infection fatality rate of COVID-19 is only 0.1 percent as suggested by a recent study by Stanford researchers (https://www.medrxiv.org/content/10.1101/2020.04.14.20062463v1.full.pdf) we can relax and go back to work. Unfortunately, it is probably not that simple.
We can use our prevalence estimates to approximate the infection fatality rate from COVID-19 in Santa Clara County. As of April 10, 2020, 50 people have died of COVID-19 in the County, with an average increase of 6% daily in the number of deaths. If our estimates of48,000-81,000 infections represent the cumulative total on April 1, and we project deaths to April 22 (a 3 week lag from time of infection to death22), we estimate about 100 deaths in the county. A hundred deaths out of 48,000-81,000 infections corresponds to an infection fatality rate of 0.12-0.2%. If antibodies take longer than 3 days to appear, if the average duration from case identification to death is less than 3 weeks, or if the epidemic wave has peaked and growth in deaths is less than 6% daily, then the infection fatality rate would be lower. These straightforward estimations of infection fatality rate fail to account for age structure and changing treatment approaches to COVID-19. Nevertheless,our prevalence estimates can be used to update existing fatality rates given the large upwards revision of under-ascertainment
COVID-19 Antibody Seroprevalence in Santa Clara County, California by Bendavid et al
It is thought the vast majority of adults get at least two symptomatic “colds” or “flus” in common usage (The CDC claims adults get 2-3 “common colds” per year and children more on their web site which matches common experience.). These are caused by a wide variety of viruses and bacteria and sometimes chemical toxins. These include the rhinovirus, various coronaviruses other that the “novel” SARS-COV-2 coronavirus, and many others including a category of viruses known as “influenza” or “influenza viruses”.
With a total US population of about 330 million, we can estimate at least 660 million individual cases and separate infections of these “cold” or “flu” organisms (either viruses or bacteria) each year. This gives a naive effective infection fatality rate averaged over the population and different diseases of:
188,000 divided by 660 million is: 0.028 percent (0.00028484848484848485)
55,000 divided by 660 million is: 0.008 percent (0.00008333333333)
This is of course much less than 0.1 percent (one in 1000).
What gives?
In common English usage, the terms “cold” and “flu” are often used interchangeably. The use of the terms “flu” and “influenza” to describe respiratory illnesses that vary in incidence seasonally predates the discovery of the influenza viruses, a category of viruses that can cause these symptoms. Influenza is Italian, from the Latin “influentia,” for “influence,” referring to the baleful influence of the stars that the ancients blamed for the disease.
The CDC hopelessly blurs the distinctions, if any, between “common cold”, “cold”, “flu”, “influenza”, “influenza like illness,” “influenza associated,” “pneumonia,” and other terms in its promotional and “scientific” materials.
Influenza as in the influenza viruses is rarely listed as a cause of death on death certificates. The weekly “pneumonia and influenza” death numbers from the National Center for Health Statistics (NCHS) only list about 8,000 deaths from influenza in 2017. The CDC cites several different reasons for claiming there is massive underdiagnosis and underreporting of influenza (THE VIRUS) deaths, dating back to at least 2005 and persisting despite the CDC’s extensive educational efforts.
The CDC uses a mysterious model to estimate about 55,000 annual deaths from influenza (THE VIRUS). Presumably the number of deaths from “influenza and pneumonia” in the leading causes of death is this number or something closely related — but this is not clear. Incidentally, in this age of the Internet and pervasive computing, the CDC could publish the actual source code for their model in a free open-source language such as Python on their web site for all to see and review.
Part of this model is an estimate of how many “colds” are caused by an influenza virus. Presumably this number is about 55 million to get the widely quoted 0.1 percent (one in 1000) infection fatality rate for influenza. This is an example of the CDC’s estimates from https://www.cdc.gov/flu/about/burden/index.html:
Thus, the CDC estimates about 55 million of the annual over 660 million “cold” cases in the United States is caused by “influenza disease” or “influenza” or “flu,” presumably meaning cases caused by influenza viruses. This is probably less than ten percent of all “colds.” The CDC also estimates about 55,000 deaths from influenza viruses. This presumably gives the about 0.1 percent (one in 1000) number widely quoted in the media.
Everyone should understand that a 0.1 percent (one in 1000) infection fatality rate is much higher than the effective infection fatality rate of all the diseases that cause deaths attributed to “pneumonia and influenza” and that also typically cause two “common colds” or “flus” in healthy adults each year.
Even accepting the CDC’s estimates of the prevalence of illness due to influenza viruses (THE VIRUS), less than ten percent of all “common colds,” if the coronavirus COVID-19 spreads more easily than the influenza viruses, it may be able to kill more people than the influenza viruses with the same infection fatality rate (e.g. one in 1000, 0.1 percent). We also need to know how the SARS-COV-2 coronavirus spreads and how quickly.
If everyone in the United States were infected with the COVID-19 coronavirus, a 0.1 percent infection fatality rate (one in 1,000) would probably mean somewhat less than 330,000 additional deaths on top of the roughly 188,000 deaths from “pneumonia and influenza” (or is it 55,000 from “influenza and pneumonia”). There would be some overlap between COVID-19 coronavirus deaths and deaths of susceptible, mostly elderly persons that would have happened anyway due to conventional non-COVID diseases including the influenza viruses.
Conclusion
There is a remarkable lack of key measurements in the current coronavirus COVID-19 pandemic. These include the actual mortality rate (aka infection fatality rate) broken down by age, sex, race, pre-existing medical conditions, ambient temperature, sunlight levels, pollution levels, and other risk factors. The false positive and false negative rates of the tests for the disease, both the tests for an active infection such as the RT-PCR tests and tests for past infection such as the antibody tests. The methods and rates of transmission for the disease. Aerosol transmission probably occurs at least at a low level and is virtually unstoppable.
The CDC and the National Security bioweapons defense programs should have been set up to quickly and efficiently collect these key data and parameters as soon as a possible outbreak or attack was detected, independent of warnings and information provided by a potential adversary such as China or from the World Health Organization (WHO).
The confusing language and numbers on pneumonia and influenza on the CDC web site and in various official reports and documents seem to be primarily for marketing the flu vaccines rather than enabling informed decisions by patients and doctors or supporting external scientific research into the influenza viruses or other diseases.
(C) 2020 by John F. McGowan, Ph.D.
About Me
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).
Addressing COVID-19 is a pressing health and social concern. To date, many epidemic projections and policies addressing COVID-19 have been designed without seroprevalence data to inform epidemic parameters. We measured the seroprevalence of antibodies to SARS-CoV-2 in Santa Clara County.
Methods
On 4/3-4/4, 2020, we tested county residents for antibodies to SARS-CoV-2 using a lateral flow immunoassay. Participants were recruited using Facebook ads targeting a representative sample of the county by demographic and geographic characteristics. We report the prevalence of antibodies to SARS-CoV-2 in a sample of 3,330 people, adjusting for zip code, sex, and race/ethnicity. We also adjust for test performance characteristics using 3 different estimates: (i) the test manufacturer’s data, (ii) a sample of 37 positive and 30 negative controls tested at Stanford, and (iii) a combination of both.
Results The unadjusted prevalence of antibodies to SARS-CoV-2 in Santa Clara County was 1.5% (exact binomial 95CI 1.11-1.97%), and the population-weighted prevalence was 2.81% (95CI 2.24-3.37%). Under the three scenarios for test performance characteristics, the population prevalence of COVID-19 in Santa Clara ranged from 2.49% (95CI 1.80-3.17%) to 4.16% (2.58-5.70%). These prevalence estimates represent a range between 48,000 and 81,000 people infected in Santa Clara County by early April, 50-85-fold more than the number of confirmed cases.
Conclusions
The population prevalence of SARS-CoV-2 antibodies in Santa Clara County implies that the infection is much more widespread than indicated by the number of confirmed cases. Population prevalence estimates can now be used to calibrate epidemic and mortality projections.
This is of course a preliminary study.
It is worth noting that officially only about sixty (60) people have died of Coronavirus COVID-19 in Santa Clara County CA, a county of almost two million residents with very close ties to China. If true the preliminary study results would indicate that the actual mortality rate in Santa Clara County is well below one percent of those infected (60/48,000 is 0.125 percent for example).
(C) 2020 by John F. McGowan, Ph.D.
About Me
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).
This is a video showing the CDC Cold vs Flu web page on April 16, 2020 shortly after I published my “Uncounted COVID Deaths? The CDC’s Contradictory Pneumonia and Influenza Death Numbers” where I discussed the contradictory language and claims on the CDC’s Cold vs Flu web page. The video was recorded to support further my discussion in the Uncounted COVID article/presentation and because I think it likely the web page will change as the CDC fields hard questions about its Influenza and Pneumonia web pages, reports, and other documentation.
Astonishingly the CDC gives two radically different numbers of deaths from pneumonia and influenza: about 55,000 “influenza and pneumonia” deaths in the leading causes of death table in the “Final Deaths” report for 2017, the latest year available, and about 188,000 in data on weekly “pneumonia and influenza” deaths, over THREE TIMES the leading causes of death number.
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).
An in-depth interview on the Coronavirus COVID-19 Pandemic and the proper response. Dr. Katz repeatedly notes the need for much better data on who has been or is infected and the actual mortality rate broken down by risk factors including age, sex, pre-existing medical conditions and so forth.
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).
I have added a video showing the CDC Pneumonia and Influenza Weekly Deaths web site as it is (was) on April 15, 2020. In this video I show the different web site sections I have discussed, download theNCHSData14.csvweekly deaths data file, go through the analysis briefly in a spreadsheet, and show the difference between the numbers in 2017 in the file and the Final Death (Leading Causes of Deaths) numbers in 2017.
The weekly pneumonia and influenza deaths data shows fewer deaths in weeks one through thirteen, the latest week in the file ending March 28, 2020, than in the comparable weeks in 2019 — last year. This despite the COVID-19 pandemic, lack of testing in the United States, asymptomatic carriers, and other issues.
The weekly pneumonia and influenza deaths data also show about 188,000 deaths from pneumonia and influenza in 2017, over THREE TIMES the about 55,000 deaths listed as “influenza and pneumonia” in the 2017 leading causes of death.
NOTE: If you are concerned about these odd numbers, please consider sharing the original post and/or this one by e-mail, a link on your web site or blog, or other methods in addition to advertising-funded and other big company social media. My original post of this on Hacker News soared for a few hours and then was flagged and shut down, for example. I have also encountered social media mobs that engage in name calling and do not address the substantive issues.
It seems likely to me that the CDC web site will change in response to questions about the confusing numbers and language. Hopefully, the CDC will clarify the language and numbers in an open, “transparent,” and genuinely honest way that survives critical scrutiny. Especially given the life and death situation.
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).
This presentation discusses the CDC’s contradictory weekly and annual pneumonia and influenza deaths. Even the latest (as of April 14, 2020) weekly death numbers show fewer deaths in 2020 than comparable weeks last year (2019) despite the Coronavirus COVID-19 pandemic. Given asymptomatic carriers and inadequate testing in the United States, one would expect a surge in reported pneumonia and influenza deaths.
Remarkably summing the weekly pnemonia and influenza deaths gives about 180,000 annual deaths from pneumonia and influenza, over THREE TIMES the widely cited 55,000 “influenza and pneumonia” deaths from the annual leading causes of death report.
These numbers raise troubling questions about the CDC and its collection, analysis and reporting of pneumonia and influenza numbers. The low number of weekly deaths compared to last year could indicate that there may be many uncounted COVID deaths, or that the disease is much less deadly than popular reports, or several other possibilities with substantially different public health implications. The numbers need to be clarified as soon as possible.
Both a video version and a written PDF version are provided below. The written version is generally faster to read and includes references and some additional technical details.
NOTE: If you are concerned about these odd numbers, please consider sharing this post by e-mail, a link on your web site or blog, or other methods in addition to advertising-funded and other big company social media. My post of this on Hacker News soared for a few hours and then was flagged and shut down, for example. I have also encountered social media mobs that engage in name calling and do not address the substantive issues.
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).
NOTE: This is an updated version of my presentation “The Myth of Falsifiability.” I have added a few comments on the application of falsifiability and falsifiability metrics to models of the COVID-19 pandemic. The main focus is on the safety and effectiveness of drugs and medical treatments and financial models of investments, but the relevance to COVID-19 models should be obvious.A video version of this presentation is available at https://youtu.be/6y6_6x_kmlY
The article starts with a discussion of the myth of falsifiability, a commonly cited doctrine often used to exclude certain points of view and evidence from consideration as “not scientific”. It discusses the glaring problems with the popular versions of this doctrine and the lack of a rigorous quantitative formulation of a more nuanced concept of falsifiability as originally proposed, but not developed, by the philosopher Karl Popper. The article concludes with a brief accessible presentation of our work on a rigorous quantitative falsifiability metric useful in practical science and engineering.
The scientific doctrine of falsifiability is key in practical problems such as confirming the accuracy and reliability of epidemiological models of the COVID-19 pandemic, the safety and effectiveness of pharmaceuticals and the safety and the reliability of financial models. How confident can we be of unprecedented world-changing policies ostensibly based on a plethora of conflicting models of the COVID-19 pandemic combined with highly incomplete and rapidly changing data?
How confident can we be that FDA approved drugs are safe, let alone effective? How confident can we be of AAA ratings for securities based on mathematical models?
In practice falsifiability is commonly cited to exclude certain points of view and evidence from consideration as “not scientific”.
The Encyclopedia Brittanica gives a typical example of the popular version of falsifiability:
Criterion of falsifiability, in the philosophy of science, a standard of evaluation of putatively scientific theories, according to which a theory is genuinely scientific only if it is possible in principle to establish that it is false.
Encyclopedia Brittanica
In practice, this popular version of falsifiability gives little guidance in evaluating whether an epidemiological model is reliable, a drug is safe or effective or a triple-A rated security is genuinely low risk. In actual scientific and engineering practice, we need a reliable estimate of how likely the apparent agreement of model and data is due to flexibility in the model from adjustable parameters, ad hoc changes to the mathematical model, and other causes such as data selection procedures. I will discuss this in more detail later in this article.
The Austrian philosopher Karl Popper developed and presented a theory of falsifiability in his book The Logic of Scientific Discovery. This book is often cited and rarely read. My copy is 480 pages of small type.
Popper was especially concerned with rebutting the ostensibly scientific claims of Marxism and other ideologies. Popper was a deep thinker and understood that there were problems with a simple concept of falsifiability as I discuss next.
Falsifiability is largely encountered in disputes about religion and so-called pseudo-science, for example parapsychology. It is notably common in disputes over teaching evolution and creationism, the notion that God created the universe, life and human beings in some way, in schools. In the United States, creationism often refers to a literal or nearly literal interpretation of the Book of Genesis in the Bible.
This is a typical example from the RationalWiki arguing that creationism is not falsifiable and therefore is not science.
Remarkably, the doctrine of falsifiability is very rarely invoked in the scholarly scientific peer-reviewed literature, almost never outside of rare articles specifically rebutting the legitimacy of topics such as creationism and alleged pseudo-science. For example, a search of the arxiv.org preprint archive (1.6 million articles) turned up only eight matches for falsifiability and Popper as shown here.
In fact, there are many historical examples of scientific theories that could not be falsified but have been confirmed.
The existence of Black Swans, discovered in Australia. No matter how long one fails to find a single black swan, this does not mean they do not exist.
Stones falling from the sky, meteorites, were rejected by science for over a century despite many historical and anecdotal accounts of these remarkable objects.
What experiment could we reasonably now perform that would falsify the existence of black swans and meteorites? Does this mean they are not scientific even though they exist?
Divine creation of the world and the existence of God are both examples of propositions that are impossible to falsify or disprove, but they can be almost completely verified by evidence that would be accepted by nearly all people as almost conclusive.
For example if we were to discover the Hebrew text of the Bible encoded in a clear way in the DNA of human beings, this would be strong – not absolutely conclusive – evidence for divine creation.
If the Sun were to stop in its course for an hour tomorrow and a voice boom out from the Heavens: “This is God. I created the world and human beings. Make love not war.” this would be reasonably accepted as nearly conclusive evidence of God and creation.
Of course, any evidence for God or any other remarkable or unexpected phenomenon can be explained by invoking other extreme possibilities such as time travel, super-advanced space aliens or inter-dimensional visitors, or a computer simulation reality as in The Matrix movie.
I am not endorsing any religion or divine creation in making this point. I am simply pointing out the deep flaws in the doctrine of falsifiability as generally invoked.
Let’s leave the world of religion and theology behind and take a look at the problems with falsifiability in mainstream scientific cosmology including the scientific account of creation, the Big Bang Theory.
In the 1930’s Fritz Zwicky, shown on the left, an astronomer at the California Institute of Technology (Caltech) noticed that the velocities of the orbit of stars in our Galaxy, the Milky Way, around the Galactic Center failed to decline with distance from the Galactic Center as predicted by both Newton’s theory of gravity and Einstein’s more recent General Theory of Relativity.
The plot on the right shows a similar dramatic discrepancy in a nearby galaxy, the Triangulum Galaxy, also known as Messier 33 (M33).
These observations would appear to falsify both Newton and Einstein’s theories of gravity in a dramatic way. Did scientists forthrightly falsify these theories as RationalWiki and other popular version of falsifiability claim they would?
NO. They did not. Instead they postulated a mysterious “dark matter” that could not be observed that fixed the gross discrepancy between theory and experiment.
In the last century, numerous additional discrepancies at the larger scales of clusters and super-clusters of galaxies have been observed, leading to the introduction of additional types of dark matter to get the theory to match the observations. None of these hypothetical dark matter candidates have ever been observed despite many searches.
Einstein’s General Theory of Relativity originally included an additional term, usually known as the cosmological constant, to prevent the expansion of the universe. Einstein is reported to have called this term his “greatest blunder” after observations by Edwin Hubble showed otherwise unexplained extragalactic redshifts that could be explained as caused by the expansion of the universe, what is now called the Big Bang Theory.
The observation of the red shifts appeared to falsify Einstein’s theory. Einstein quickly dropped the cosmological constant term, achieving agreement with the data.
The Hubble Space Telescope discovered evidence that the expansion of the universe was accelerating, something the General Theory of Relativity failed to predict.
Did scientists falsify the General Theory at this point? NO. Einstein had chosen the value of the cosmological constant to exactly balance the predicted expansion which initially contradicted known observations and theoretical prejudices. By using a different cosmological constant, modern scientists could reproduce the acceleration found by the Hubble.
Einstein, right even when he was wrong! Modern cosmologists attribute the non-zero cosmological constant to a mysterious dark energy permeating the universe. So far the dark energy, like the dark matter before it, has never been directly observed.
The modern Big Bang Theory incorporates other as yet unobserved entities such as “inflation” as well.
In practice, it is almost always possible to salvage a scientific theory by postulating undetected and perhaps unmeasurable entities such as dark matter, dark energy, inflation, and the original Ptolemaic epicycles.
In the Ptolemaic Earth-centered solar system Mars orbits the Earth. Mars is observed to back up in the Zodiac for about two months every two years. This clearly contradicted the Earth-centered model. This gross discrepancy was largely fixed by introducing an epicycle in which Mars orbits an invisible point which in turn orbits the Earth as shown in the plot on the right. The ancients interpreted Mars as a god or angel and justified the epicycles as complex dance moves dictated by the king of the gods or a monotheistic God.
In mathematical terms, a rigorous quantitative theory such as the General Theory of Relativity or Newton’s Theory of Gravity is a mathematical formula or expression. Discrepancies between these theories and observation can be resolved by adding, subtracting, or modifying different terms in the formula, such as the cosmological constant term. These modified terms often correspond to hypothetical entities such as dark energy.
Many alternative theories to general relativity exist. MOND or Modified Newtonian Dynamics is the leading competitor at the moment. It can explain many (not all) observations without resorting to unobserved dark matter.
In fact, many complex mathematical theories such as those produced by modern machine learning and deep learning methods can “explain” the observations in scientific cosmology.
This is not surprising because complex theories with many adjustable parameters like the cosmological constant are plastic and can fit a wide range of data, in extreme cases like saran wrap can fit almost any solid surface.
A simple example of this saran wrap like behavior of complex mathematical formulae is the Taylor polynomial. A Taylor polynomial with enough terms can approximate almost any function arbitrarily well.
The plot here shows a Taylor polynomial approximating a periodic function, the trigonometric sine, better and better as the degree, number of terms, increases.
The region of interest (ROI), containing the data used in the fit, is the region between the red triangle on the left and the red triangle on the right.
Notice the agreement with the data in the Region of Interest improves as the degree, the number of terms, increases. R SQUARED is roughly the fraction of the data explained by the model. Notice also the agreement for the Taylor Polynomial actually worsens outside the Region of Interest as the number of terms increases.
In general the Taylor Polynomial will predict new data within the Region of Interest well but new data outside the ROI poorly.
If agreement is poor, simply add more terms – like the cosmological constant – until agreement is acceptable.
This is why the Ptolemaic theory of planetary motion with epicycles could not be falsified.
Is Scientific Cosmology Falsifiable?
In real scientific practice, falsifiability is too vaguely defined and is not quantitative.
Falsifiability is not a simple binary, yes or no criterion in actual practice. Rather some theories are highly plastic and difficult to falsify. Some are less plastic, stiffer and easier to falsify. Falsifiability or plasticity is a continuum, not a simple binary yes or no, 0 or 1.0.
Falsifiability in Drug Approvals
Nonetheless, the question of falsifiability is of great practical importance. For example, many drugs are advertised as scientifically proven or strongly implied to be scientifically proven to reduce the risk of heart attacks and extend life, to slow the progression of cancer and extend life for cancer patients, and to effectively treat a range of psychological disorders such as paranoid schizophrenia, clinical depression, and Attention Deficit Hyperactivity Disorder (ADHD).
All of these claims have been questioned by a minority of highly qualified medical doctors, scientists, and investigative reporters.
Are these claims falsifiable? If not, are they therefore not scientific? How sure can we be that these drugs work? Does the doctrine of falsifiability give any insight into these critical questions?
Somehow we need to adjust the seeming agreement of models with data for the plasticity of the models – their ability to fit a wide range of data sets due to complexity.
In Pharmaceutical Drug Approvals, the scientific theory being tested is that a drug is both safe and effective. Can erroneous claims of safety or effectiveness by pharmaceutical companies be falsified – not always it seems.
In the VIOXX scandal, involving a new pain reliever, marketed as a much more expensive super-aspirin that was safer than aspirin and other traditional pain relievers which can cause sometimes fatal gastrointestinal bleeding after prolonged use, scientists omitted several heart attacks, strokes, and deaths from the reported tallies for the treatment group.
This omission is similar to omitting the cosmological constant term in General Relativity. Indeed the ad hoc assumptions used to omit the injuries and deaths could be expressed mathematically as additional terms in a mathematical model of mortality as a function of drug dose.
Surveys of patients treated with VIOXX after approval showed higher heart attack, stroke, and death rates than patients treated with traditional pain relievers. Merck was nearly bankrupted by lawsuit settlements.
Falsifiability in Financial Risk Models
Moving from the world of drug risks to finance: the 2008 housing and financial crash was caused in part by reliance on financial risk models that underestimated the risk of home price declines and mortgage defaults.
Many of these models roughly assumed the popular Bell Curve, also known as the Normal or Gaussian distribution. The Bell Curve is frequently used in grading school work. It also tends to underestimate the risk of financial investments.
Are financial models falsifiable? Not always it seems.
Falsifiability of Coronavirus COVID-19 Pandemic Models
The public response to the current (April 12, 2020) Coronavirus COVID-19 Pandemic has been shaped by frequently complex, sometimes contradictory, and changing epidemiological models such as the widely cited Imperial College Model from the group headed by Professor Nell Ferguson as well as a competing model from Oxford — and many other models as well. There has been considerable well-justified controversy and confusion over these models.
Can we “falsify” these models in the popular binary “yes” or “no” sense of falsifiability? They are certainly imperfect and have failed various predictions, hence various revisions. Many key parameters such as the actual mortality rate broken down by age, sex, race, pre-existing medical conditions, and other risk factors have not been measured. The Imperial College Model is reportedly quite complex and may well be very “plastic” (not very falsifiable).
In fact, all or most of the models have been “falsified” in the binary falsification sense in real time as they have made predictions that failed and have been revised in various ways. Obviously a more nuanced measure, such as the falsifiability metric discussed below, is needed to evaluate the reliability of the models and compare them.
Falsifiability in Math Recognition
This is an example of the falsifiability problem in our work at Mathematical Software. We have a large, growing database of known mathematics, functions such as the Bell Curve and the Cauchy-Lorenz function shown here. Our math recognition software identifies the best candidate mathematical models for the data from this database.
The math recognizer yields an ordered list of candidate models ranked by goodness of fit, in this example the coefficient of determination, loosely the percent of agreement with the data.
The plot is an analysis of some financial data. On the vertical axis we have the percent agreement of the model with the data, One hundred percent is perfect agreement. Technically the value on the vertical axis is the coefficient of determination, often referred to as R squared.
On the horizontal axis is the probability of getting a return on investment less than the risk free return, the return from investing in a Treasury bond, about two (2) percent per year. This probability varies dramatically from model to model. It is a key metric for investment decisions.
Our best model is the Cauchy-Lorenz model, beating out the popular Bell Curve. BUT, what if the Cauchy-Lorenz is more plastic (less falsifiable) than the Bell Curve? The better agreement may be spurious. The difference in risk is enormous! Cauchy-Lorenz means a high risk investment and the Bell Curve means a low risk investment.
This problem has been encountered many times in statistics, data analysis, artificial intelligence, and many other related fields. A wide variety of ad hoc attempts to solve it have been offered in the scientific and engineering literature. For example, there are many competing formula to correct the coefficient of determination R**2 (R SQUARED) but there does not appear to be a rigorous and/or generally accepted solution or method. These adjusted R**2 formulae included Wherry’s formula, McNemar’s formula, Lord’s formula, and Stein’s formula (see graphic below).
The formulae do not, for example, take into account that different functions with the same number of adjustable parameters can have different degrees of plasticity/falsifiability.
In many fields, only the raw coefficient of determination R**2 is reported.
A Prototype Falsifiability Metric
This is an example of a prototype falsifiability metric illustrated with the Taylor Polynomials.
The metric consists of an overall falsifiability measure for the function, the value F in the title of each plot, and a function or curve adjusting the raw goodness of fit, the coefficient of determination or R SQUARED in this case, for each model.
The plots show the Taylor Polynomial A times X + B in the upper left, the Taylor Polynomial A times X squared plus B times X + C in the upper right, the 6th degree Taylor Polynomial in the lower left, and the tenth degree Taylor Polynomial in the lower right.
The red marker shows the adjusted value of an R SQUARED value of 0.9 or ninety percent.
As terms are added to the model the falsifiability decreases. It is easier for the more complex models to fit data generated by other functions! The Taylor Polynomials of higher degree are more and more plastic. This is reflected in the decreasing value of the falsifiability metric F.
In addition, the goodness of fit metric, R SQUARED here, is adjusted to compensate for the higher raw values of R SQUARED that a less falsifiable, more plastic function yields. An unfalsifiable function will always give R SQUARED of 1.0, the extreme case .The adjusted R**2 enables us to compare the goodness of fit for models with different numbers of terms and parameters, different levels of falsifiability.
Conclusion
In conclusion, a simple “yes” or “no” binary falsifiability as commonly defined (e.g. in the Encyclopedia Brittanica) does not hold up in real scientific and engineering practice. It is too vaguely defined and not quantitative. It also excludes scientific theories that can be verified but not ruled out. For example, in the present (April 12, 2020) crisis, it is clearly useless in evaluating the many competing COVID-19 pandemic models and their predictions.
Falsifiability does reflect an actual problem. Scientific and engineering models — whether verbal conceptual models or rigorous quantitative mathematical models — can be and often are flexible or plastic, able to match many different sets of data and in the worse case such as the Taylor Polynomials, essentially any data set. Goodness of fit statistics such as R**2 are boosted by this plasticity/flexibility of the models making evaluation of performance and comparison of models difficult or impossible at present.
A reliable quantitative measure is needed. What is the (presumably Bayesian) probability that the agreement between a model and data is due to this flexibility of the model as opposed to a genuine “understanding” of the data? We are developing such a practical falsifiability measure here at Mathematical Software.
(C) 2020 by John F. McGowan, Ph.D.
About Me
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).
Is that COVID-19 model true? An intro to quantitative falsifiability metrics for confirming the safety and effectiveness of drugs and medical treatments, the reliability of mathematical models used in complex derivative securities and other practical applications. It starts with a discussion of the myth of falsifiability, a commonly cited doctrine often used to exclude certain points of view and evidence from consideration as “not scientific”. It discusses the glaring problems with the popular versions of this doctrine and the lack of a rigorous quantitative formulation of a more nuanced concept of falsifiability as originally proposed, but not developed, by the philosopher Karl Popper. The video concludes with a brief accessible presentation of our work on rigorous quantitative falsifiability metrics for practical science and engineering.
It is generally faster to read the article than watch the video.
(C) 2020 by John F. McGowan, Ph.D.
About Me
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).
As of March 21, 2020, the United States Centers for Disease Control (CDC) has posted weekly death numbers for pneumonia and influenza (P&I) that are substantially lower than the weekly death numbers for the matching weeks last year (2019) despite the Coronavirus COVID-19 Pandemic. This is remarkable given that with the lack of widespread testing many deaths caused by the pandemic would be expected to appear as a surge in deaths attributed to pneumonia and influenza. One can also argue that deaths caused by COVID-19, where known, should be included in the pneumonia and influenza death tally as well.
NOTE: The latest numbers, through the week ending March 21, 2020, were posted last Friday, April 3, 2020.
The weekly numbers for 2017 and previous years also sum to a total number of annual deaths due to pneumonia and influenza that is about three times larger than the widely quoted numbers from the 2017 and earlier leading causes of death reports.
I have done a number of video posts on the seeming absence of COVID-19 from reports through March 21, 2020. Remarkably the latest raw (?) data file NCHSData13.csv from https://www.cdc.gov/flu/weekly/#S2 (click on View Chart Data below the plot) shows total 40,002 deaths in week 12 of 2020 and 57,086 total deaths in week 12 of 2019 (see screenshot below) — much lower in 2020 despite the pandemic. The file shows total pneumonia and influenza deaths of 3,261 in week 12 of 2020 and 4,276 deaths in week 12 of 2019 (last year). Again many more deaths last year.
Remarkably the weekly death numbers attributed to pneumonia and influenza have been running below last year’s numbers for the same weeks for almost all weeks since the beginning of 2020 and well below what might be expected from simple modeling of the long term trend and seasonal variation.
In the plot below, the green plus signs are the data from the NCHSData13.csv file. The red line is the long term trend and the blue line if the full model with a roughly sinusoidal model of the seasonal variation in deaths added. One can see that the weekly death numbers are lower this year than last year and also fall well below the model prediction.
There are many possible explanations for this remarkable shortfall in deaths. No doubt the CDC is fielding hard questions from web site visitors, analysts, and others. Fox News host Tucker Carlson included the discrepancy in his commentary on the COVID-19 crisis on April 7 (from 3:24 to 3:50 in the linked video):
The CDC appears to have updated its FluView web site with information on how complete the numbers for the last three weeks are — how many death certificates have been collected. They appear to have added a table at the bottom with the weekly numbers:
The final right-most column labeled “Percent Complete” seems to refer to how complete the numbers are, although this is not clear. Hovering the mouse pointer over the question mark to the right of “Percent Complete” brings up a legal disclaimer and not a clear explanation of what “Percent Complete” means. The final week (week 12, ending March 21, 2020) is listed as 85.4 % complete. Oddly, the previous two weeks (week 10 and week 11) are listed as (> 100 %) complete — note the greater than sign. Since one-hundred percent (100%) means COMPLETE, it is especially difficult to understand the use of the greater than sign > in the table. 🙂
In the FluView application/web page (today, April 9, 2020) the CDC seems to be claiming the numbers up to week 11 ending March 14, 2020 are in fact complete. The remarkable absence of COVID-19 deaths up to March 14, 2020 cannot be attributed to delays in collecting and processing death certificates or reports. A number of legal disclaimers such as the popup shown seem to have appeared recently (last few days) on the CDC web site.
As I have noted there are many possible explanations for this remarkable reported decline in deaths during a purported pandemic. It may be that people have been extra careful during the pandemic, staying home, avoiding risky behaviors, thus resulting in a drop in deaths both in general and from pneumonia and influenza causes other than COVID-19. It could be there are errors or omissions caused by the crisis response that are making the numbers unreliable. It could be pneumonia and influenza deaths from other causes are being incorrectly labeled as COVID-19 and omitted from the numbers; this is why it would be best to include COVID-19 as part of the P&I deaths. It could be that COVID-19, despite the headlines, is not unusually deadly.
NOTE: Total deaths in Europe have risen sharply in the latest weekly numbers from EuropMOMO, consistent with an unusually deadly new cause of death, after many weeks of remarkably showing no sign of the COVID-19 pandemic.
The CDC Weekly Pneumonia and Influenza Death Numbers are Three Times the Widely Reported Annual Death Numbers
Astonishingly the weekly death numbers in the NCHSData13.csv file — which go back to 2013 as shown in the plot above — indicate that about three times as many people in the United States have died from pneumonia and influenza in 2017 and preceding years as reported in the National Vital Statistics.
For example, the National Vital Statistics Report Volume 69, Number 8 dated June 24, 2019: “Deaths: Final Data for 2017” gives 55,672 deaths from “Influenza and pneumonia” in its table (Table B) of leading causes of deaths. “Influenza and pneumonia” is the eighth leading cause of death in 2017.
Note that the report uses the phrase “Influenza and pneumonia” whereas the weekly death web site uses the language “Pneumonia and influenza (P&I)”. As I will explain below this may be a clue to the reason for the huge discrepancy.
In contrast, summing the weekly death numbers for 2017 in NCHSData13.csv gives 188,286 deaths for the entire year. This is OVER three times the number in the “Deaths: Final Data for 2017” (June 24, 2019).
It is worth noting that the web site, the NCHSData13.csv file, and the report appear intended for the general public, in part for educational and informational purposes — as well as doctors and other professionals who have limited time to dig into the numbers. Most people would interpret deaths due to “Influenza and pneumonia” in one report as the same or nearly the same (except for minor technical issues) number as “Pneumonia and influenza” in another report, data file, or web site.
What gives?
In 2005, Peter Doshi, an associate editor with the British Medical Journal (BMJ), one of the most prestigious medical journals in the world, wrote a highly critical, though short, article on the CDC’s pneumonia and influenza numbers: “Are US flu death figures more PR than science?”
BMJ. 2005 Dec 10; 331(7529): 1412. PMCID: PMC1309667
US data on influenza deaths are a mess. The Centers for Disease Control and Prevention (CDC) acknowledges a difference between flu death and flu associated death yet uses the terms interchangeably. Additionally, there are significant statistical incompatibilities between official estimates and national vital statistics data. Compounding these problems is a marketing of fear—a CDC communications strategy in which medical experts “predict dire outcomes” during flu seasons.
The CDC website states what has become commonly accepted and widely reported in the lay and scientific press: annually “about 36 000 [Americans] die from flu” (www.cdc.gov/flu/about/disease.htm) and “influenza/pneumonia” is the seventh leading cause of death in the United States (www.cdc.gov/nchs/fastats/lcod.htm). But why are flu and pneumonia bundled together? Is the relationship so strong or unique to warrant characterising them as a single cause of death?
BMJ. 2005 Dec 10; 331(7529): 1412. PMCID: PMC1309667
Peter Doshi goes on in this vein for a couple of pages (see the linked article above). Peter Doshi and other sources online seem to suggest that CDC estimates a large number of pneumonia deaths that are attributed to secondary effects of influenza such as a bacterial pneumonia infection caused by influenza. Influenza is rarely detected in actual tests of actual patients and only a small fraction of deaths reported in the weekly statistics are attributed in NCHSData13.csv to influenza (the virus, NOT “flu” as used in popular language which can mean any disease with similar symptoms to influenza — the scientific term).
The influenza and pneumonia deaths number in the National Vital Statistics Report may be this estimate that Doshi is describing in his critical article in the BMJ. The other (many more!) weekly “pneumonia and influenza” deaths presumably are assigned to some other categories in the annual leading causes of death report.
Presumably CDC can give some explanation for this vast discrepancy between two numbers that most of us would expect to be the same. “Influenza and pneumonia” and “pneumonia and influenza” mean the same thing in common English usage. They almost certainly mean the same thing to most doctors and other health professionals as well.
Conclusion
These pneumonia and influenza death numbers need to be clarified in an open and transparent manner. The next set of numbers will probably be posted tomorrow Friday April 10, 2020. Hopefully these new numbers and accompanying commentary will explain the situation in an open and transparent manner that survives critical scrutiny.
The proper response to the COVID-19 pandemic depends on knowing a range of parameters including the actual mortality rate broken down by age, sex, race, obesity, other medical conditions, whatever can be measured quickly and accurately. The actual rates and modes of transmission. The false positive and false negative rates for the various tests, both for active infection and past infection. These are mostly not known.
Most of us are experiencing the instinctive fight or flight response which degrades higher cognitive function, aggravated by the 24/7 Internet/social media fear barrage. It is important to calm down, collect actual data in a genuinely open, transparent way that will yield broad public support, and think carefully.
UPDATE (February 13, 2021):
We have received some questions about more up to date information on the issues raised in this article. Our most recent and comprehensive article on the CDC’s historical influenza and pneumonia death numbers and their current COVID-19 death numbers is:
This article argues that the US Centers for Disease Control (CDC)’s April 2020 guidance for filling out death certificates for possible COVID-19 related deaths strongly encourages, if not requires, assigning COVID-19 as the underlying cause of death (UCOD) in any death where COVID-19 or the SARS-COV-2 virus may be present, which appears to differ from common historical practice for pneumonia and influenza deaths where pneumonia was frequently treated as a “complication,” a cause of death but not the underlying cause of death.
This means the number of COVID deaths should be compared to a count of death certificates where pneumonia and influenza were listed as a cause of death or even a lesser contributing factor, a historical number which appears to have been at least 188,000 per year based on the CDC FluView web site. The proper comparison number may be even larger if deaths that historically were listed as heart attacks, cancer or other causes than pneumonia or influenza are also being reassigned due to the April 2020 guidance.
Here are some earlier articles and references:
This is a more recent article/video on the long standing problems with the pneumonia and influenza death numbers:
“How Reliable are the US Centers for Disease Control’s Death Numbers” (October 14, 2020)
The second article on the Santa Clara County death numbers includes a detailed section on the changes in the standard on assigning the underlying cause of death for COVID cases from the CDC’s April 2020 “guidance” document, which probably boost the COVID death numbers substantially. This section is broken out and edited into this article:
We are looking at the CDC’s excess death numbers which appear to be highly questionable. The CDC follows a non-standard procedure of zeroing out data points that are negative in summing the excess deaths. See this article by Tam Hunt:
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).
The luxury apartment project across the street from where I live has paused in the Coronavirus COVID-19 Pandemic. I thank those responsible and briefly discuss the proper steps to deal with the pandemic based on knowledge and data rather than fear and panic.
(C) 2020 by John F. McGowan, Ph.D.
About Me
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).