data science – Mathematical Software

May 18, 2023

[Video] Can AI Boost Cracker Barrel’s Stagnant Sales?

Uncensored Video: BitChute NewTube Rumble Odysee

Short video on using AI to estimate how to boost the Cracker Barrel restaurant and gift shop chain’s sales by improving advertising expenditures.

Short Article: http://wordpress.jmcgowan.com/wp/article-can-ai-boost-cracker-barrels-stagnant-sales/

Short Link: bit.ly/3ojdV9t

About Us:

Main Web Site: https://mathematical-software.com/
Censored Search: https://censored-search.com/
A search engine for censored Internet content. Find the answers to your problems censored by advertisers and other powerful interests!

Subscribe to our free Weekly Newsletter for articles and videos on practical mathematics, Internet Censorship, ways to fight back against censorship, and other topics by sending an email to: subscribe [at] mathematical-software.com

Avoid Internet Censorship by Subscribing to Our RSS News Feed: http://wordpress.jmcgowan.com/wp/feed/

Legal Disclaimers: http://wordpress.jmcgowan.com/wp/legal/

Support Us:
PATREON: https://www.patreon.com/mathsoft
SubscribeStar: https://www.subscribestar.com/mathsoft

Rumble (Video): https://rumble.com/c/mathsoft
BitChute (Video): https://www.bitchute.com/channel/HGgoa2H3WDac/
Brighteon (Video): https://www.brighteon.com/channels/mathsoft
Odysee (Video): https://odysee.com/@MathematicalSoftware:5
NewTube (Video): https://newtube.app/user/mathsoft
Minds (Video): https://www.minds.com/math_methods/
Archive (Video): https://archive.org/details/@mathsoft

About Me

John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).

November 28, 2022

[Article] How to Display Grid Lines in MatPlotLib

"""
Short demo How to Display Grid Lines in MatPlotLib
(C) 2022 by Mathematical Software Inc.
http://www.mathematical-software.com/
"""
# Python Standard Library
import os
import sys
import time
# NumPy and MatPlotLib add on Python packages/modules
import numpy as np
import matplotlib.pyplot as plt
XRANGE = 5.0
CUBE_CONST = 1.5
ACCELERATION = 9.8
VELOCITY = -20.0
x = np.linspace(-XRANGE, XRANGE, 200)
y = CUBE_CONST*x**3 + 0.5*ACCELERATION*x**2 + VELOCITY*x
# simple MatPlotLib plot
f1 = plt.figure()
ax = plt.axes() # get plot axes
ax.set_facecolor('lightgray') # background color of plot
plt.plot(x, y, 'g-')
plt.title('Grid Lines in MatPlotLib DEMO')
plt.xlabel('X')
plt.ylabel(f'Y = {CUBE_CONST:.2f}*X3 + {0.5*ACCELERATION:.2f}*x**2' f' + {VELOCITY:.2f}*x)')
plt.grid(which='major', color='black')
plt.grid(which='minor', color='gray')
plt.minorticks_on() # need this to see the minor grid lines
plt.show(block=True)
f1.savefig('how_to_display_grid_lines_in_matplotlib.jpg',
dpi=300)

About Me

November 14, 2022

[Video] New Ways to Lie with Statistics

Uncensored Video Links: BitChute NewTube Odysee

Introduction to the new and improved ways to lie with statistics enabled by modern computers and technology.

Some References:

Example of Fishy Numbers: http://wordpress.jmcgowan.com/wp/video-article-the-cdcs-grossly-contradictory-death-numbers/

Which Pneumonia and Influenza Death Numbers Should Be Compared to the COVID-19 Death Numbers?

Deep Dive: https://www.authorea.com/users/425106/articles/547336-improving-cdc-data-practices-recommendations-for-improving-the-united-states-centers-for-disease-control-cdc-data-practices-for-pneumonia-influenza-and-covid-19-v-1-1

Serial Killers: http://wordpress.jmcgowan.com/wp/murder-math-do-executions-deter-serial-killers/

About Us:

Avoid Internet Censorship by Subscribing to Our RSS News Feed: http://wordpress.jmcgowan.com/wp/feed/

Legal Disclaimers: http://wordpress.jmcgowan.com/wp/legal/

Support Us:
PATREON: https://www.patreon.com/mathsoft
SubscribeStar: https://www.subscribestar.com/mathsoft

#

About Me

August 28, 2022

Murder Math: Do Executions Deter Serial Killers?

Introduction

The 1980’s were the decade of serial killers. Serial killers were in the news, Hollywood movies, bestselling novels like Thomas Harris’s Red Dragon and Silence of the Lambs, and True Crime books. The serial killer craze overlapped with fears of missing children, hysteria about the Dungeons and Dragons role playing game, and tales of Satanic Ritual Abuse — themes exploited in Netflix’s hit 80s nostalgia show Stranger Things.

The craze started in the 1970s with high profile serial killer cases such as Ted Bundy and John Wayne Gacy and continued into the early 1990s with the hit Silence of the Lambs movie starring Jodie Foster (the fifth highest grossing movie in 1991) and the notorious Jeffrey Dahmer cannibal serial killer case.

The Silence of the Lambs Movie Poster (1991)

In the last three decades serial killers have waned in the news and public consciousness. The 1980s is often remembered for a Satanic Panic, a supposedly entirely unfounded hysteria about Satanism, ritual abuse in day care centers and Satanic serial killers. Henry Lee Lucas’s once widely repeated claims to have killed six-hundred people as a contract killer for a Satanic cult have been largely dismissed as fantasy.

A US Senate committee and the FBI produced an estimate in 1983 of about 3,600 Americans possibly murdered in 1981 by serial killers, rounded up to 4-5,000 in some news media reports. In 1984 the FBI rolled the number back to about ten percent of all murders, or about 540 Americans possibly murdered by serial killers in a year. Scholars challenged the FBI and government figures as grossly exaggerating the number of murders by serial killers. (see Using Murder: The Social Construction of Serial Homicide by Philip Jenkins for an in depth discussion of the statistics.)

Was the 1980s Serial Killer Wave Real?

Was there even a wave of serial killer cases or was the serial killer wave the product of news coverage and clever marketing by the FBI’s Behavioral Sciences Unit? Have serial killer cases declined in frequency and — if so — why?

Wikipedia’s List of serial killers in the United States purports to be a comprehensive list of known serial killers in the United States with names, dates, and numbers of proven and possible victims from the 1700s to the present (data table downloaded on August 21, 2022).

Indeed this Wikipedia database shows a dramatic surge in serial killers peaking in the early to mid 1980’s and declining back to historically very low levels in the 1990s through the present. This surge is clear in all cases, serial killers with five or more known victims, and even the rare serial killers with ten or more known victims such as Jeffrey Dahmer.

WHY?

The serial killer surge occurred during a period of almost no executions with a complete cessation for four years after the 1972 Furman versus George Supreme Court cast. As executions ramped back up in the 1990’s serial killer cases dropped.

This plot shows executions of serial killers between 1950 and 2020, with about 115 executed out of a total of 553 from the Wikipedia list. These are particularly heinous crimes often involving sadistic torture of victims. Serial killers face a much higher probability of execution than the typical murderer.

The next plot shows several key data series from 1950 to 2020. The red line shows the execution rate, executions per 150 million Americans. The count of executions in the United States is from the ESPY file from deathpenaltyinfo.org and the US Executions from 2003 to 2020 reported by statista (URL: https://www.statista.com/statistics/629845/number-of-executions-per-year-in-the-us-since-2000/)

One can see executions had dropped to almost zero by 1966, well before the 1972 Furman versus Georgia Supreme Court case that completely stopped executions until 1976. Many of the few executions during the 1970s were murderers who wanted or claimed they wanted to be executed such as Utah’s Gary Gilmore case. Executions began to rise slowly in 1982 reaching a post 1972 peak in 1998.

The green line shows the overall United States Homicide Rate in Homicides per Million Americans. The homicide rate is usually quoted as homicides per 100,000 (100K). Per million is used to make the scale comparable to the other key data series in the plot. The overall murder rate doubles during this period, dropping back to historical levels in the 1990’s as executions rose back to late 1950, early 1960s levels.

The US Homicide rate is from two sources: https://www.infoplease.com/us/crime/homicide-rate-1950-2014 and https://www.macrotrends.net/countries/USA/united-states/murder-homicide-rate

The light blue bars show the proven victims of named serial killers from the Wikipedia list. The shorter dark blue bars show the number of active named serial killers. The list often gives a range of years when the killer was active, e.g. 1982-1986. The proven victims and the serial killer are assigned to the mid-point of the range for simplicity, e.g. 1984. The number of active serial killers by year, defined in this way, is inversely correlated with the execution rate: more executions, fewer serial killers.

The yellow bars show the number of victims of mass shootings from Wikipedia’s List of mass shootings in the United States (data tables downloaded on August 24, 2022). The number of mass shooting victims has climbed as the number of executions has dropped since the 1998 peak while the number of serial killer victims has remained low.

The plot below shows the US named serial killer proven victims per year versus the execution rate, executions per 150 million Americans. We can see that the maximum number of proven victims drops exponentially with the execution rate. At low execution rates, there is significant unexplained variation in the number — suggestion other factors at play when executions are rare or nonexistent.

The plot below shows the US Murder Victims per 100,000 Americans per year versus the execution rate, again executions for 150 million Americans. We can see the murder rate drops exponentially with the execution rate. Again, at low execution rates, there is significant unexplained variation indicating other factors than the execution rate play a role when executions are rare or non existent. R**2, known as “R squared” is the coefficient of determination and is roughly the fraction of variation in the data explained by the model.

There appears to be a strong relationship between the execution rate and the number of murders — both the general murder rate and the serial killer murder rate. Both popular news and academic articles often claim mystification as to both the reasons for the sharp late 1960’s rise in murders compared to the historical 1950s, early 1960’s levels and similarly the dramatic drop in murders in the 1990s. It is true that the execution rate is probably highly correlated with other “tough on crime” measures over time.

Alternative Theories

Nonetheless, there are several prominent attempts to explain the sharp drop in murders in the 1990s without crediting any “tough on crime” policies, let alone the execution rate.

One attempt to explain the drop in the general murder rate in the 1990s is the removal of lead paint and piping in poor — often Black areas — and yet this would probably have been an even more serious problem in the 1950s and early 1960s when murder rates and serial murder rates were quite low. Most named serial killers are white, not Black. Contrary to old claims from the FBI, some serial killers are Black, but white serial killers still dominate the statistics. Yet serial killer murders declined as well in the 1990s as the execution rate climbed to the 1998 peak.

The economist Steven D Levitt of Freakonomics fame tried to explain the 1990s drop as a delayed effect of the 1973 legalization of abortion in Roe vs Wade. He has also suggested other explanations — but not executions.

Here, for example, is a 2004 article by Steven D. Levitt ruling out increased use of capital punishment as the cause of the 1990s decline in murders. He claims:

“given the rarity of with which executions are carried out in this country and the long delays in doing so, a rational criminal should not be deterred by the threat of execution” (emphasis added)
https://www.ojp.gov/ncjrs/virtual-library/abstracts/understanding-why-crime-fell-1990s-four-factors-explain-decline-and

This is a rather odd statement for a scientific data analysis where one ought to look at empirical data. Who said murderers are rational? Most serial killers are not rational as normally defined. They are often found legally sane which is a narrower concept than common notions of rationality. Indeed the historical data suggests relatively low levels of execution have deterred both highly irrational serial killers and presumably somewhat more rational ordinary murderers.

The Conspiracy Question

The stereotype of serial killers is that they are lone nuts. However the Wikipedia list of US serial killers actually lists at least 57 serial killers active between 1950 and 2020 with an accomplice or accomplices in the Notes section out of 553 active serial killers for a total of 89 killers including the accomplices. This is about 10.3% (57/553) to 16.09% (89/553) of the serial killers depending on how one counts accomplices. In common usage, the murders are the work of a conspiracy: two or more perpetrators.

The list appears to identify only accomplices convicted in court cases. For example, there was strong forensic and eyewitness evidence that Randy Kraft, one of the three seemingly independent California “Freeway Killers” of the 70s and 80s, had at least one accomplice. Police suspected one of his roommates but were unable to find enough evidence or secure a confession. The Wikipedia list does not mention any accomplices for Randy Kraft. Thus, the list probably understates the number of cases with actual accomplices.

It is true that many of these conspiracy cases are pairs such as the so-called “Toolbox Killers” Lawrence Bittaker and Roy Norris. Nonetheless, there are several cases of three or more “serial killers” working together. Another California “Freeway Killer,” William Bonin had an astonishing four accomplices — all convicted or confessed. The Briley Brothers were three brothers and an accomplice Duncan Eric Meekins — another total of four. Dean Corll has at least two accomplices, both convicted: Elmer Wayne Henley and David Brooks. The so-called “Ripper Crew,” a Satanic cult, in Chicago included at least four (4) convicted members: Robin Gecht, Andrew Kokoraleis, Thomas Kokoraleis, and Edward Spreitzer. Four people — Manuel Moore, Larry Green, Jessie Lee Cooks, and J. C. X. Simon — were convicted of the so-called Zebra Murders. That is five cases and twenty (20) out of 553 active identified serial killers between 1950 and 2020 with three or more clearly identified — convicted or confessed — conspirators, about one to four percent depending on how one counts the cases, accomplices and serial killers.

Only small serial murder conspiracies have been demonstrated in court or by rigorous forensic evidence. There is however a popular literature alleging some or a large fraction of serial killer cases are the work of a larger conspiracy or conspiracies such as Dave McGowan’s Programmed to Kill (no relation) and Maury Terry’s The Ultimate Evil. These works blame the serial murders on Satanic cults, neo-Nazis, elite pedophile networks and CIA MK-ULTRA-like mind control programs, often combined into a single super-conspiracy and often overlapping with the Satanic Ritual Abuse allegations of the 1980s — which in fact continue to the present.

The Late Dave McGowan, author of Programmed to Kill. (left)

It is more difficult to identify and convict murderers in larger conspiracies such as street gang violence, the “Mafia”, and other higher level organized crime. Many unsolved murders — often in inner city Black neighborhoods — are attributed to street gang violence. Larger conspiracies are more effective at intimidating witnesses and corrupting investigations than lone killers or pairs of killers. Larger conspiracies can be long lived and technically sophisticated, better at destroying forensic evidence, disposing of bodies etc. Proving a gang leader or organized crime official has ordered a murder can be difficult or impossible.

The serial killer super conspiracy theories invoke confessions by some serial killers such as the “Son of Sam” David Berkowitz claiming to have been part of a larger conspiracy and various anomalies in some serial killer cases, some quite odd and suspicious. For example, serial killer Bob Berdella, who owned Bob’s Bizarre Bazaar, a boutique that sold artifacts of the occult, home was purchased by Kansas City multi-millionaire Delbert Dunmire — a former bank robber — who eventually destroyed the home and presumably any remaining evidence.

Cary Stayner, convicted of killing four women around the Yosemite National Park in California, is the older brother of Stephen Stayner, a high profile victim of abduction by child molester Keith Parnell — the subject of national news stories and later a TV mini-series.

Police largely failed to investigate a series of disappearances of teenage boys, many from the same Junior High School, in Dean Corll’s neighborhood in Houston, Texas despite pleas from parents some of whom hired private investigators and posted flyers throughout the neighborhood — until the boys were found buried in Corll’s rented boathouse after accomplice Elmer Wayne Henley killed Corll and called the police.

It is unclear how to evaluate such anomalies. Serial killers are quite unusual. The cases frequently attract unusually high levels of publicity. The cases often overlap with prostitution and other illegal activities as well as legal but often socially taboo activities such as homosexuality or occult practices. In some cases, the police may be paid off to “look the other way” or even involved in these activities, which may explain some instances of remarkably inept policing.

Some of these theories emphasize the military background of some serial killers, suggesting they were specially trained or even brainwashed by MK-ULTRA like mind control programs during military service. Reviewing the fifty-seven (57) entries with identified accomplices active from 1950 to 2020 in the Wikipedia list — giving a total of 89 killers including the accomplices, only nine (9 or 15.8%) appear to have served in the US military: Doug Clark, Gary Lewingdon, John Allen Muhammad, Leonard Lake, Manuel Pardo, Roy Norris, and William Bonin. The five-thirty-eight statistics web site published an article in 2015 estimating that about 13.4% of US males have served in the US military.

Since most serial killers are men, there is little evidence that US military veterans are over- or under-represented among serial killers — at least who have identified accomplices. One might expect serial killers in a super conspiracy to be over-represented among serial killers with identified accomplices.

Relevant to the causation of the serial killer wave and the more recent mass shooting wave, some of these theories, notably Dave McGowan’s Programmed to Kill, argue the serial killer wave cases were manufactured in part to frighten the US public into embracing oppressive “tough on crime” policies that increase the power of the CIA, FBI, and other police and security agencies. Thus, there is no deterrent effect from executions but rather a wave of “false flag” operations to undercut more liberal policies such as reducing or eliminating use of the death penalty.

Conclusion

The Eighties serial killer wave was real, not wholly a product of media hype or manipulated statistics from the FBI Behavioral Sciences Unit or other official sources, despite both government and media exaggerations such as the Henry Lee Lucas case and the very high claimed numbers of Americans killed by serial killers during the early 1980s.

The major contributing factor to the wave was probably the dramatic drop in the execution rate and/or associated “tough on crime” measures in the mid 1960s.

The evidence for a super conspiracy behind the serial killer wave such as proposed by the late Dave McGowan in Programmed to Kill is quite weak but not non-existent. The anomalies cited in such theories probably can be explained by the unusual nature of the crimes and perpetrators, the extreme levels of publicity, unidentified accomplices — somewhat larger small conspiracies — which comprise 10-16% of the cases, and overlaps with illegal activities such as prostitution and police corruption.

A similar rise in murders — though not as great — only a factor of two — occurred in general US murders, generally less horrific and less likely to receive the death penalty than serial killers.

The significant variation in murder rates, both general US and serial killer, at lower execution rates indicates other factors come into play as well when executions are rare or do not occur.

We may be seeing a surge of mass shootings as the execution rate has declined since 1998 instead of the surge of serial killers in the 1960s and 1970s following the near cessation of executions culminating in a total cessation for four years after the 1972 Furman versus Georgia Supreme Court decision.

Note that these conclusions are not an endorsement of capital punishment, nor do they address a range of other issues regarding capital punishment such as wrongful convictions, racial and other discrimination in the application of capital punishment, and the relative effectiveness of life imprisonment without possibility of parole.

About Me

June 17, 2022

[Video] Can Nuclear War Get You Reelected?

Uncensored Video Links: NewTube ARCHIVE BitChute

What does the history of Presidential approval ratings and wars tell us about whether a nuclear war could get a failing President reelected?

Article: http://wordpress.jmcgowan.com/wp/article-can-nuclear-war-get-you-reelected/

About Us:

Avoid Internet Censorship by Subscribing to Our RSS News Feed: http://wordpress.jmcgowan.com/wp/feed/

Legal Disclaimers: http://wordpress.jmcgowan.com/wp/legal/

Support Us:
PATREON: https://www.patreon.com/mathsoft
SubscribeStar: https://www.subscribestar.com/mathsoft

About Me

April 10, 2022April 10, 2022

[Video] How to Analyze Data Using a Baseline Linear Model in Python

https://www.bitchute.com/video/b1D2KMk4kGKH/

Other Uncensored Video Links: NewTube Odysee

YouTube

Video on how to analyze data using a baseline linear model in the Python programming language. A baseline linear model is often a good starting point, reference for developing and evaluating more advanced usually non-linear models of data.

Article with source code: http://wordpress.jmcgowan.com/wp/article-how-to-analyze-data-with-a-baseline-linear-model-in-python/

About Me

April 10, 2022April 10, 2022

[Article] How to Analyze Data with a Baseline Linear Model in Python

This article shows Python programming language source code to perform a simple linear model analysis of time series data. Most real world data is not linear but a linear model provides a common baseline starting point for comparison of more advanced, generally non-linear models.

Simulated Nearly Linear Data with Linear Model

"""
Standalone linear model example code.

Generate simulated data and fit model to this simulated data.

LINEAR MODEL FORMULA:

OUTPUT = MULT_T*DATE_TIME + MULT_1*INPUT_1 + MULT_2*INPUT_2 + CONSTANT + NOISE

set MULT_T to 0.0 for simulated data.  Asterisk * means MULTIPLY
from grade school arithmetic.  Python and most programming languages
use * to indicate ordinary multiplication.

(C) 2022 by Mathematical Software Inc.

Point of Contact (POC): John F. McGowan, Ph.D.
E-Mail: ceo@mathematical-software.com

"""

# Python Standard Library
import os
import sys
import time
import datetime
import traceback
import inspect
import glob
# Python add on modules
import numpy as np  # NumPy
import pandas as pd  # Python Data Analysis Library
import matplotlib.pyplot as plt  # MATLAB style plotting
from sklearn.metrics import r2_score  # scikit-learn
import statsmodels.api as sm  # OLS etc.

# STATSMODELS
#
# statsmodels is a Python module that provides classes and functions for
# the estimation of many different statistical models, as well as for
# conducting statistical tests, and statistical data exploration. An
# extensive list of result statistics are available for each
# estimator. The results are tested against existing statistical
# packages to ensure that they are correct. The package is released
# under the open source Modified BSD (3-clause) license.
# The online documentation is hosted at statsmodels.org.
#
# statsmodels supports specifying models using R-style formulas and pandas DataFrames. 


def debug_prefix(stack_index=0):
    """
    return <file_name>:<line_number> (<function_name>)

    REQUIRES: import inspect
    """
    the_stack = inspect.stack()
    lineno = the_stack[stack_index + 1].lineno
    filename = the_stack[stack_index + 1].filename
    function = the_stack[stack_index + 1].function
    return (str(filename) + ":"
            + str(lineno)
            + " (" + str(function) + ") ")  # debug_prefix()


def is_1d(array_np,
          b_trace=False):
    """
    check if array_np is 1-d array

    Such as array_np.shape:  (n,), (1,n), (n,1), (1,1,n) etc.

    RETURNS: True or False

    TESTING: Use DOS> python -c "from standalone_linear import *;test_is_1d()"
    to test this function.

    """
    if not isinstance(array_np, np.ndarray):
        raise TypeError(debug_prefix() + "argument is type "
                        + str(type(array_np))
                        + " Expected np.ndarray")

    if array_np.ndim == 1:
        # array_np.shape == (n,)
        return True
    elif array_np.ndim > 1:
        # (2,3,...)-d array
        # with only one axis with more than one element
        # such as array_np.shape == (n, 1) etc.
        #
        # NOTE: np.array.shape is a tuple (not a np.ndarray)
        # tuple does not have a shape
        #
        if b_trace:
            print("array_np.shape:", array_np.shape)
            print("type(array_np.shape:",
                  type(array_np.shape))
            
        temp = np.array(array_np.shape)  # convert tuple to np.array
        reference = np.ones(temp.shape, dtype=int)

        if b_trace:
            print("reference:", reference)

        mask = np.zeros(temp.shape, dtype=bool)
        for index, value in enumerate(temp):
            if value == 1:
                mask[index] = True

        if b_trace:
            print("mask:", mask)
        
        # number of axes with one element
        axes = temp[mask]
        if isinstance(axes, np.ndarray):
            n_ones = axes.size
        else:
            n_ones = axes
            
        if n_ones >= (array_np.ndim - 1):
            return True
        else:
            return False
    # END is_1d(array_np)


def test_is_1d():
    """
    test is_1d(array_np) function  works
    """

    assert is_1d(np.array([1, 2, 3]))
    assert is_1d(np.array([[10, 20, 33.3]]))
    assert is_1d(np.array([[1.0], [2.2], [3.34]]))
    assert is_1d(np.array([[[1.0], [2.2], [3.3]]]))
    
    assert not is_1d(np.array([[1.1, 2.2], [3.3, 4.4]]))

    print(debug_prefix(), "PASSED")
    # test_is_1d()


def is_time_column(column_np):
    """
    check if column_np is consistent with a time step sequence
    with uniform time steps. e.g. [0.0, 0.1, 0.2, 0.3,...]

    ARGUMENT: column_np -- np.ndarray with sequence

    RETURNS: True or False
    """
    if not isinstance(column_np, np.ndarray):
        raise TypeError(debug_prefix() + "argument is type "
                        + str(type(column_np))
                        + " Expected np.ndarray")

    if is_1d(column_np):
        # verify if time step sequence is nearly uniform
        # sequence of time steps such as (0.0, 0.1, 0.2, ...)
        #
        delta_t = np.zeros(column_np.size-1)
        for index, tval in enumerate(column_np.ravel()):
            if index > 0:
                previous_time = column_np[index-1]
                if tval > previous_time:
                    delta_t[index-1] = tval - previous_time
                else:
                    return False

        # now check that time steps are almost the same
        delta_t = np.median(delta_t)
        delta_range = np.max(delta_t) - np.min(delta_t)
        delta_pct = delta_range / delta_t
        
        print(debug_prefix(),
              "INFO: delta_pct is:", delta_pct, flush=True)
        
        if delta_pct > 1e-6:
            return False
        else:
            return True  # steps are almost the same
    else:
        raise ValueError(debug_prefix() + "argument has more"
                         + " than one (1) dimension.  Expected 1-d")
    # END is_time_column(array_np)


def validate_time_series(time_series):
    """
    validate a time series NumPy array

    Should be a 2-D NumPy array (np.ndarray) of float numbers

    REQUIRES: import numpy as np

    """
    if not isinstance(time_series, np.ndarray):
        raise TypeError(debug_prefix(stack_index=1)
                        + " time_series is type "
                        + str(type(time_series))
                        + " Expected np.ndarray")

    if not time_series.ndim == 2:
        raise TypeError(debug_prefix(stack_index=1)
                        + " time_series.ndim is "
                        + str(time_series.ndim)
                        + " Expected two (2).")

    for row in range(time_series.shape[0]):
        for col in range(time_series.shape[1]):
            value = time_series[row, col]
            if not isinstance(value, np.float64):
                raise TypeError(debug_prefix(stack_index=1)
                                + "time_series[" + str(row)
                                + ", " + str(col) + "] is type "
                                + str(type(value))
                                + " expected float.")

    # check if first column is a sequence of nearly uniform time steps
    #
    if not is_time_column(time_series[:, 0]):
        raise TypeError(debug_prefix(stack_index=1)
                        + "time_series[:, 0] is not a "
                        + "sequence of nearly uniform time steps.")

    return True  # validate_time_series(...)


def fit_linear_to_time_series(new_series):
    """
    Fit multivariate linear model to data.  A wrapper
    for ordinary least squares (OLS).  Include possibility
    of direct linear dependence of the output on the date/time.
    Mathematical formula:

    output = MULT_T*DATE_TIME + MULT_1*INPUT_1 + ... + CONSTANT

    ARGUMENTS: new_series -- np.ndarray with two dimensions
                             with multivariate time series.
                             Each column is a variable.  The
                             first column is the date/time
                             as a float value, usually a
                             fractional year.  Final column
                             is generally the suspected output
                             or dependent variable.

                             (time)(input_1)...(output)

    RETURNS: fitted_series -- np.ndarray with two dimensions
                              and two columns: (date/time) (output
                              of fitted model)

             results --
                 statsmodels.regression.linear_model.RegressionResults

    REQUIRES: import numpy as np
              import pandas as pd
              import statsmodels.api as sm  # OLS etc.

    (C) 2022 by Mathematical Software Inc.

    """
    validate_time_series(new_series)

    #
    # a data frame is a package for a set of numbers
    # that includes key information such as column names,
    # units etc.
    #
    input_data_df = pd.DataFrame(new_series[:, :-1])
    input_data_df = sm.add_constant(input_data_df)

    output_data_df = pd.DataFrame(new_series[:, -1])

    # statsmodels Ordinary Least Squares (OLS)
    model = sm.OLS(output_data_df, input_data_df)
    results = model.fit()  # fit linear model to the data
    print(results.summary())  # print summary of results
                              # with fit parameters, goodness
                              # of fit statistics etc.

    # compute fitted model values for comparison to data
    #
    fitted_values_df = results.predict(input_data_df)

    fitted_series = np.vstack((new_series[:, 0],
                               fitted_values_df.values)).transpose()

    assert fitted_series.shape[1] == 2, \
        str(fitted_series.shape[1]) + " columns, expected two(2)."

    validate_time_series(fitted_series)

    return fitted_series, results  # fit_linear_to_time_series(...)


def test_fit_linear_to_time_series():
    """
    simple test of fitting  a linear model to simple
    simulated data.

    ACTION: Displays plot comparing data to the linear model.

    REQUIRES: import numpy as np
              import matplotlib.pyplot as plt
              from sklearn.metrics impor r2_score (scikit-learn)

    NOTE: In mathematics a function f(x) is linear if:

    f(x + y) = f(x) + f(y)  # function of sum of two inputs
                            # is sum of function of each input value

    f(a*x) = a*f(x)         # function of constant multiplied by
                            # an input is the same constant
                            # multiplied by the function of the
                            # input value

    (C) 2022 by Mathematical Software Inc.
    """

    # simulate monthly data for years 2010 to 2021
    time_steps = np.linspace(2010.0, 2022.0, 120)
    #
    # set random number generator "seed"
    #
    np.random.seed(375123)  # make test reproducible
    # make random walks for the input values
    input_1 = np.cumsum(np.random.normal(size=time_steps.shape))
    input_2 = np.cumsum(np.random.normal(size=time_steps.shape))

    # often awe inspiring Greek letters (alpha, beta,...)
    mult_1 = 1.0  # coefficient or multiplier for input_1
    mult_2 = 2.0   # coefficient or multiplier for input_2
    constant = 3.0  # constant value  (sometimes "pedestal" or "offset")

    # simple linear model
    output = mult_1*input_1 + mult_2*input_2 + constant
    # add some simulated noise
    noise = np.random.normal(loc=0.0,
                             scale=2.0,
                             size=time_steps.shape)

    output = output + noise

    # bundle the series into a single multivariate time series
    data_series = np.vstack((time_steps,
                             input_1,
                             input_2,
                             output)).transpose()

    #
    # np.vstack((array1, array2)) vertically stacks
    # array1 on top of array 2:
    #
    #  (array 1)
    #  (array 2)
    #
    # transpose() to convert rows to vertical columns
    #
    # data_series has rows:
    #    (date_time, input_1, input_2, output)
    #    ...
    #

    # the model fit will estimate the values for the
    # linear model parameters MULT_T, MULT_1, and MULT_2

    fitted_series, \
        fit_results = fit_linear_to_time_series(data_series)

    assert fitted_series.shape[1] == 2, "wrong number of columns"

    model_output = fitted_series[:, 1].flatten()

    #
    # Is the model "good enough" for practical use?
    #
    # Compure R-SQUARED also known as R**2
    # coefficient of determination, a goodness of fit measure
    # roughly percent agreement between data and model
    #
    r2 = r2_score(output,  # ground truth / data
                  model_output  # predicted values
                  )

    #
    # Plot data and model predictions
    #

    model_str = "OUTPUT = MULT_1*INPUT_1 + MULT_2*INPUT_2 + CONSTANT"

    f1 = plt.figure()
    # set light gray background for plot
    # must do this at start after plt.figure() call for some
    # reason
    #
    ax = plt.axes()  # get plot axes
    ax.set_facecolor("lightgray")  # confusingly use set_facecolor(...)
    # plt.ylim((ylow, yhi))  # debug code
    plt.plot(time_steps, output, 'g+', label='DATA')
    plt.plot(time_steps, model_output, 'b-', label='MODEL')
    plt.plot(time_steps, data_series[:, 1], 'cd', label='INPUT 1')
    plt.plot(time_steps, data_series[:, 2], 'md', label='INPUT 2')
    plt.suptitle(model_str)
    plt.title(f"Simple Linear Model (R**2={100*r2:.2f}%)")

    ax.text(1.05, 0.5,
            model_str,
            rotation=90, size=7, weight='bold',
            ha='left', va='center', transform=ax.transAxes)

    ax.text(0.01, 0.01,
            debug_prefix(),
            color='black',
            weight='bold',
            size=6,
            transform=ax.transAxes)

    ax.text(0.01, 0.03,
            time.ctime(),
            color='black',
            weight='bold',
            size=6,
            transform=ax.transAxes)

    plt.xlabel("YEAR FRACTION")
    plt.ylabel("OUTPUT")
    plt.legend(fontsize=8)
    # add major grid lines
    plt.grid()
    plt.show()

    image_file = "test_fit_linear_to_time_series.jpg"
    if os.path.isfile(image_file):
        print("WARNING: removing old image file:",
              image_file)
        os.remove(image_file)

    f1.savefig(image_file,
               dpi=150)

    if os.path.isfile(image_file):
        print("Wrote plot image to:",
              image_file)

    # END test_fit_linear_to_time_series()


if __name__ == "__main__":
    # MAIN PROGRAM

    test_fit_linear_to_time_series()  # test linear model fit

    print(debug_prefix(), time.ctime(), "ALL DONE!")

About Me

April 8, 2022

[Video] How to Extract Data from Images of Plots

Free Speech Video Links: Odysee RUMBLE NewTube

Short video on how to extract data from images of plots using WebPlotDigitizer, a free, open-source program available for Windows, Mac OS X, and Linux platforms.

WebPlotDigitizer web site: https://automeris.io/WebPlotDigitizer/

About Me

April 3, 2022

[Video] Ukraine COVID and Biden Approval Ratings Deeper Dive

Uncensored Video Links: BitChute Odysee Rumble

Short video discussing results of analyzing President Biden’s declining approval ratings and the possible effect of the COVID pandemic and Ukraine crises on the approval ratings.

A detailed longer explanation of the analysis discussed can be found in the previous video “How to Analyze Simple Data Using Python” available on all of our video channels.

About Me

April 3, 2022

[Video] How to Analyze Simple Data Using Python

Uncensored Video Links: BitChute NewTube ARCHIVE Brighteon Odysee

Video on how to analyze simple data using the Python programming language using President Biden’s approval ratings as an example.

About Me