Short (seven and one half minute) video showing how to evaluate scientifically if advertising boosts profits using mathematical modeling and statistics with a pitch for our free open source AdEvaluator™ software and a teaser for our non-free AdEvaluator Pro™software — coming soon.
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).
John Wanamaker, (attributed) US department store merchant (1838 – 1922)
Between $190 billion and $270 billion is spent on advertising in the United States each year (depending on source). It is often hard to tell whether the advertising boosts sales and profits. This is caused by the unpredictability of individual sales and in many cases the other changes in the business and business environment occurring in addition to the advertising. In technical terms, the evaluation of the effect of advertising on sales and profits is often a multidimensional problem.
Many common metrics such as the number of views, click through rates (CTR), and others do not directly measure the change in sales or profits. For example, an embarrassing or controversial video can generate large numbers of views, shares, and even likes on a social media site and yet cause a sizable fall in sales and profits.
Because individual sales are unpredictable, it is often difficult or impossible to tell whether a change in sales is caused by advertising, simply due to chance alone or some combination of advertising and luck.
The plot below shows the simulated daily sales for a product or service with a price of $90.00 per unit. Initially, the business has no advertising, relying on word of mouth and other methods to acquire and retain customers. During this “no advertising” period, an average of three units are sold per day. The business then contracts with an advertising service such as Facebook, Google AdWords, Yelp, etc. During this “advertising” period, an average of three and one half units are sold per day.
The raw daily sales data is impossible to interpret. Even looking at the thirty day moving average of daily sales (the black line), it is far from clear that the advertising campaign is boosting sales.
Taking the average daily sales over the “no advertising” period, the first six months, and over the “advertising” period (the blue line), the average daily sales was higher during the advertising period.
Is the increase in sales due to the advertising or random chance or some combination of the two causes? There is always a possibility that the sales increase is simply due to chance. How much confidence can we have that the increase in sales is due to the advertising and not chance?
This is where statistical methods such as Student’s T test, Welch’s T test, mathematical modeling and computer simulations are needed. These methods compute the effectiveness of the advertising in quantitative terms. These quantitative measures can be converted to estimates of future sales and profits, risks and potential rewards, in dollar terms.
Measuring the Difference Between Two Random Data Sets
In most cases, individual sales are random events like the outcome of flipping a coin. Telling whether sales data with and without advertising is the same is similar to evaluating whether two coins have the same chances of heads and tails. A “fair” coin is a coin with an equal chance of giving a head or a tail when flipped. An “unfair” coin might have a three fourths chance of giving a head and only a one quarter chance of giving a tail when flipped.
If I flip each coin once, I cannot tell the difference between the fair coin and the unfair coin. If I flip the two coins ten times, on average I will get five heads from the fair coin and seven and one half (seven or eight) heads from the unfair coin. It is still hard to tell the difference. With one hundred times, the fair coin will average fifty heads and the unfair coin seventy-five heads. There is still a small chance that the seventy five heads came from a fair coin.
The T statistics used in Student’s T test (Student was a pseudonym used by statistician William Sealy Gossett) and Welch’s T test, a more advanced T test, are measures of the difference in a statistical sense between two random data sets, such as the outcome of flipping coins one hundred times. The larger the T statistic the more different the two random data sets in a statistical sense.
Student’s T test and Welch’s T test convert the T statistics into probabilities that the difference between the two data sets (the “no advertising” and “advertising” sales data in our case) is due to chance. Student’s T test and Welch’s T test are included in Excel and many other financial and statistical programs.
The plot below is a histogram (bar chart) of the number of simulations with a Welch’s T statistic value. In these simulations, the advertising has no effect on the daily sales (or profits). The advertising has no effect is the null hypothesis in the language of classical statistics.
Welch was able to derive a mathematical formula for the expected distribution — shape of this histogram — using calculus. The mathematical formula could then be evaluated quickly with pencil and paper or an adding machine, the best available technology of his time (the 1940’s).
To derive his formula using calculus, Welch had to assume that the data had a Bell Curve (Normal or Gaussian) distribution. This is at best only approximately true for the sales data above. The distribution of daily sales in the simulated data is actually the Poisson distribution. The Poisson distribution is a better model of sales data and approximates the Bell Curve as the number of sales gets larger. This is why Welch’s T test is often approximately valid for sales data.
Many methods and tests in classical statistics assume a Bell Curve (Normal or Gaussian) distribution and are often approximately correct for real data that is not Bell Curve data. We can compute better, more reliable results with computer simulations using the actual or empirical probability distributions — shown below.
More precisely, naming one data set the reference data and the other data set the test data, the T test computes the probability that the test data is due to a chance variation in the process that produced the reference data set. In the advertising example above, the “no advertising” period sales data is the reference data and the “advertising” sales data is the test data. Roughly this probability is the fraction of simulations in the Welch’s T statistic histogram that have a T statistic larger (or smaller for a negative T statistic) than the measured T statistic for the actual data. This probability is known as a p-value, a widely used statistic pioneered by Ronald Fisher.
The p-value has some obvious drawbacks for a business evaluating the effectiveness of advertising. At best it only tells us the probability that the advertising boosted sales or profits, not how large the boost was nor the risks. Even if on average the advertising boosts sales, what is the risk the advertising will fail or the sales increase will be too small to recover the cost of the advertising?
Fisher worked for Rothamsted Experimental Station in the United Kingdom where he wanted to know whether new breeds of crops, fertilizers, or other new agricultural methods increased yields. His friend and colleague Gossett worked for the Guinness beer company where he was working on improving yields and quality of beer. In both cases, they wanted to know whether a change in the process had a positive effect, not the size of the effect. Without modern computers — using only pencil and paper and adding machines — it was not practical to perform simulations as we can easily today.
Welch’s T statistic has a value of -3.28 for the above sales data. This is in fact lower than nearly all the simulations in the histogram. It is very unlikely the boost in sales is due to chance. The p-value from Welch’s T test for the advertising data above — computed using Welch’s mathematical formula — is only 0.001 (one tenth of one percent). Thus it is very likely the boost in sales is caused by the advertising and not random chance. Note that this does not tell us if the size of the boost, whether the advertising is cost effective, or the risk of the investment.
Sales and Profit Projections Using Computer Simulations
We can do much better than Student’s T test and Welch’s T test by using computer simulations based on the empirical probabilities of sales from the reference data — the “no advertising” period sales data. The simulations use random number generators to simulate the random nature of individual sales.
In these simulations, we simulate one year of business operations with advertising many times — one-thousand in the examples shown — using the frequency of sales from the period with advertising. We also simulate one year of business operations without the advertising, using the frequency of sales from the period without advertising in the sales data.
We compute the annual change in the profit relative to the corresponding period — with or without advertising — in the sales data for each simulated year of business operations.
The simulations show that we have an average expected increase in profit of $5,977.66 over one year (our annual advertising cost is $6,000.00). It also shows that despite this there is a risk of a decrease in profits, some greater than the possible decreases with no advertising.
A business needs to know both the risks — how much money might be lost in a worst case — and the rewards — the average and best possible returns on the advertising investment.
Since sales are a random process like flipping a coin or throwing dice, there is a risk of a decline in profits or actual losses without the advertising. The question is whether the risk with advertising is greater, smaller, or the same. This is known as differential risk.
The Problem with p-values
This is a concrete example of the problem with p-values for evaluating the effectiveness of advertising. In this case, the advertising increases the average daily sales from 100 units per day to 101 units per day. Each unit costs one dollar (a candy bar for example).
The p-value from Welch’s T test is 0.007 (seven tenths of one percent). The advertising is almost certainly effective but the boost in sales is much less than the cost of the advertising:
The average expected decline in profits over the simulations is $5,128.84.
The p-value is not a good estimate of the potential risks and rewards of investing in advertising. Sales and profit projections from computer simulations based on a mathematical model derived from the reference sales data are a better (not perfect) estimate of the risks and rewards.
Multidimensional Sales Data
The above examples are simple cases where the only change is the addition of the advertising. There are no price changes, other advertising or marketing expenses, or other changes in business or economic conditions. There are no seasonal effects in the sales.
Student’s T test, Welch’s T test, and many other statistical tests are designed and valid only for simple controlled cases such as this where there is only one change between the reference and test data. These tests were well suited to data collected at the Rothamsted Experimental Station, Guinness breweries, and similar operations.
Modern businesses purchasing advertising from Facebook, other social media services, and modern media providers (e.g. the New York Times) face more complex conditions with many possible input variables (unit price, weather, unemployment rate, multiple advertising services, etc.) changing frequently or continuously.
For these, financial analysts need to extract predictive multidimensional mathematical models from the data and then perform similar simulations to evaluate the effect of advertising on sales and profits.
AdEvaluator™ is designed for cases with a single product or service with a constant unit price during both periods. AdEvaluator™ needs a reference period without the new advertising and a test period with the new advertising campaign. The new advertising campaign should be the only significant change between the two periods. AdEvaluator™ also assumes that the probability of the daily sales is independent and identically distributed during each period. This is not true in all cases. Exercise your professional business judgement whether the results of the simulations are applicable to your business.
This program comes with ABSOLUTELY NO WARRANTY; for details use -license option at the command line or select Help | License… in the graphical user interface (GUI). This is free software, and you are welcome to redistribute it under certain conditions.
We are developing a professional version of AdEvaluator™ for multidimensional cases. This version uses our Math Recognition™ technology to automatically identify good multidimensional mathematical models.
The Math Recognition™ technology is applicable to many types of data, not just sales and advertising data. It can for example be applied to complex biological systems such as the blood coagulation system which causes heart attacks and strokes when it fails. According the US Centers for Disease Control (CDC) about 633,000 people died from heart attacks and 140,000 from strokes in 2016.
Conclusion
It is often difficult to evaluate whether advertising is boosting sales and profits, despite the ready availability of sales and profit data for most businesses. This is caused by the unpredictable nature of individual sales and frequently by the complex multidimensional business environment where price changes, economic downturns and upturns, the weather, and other factors combine with the advertising to produce a confusing picture.
In simple cases with a single change, the addition of the new advertising, Student’s T test, Welch’s T test and other methods from classical statistics can help evaluate the effect of the advertising on sales and profits. These statistical tests can detect an effect but provide no clear estimate of the magnitude of the effect on sales and profits and the financial risks and rewards.
Sales and profit projections based on computer simulations using the empirical probability of sales from the actual sales data can provide quantitative estimates of the effect on sales and profits, including estimates of the financial risks (chance of losing money) and the financial rewards (typical and best case profits).
(C) 2018 by John F. McGowan, Ph.D.
About Me
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).
This is a short article on how to control the order of slides in a slideshow on the Microsoft Windows 10 operating system. Slideshows can be quickly launched in Windows 10 using the Windows File Explorer by selecting the Manage tab and clicking on the Slideshow Icon.
Usually, Windows 10 will display the picture files in the folder in the order displayed in the file explorer: alphabetically if Name is selected, by date if Date is selected, by file size if Size is selected, etc. In my experience on my system, this occasionally does not happen and the files are displayed alphabetically even though another view is selected. Thus, it is probably best to use alphabetical file names to ensure that the files display as desired.
Note that on Windows (and many computer systems) the numbers 0-9 come before A-Z, thus files that start with a number such as 000my_file_name.jpg will display before files that start with a letter such as my_file_name.jpg. In the example below, I use the prefix 000 to display the picture of George Washington first.
To display the Presidents in chronological order, I add a numeric prefix to each file in the folder. George Washington is the first President of the United States. John Adams is the second. Thomas Jefferson third. Andrew Jackson seventh. Abraham Lincoln sixteenth. Theodore Roosevelt twenty-sixth. Donald Trump forty-fifth.
By default, Windows 10 plays the slide show in Loop mode with Shuffle mode off. In this mode, the slides are displayed in order.
Right clicking with the mouse or other pointing device during the slide show brings up a popup menu with the Loop and Shuffle modes as well as other controls.
In the shuffle mode, the first slide is always displayed first. I will still get George Washington first in my example. All subsequent slides are displayed in random order. This seems like a bug; I would prefer the first slide to also be random.
NOTE: If for some reason you do not like the first slide displayed every time in shuffle mode, add a prefix to a picture file that you would prefer to be first to place it alphabetically before all other picture files in the folder.
Once shuffle mode or other controls (slow or fast for example) are selected, the selections remain in force for subsequent slide shows until changed.
That is how to control the order of slides in a slide show in Microsoft Windows 10.
This is a short video on how to control the order of slides in a slide show on Windows 10:
(C) 2018 by John F. McGowan, Ph.D.
About Me
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).
In this video, I discuss how to defeat soft or stealth censorship of the Internet by giant social media companies such as Facebook, Google/YouTube, Apple, Twitter and the government using the proven, established, widely used free open-source decentralized technology RSS (Rich Site Summary also known as Really Simple Syndication). I demonstrate this using the RSS reader in the free open-source Thunderbird e-mail program available for Microsoft Windows, Mac OS X, and most Linux platforms.
The government has many potential ways to apply pressure on social media companies in the absence of explicit censorship laws:
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).
A brief introduction to the math recognition problem and automatic math recognition using modern artificial intelligence and pattern recognition methods. Includes a call for data. About 14 minutes.
(C) 2018 by John F. McGowan, Ph.D.
About Me
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).
Microsoft recently (April 2018) added a new feature to Windows 10 known as Focus Assist to block distracting notifications on Windows 10. Remarkably, Focus Assist by default turns off notifications when the display is duplicated (for example, you plug your laptop into a high definition giant screen at the office every day — just like I do) ANDposts its’ own annoying notification to the notifications visible in the lower right corner of Windows 10!!!
By default this notification is generated and posted every time a laptop is plugged into a large screen display to duplicate the screen. Every time I come to work for example. Clicking on the lower right corner notification icon displays the full notification from Focus Assist:
Thus, after spending months figuring out how to disable a range of annoying notifications on my computer, I was now getting an annoying notification every day from a tool supposedly intended to help me focus 🙂
It is possible to turn this off by disabling the default option to turn on focus assist when the screen is duplicated:
Instead:
Just Block All Notifications Instead:
Just to be clear, the issue for me is that although I have Focus Assist on already, I still got the notification — the icon in the lower right corner updates and changes causing a visual distraction — when I plug my laptop into my large screen. This happened every morning and sometimes several times a day depending on my schedule and appointments. In general I don’t want notifications from Focus Assist distracting me either!!!
About Me
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).
Many online web sites and services, including many that have useful information, have become highly distracting and addictive — wasting many hours of precious time and clouding our judgement leading to bad purchases and other critical decisions. This increasing level of distraction is probably due to a combination of increasing integration of persuasive technology and advances in recommendation engines and other algorithms.
I personally have had significant problems with lost time and distractions from YouTube and Hacker News. Both of these sites have useful information as well as large amounts of distracting dreck. I find both addictive. I substantially reduced the amount of time wasted on Hacker News by switching from the web site to the Hacker News RSS feed in my Thunderbird email program.
Below is the Hacker News RSS feed in Thunderbird.
Thunderbird has a feature to subscribe to and manage RSS feeds. As can be seen, I have subscribed to Hacker News and Slashdot. Although Slashdot is similar to Hacker News in a number of respects, I have consistently found Hacker News much more distracting and addictive.
There is also an “End of Month” folder. I have configured message filters in Thunderbird to move important but distracting articles from Hacker News and Slashdot to the “End of Month” folder. This includes for example articles on some political topics that tend to get my blood boiling.
Hacker News has a “social” system of user and article scores, upvotes, downvotes , comments and other decorations. This “socialization” of the new articles seems to be a major factor in why the web site is substantially more addictive and distracting than the RSS feed. In addition, as noted, I am able to filter out articles that tend to distract me, putting them aside for a planned time to deal with distracting topics.
Many web sites have RSS feeds including Hacker News, Slashdot, and Tech Crunch (not shown here). This method can be applied to many distracting web sites to reduce the unwanted distractions and lost time while still keeping up with useful information. Message Filters can be configured to delete dreck, set aside articles on important but distracting topics, and highlight articles of special interest. With message filters, you — the reader/user — are in control instead of mysterious machine learning algorithms and recommendation engines.
How to Set Up RSS in Thunderbird
The Thunderbird web site provides detailed instruction on how to set up a Feed Account and subscribe to RSS feeds here.
Step 1: Create a Feed Account
First you must create an account in Thunderbird for your feeds.
1. In the Menu Bar, click File > New > Feed Account. The Feed Account Wizard window appears.
2. Type a name for your Feed account in the Account Name box, then click Next.
3. Click Finish. Your new account will now appear in Thunderbird’s folder pane.
I gave my account the name Blogs and News Feeds:
There is a main dashboard for Blogs and News Feeds in Thunderbird:
Click on Manage subscriptions to add an RSS feed. You will need the URL for the feed. The picture below shows the feed dialog in Thunderbird.
How to set up message filters
Select the Message Filters menu item from the Tools drop down menu:
This brings up a dialog for creating and managing message filters:
Click on the New button to create a new message filter. For example:
The Distraction Economy
Smartphones and the Internet have become more and more distracting and addictive over the last several years with no signs of the trend reversing. This translates into many hours of lost time per week, month, and year. Even using the federal minimum wage of $7.25 per hour, five or ten hours per week lost to cat videos on YouTube, software industry gossip on Hacker News, or a million other online distractions translates into $36 to $72 per week, which is a lot for someone earning the minimum wage. Of course most readers of this article probably should value their time at $15 to $100 per hour.
A dollar estimate does not capture the lost real world social, personal, and professional opportunities. Outrage inducing videos and articles are often addictive but they are certainly not pleasant entertainment either.
Many of these distracting web sites, apps, and services seek to persuade us to buy products we don’t need, vote for public policies that don’t benefit us, and have other hidden costs that are difficult to measure — unlike lost time.
Many of these distracting web sites, apps, and services are also tightly integrated with a growing system of mass surveillance which, thanks to new technologies, is unprecedented in human history even in extreme dictatorships like Nazi Germany or Stalin’s Soviet Union. Extremely high bandwidth wireless networks, inexpensive high resolution video cameras, remarkable advances in video compression, huge disk drives, and ultra-fast computers have enabled levels of monitoring far beyond the dystopian future in George Orwell’s 1984.
Fears of terrorism and an implied Mad Max scenario of global economic collapse due to peak oil have contributed to a public acceptance of these highly questionable developments, along with shrewd marketing of social media and smartphones.
Waiting for companies obsessed with quarterly earnings and politicians beholden to wealthy campaign contributors to roll back or reform these developments is unlikely to work. People can take effective action — both individually and acting together — to reduce the level of distraction in their lives, regain valuable free time, and think more clearly, such as switching to RSS feeds and away from distracting web sites.
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).
I recently switched from an aging Apple Macbook Air to a shiny new, substantially lighterLG gram running Windows 10. This involved switching from Apple Mail to the free open-source Mozilla Thunderbird. In principle, Apple Mail can export my address book (contacts) in the human-readable vcard or VCF, Virtual Contact File, format and Thunderbird can import an address book in vcard format. BUT, as it happens, Thunderbird failed to import the telephone numbers for my contacts, a long standing problem with Thunderbird.
To export the Apple Mail contacts, bring up the Apple Contacts application, select All Contacts, and select File | Export | Export vCard…
Then save the vCard file:
To import the Apple Mail contacts with the phone numbers successfully, I wrote a Python 3 script to convert the VCF file to a comma separated values (CSV) file that Thunderbird could import with the phone numbers. I used Python 3.6.4 installed as part of the Anaconda Python distribution. Python and Anaconda are both available for Windows, Mac OS X, and most major flavors of the free open-source GNU/Linux operating system. In principle, the Python script should run correctly on any of these platforms.
By default the script (below) assumes the vcard file is named jfm_contacts.vcf and writes the Thunderbird compliant CSV to tbird_imports.vcf
To run the script using ipython (installed by Anaconda) and override these defaults:
C:\Users\John McGowan\Code>ipython
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: run convert_vcf_to_csv.py mytest.vcf -o tbird_mytest.csv
Reading vcard (vcf) file: mytest.vcf
WARNING: VCARD 19 ( Apple Inc. ) 1-800-MY-APPLE MAY NOT BE A VALID PHONE NUMBER
WARNING: VCARD 301 ( Name ) Mobile MAY NOT BE A VALID PHONE NUMBER
WARNING: VCARD 301 ( Name ) Home MAY NOT BE A VALID PHONE NUMBER
WARNING: VCARD 301 ( Name ) Work MAY NOT BE A VALID PHONE NUMBER
WARNING: VCARD 301 ( Name ) Fax MAY NOT BE A VALID PHONE NUMBER
Processed 503 vcards
Wrote Thunderbird Compliant CSV file with phone numbers to: tbird_mytest.csv
ALL DONE
DISCLAIMER: Note that this script is provided “AS IS” (see license terms for more details). Giant corporations like Apple work long and hard to lock users into their “ecosystems” by, for example, using obfuscated non-standard “standard” formats for key contacts and other critical information stored in their products. Make sure to keep backups of your address books and contacts before using this script or similar software.
convert_vcf_to_csv.py
"""
convert Apple Mail VCF archive to CSV file for Mozilla Thunderbird (tbird)
tbird cannot read phone numbers from Apple Mail VCF file
"""
import sys
import os.path # os.path.isfile(fname)
import re # regular expressions
import phone # my phone number validation module
VERBOSE_FLAG = False # debug trace flag
# CSV file header generated by exporting contacts from Mozilla Thunderbird 52.6.0
TBIRD_ADR_BOOK_HEADER = 'First Name,Last Name,Display Name,Nickname,Primary Email,Secondary Email,Screen Name,Work Phone,Home Phone,Fax Number,Pager Number,Mobile Number,Home Address,Home Address 2,Home City,Home State,Home ZipCode,Home Country,Work Address,Work Address 2,Work City,Work State,Work ZipCode,Work Country,Job Title,Department,Organization,Web Page 1,Web Page 2,Birth Year,Birth Month,Birth Day,Custom 1,Custom 2,Custom 3,Custom 4,Notes' # was carriage return here
# John Nada from John Carpenter's THEY LIVE
DUMMY_CONTACT = 'John,Nada,John Nada,Nada,nada@nowhere.com,nada@cable54.com,NADA,999-555-1212,888-555-1234,777-555-6655,111-555-1234,111-555-9876,123 Main Street, Apt 13, Los Angeles, CA, 91210,USA,Work Address,Work Address 2,Work City,Work State,Work ZipCode,Work Country,Job Title,Department,Organization,Web Page 1,Web Page 2,Birth Year,Birth Month,Birth Day,Custom 1,Custom 2,Custom 3,Custom 4,Notes'
# break into values
FIELD_NAMES = TBIRD_ADR_BOOK_HEADER.split(',')
FIELD_VALUES_START = DUMMY_CONTACT.split(',')
for index, value in enumerate(FIELD_VALUES_START):
FIELD_VALUES_START[index] = '' # try single space
# build dictionary to map from field name to index
FIELD_INDEX = {}
for index, field_name in enumerate(FIELD_NAMES):
FIELD_INDEX[field_name] = index
if VERBOSE_FLAG:
print(FIELD_INDEX)
print(TBIRD_ADR_BOOK_HEADER)
print(DUMMY_CONTACT)
if len(sys.argv) < 2:
VCARD_FILE = 'jfm_contacts.vcf'
else:
VCARD_FILE = sys.argv[1] # 0 is script name
def usage(cmd):
""" usage message """
print("Usage: ", cmd, " [-license] [-o output_file.csv] ")
print(" -- generate Thunderbird Compliant CSV file with importable telephone numbers ")
print(" -- from Apple Mail generated .vcf (vcard) file")
print(" -- (C) 2018 by John F. McGowan, Ph.D.")
print(" ")
print(" -license -- print license terms")
print(" ")
print("In Mozilla Thunderbird 52.6.0, Tools | Import | Address Books | Text file(LDIF,csv,tab,txt) | choose output file from this program.")
print(" ")
print("Tested with Python 3.6.4 installed by/with Anaconda")
def license_terms():
""" license terms """
license_msg = """Copyright 2018 John F. McGowan, Ph.D.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""
print(license_msg)
if VCARD_FILE == "--help" or VCARD_FILE == "-h"\
or VCARD_FILE == "-help" or VCARD_FILE == "-?":
usage(sys.argv[0])
sys.exit(0)
if VCARD_FILE == "--license" or VCARD_FILE == "-license":
license_terms()
sys.exit(0)
OUTPUT_FILENAME = 'tbird_imports.csv'
OUTPUT_FLAG = False
for arg_index, argval in enumerate(sys.argv):
if OUTPUT_FLAG:
OUTPUT_FILENAME = argval
OUTPUT_FLAG = False
if argval == "-o":
OUTPUT_FLAG = True
# write import file with one dummy contact (John Nada from THEY LIVE)
OUTPUT_FILE = open(OUTPUT_FILENAME, 'w')
OUTPUT_FILE.write(TBIRD_ADR_BOOK_HEADER)
OUTPUT_FILE.write('\n')
OUTPUT_FILE.write(DUMMY_CONTACT)
OUTPUT_FILE.write('\n')
COMMA_DELIM = r"\,"
VCARD_COUNT = 0
b_processing = False # processing a vcard
# check if input file exists
if os.path.isfile(VCARD_FILE):
print("Reading vcard (vcf) file: ", VCARD_FILE)
else:
print("Input vcard (vcf) file ", VCARD_FILE, " does not exist (missing)!")
sys.exit(0)
input_file = open(VCARD_FILE)
for line in input_file:
values = line.split(':')
if values[0] == 'BEGIN':
if len(values) > 1:
if values[1] == 'VCARD\n':
VCARD_COUNT = VCARD_COUNT + 1
b_processing = True
FIELD_VALUES = FIELD_VALUES_START.copy()
if b_processing:
tag = values[0]
if tag == 'END': # reached end of vcard
if VERBOSE_FLAG:
print("END OF VCARD ", VCARD_COUNT)
b_processing = False
# process non-dummy contact
if FIELD_VALUES[FIELD_INDEX['Display Name']] != FIELD_VALUES_START[FIELD_INDEX['Display Name']]:
contact_record = ','.join(FIELD_VALUES)
if VERBOSE_FLAG:
print(contact_record)
OUTPUT_FILE.write(contact_record)
OUTPUT_FILE.write('\n')
# parse info for the contact
if tag == 'N':
contact_name = values[1].strip().replace(';', ' ')
if COMMA_DELIM in contact_name:
contact_name = contact_name.split(COMMA_DELIM)[0]
if isinstance(contact_name, str):
name_parts = contact_name.split()
if len(name_parts) > 1:
contact_first_name = name_parts[1]
contact_last_name = name_parts[0]
else:
contact_first_name = ''
contact_last_name = ''
FIELD_VALUES[FIELD_INDEX['First Name']] = contact_first_name
FIELD_VALUES[FIELD_INDEX['Last Name']] = contact_last_name
if tag == 'FN': # FN (full name) is usually first_name last_name
contact_fullname = values[1].strip().replace(';', ' ')
if COMMA_DELIM in contact_fullname:
contact_fullname = contact_fullname.split(COMMA_DELIM)[0]
if isinstance(contact_fullname, str):
FIELD_VALUES[FIELD_INDEX['Display Name']] = contact_fullname
name_parts = contact_fullname.split()
contact_first_name = name_parts[0]
if len(name_parts) > 1:
contact_last_name = name_parts[1]
else:
contact_last_name = ''
else:
contact_first_name = ''
contact_last_name = ''
FIELD_VALUES[FIELD_INDEX['First Name']] = contact_first_name
FIELD_VALUES[FIELD_INDEX['Last Name']] = contact_last_name
#print(contact_fullname)
if tag == 'ORG': # ORG (organization)
contact_org = values[1].strip().replace(';', ' ')
# Apple vcard uses semicolon as embedded delimiter
FIELD_VALUES[FIELD_INDEX['Organization']] = contact_org
if tag == 'NOTE': # NOTE (notes) in VCF
contact_notes = values[1].strip().replace(r'\n', ' ')
FIELD_VALUES[FIELD_INDEX['Notes']] = 'NOTE: ' + contact_notes
if tag == 'TITLE': # TITLE
contact_title = values[1].strip()
FIELD_VALUES[FIELD_INDEX['Job Title']] = 'TITLE: ' + contact_title
if tag.startswith('EMAIL'): #process emails
contact_email = values[1].strip()
FIELD_VALUES[FIELD_INDEX['Primary Email']] = contact_email
if tag.startswith('TEL'): # process phone numbers
contact_phone = values[1].strip()
# remove special characters and other noise
contact_phone = re.sub('[^A-Za-z0-9() -]+', ' ', contact_phone)
contact_phone = contact_phone.strip() # remove leading/trailing whitespace
if not phone.is_valid_phone(contact_phone):
print("WARNING: VCARD ", VCARD_COUNT, " (", contact_fullname, ") ", \
contact_phone, " MAY NOT BE A VALID PHONE NUMBER")
if "HOME" in tag:
FIELD_VALUES[FIELD_INDEX['Home Phone']] = contact_phone
elif "WORK" in tag:
FIELD_VALUES[FIELD_INDEX['Work Phone']] = contact_phone
elif "MAIN" in tag:
FIELD_VALUES[FIELD_INDEX['Work Phone']] = contact_phone
elif "CELL" in tag:
FIELD_VALUES[FIELD_INDEX['Mobile Number']] = contact_phone
elif "OTHER" in tag:
FIELD_VALUES[FIELD_INDEX['Custom 1']] = 'OTHER PHONE: ' + contact_phone
else:
FIELD_VALUES[FIELD_INDEX['Work Phone']] = contact_phone
if tag.startswith('ADR'): # physical addresses
contact_address = values[1].strip().strip(';')
contact_address = contact_address.replace(r'\n', ';')
if "HOME" in tag:
FIELD_VALUES[FIELD_INDEX['Home Address']] = contact_address
elif "WORK" in tag:
FIELD_VALUES[FIELD_INDEX['Work Address']] = contact_address
elif "OTHER" in tag:
FIELD_VALUES[FIELD_INDEX['Custom 2']] = 'OTHER ADDRESS: ' + contact_address
else:
FIELD_VALUES[FIELD_INDEX['Home Address']] = contact_address
# just ^URL;....:url
if tag.startswith('URL'): # url
contact_url = values[1].strip()
index = FIELD_INDEX['Web Page 1']
if not FIELD_VALUES[index]:
FIELD_VALUES[index] = contact_url
else:
FIELD_VALUES[FIELD_INDEX['Web Page 2']] = contact_url
if tag.startswith('item1.URL'): # item1.URL...:https:remaining_url
contact_url = ':'.join(values[1:])
contact_url = contact_url.strip()
if contact_url[:4] != 'http':
contact_url = 'http://' + contact_url
index = FIELD_INDEX['Web Page 1']
if not FIELD_VALUES[index]:
FIELD_VALUES[index] = contact_url
else:
FIELD_VALUES[FIELD_INDEX['Web Page 2']] = contact_url
if tag.startswith('item2.URL'): # item2.URL...:https:remaining_url
contact_url = ':'.join(values[1:])
contact_url = contact_url.strip()
if contact_url[:4] != 'http':
contact_url = 'http://' + contact_url
index = FIELD_INDEX['Web Page 1']
if not FIELD_VALUES[index]:
FIELD_VALUES[index] = contact_url
else:
FIELD_VALUES[FIELD_INDEX['Web Page 2']] = contact_url
print("Processed ", VCARD_COUNT, " vcards")
OUTPUT_FILE.close()
print("Wrote Thunderbird Compliant CSV file with phone numbers to: ", OUTPUT_FILENAME)
print('ALL DONE')
convert_vcf_to_csv.py expects a module phone.py which contains code to check if a phone number is valid. The convert_vcf_to_csv.py script will print warning messages if it encounters a phone number that may be invalid although it still inserts the suspect phone number in the CSV file.
phone.py
'''
validate phone number module
(C) 2018 by John F. McGowan, Ph.D.
'''
import re
def is_valid_phone(phone_number):
''' determine if argument is a valid phone number '''
result = re.match(r'\d?[ -]*(\d{3}|\(\d{3}\))?[ -]*\d{3}[- ]*\d{4}', phone_number)
return bool(result != None)
Usage message
convert_vcf_to_csv.py --help
Usage: convert_vcf_to_csv.py [-license] [-o output_file.csv]
-- generate Thunderbird Compliant CSV file with importable telephone numbers
-- from Apple Mail generated .vcf (vcard) file
-- (C) 2018 by John F. McGowan, Ph.D.
-license -- print license terms
In Mozilla Thunderbird 52.6.0, Tools | Import | Address Books | Text file(LDIF,csv,tab,txt) | choose output file from this program.
Tested with Python 3.6.4 installed by/with Anaconda
By default, convert_vcf_to_csv.py writes an output file tbird_imports.csv which can be imported into the Thunderbird Address Book as follows:
(1) Bring up the Mozilla Thunderbird Address Book by clicking on the Address Bookbutton in Thunderbird:
(2) Select Tools | Import
(3) This brings up an Importdialog. Select the Address Books option in the Importdialog.
(4) Select Nextbutton. This brings up a File Type Selection Dialog. Select Text File (LDIF, .tab, .csv, .txt)
(5) Select Nextbutton. This brings up the Select address book filedialog. By default this displays and imports LDIF format address book. Select comma separated values (CSV) instead:
(6) Now open the Thunderbird compliant CSV file, default name tbird_imports.csv:
(7) The new address book will now be imported into Mozilla Thunderbird complete with phone numbers. The new address book will appear in the list of address books displayed but the individual contacts may not be displayed immediately. Switch to another address book and back to see the new contacts or try searching for a new contact.
NOTE: Tested with Python 3.6.4 installed by Anaconda, Mozilla Thunderbird 52.6.0 on LG gram with Windows 10, and VCF contacts file exported from Apple Contacts Version 10.0 (1756.20) on a 13 inch Macbook Air (about 2014 vintage) running Mac OS X version 10.12.6 (macOS Sierra).
(C) 2018 by John F. McGowan, Ph.D.
About Me
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).
I find Facebook useful for keeping in touch with friends and family that I can’t see in person regularly. I live in California and many of my relatives live on the East Coast of the United States. Similarly my busy life makes keeping in touch with some friends and acquaintances even in California in person difficult. However, Facebook became very distracting for me the last few years, primarily due to political posts during the 2016 Presidential election and even worse after Donald Trump won. I found Facebook was contributing heavily to distractions and wasted time.
Here are the steps that I have taken to largely eliminate the Facebook distractions in my life:
Remove the Facebook and Facebook Messenger apps from my smartphone entirely; only check Facebook on my laptop and desktop computers.
Configure my Facebook account to only send absolutely essential email and other notifications. No marketing or promotional notifications, no “someone liked this post” notifications, etc.
Install Matt Kruse’s Social Fixer add-on for Facebook and enable its’ built-in politics filter as well as add some custom filters for “Trump,” etc. I’ll say more about Social Fixer below.
Use SelfControl on the Mac and ColdTurkey on Windows to block Facebook entirely during my work day as well as sometimes at home.
Social Fixer
I have been using Social Fixer for about three months with a dramatic reduction in mostly political distracting posts. Social Fixer is a Javascript add on for Facebook available for both the Safari web browser on the Mac and the Firefox Web browser on a number of platforms. It comes with a built-in politics filter as well as user customizable filters and many other features to enable fine control over what Facebook shows you.
The politics filter proved quite good although occasionally something will slip through. This enables me to keep in touch with friends who are freaking out over Trump (for example) or other hot button topics without being inundated with a continuous stream of distracting political posts.
At least so far, I have found Social Fixer is a better option than unfollowing a friend on Facebook, where you lose all of their posts whether distracting (e.g. politics) or not.
Don’t Get Your News from Facebook
Facebook, YouTube and many other social media services appear to be using recently developed — we might say unproven, mostly untried — methods such as Deep Learning and Machine Learning to recommend, prioritize, and otherwise manage a wide range of posts, notably posts with political content. As I discussed in my previous post on reducing YouTube distractions, what these methods appear to do frequently is promote posts that generate strong often irrational instinctive reactions such as our “fight or flight” response. This often overrides our higher cognitive function which we need to use for most (not all) political issues. If you really care about politics or humanity, as I do, you want to avoid this sort of content so that you can think calmly and rationally about important issues.
Set aside some time each day or week depending on your schedule when you are calm and collected to study current events and the issues dispassionately.
Avoid your “Ideological Echo Chamber.” Identify a range of web sites or other sources that discuss the issues deeply and carefully from many points of view, not just your own. If you are a conservative, you should be following at least a few liberal and left-wing sources. If you are a liberal, you should be following at least a few conservative and right-wing sources. You should also be following some “fringe” sources that don’t fit neatly into the traditional right-left paradigm.
Fact-check and check the context of quotes and “facts” on all sides. A genuine fact can be highly misleading if other facts are omitted. Search engines such as Google and other Internet services make this much easier than years ago, when access to a top-notch library was generally needed.
Remember that Wikipedia is not reliable on “controversial” subjects. There are many examples of interest groups and activists capturing Wikipedia pages or bogging them down in flame wars.
Wherever possible use primary sources: read the actual memo, watch the unedited long form video, etc. Wikipedia is not a primary source.
Consider finding or organizing a dedicated forum — online or real-world — to share your concerns with friends, neighbors, colleagues and others rather than broadcasting your concerns with posts on Facebook or other general purpose social media platforms.
Conclusion
In my experience, it is possible to largely eliminate the distractions from Facebook using these methods:
Remove the Facebook and Facebook Messenger apps from my smartphone entirely; only check Facebook on my laptop and desktop computers.
Configure my Facebook account to only send absolutely essential email and other notifications. No marketing or promotional notifications, no “someone liked this post” notifications, etc.
Install Matt Kruse’s Social Fixer add-on for Facebook and enable its’ built-in politics filter as well as add some custom filters for “Trump,” etc.
Use SelfControl on the Mac and ColdTurkey on Windows to block Facebook entirely during my work day as well as sometimes at home.
(C) 2018 by John F. McGowan, Ph.D.
About the author
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).