No, I am not looking for a job at Google!

 

I have been contacted a number of times in the last few months by recruiters or what have turned out to be recruiters from Google.  For the record, I am not currently looking for a job and I am specifically not looking for a job at Google.  🙁

I am developing tools and algorithms for automating complex data analysis, reducing costs and increasing results.  I am interested in conversations with potential customers and interested parties.  You should have a sincere, genuine interest in my work if you contact me.

(C) 2017 by John F. McGowan, Ph.D.

John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).

STEM Employment Related Articles

Inside the Growing Guest Worker Program Trapping Indian Students in Virtual Servitude

An article in the left-wing Mother Jones magazine on Indian students and the OPT program, using students at the University of Central Missouri as examples.

STEM Worker High Turnover Rates

http://www.businessinsider.com/employee-retention-rate-top-tech-companies-2017-8

An article in Business Insider on the possible high turnover rate of many tech companies.  It does not clearly separate the turnover rate and average duration of employment at a company.  A company that is growing rapidly can have a low turnover rate and a low average duration of employment simply because so many employees are new.  If a company doubles in size in two years, half its’ employees will have no more than two years of employment at the company.

Apple, for example, has been growing and hiring rapidly the last several years.  Many employees are new which will pull down the average employment time.   Having worked at Apple from 2014-2016, I suspect it does have a high turnover rate but it is hard to prove due to the apparent rapid growth of the company.

Alleged Age Discrimination in STEM

http://www.bbc.com/future/story/20170828-the-amazing-fertility-of-the-older-mind

An article from the BBC on the considerable ability of older people to learn new things contrary to a common stereotype.

https://www.computerworld.com/article/3090087/it-careers/google-age-discrimination-lawsuit-may-become-monster.html

An article by Patrick Thibodeau at Computerworld on the Google age discrimination class action lawsuit.

Race and Sex Discrimination in STEM

https://www.theguardian.com/technology/2017/aug/07/silicon-valley-google-diversity-black-women-workers

An article in The Guardian questioning Google and other Silicon Valley employer explanations for the low numbers of some groups in their companies, pointing to the large number and percentage of African Americans employees in software engineering in the Washington DC area — generally at government agencies such as NASA and government contractors.

It should be noted that the DC metro area is about 25 percent African-American whereas California as a whole is about 6.5 percent African-American.  Of course, as the article points out, Google and many other tech companies recruit worldwide.

However, Hispanics with visible American Indian ancestry almost certainly make up over 30 percent of California and the San Francisco Bay Area’s population, a comparable or even larger fraction than African-Americans in the DC metro area.  The US Census claims that 38.9 percent of people in California in 2016 were Hispanic-Latino.  Probably 80 to 90 percent of these have visible American Indian ancestry.

The US Census relies on self-identification for race rather than visible appearance.  Hispanics self-identify as white, mixed race, “other race,” and sometimes American Indian/Native American.  My personal impression is that genuine discrimination tends to follow visible appearance and accent/spoken dialect of English.

Hispanic is not a racial category, including people who are entirely European and indeed Northern European in appearance.  At least in my personal experience, most — not all — Hispanics in leadership and engineering positions at high tech companies like Google are European in appearance.  On its diversity web site, Google claims that 4 percent of its workforce in 2017 are Hispanic.

UPDATE (added September 11, 2017)

“At Google, Employee-Led Effort Finds Men Are Paid More Than Women,” by Daisuke Wakabayashi, New York Times, September 8, 2017

The article discusses an internal Google spreadsheet set up by a now former Google employee with self-reported salary and bonus information from Google employees showing women paid less than men.  There is also discussion of the current Labor Department investigation into disparities in salaries between men and women at Google as well as activist investors pressuring Google to disclose information on the salaries of men and women at Google.

 

Articles Questioning STEM Shortage Claims

http://www.techrepublic.com/article/so-much-for-the-stem-shortage/

Tech industry’s persistent claim of worker shortage may be phony, by Michael Hiltzik, Los Angeles Times, August 1, 2015

An article noting the obvious inconsistency between the many layoff announcements in high tech and the claims of a shortage of STEM workers, often by the same employers.

The Open Office Nightmare

Apple staffers reportedly rebelling against open office plan at new $5 billion HQ

An article claiming discontent over the new open office plans at Apple’s new headquarters — the “Spaceship” — in Cupertino.

(C) 2017 John F. McGowan, Ph.D.

About the author

John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).

 

“Introduction to Automating Complex Data Analysis” Video Published

(C) 2017 by John F. McGowan, Ph.D.

About the author

John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).

 

Video of “Automating Complex Data Analysis” Presentation to the Bay Area SAS Users Group

 

This is an edited video of my presentation on “Automating Complex Data Analysis” to the Bay Area SAS Users Group (BASAS) on August 31, 2017 at Building 42, Genentech in South San Francisco, CA.

The demonstration of the Analyst in a Box prototype starts at 14:10 (14 minutes, 10 seconds). The demo is a video screen capture with high quality audio.

Unfortunately there was some background noise from a party in the adjacent room starting about 12:20 until 14:10 although my voice is understandable.

Updated slides for the presentation are available at: https://goo.gl/Gohw87

You can find out more about the Bay Area SAS Users Group at http://www.basas.com/

Abstract:

Complex data analysis attempts to solve problems with one or more inputs and one or more outputs related by complex mathematical rules, usually a sequence of two or more non-linear functions applied iteratively to the inputs and intermediate computed values. A prominent example is determining the causes and possible treatments for poorly understood diseases such as heart disease, cancer, and autism spectrum disorders where multiple genetic and environmental factors may contribute to the disease and the disease has multiple symptoms and metrics, e.g. blood pressure, heart rate, and heart rate variability.

Another example are macroeconomic models predicting employment levels, inflation, economic growth, foreign exchange rates and other key economic variables for investment decisions, both public and private, from inputs such as government spending, budget deficits, national debt, population growth, immigration, and many other factors.

A third example is speech recognition where a complex non-linear function somehow maps from a simple sequence of audio measurements — the microphone sound pressure levels — to a simple sequence of recognized words: “I’m sorry Dave. I can’t do that.”

State-of-the-art complex data analysis is labor intensive, time consuming, and error prone — requiring highly skilled analysts, often Ph.D.’s or other highly educated professionals, using tools with large libraries of built-in statistical and data analytical methods and tests: SAS, MATLAB, the R statistical programming language and similar tools. Results often take months or even years to produce, are often difficult to reproduce, difficult to present convincingly to non-specialists, difficult to audit for regulatory compliance and investor due diligence, and sometimes simply wrong, especially where the data involves human subjects or human society.

A widely cited report from the McKinsey management consulting firm suggests that the United States may face a shortage of 140,000 to 190,000 such human analysts by 2018: http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/big-data-the-next-frontier-for-innovation.

This talk discusses the current state-of-the-art in attempts to automate complex data analysis. It discusses widely used tools such as SAS and MATLAB and their current limitations. It discusses what the automation of complex data analysis may look like in the future, possible methods of automating complex data analysis, and problems and pitfalls of automating complex data analysis. The talk will include a demonstration of a prototype system for automating complex data analysis including automated generation of SAS analysis code.

(C) 2017 John F. McGowan, Ph.D.

About the author

John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).

 

Automating Complex Data Analysis Presentation to Bay Area SAS Users Group

I will be giving a presentation (about 30 minutes) to the Bay Area SAS User’s Group (BASAS) this Thursday, August 31, 2017 (12:30 PM – 4 PM) at Genentech in South San Francisco, CA: Automating Complex Data Analysis for Fun, Profit, and the Greater Good.
 
Speaker (John F. McGowan, Ph.D.)
Speaker (John F. McGowan, Ph.D.)
Abstract:

Complex data analysis attempts to solve problems with one or more inputs and one or more outputs related by complex mathematical rules, usually a sequence of two or more non-linear functions applied iteratively to the inputs and intermediate computed values. A prominent example is determining the causes and possible treatments for poorly understood diseases such as heart disease, cancer, and autism spectrum disorders where multiple genetic and environmental factors may contribute to the disease and the disease has multiple symptoms and metrics, e.g. blood pressure, heart rate, and heart rate variability.

Another example are macroeconomic models predicting employment levels, inflation, economic growth, foreign exchange rates and other key economic variables for investment decisions, both public and private, from inputs such as government spending, budget deficits, national debt, population growth, immigration, and many other factors.

A third example is speech recognition where a complex non-linear function somehow maps from a simple sequence of audio measurements — the microphone sound pressure levels — to a simple sequence of recognized words: “I’m sorry Dave. I can’t do that.”

State-of-the-art complex data analysis is labor intensive, time consuming, and error prone — requiring highly skilled analysts, often Ph.D.’s or other highly educated professionals, using tools with large libraries of built-in statistical and data analytical methods and tests: SAS, MATLAB, the R statistical programming language and similar tools. Results often take months or even years to produce, are often difficult to reproduce, difficult to present convincingly to non-specialists, difficult to audit for regulatory compliance and investor due diligence, and sometimes simply wrong, especially where the data involves human subjects or human society.

A widely cited report from the McKinsey management consulting firm suggests that the United States may face a shortage of 140,000 to 190,000 such human analysts by 2018: http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/big-data-the-next-frontier-for-innovation.

This talk discusses the current state-of-the-art in attempts to automate complex data analysis. It discusses widely used tools such as SAS and MATLAB and their current limitations. It discusses what the automation of complex data analysis may look like in the future, possible methods of automating complex data analysis, and problems and pitfalls of automating complex data analysis. The talk will include a demonstration of a prototype system for automating complex data analysis including automated generation of SAS analysis code.

(C) 2017 John F. McGowan, Ph.D.

About the author

John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).

Machine Learning at Google Event

I attended a “Machine Learning at Google” event at the Google Quad 3 building off Ellis in Mountain View last night (August 23, 2017).  This seemed to be mostly a recruiting event for some or all of Google’s high profile Machine Learning/Deep Learning groups, notably the team responsible for TensorFlow.

Token Good Looking Woman Opens Event
Woman Opens Event

I had no trouble finding the registration table when I arrived and getting my badge.  All the presentations seemed to run on time or nearly on time.  There was free food, a cute bag with Google gewgaws, and plenty of seating (about 280 seats with attendance about 240 I thought).

The event invitation that I received was rather vague and it did not become clear this was a recruiting event until well into the event.  It had the alluring title:

An Exclusive Invite | Machine Learning @ Google

Ooh, exclusive!  Aren’t I special!  Along with 240 other attendees as it turned out.  🙂

Andrew Zaldivar (see below) explicitly called it a recruiting event in the Q&A panel at the end.  It would have been good to know this as I am not looking for a job at Google. That does not mean the event wasn’t interesting to me for other reasons, but Google and other companies should be up front about this.

Although I think the speakers were on a low platform, they weren’t up high enough to see that well, even though I was in the front.  This was particularly true of Jasmine Hsu who was short.  I managed to get one picture of her not fully or mostly obscured by someone’s head.  Probably a higher platform for the presenters would have helped.

A good looking woman who seemed to be some sort of public relations or marketing person opened the event at 6:30 PM.  She went through all the usual event housekeeping and played a slick Madison Avenue style video on the coming wonders of machine learning.  Then she introduced the keynote speaker Ravi Kumar.

Ravi Kumar Keynote
Ravi Kumar Keynote

Ravi was followed by a series of “lightning talks” on machine learning and deep learning at Google by Sandeep Tata, Heng-Tze Cheng, Ian Goodfellow, James Kunz, Jasmine Hsu, and Andrew Zaldivar.

The presentations tended to blur together.  The typical machine learning/deep learning presentation is an extremely complex model that has been fitted to a very large data set.  Giant companies like Google and Facebook have huge proprietary data sets that few others can match.  The presenters tend to be very confident and assert major advances over past methods and often to match or exceed human performance.  It is often impossible to evaluate these claims without access to both the huge data sets and vast computing power.  People who try to duplicate the reported dramatic results  with more modest resources often report failure.

The presentations often avoid the goodness-of-fit statistics, robustness, and overfitting issues that experts in mathematical modeling worry about with such complex models.  A very complex model such as a polynomial with thousands of terms can always fit a data set but it will usually fail to extrapolate outside the data set correctly.  Polynomials, for example, always blow up to plus or minus infinity as the largest power term dominates.

In fact one Google presenter mentioned a “training-server skew” problem where the field data would frequently fail to match the training data  used for the model.  If I understood his comments, this seemed to occur almost every time supposedly for different reasons for each model.  This sounded a lot like the frequent failure of complex models to extrapolate to new data correctly.

Ravi Kumar’s keynote presentation appeared to be a maximum likelihood estimation (MLE) of a complex model of repeat consumption by users: how often, for example, a user will replay the same song or YouTube video.  MLE is not a robust estimation method and it is vulnerable to outliers in the data, almost a given in real data, yet there seemed to be no discussion of this issue in the presentation.

Often when researchers and practitioners from other fields that make heavy use of mathematical modeling such as statistics or physics bring up these issues, the machine learning/deep learning folks either circle the wagons and deny the issues or assert dismissively that they have the issues under control.  Move on, nothing to see here.

Sandeep Tata
Sandeep Tata
Hang Tze
Hang Tze
Ian Goodfellow on Deep Learning Research at Google
Ian Goodfellow on Deep Learning Research at Google
Jasmine Hsu on Robotics and Computer Vision
Jasmine Hsu on Robotics and Computer Vision
James Kunz
James Kunz
Andrew Zaldivar on SPAM Fighting with Machine Learning
Andrew Zaldivar on SPAM Fighting with Machine Learning

Andrew Zaldivar introduced the Q&A panel for which he acted as moderator.  Instead of having audience members take the microphone and ask their questions uncensored as many events do, he read out questions supposedly submitted by e-mail or social media.

Andrew Zaldivar Introduces the Panel
Andrew Zaldivar Introduces the Panel
Q and A Panel
Q and A Panel

The Q&A panel was followed by a reception from 8-9 PM to “meet the speakers.”  It was difficult to see how this would work with about thirty (30) audience members for each presenter.  I did not stay for the reception.

Conclusion

I found the presentations interesting but they did not go into most of the deeper technical questions such as goodness-of-fit, robustness, and overfitting that I would have liked to hear.  I feel Google should have been clearer about the purpose of the event up front.

(C) 2017 John F. McGowan, Ph.D.

About the author

John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).

A Personal Note: Hacking Traffic Lights in Sunnyvale

This is another brief followup to my earlier post “A Personal Note: Mysterious Accident” about my odd accident in Sunnyvale, CA about a month ago.

Remarkably, in June 2005, SFGate published an article “SUNNYVALE / Trickster is trifling with traffic / Police on lookout for culprit skilled in resetting signals” by Chuck Squatriglia, Chronicle Staff Writer (Published 4:00 am, Wednesday, June 22, 2005) reporting:

Police in Sunnyvale are keeping an eye out for a highly skilled and frustratingly elusive prankster who has been tampering with city traffic lights for more than three months, authorities said Tuesday.

The article gives more details on a series of traffic light tampering incidents in Sunnyvale in the spring of 2005.

Traffic lights went out on Mathilda in Sunnyvale today.  See this tweet:

 


 

Note to “tricksters.” It is quite easy to kill someone by tampering with traffic lights!

(C) 2017 John F. McGowan, Ph.D.

About the author

John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).

A Personal Note: Hacking Traffic Lights

This is a brief followup to my previous post “A Personal Note: Mysterious Accident”

UPDATE: August 24, 2017

There are some known cases of hacking/tampering with traffic lights.

According to the Los Angeles Times, in 2009, two Los Angeles traffic engineers pleaded guilty to hacking into the city’s signal system and slowing traffic at key intersections as part of a labor protest in 2006.

According to the San Francisco Chronicle, in June 2005, police in Sunnyvale, California (where my accident occurred) requested assistance from the public to find a suspected sophisticated “trickster” who had been tampering with traffic lights for several months.

END UPDATE: August 24, 2017

It is possible to hack traffic lights and there has been some published research into how to do it for some traffic light systems.  Here are some links to articles and videos on the subject, mostly from 2014:

http://thehackernews.com/2014/08/hacking-traffic-lights-is-amazingly_20.html

https://www.wired.com/2014/04/traffic-lights-hacking/

https://www.schneier.com/blog/archives/2014/08/hacking_traffic.html

https://www.technologyreview.com/s/530216/researchers-hack-into-michigans-traffic-lights/

Talk by Cesar Cerrudo at DEFCON 22 on Hacking Traffic Lights

Someone with sufficient physical access to the traffic lights could always modify the hardware even if a computer-style “hack” was impossible.

I was driving a 1995 Nissan 200SX SE-R with minimal built-in electronics by modern car standards.  It would be difficult to hack my car without physical access and it was either with me, in a brightly lit parking lot at my office, or in a secured parking garage at my apartment building.

Just to be clear I am not saying my accident was caused by hacking of the traffic lights, only that it is possible.  As noted in my previous post, there are other possible explanations: an accidental failure of the traffic lights or a remarkable mental lapse on my part.  None of the three explanations seems likely to me.

(C) 2017 John F. McGowan, Ph.D.

About the author

John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech).