I would say there are a number if inaccuracies in your article that are worth addressing:

1. R was literally founded by, created by, and is maintained by the academic statistics community. Always has been from day one. Leaving R out of the “Statistics” category is incorrect.

2. Stating SAS is the primary tool of statisticians is incorrect. SAS is still used in certain sectors of the healthcare, insurance, and finance industries, and is used purely for historical reasons. Unlike R or Python, SAS is really the only scientific computing software whose default operating setup on a local computer is not in-memory. When running an analysis on a large dataset locally on a single machine, with R or Python you are limited by the size of your RAM (~16GB). With SAS, you are instead effectively limited by the size of your computer’s hard drive which is orders of magnitude larger (~1TB). For this reason, SAS had a huge advantage in the space through the entire 80s and 90s, and weaseled their way into government agencies (FDA, Fed, etc). In the era of cloud computing and datasets too large to even fit on a single computer hard drive let alone RAM, SAS has completely lost their competitive edge. Unfortunately for them, they also haven’t innovated or improved their product in decades. Their business model is obscene, and they got away with murder for a long time (~$15,000 USD for an annual license). The only thing still keeping them alive are government agencies, given the enormous tech debt of SAS integrated into their processes since the 80s. I saw this personally when I was a research fellow at a large federal agency while in grad school. Unless you plan on working for the government, any statistician coming up through the ranks shouldn’t waste their time learning SAS.

3. Your description of the responsibilities of a Statistician are very “analyst” based and represent only a small slice of the field. For example, there’s the entire subfield of Statistical Learning. It’s worth keeping in mind that 90% of the methodologies and theory in Machine Learning were developed by Statisticians, not Computer Scientists. Although the CS community are wonderful ambassadors of the ML field and have been highly successful from a branding perspective of applying ML in great use-cases, the overwhelming majority of ML methodology and theory were developed by Statisticians. Your comment about statistician roles not requiring ML knowledge is completely incorrect.

4. Your comment about “forming a question, forming a problem statement, creating a process for answering that problem, presenting findings…” I found extremely concerning. Not because these are not important skills, but rather because these are the single most important set of technical skills of ANY scientist in ANY field. Your statement above is an almost word-for-word colloquial rephrasing of the scientific method itself. These are not “non-technical” skills that should be acquired from practice. Those are skills that should be deeply deeply ingrained in your technical training as a scientist regardless of the domain. And obviously Statisticians are included in this. If you don’t have this skill set deeply ingrained in you, you’re not a “scientist” anything.


Principal Data/ML Scientist @ The Cambridge Group | Harvard trained Statistician and Machine Learning Scientist | Expert in Statistical ML & Causal Inference

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Andrew Rothman

Andrew Rothman

Principal Data/ML Scientist @ The Cambridge Group | Harvard trained Statistician and Machine Learning Scientist | Expert in Statistical ML & Causal Inference