A McKinsey interview (sorry, it's behind a registration wall, but the registration is worth it if you're interested in business topics or "futurism") with Google's chief economist Hal Varian has an interesting quote:
I keep saying the sexy job in the next ten years will be statisticians. People think I'm joking, but who would've guessed that computer engineers would've been the sexy job of the 1990s? The ability to take data--to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it--that's going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids. Because now we really do have essentially free and ubiquitous data. So the complimentary scarce factor is the ability to understand that data and extract value from it.I think statisticians are part of it, but it's just a part. You also want to be able to visualize the data, communicate the data, and utilize it effectively. But I do think those skills--of being able to access, understand, and communicate the insights you get from data analysis--are going to be extremely important. Managers need to be able to access and understand the data themselves.
I'm sure everyone reading this blog will feel warmer and fuzzier
now. :) But the teaching of introductory statistics really has to
convey how to:
- capture data relevant to the problems
- visualize, communicate and effectively utilize the data
- access, understand, and communicate the insights of data analysis
It would definitely turn fewer students off than the usual package full of integrals, density functions, t-tests and p-values.





> It would definitely turn fewer students off
I would notbe so sure - I was at first suprised at students reactions to such conceptual material
but they now how to play the game by being able to solve the puzzles (integrals, p-values,etc)
they get really really worried about how they will be tested on the conceptual material
especially when the realize its much easier for instructors to write and mark tests involving just puzzle sovling
Not insurmountable, be perhaps worth some thought
Anonymous
I agree with you that capturing relevant data and learning how to communicate that data is important. But as a stated class goal, I any introduction to statistics class should focus on statistical applications in the real world. Real world examples prepare students for career roles, better than a theory class would.
Hell Yes!! The only part of the quote I would quibble with is "I think statisticians are part of it, but it's just a part. You also want to be able to visualize the data, communicate the data, and utilize it effectively." I think that visualizing and communicating data is part of our job.
I think statistical ability has been consistently sexy from an employer's perspective for ages. I don't think it's new that the bottle neck is "the ability to understand that data and extract value from it." If anything, I think the "ubiquity of data" (here meaning the efficiency of its collection and access over geography) may reduce the need for statistical literacy at the grass roots level further. The smarter the people at the top of McDonalds, WalMart, and the army can be about statistical analysis, for example, the less literate the grassroots population can afford to be about quantitatively analyzing and forecasting inventory needs.
Being a statistician may have its fad moment, like computer programming did, but I think demand for statistical ability as a tangible asset (rather than as a signal of work discipline or general intelligence) will remain concentrated at the top of the ability pyramid.
I think that the time of playing with integrals, density functions and demonstrations is over. I haven't taught that in many years and regret that I myself spent so much time on them in my youth.
What matters nowadays is to be able to: a) use with full knowledge of statistical implications; b) interpret results and confront them with the underlying theory.
[My English may sound weird - it is my third language]
Varian is an economist and, as such, given the way they analyze and present their data in ugly tables, he might think that statisticians and other quantitative types do the same. Thus, from his view, data visualization and communication should done by other, non-data statisticians. Is interesting how can economists conceive data real data analysis without visualization...
I'm curious, what are your thoughts on teaching statistics to non -statitics, -epidemiology, -psychometrics, or -econometrics graduate students? I ask because as a PhD student in a public health program (non-epi track) I've taken stats/quantitative methods courses taught in polisci, public policy, and psychometrics departments. The focus of these courses ranged from almost no emphasis on the mathematics (and strong emphasis on the applied) to almost full emphasis on the mathematics, and I can't say that one focus was better or more useful than the other.
'I think that the time of playing with integrals, density functions and demonstrations is over. I haven't taught that in many years and regret that I myself spent so much time on them in my youth.'
I absolutely agree. I think the 'technical' aspects of a stats class (ie aside from conceptual intuitive stuff) should focus on coding and implementation. Let's start teaching with pseudo-code, rather than equations. I also think a fair amount of time should be spent on data manipulation,clean-up, and documentation (metadata).
I totally agree that statistics should be considered from the applied perspective, focus on programming and implementation rather than on maths. A great many people need this and conditioning their learning on math just distract them from the main point of answering really important questions in their respective fields (possibly except for economists who put the math as pre-requisite for everything). However, if one is particularly serious about stats one should deal with some math at a certain point, right?
Dan G: I wrote an article on the topic. (The article was pretty much put together from blog entries.)
I'd say that the usefulness of mathematics is limited for applied researchers. I don't know anyone under the age of 45 who bases his work on theoretical demostrations; they take formulas in hand only when they have to write the first paragraph of their article. It seems to me that being able to appropriately use a certain technique does not require to know the math behind it.
As regards teaching, I am always hoping to be asked to show, for example, why a test statistics is built that way or how one derives the Student t density from Normal and Chi-square. I doubt that that day will ever come. There is already so much to convey to the students.
On the other hand, I don't have the least idea of how to clean a data set...
One problem that a lot of employers are running into right now (I know we are!) is that students know too much about the "standard" material ("integrals, density functions, t-tests and p-values"), but not enough about when the simplified stuff they learned in school won't cut it. They also tend to be absolutely clueless when asked to figure out what they would need to know to figure out a difficult problem - it's easy to receive a problem statement that's carefully set up with all the data you need to calculate a certain quantity, but it's a different skill altogether to be presented with a business need and actually figure out what data you'll need to crunch and what quantities you'll need to calculate to solve the problem.
Visualization is key here, and it's something that a lot of recent grads are terrible at. Further, they usually assume that everything is normally distributed, and that a linear model will always be the right tool for the job, which was true for maybe 95% of the problems they worked in school. But in many fields, that type of assumption is a one way ticket to failure, to the point that oftentimes people coming from physics or pure math (or even CS!) perform far better than recent statistics grads, because they approach things with fewer preconceptions about what techniques actually work, and tend to assume that the relations they are trying to ferret out are a bit less simple than statistics majors.
Disclaimer: I work in sports betting, which to be fair requires a different kind of modeling than most statistical fields, but I imagine things are just as bad with recent recruits in finance. Also, my complaints primarily apply to undergrad students; people with graduate statistics degrees tend to do much better because they've had more time to solve problems that aren't already set up for easy success.
I'm not sure I'm qualified to comment on this, but since that has never stopped me before, here goes... :)
I have a background in computer science (with a minor in mathematics) and I never managed to understand stats while studying.
We had two mandatory introductory classes in this field: one in probability theory and one in statistics. I know they are really not that separate areas, but that is what the classes were called anyway.
The probability theory class was pure math, and while it isn't the most enjoyable class I've had, at least I felt comfortable with that.
The stats class I never got my head around. It was taught in a way similar to the probability theory class -- so lots of proofs of this and that -- but all the problems boiled down to modelling some thought up data with this or that given model and then getting a p-value out of it.
It seemed very arbitrary to me to conclude anything from that.
Worse, all exam questions were a question of pattern matching. If you read the description of the problem, you would recognize which model you should use, and then just do the calculations. That, in particular, didn't make sense to me.
I just got the impression that stats was pretending to be math -- with right or wrong answers to questions; answers you could work out mathematically -- but why a certain model should be used in a given context was never clear to me.
It wasn't until years later I finally started to "get it". I was working as a post doc in a stats department, and only then I started realising that statistics is not about the mathematics as such but about trying to model data. Not finding the "right" model, but "some" model. The models, and the math, is only a way to quantitatively get an idea about how good your model is and what it tells you about the data.
I feel now, that the way I was taught stats did more harm than good to me.
If only someone had told me, back then, that it is not about right or wrong models, but about useful and less useful models...
This is the key thing I try to get through when I teach now.
My personal bias would be to start with decision making and work back to information needs. This could move from descriptive stats to inferential.
I have a background in computer science (with a minor in mathematics) and I never managed to understand stats while studying.
Thomas
Agree with much of what you say, but some additional points
An (unknown) amount of math is necessary
I have been told that many Stat faculty soon learn they need to pretend to be mathematicians (especially in Math & Stat depts)
And something Ken Iverson said when presenting an introduction to calculus using J (modern APL)
"This is just what I wished someone would have pointed out to me when I first studied calculus - it may be something that didn't need to be pointed out to you and for those new students who may have this pointed out them, they will later realize they needed something different pointed out to them!'
But at least someone is now giving a grad course in teaching statistics (many even some "real" empirical research in the future?)
Oh don't get me wrong, I am not saying that the math is not necessary. Some very involved math can be necessary for some models, I just think that the modelling was sort of left out of my class back then.
I love the quote, though. It is unlikely that we need the same things pointed out, but as a teacher I will tend to point out the things I needed pointed out :)
We need to define who the audience is for this intro to statistics class. The Varian quote unfortunately indicated confusion: it started out by referring to statisticians as the "sexy job" and ended up talking about managers needing to work with data. Those are two different audiences!
For managers whose primary job is not data analysis, we need to focus on statistical concepts - variability, margin of error, interaction, causation and correlation, etc. - and then rudimentary spreadsheet skills. I'm finishing a book with this in mind right now. For managers, data analysis will always be only a tool and they will spend most of their time on "more important" things.
For statisticians who will be hired to assist the aforementioned managers, we need to focus on applications, computing, visualization, etc. Also, how about internships and class projects with real companies and organizations to improve consulting skills? This group should also be exposed to the mathematics; one can't apply techniques without understanding the limitations.
Enh, I'm not feeling warmer and fuzzier. I'm feeling colder and pricklier because I'm miffed that he doesn't know the difference between complimentary and complementary.
I am teaching an introduction to statistics for undergraduates in a political science department this term. I've been quite happy (so far) with a new (and in progress) textbook by Daniel Kaplan:
Introduction to Statistical Modeling (by Daniel Kaplan)
It seems to de-emphasis much of what confuses people, without adding confusion by dumbing things down (for example, he uses geometry to talk about (X^{T}X)^{-1}X^{T}y rather than just avoiding multivariate problems).
I am not sure it does a deep job regarding insight into social and political questions, but then, the book is meant to be as useful for biology majors as it might be for political science majors (I think Kaplan uses the book himself with folks in their first year of college). This means that much of my job this term is linking the insights in the book to the kinds of exciting and worrying questions that get people into the social sciences in the first place.
Anyway, this is my first time with the book, so I'll know more about how it works in a couple of months. Right now, a few weeks into the semester, I'm quite happy with it and hopeful that it strikes the right kind of balance for such classes. And I'm posting about it here in case other folks are casting about for a textbook --- whether as a textbook per se or as a model/proposal of how introduction to statistics ought to be taught.
Jake
1) I agree that good statistical training should involve education in data manipulation, machine learning, modeling, visualization, simulation, etc. There's not enough of this sort of thing in econometrics, IMHO. Statistics (these days at least) seems to be much broader.
2) Ubs, I guarantee that I know the difference between "complimentary" and "complementary". The quote is a transcription of an interview, and apparently the transcriber did not hear the difference. Think of the "i" as an error term...