Exploratory Versus Confirmatory Data Analysis?

In 1977, John Tukey published a book called Exploratory Data Analysis. It introduced many new ways of analyzing data, all relatively simple. Most of the new ways involved plotting your data. A few involved transforming your data. Tukey’s broad point was that statisticians (taught by statistics professors) were missing a lot: Conventional statistics focussed too much on confirmatory data analysis (testing hypotheses) to the omission of exploratory data analysis — data analysis that might show you something new. Here are some tools to help you explore your data, Tukey was saying.

No question the new tools are useful. I have found great benefits from plotting and transforming my data. No question that conventional statistics textbooks place far too little emphasis on graphs and transformations. But I no longer agree with Tukey’s exploratory versus confirmatory distinction. The distinction that matters — at least to historians, if not to data analysts — is between low-status and high-status. A more accurate title of Tukey’s book would have been Low-Status Data Analysis. Exploratory data analysis already had a derogatory name: Descriptive data analysis. As in mere description. Graphs and transformations are low-status. They are low-status because graphs are common and transformations are easy. Anyone can make a graph or transform their data. I believe they were neglected for that reason. To show their high status, statistics professors focused their research and teaching on more difficult and esoteric stuff — like complicated regression. That the new stuff wasn’t terribly useful (compared to graphs and transformations) mattered little. Like all academics — like everyone — they cared enormously about showing high status. It was far more important to be impressive than to be useful. As Veblen showed, it might have helped that the new stuff wasn’t very useful. “Applied” science is lower status than “pure” science.

That most of what statistics professors have developed (and taught) is less useful than graphs and transformations strikes me as utterly clear. My explanation is that in statistics, just as in every other academic area I know about, desire to display status led to a lot of useless highly-visible work. (What Veblen called conspicuous waste.) Less visibly, it led to the best tools being neglected. Tukey saw the neglect –  underdevelopment and underteaching of graphs, for example — but perhaps misdiagnosed the cause. Here’s why Tukey’s exploratory versus confirmatory distinction was misleading: Because the tools that Tukey promoted for exploration also improve confirmation. They are neglected everywhere. For example:

1. Graphs improve confirmatory data analysis. If you do a t test (or compute a p value in any way) but don’t make an associated graph, there is room for improvement. A graph will show whether the assumptions of the computation are reasonable. Often they aren’t.

2. Transformations improve confirmatory data analysis. That a good transformation will make the assumptions of the test more reasonable many people know. What few people seem to know is that a good transformation will make the statistical test more sensitive. If a difference exists, the test will be more likely to detect it. This is like increasing your sample size at no extra cost.

3. Exploratory data analysis is sometimes thought of as going beyond the question you started with to find other structure in the data — to explore your data. (Tukey saw it this way.) But to answer the question you started with as well as possible you should find all the structure in the data. Suppose my question is whether X has an effect.  I should care whether Y and Z have an effect in order to (a) make my test of X more sensitive (by removing the effects of Y and Z) and (b) assess the generality of the effect of X (does it interact with Y or Z?).

Most statistics professors and their textbooks have neglected all uses of graphs and transformations, not just their exploratory uses. I used to think exploratory data analysis (and exploratory science more generally) needed different tools than confirmatory data analysis and confirmatory science. Now I don’t. A big simplification.

Exploration (generating new ideas) and confirmation (testing old ideas) are outputs of data analysis, not inputs. To explore your data and to test ideas you already have you should do exactly the same analysis. What’s good for one is good for the other.

Likewise, Freakonomics could have been titled Low-status Economics. That’s essentially what it was, the common theme. Levitt studied all sorts of things other economists thought were beneath them to study. That was Levitt’s real innovation — showing that these questions were neglected. Unsurprisingly, the general public, uninterested in the status of economists, found the work more interesting than high-status economics. I’m sensitive to this because my self-experimentation was extremely low-status. It was useful (low-status), cheap (low-status), small (low-status), and anyone could do it (extremely low status).

More Andrew Gelman comments. Robin Hanson comments.

34 Replies to “Exploratory Versus Confirmatory Data Analysis?”

  1. I spoke with a statistics professor at Berkeley about this book. On her website, it says she studies “multilevel and latent variable modeling.” She said Tukey’s book is “not important” and she mentioned something funny about Tukey’s life. She did say, “Are you interested in statistics?” Thanks, Seth, for making me seem smart to these Berkeley profs!! I always talk to them about something I learned from you and your blog!!

  2. what was the funny thing about Tukey’s life?

    Tukey’s book was really important to me because it stressed two things (graphs & transformations) that my other statistics textbooks did not. They turned out to be incredibly useful.

  3. lattice plots allow more than three variables to be visualized. For example, one scatterplot shows X vs Y. And a 2 x 2 matrix of X-Y scatterplots shows how that relationship varies with W (rows) and Z (columns). So you get up to 4 dimensions easily enough. More than 4 dimensions is hard. Gotta use ANOVA to figure out what graphs to make.

  4. jay, thanks for the link. Nice post. Although psychiatric research has done little for the general public, it has done wonders for the status of the psychiatrists who publish it (within their profession). That’s why it’s not a bubble. It really pays off — just not for the rest of us. Basically the same situation as most statistics research, I agree.

  5. I believe for bipolar disorder, there hasn’t been a lot of new medications out in the last ten years. Or at least, to my knowledge. Probably true for schizophrenia medication (antipsychotics), as well. In addition, some of them can be hard to tolerate, all have side effects, etc. It is too bad, but some people have no other choice and have to take these psychiatric medications. I think that’s why there’s therapists and psychologists who devote their careers to helping the mentally ill cope with these things. They’ve developed different ideas such as CBT and ACT. Together with the psych meds, these can be more powerful than psych meds alone.

  6. LemmusLemmus, my self-experimentation was published in a high-status journal. That didn’t change the basic picture. I believe that most econ profs in high-rated depts believed that determining the income of drug dealers was low-status. The abortion stuff, not so low-status. Sumo wrestlers = low status.

  7. What is interesting to me about the low-status/high-status distinction when you extend it to academic work is that much of the low-status academic work could also offer more value to society than the high-status work that is instead undertaken — like more lessons from self-experimentation would be of value than expensive clinical trials, or research on prevention of disease might be of greater social value than high-status pharmaceutical or surgical interventions.

    Is much of the research in Freakonomics also of value to society, or simply popular in the way that other novelties are popular — e.g., information on Britney Spears’ personal life is popular, but not valuable to society. I didn’t read Freakonomics because in the multiple excerpts I read the only valuable insights I found were the relationship between abortion and later crime rates and the low-incomes of drug dealers on the street — both of which were fascinating and potentially valuable, but covered thoroughly in other sources. Was the ‘sumo wrestlers’ for instance useful? I don’t recall reading about it.

  8. Michael, the broad point of Freakonomics — that data is useful, that it can change your mind — is quite useful. Whether this is a low-status point to make I’m not entirely sure but many famous economists have been far less interested in data collection than Levitt.

  9. Steven Levitt is a Clark Medalist from the University of Chicago, easily the most influential Economics department in the world. The idea that he is some outsider doing low status work that the rest of the field disdains is nonsense. His prominence in the field is what allows him to study cheating in sumo wrestling and ghetto baby names, instead of unemployment and inflation. That is to say, exactly the kind of esoteric and impractical status signaling work you deplore in every other academic.

  10. I think this post is great and agree in general. But I’ll chime in and agree that Steven Levitt is about as “high status” as you can get within the academic economics community. The Clark Medal is only given out once every two years (as opposed to one Nobel a year). I think one of the reasons that he is high status is because he has used fairly conventional econometric techniques to “colonize” areas not traditionally the realm of economists. (Eg, what would have been considered the realm of sociology.)

  11. Socktopi & M, you make a good point that perhaps I should have made. (In an earlier draft, I did.) It’s like Nixon and China. That his anti-communist credentials were secure made it easier for him to go to China. Long before Exploratory Data Analysis, John Tukey’s very high status was assured. He was a co-inventor of the Fast Fourier Transform, for example. I’m sure he was utterly unconcerned how EDA would affect his perceived status. (As L says, it didn’t help. It really did get a scornful reception from some high-status statistics professors.) Likewise with Levitt. Just as you say, Socktopi, Levitt’s very high status made it easier for him to do low-status research.

    I wouldn’t call the stuff Levitt studied “esoteric”. For the field of economics, they are esoteric topics but for the general public they are common concerns: What to name our baby? for example. You could say that by doing such research, Levitt signaled his extremely high status — just as Tukey did, just as Nixon signalled his extreme anti-communism by going to China. In practice I don’t think it works that way. I don’t think the motive for the work is signaling. I don’t think Nixon went to China to show how incredibly anti-communist he was. Nor did Tukey write EDA to show how incredibly high status he was.

  12. Seth,

    I’m not buying the revised version of the Levitt’s-work-as-low-status view either. According to his CV, he got the John Bates Clark Medal in 2003. The papers that went into Freakonomics are (based on Wikipedia’s chapter overview, plus memory – i.e., I may have overlooked stuff):

    Cheating teachers – 2002 , QJE, Brookings-Wharton Papers on Urban Affairs
    Cheating sumo wrestlers – 2002, AER
    Drug-selling gang’s finances – 2000, QJE
    Abortion and crime – 2001, QJE
    SES and names – 2004, QJE

    The established economists that vote for the Bates Clark Medal clearly liked the “low-status” stuff that went into Freakonomics.

  13. LemmusLemmus, I’m not saying all economists think alike. Lots of people were glad Nixon went to China. Enough prominent statisticians liked Tukey’s emphasis on graphics that the whole area has become more popular. Nor am I saying that Levitt’s work was simply low-status. It was also well-done — just as Tukey’s work wasn’t merely low-status. He also introduced important new ways of making graphs. Disdain for what Levitt has studied has been publically expressed by Heckman, one of his colleagues. But I agree with you to this extent: Levitt had technical skills that made his work on low-status questions more acceptable to his profession. People are far more concerned about their own status than other people’s. Professor X, who would never study something low-status, might be quite happy that Levitt did so.

  14. Seth, I’m glad you mentioned psychiatry. I may have posted this link in the past, but here is an excellent book about the psychiatric establishment and how it does more to harm patients than help them:

    Mad in America: : Bad Science, Bad Medicine, and the Enduring Mistreatment of the Mentally Ill, by Robert Whitaker.

    Whitaker (the author) is also coming out with a new book soon:

    Anatomy of an Epidemic: Magic Bullets, Psychiatric Drugs, and the Astonishing Rise of Mental Illness in America.

    It should be out in April. I’ve pre-ordered it. If it’s anything like the previous book, it should be excellent.

  15. Also agree with everything, except the last paragraph. I don’t know if Levitt’s work is high status or low status, but one thing it’s not is economics, nor is it useful, correct, or insightful. It is cute though.

  16. Seth is 100% right about the paradox of Levitt. Levitt himself is high-status and that high-status allowed him to low-status type data exploration e.g. baby names, the economics of drug dealers etc etc

    (I should mention before Levitt, Steven Landsberg and David Friedman were also writing economics books in a similar vein. However, I don’t believe they did extensive research into some of these low-status subjects, generally only doing theoritical exploration of these e.g. why does popcorn cost so much in movie theaters? etc etc

    I know more than a few macro-economists who sniffed (jealously) at Levitt’s massive mainstream success, commenting on how “un-serious” and “unimportant” his work was.

  17. Turning again to the economics profession, a good example of high/low status problem is the gap between economists who work on policy (say in think tanks or government) and academics. Relatively simple analysis of data is essential input for policymakers and top decision-makers. And government is an important part of our economy. So, in this sense, policy economists do very useful work. (I’m not saying they are all good — just that they have an important role.) The majority of the person-hours expended by academic economists has nothing to do with improving policy analysis.

    We also seem to have a system in the US (and other countries) where some of the very top economist positions in government are filled by those who first made their name as academics. In other words, they had to spend a long time demonstrating their high status to other academics (in not very useful ways) before getting the chance to employ relatively “simple” analysis in the public service.

  18. “What would be helpful for the mentally ill?” Helping people with a personal stake — they have the problem, or a loved one has the problem — do research. Helping them publish the results. Shift resources from those whose main goals are status and career advancement to those whose main goal is useful progress.

  19. Seth, I’ve never heard you talk about this before. This idea is big. I was wondering if you could go into more detail on your blog, if you have time.

  20. Seth, the economics professor said that he agrees that Freakonomics is low- status research. He said, “But I would use the word ‘popular [instead of low-status]’ ” and “it’s not real economics research.”

    I am in contact with a lot of professors at Cal everyday and I enjoy speaking with them about your blog entries!

Comments are closed.