I haven’t been interested in the work of John Ioannidis because it seems unrelated to discovery. Ioannidis says too many papers are “wrong”. I don’t know how the fraction of “wrong” papers is related to the rate of discovery. For example, what percentage of “wrong” papers produces the most discovery? Ioannidis doesn’t seem to think about this. Yet that is the goal of science — better understanding. Not “right” papers.

Almost all important health discoveries are discoveries of new cause-effect relationships. If you do X, Y happens. My view of the problem with modern health science is nothing like what Ioannidis and other critics (such as the “couldn’t replicate Finding X” critics) say. It is lack of progress on major health questions (e.g., what causes depression?), emphasized every year by awarding of the Nobel Prize in Medicine to research of little or no practical value. Almost every year, the Nobel Prize press office says the honored research will be useful in the future. The lack of progress shows no sign of ending.

The best that can be said about recent critics of science, such as Ioannidis and Danny Kahneman, my former colleague, is they see there’s a problem. The worst that can be said about them is they fail to understand the cause of the problem. This is why their proposed solutions could easily make the problem worse.

Whenever you do an experiment — psychology and the health sciences are almost all experimental — you “use up” the effect you are studying (X causes Y). You can do an experiment to learn if X causes Y only so many times. After that, you know the answer and a new experiment is pointless. Professional scientists are only able to test ideas (cause-effect statements) that are fairly plausible. With such ideas, a publishable outcome is likely enough to be worth the cost of testing. They are unable to test implausible ideas, because such experiments are not likely enough to produce a publishable outcome. With limited resources, they must generate a certain number of published papers per year, at least if they want a career.

To have a viable system, you need to generate new plausible ideas at at least the same rate you are using them up. Otherwise you will run out. You must design your experiments so that they accomplish this. Not necessarily every experiment, but your experiments in aggregate. It isn’t easy to find new plausible ideas. If you think I’ll just get on with my career, generating papers as fast as possible and leave it to someone else to come up with new ideas worth testing, then your field will run downhill as plausible ideas are used up and not replaced. This is what has happened in several fields, including mine (animal learning). In psychology much greater concern about both fraud and lack of replicability have started at about the same time. I believe both (more fraud, more lack of replicability) stem from the increasing difficulty of honest (or more honest) research.

A friend who is a psychology professor agreed with me that psychologists — at least him — didn’t know how to generate new ideas worth testing. “Do you?” he asked. I said I did:

1. They [= psychologists] should modify their data collection. In my experience, new ideas almost always come from carefully collected data. They don’t come from introspection, talking to friends, reading the newspaper, watching TV, going to talks, etc.

2. Finding new ideas worth testing means finding new ideas that are plausible enough to be worth the cost of testing. To find new ideas with sufficient plausibility to test you need to test implausible ideas. A small fraction will pass the test, gaining plausibility. They will become sufficiently plausible to be worth testing.

3. To test implausible ideas in a career-consistent manner, you need to be able to test them very cheaply. Few if any psychologists have thought about this. They don’t realize how important it is.


When you have very cheap tests, you can test far more ideas than you can if you only have expensive tests. You need a “test set”: very cheap tests, cheap tests, almost-cheap tests, and so on. Ideas that pass a very cheap test become worth testing with a cheap test, those that pass a cheap test become worth testing with an almost cheap test, and so on. With current methods (all tests are expensive), perhaps social psychology professors who want to publish have a set of 50 ideas that are plausible enough to be worth the cost of testing. Those ideas get tested over and over, using them up. Were cheap tests available, perhaps the same professors could choose from a set of 1000 ideas those they want to test. Of those 1000 ideas, 950 were too implausible to test with expensive tests. Among those 950, I believe, would be some ideas that when tested seemed to be true.

I came to these beliefs trying to understand why my self-experimentation did a good job of finding new ideas worth testing. I concluded that the secret was this: I was able to test implausible ideas very cheaply — thousands of times more cheaply than professional scientists. Self-tracking — keeping track of my sleep, for example, and looking for outliers — was a very cheap way of getting new ideas about what controls sleep. Self-experimentation was a slightly more expensive (but still very cheap) way to test ideas that self-tracking came up with.

Many people have complained about a lack of replicability problem in psychology, including my friend and co-author Hal Pashler. An obvious solution is to raise the bar for publication: require better (= stronger) evidence. Sure, this will improve the quality of testing, but how will it affect the rate of production of plausible new ideas? My cost-of-test proposal suggests it will reduce that rate of production. I am saying that cheap tests are all important. Raising the publication bar will make the only test you have more expensive. What if the replication problem is a response to lack of plausible new ideas? Then this solution to the problem would make the problem worse.


  1. Good points about the importance of science being on a path to doing something useful, though pure science can pay off in the long run. Who’d have thought that observing the movements of the moon and planets would pay off centuries later in communications and observation satellites?

    Back to medicine…. I think you’re pointing at serious problem, but if the field didn’t have so much fraud and incompetence, there’s be more good information (and less bad pseudo-information) to base hypotheses on.

    1. “Back to medicine…. I think you’re pointing at serious problem, but if the field didn’t have so much fraud and incompetence, there’s be more good information (and less bad pseudo-information) to base hypotheses on.”

      That’s what a friend of mine says. I think he’s wrong. I think the problem lies elsewhere — failure to discover big new effects.

  2. Aside from discovery, medical science also must support the pledge all doctors take to “first, do no harm”. This is the main danger with wrong studies. We assume that the results are right because the P-value is small and then we go out and treat millions of patients based on it. Years later we find out we were killing thousands of people.

    1. “Years later we find out we were killing thousands of people.”

      One big reason medical treatments do harm is that the theory behind them is wrong. For example, cholesterol theory, chemical-imbalance theory.

  8. Seth, my apologies if you are already aware of this connection, but you should know that the way that you talk about idea-generation in science is *extremely* similar to the way people in the “lean startup” movement talk about starting businesses.

    The core idea of leans startup is that instead of investing the time and money to start a business based on a hunch of whether there is demand for its product, you should progress through a sequence of progressively more expensive stages of data gathering to validate your idea. So first do the cheapest possible thing that you can do to test an idea, such as basic analysis and discussions. Then do the next most expensive thing you can do to test it further (such as customer surveys, or trial ad campaigns). Proceed along this path, ramping up your investment gradually, in a way shaped by evidence you are discovering. People in the lean world like to talk about “de-risking” the process, by proceeding in this incremental and empirical way.

    This is just like the progression you describe, from cheap idea generation, to cheap tests via self-experimentation, to more expensive research trials.

    I mention not only because it’s interesting, but because the lean startup movement is large and quite influential in certain circles. It might help you to spread your ideas in some quarters, if you pointed out this parallel.

    I think this is one of the main books used to describe this approach:

  9. I think Ioannidis serves the important role of deconstructing the temple. I don’t think he is trying to say that too many papers are wrong, rather he is saying that we underestimate how many are wrong, and this leads to counterproductive practice.

    He is not trying to point out the solution, he is only pointing out the errors. There’s nothing wrong with that. Ioannidis is the garbageman of medical science.

    1. He’s pointing out the wrong errors, thus distracting attention from the important errors. Far more damaging than “wrong” published papers — which can be ignored — are papers that are not published but should have been.

