Science is confronted with an apparent paradox. Although almost every researcher agrees that falsification is the key to progress and publication of null results is pivotal in the cumulative process of building knowledge, many more “positive” results are published than expected based on unbiased estimates of effect sizes. Obviously, there are many reasons for this discrepancy. A couple of days ago, Anne Scheel posted a very thoughtful blog post about why we should love null results. If you haven’t read it yet, make sure to do so, it is certainly worth the 10-min read. Whereas I second many of her conclusions, I don’t think having the same love for null effects that we have for significant effects will eventually help in tackling publication bias unless there will be a level playing field. Perhaps this pessimism is due to the take-home of several classics from literature classes taught back in the days in school: think of Romeo and Juliet or Intrigue and Love showing essentially that love is at times not enough to overcome a rigid system of family lineage, class, and seniority rule. Any association with academia is of course only coincidental.
I’ve heard it before, but I’m still not fully convinced that publication bias has so much to do with a bias in the researcher’s perception. When I talk to undergraduate or graduate students about these topics, I don’t see much of an inherent bias in favor of significant results. Sure, if you do an experiment because you have developed a convincing hypothesis on a pressing question and you end up finding “nothing”, you might be disappointed for a while. However, if you put extensive thought in an experiment, if you have worked hard to collect quality data, if you double-checked every entry including your scripts and then, still, the results don’t turn out to be confirmatory, I have rarely encountered the reaction that reporting the absence of an effect would be less worthwhile to pursue. That makes me wonder if such biases primarily arise from being socialized in today’s, or maybe even yesterday’s academic environment.
Surely God loves the .06 nearly as much as the .05.
Rosnow & Rosenthal (1989)
As an experimental psychologist, I have mostly trusted my data more than my hypotheses in the end. Okay, sometimes even basic sanity checks fail and you don’t learn anything other than the feel of the wrath of Murphy’s law. Honestly, I would be just as happy to write about any outcome of a proper study. However, the truth is that my work as an early career researcher (ECR) is not evaluated in terms of producing scientific evidence that will stand the test of time. What counts is impact, a rather crude measure of quality work.
The bottleneck for null effects is not love, but time and “impact”
So, the problem I am facing as an ECR is quite simple. I have no preference for positive or negative results. Results are just results. If you do research properly, you have no control over the exact numbers that you will be getting. Even well-powered studies on true population effects will sometimes be “negative”. Nevertheless, I have also learned that negative results are much harder to get published within a reasonable time frame. Realistically speaking, you have to aim for a lower impact factor journal and it will usually take much longer because peer review takes longer and chances are high that one reviewer (previous author?) will not like the results or come up with reasons why you obviously must have used the wrong design.
That’s exactly why registered reports are so important, but they are a rather recent introduction in many journals and I haven’t seen many RRs in biomedicine yet. I am positive that this will be a game changer in the next years. There is hope. The transition will take time, though, to materialize. Still, a lot of very valuable data has been collected without preregistration in my field of fMRI research and I would like to see as much of it published as possible regardless of the p-[hacked] value. Given the cost of scanning, we need to make the most of what we already have. But what is keeping us from doing it at a much bigger scale? Is it all about the love?
My guess is that it primarily comes down to time and opportunity costs. Let’s assume the same love for a result, regardless of the p-values that we obtain for H1. But now we factor in the lower incentive and the longer delay until the paper gets published. What you get is a publication system that is essentially punishing you for doing the right thing. Whereas you might not care as much if you have just started grad school or landed a permanent contract, it gets very frustrating when you are competing for a small number of positions or grants, which are evaluated based on how many high-impact papers you have published, not how many of your hyped findings have been successfully replicated by others.
Limited space creates pressure of selection
Due to this pressure of selection, strategic considerations start to kick in once you have more data at your hands than you can turn into papers. It might just take a few weeks to write a draft of a paper, but formatting, revising, submitting, re-formatting, re-revising, re-submitting etc. can take ages. Thus, working on too many papers that are difficult to “sell” is a risk for your productivity as a battling ECR. Yes, you can upload a preprint and make your work instantly visible. That’s nice but again, I learned not everyone has the same love for preprints. By the way, has anyone ever studied if love for negative results and love for preprints is associated? I could make an educated guess by now.
On top of that, when you work in fMRI research for a while, there are so many options to make use of the high dimensionality of the accumulating data from different modalities. This necessitates you to focus on some selected operationalizations of the concept you are interested in. New methods for improved data analysis come in by the minute and the pressure to always do something novel is enormous. As a result, you are also pressured to select the most promising route and one may not follow up on things that don’t seem to “work”.
The classic paper might not be a good format for negative results
Taken together, all of this makes me wonder if we should think more outside the box to effectively open the many file drawers. Mandatory publishing of protocols and data is great to evaluate reproducibility because it allows other researchers to take a second look at your gem. Still, leaving the incentive of shooting down the pet theory of your competitors aside, this might not be enough to turn the tide of publishing waves.
Instead, we may need to develop a new fast-track format for negative results or replications in general. Yes, there are journals committed to publishing such work, but my experiences are mixed so far. I believe that the more fundamental question is: if the classic paper was developed to communicate and promote a positive result given very limited space in print, maybe we need a new format for the digital age with its abundance of big data resources and virtually unlimited space for storage that is better in line with the goal of rapid and unbiased dissemination.
My vision would be to post something similar to a formalized blog post, more like a brief report, linked to a published study protocol and directly to the original paper (both ways!) combined with an open peer review, similar to F1000research. Maybe it could be a two-stage procedure as in a registered report where you don’t get to see the results during the first round. Notwithstanding, for a format like that to gain momentum, it would need to be a) indexed properly, b) more balanced in terms of reviewer choice and c) free to publish or at least not incurring substantial article processing charges. For example, everything that does not get an impact factor assigned would not really count in evaluating my scientific output at a faculty of medicine. Probably, this was determined in the spirit of Galileo Galilei’s “Measure what is measurable, and make measurable what is not so.”
Same love, different formats?
Arguably, we should have the same love for all our results, but maybe we simply don’t need to have the same format for all of them, at least while we are in a phase of transition in how we do science. I know this sounds a bit contradictory when I write that results shouldn’t be treated differently in the first place. But in fact, they are treated differently for several reasons that are hard to overcome with stats education alone and they can be a heavy burden on a young researcher’s shoulders. Let’s find or build ourselves a new home where every honest p-value is welcome until the RR becomes the new default.
Importantly, there is also one more imperative lesson from the literature classes in school. You better stick to what is right and who/what you love if you want to be remembered for your actions by many generations to come. Keep that in mind when other people talk about maximizing impact. They might be referring to something that will not necessarily last very long.