Open Science and the Marketplace of Ideas

February 24th, 2017, David Mellor


research curves


Our mission at COS is to make science more reproducible and dependable. Our strategy to achieve that mission is to make the research process more transparent so that those who follow can understand and then build upon your discovery. One reason why transparency into the research process increases reproducibility is the simple clarity that comes from documenting important materials that are all too often lost. Preserved data, code, and methods allow for others to stand on your shoulders and to push knowledge into new areas.


However, the other method by which transparency increases reproducibility is by bringing clarity not to objects, but to decisions. The decisions that we make when analyzing a data set, and the timing of those decisions, affect the ability to make an inference from the results.


One common example of the need for this transparency comes from undisclosed flexibility in data analysis. Any sufficiently large dataset has many possible ways to measure the relationship between a predictor and an outcome. A great demonstration of the dangers of unreported flexibility comes from Simmons, Nelson, and Simonsohn’s “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.” In it they make an assertion from data they collected on real research participants that “people were nearly a year-and-a-half younger after listening to “When I’m Sixty-Four” (adjusted M = 20.1 years) rather than to “Kalimba” (adjusted M = 21.5 years), F(1, 17) = 4.92, p = .040.” What they later go on to demonstrate is that they collected many other variables and repeatedly tested for significance until one of many tests performed came back surprisingly significant. If you want to play around with a large data set to see how different combinations of variables can lead to different and surprising results, play with this excellent tool from 538.com.


Besides unreported flexibility in data analysis, using a dataset to influence how a hypothesis will be tested can also invalidate the ability to make any meaningful inference from your research. In this situation, the precise hypothesis one tests is subtly altered by the incoming data. Known as “hypothesizing after results are known,” (HARKing, Kerr, 1998) any such assertion becomes mired in circular reasoning. A trend seen in a sample is confirmed by that same sample and then the  hypothesis suggested by the data cannot be used to make more general inferences to another population.


However, even knowing that these data-led decisions affect the credibility of our results, few of us can clearly recall when the individual decisions were made as we worked through a tough problem. Even if our memories were perfect, the context of the decisions will be lost to future scholars if they are not documented. Of course, our memories are not perfect and we are each faced with motivated reasoning and hindsight bias that cloud our ability to distinguish data-led exploration from the precise tests specified a-priori.


Preregistration documents the process. Preregistration keeps you honest to yourself, and as the Richard Feynman reminds us, the person who is easiest to trick is you.


When creating a preregistration, you create a time-stamped documentation of your ideas as they exist at that time. Including an analysis plan ensures that your ideas are precisely and accurately documented. Creating that document makes clear when future decisions are made, it does not prevent you from making or implementing them.  


The most frequent concern I hear about preregistration is that it will stifle exploration; that data-led analyses are how we push knowledge into new areas. I agree that exploration is critical. Preregistration simply creates the line in the sand where confirmation and exploration meet. Crossing that line is a signal to you and to your peers that you are in new, unexpected areas. Perhaps the effect you are measuring only occurs on certain days; if so, that explanation deserves to be put the the test.


If preregistration were to be widely implemented prior to any data collection effort, the result would be a more functional marketplace of ideas. As any economist will tell you, a properly functioning marketplace requires transparency so that the individual players can accurately value the items in that marketplace. The ideas in the marketplace of science are the results of hypothesis testing, confirmatory analyses or the results of hypothesis generating, exploratory analyses. Though they both have value, their values are not equal. Right now, no one can accurately judge the value of most ideas in the published literature. Not the reader, not the peer reviewers, and not even the original author.


In On Liberty, John Stuart Mill laid out the rationale for allowing the marketplace of ideas to exist (though I do not think that term was yet in use). His rationale for fostering a truly free and open debate of ideas and counterarguments is threefold: 1) it allows for false ideas to be countered, 2) it allows for true ideas to be strengthened through the exercise of argument, and, most important of all, 3) it allows for the partially-true concepts to be improved. This rationale lays out why no idea should be stifled, except through counterargument. Our vision for open science mirrors this rationale: ideas must be debated, transparency into the process of science allows that to happen.


Preregistration allows the argument to have meaning. With the status quo, the credibility of most new ideas is hard to judge: are the reported assertions the result of confirmatory hypothesis tests or are they the result of data exploration, deserving of more study? We envision a future where scholarly communication is more than just the advertisement at the end of the study--it's a place where ideas can be freely tested, and the work can be used by the community to advance knowledge.

If you want to be part of that future, start your preregistration now.


Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science, 22(11), 1359–1366. https://doi.org/10.1177/0956797611417632


Recent Blogs

The Content of Open Science

What Second Graders Can Teach Us About Open Science

What's Going on With Reproducibility?

Open Science and the Marketplace of Ideas

3 Things Societies Can Do to Promote Research Integrity

How to Manage and Share Your Open Data

Interview with Prereg Challenge Award Winner Dr. Allison Skinner

Next Steps for Promoting Transparency in Science

Public Goods Infrastructure for Preprints and Innovation in Scholarly Communication

A How-To Guide to Improving the Clarity and Continuity of Your Preregistration

Building a Central Service for Preprints

Three More Reasons to Take the Preregistration Challenge

The Center for Open Science is a Culture Change Technology Company

Preregistration: A Plan, Not a Prison

How can we improve diversity and inclusion in the open science movement?

OSF Fedora Integration, Aussie style!

Replicating a challenging study: it's all about sharing the details.

Some Examples of Publishing the Research That Actually Happened

How Preregistration Helped Improve Our Research: An Interview with Preregistration Challenge Awardees

Are reproducibility and open science starting to matter in tenure and promotion review?

The IRIS Replication Award and Collaboration in the Second Language Research Community

We Should Redefine Statistical Significance

Some Cool New OSF Features

How Open Source Research Tools Can Help Institutions Keep it Simple

OSF Add-ons Help You Maximize Research Data Storage and Accessibility

10 Tips for Making a Great Preregistration

Community-Driven Science: An Interview With EarthArXiv Founders Chris Jackson, Tom Narock and Bruce Caron

A Preregistration Coaching Network

Why are we working so hard to open up science? A personal story.

One Preregistration to Rule Them All?

Using the wiki just got better.

Transparent Definitions and Community Signals: Growth in the Open Science Community

We're Committed to GDPR. Here's How.

Preprints: The What, The Why, The How.

The Prereg Challenge Is Ending. What's Next?

We are Now Registering Preprint DOIs with Crossref

Using OSF in the Lab

Psychology's New Normal

How Open Commenting on Preprints Can Increase Scientific Transparency: An Interview With the Directors of PsyArxiv, SocArxiv, and Marxiv

The Landscape of Open Data Policies

Open Science is a Behavior.

Why pre-registration might be better for your career and well-being

Interview: Randy McCarthy discusses his experiences with publishing his first Registered Report

Towards minimal reporting standards for life scientists

Looking Back on the Prereg Challenge and Forward To More Credible Research

This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.