I was reading a psychology paper that promised to change how I thought about willpower.

The study claimed that ego depletion-the idea that willpower is a limited resource that gets exhausted-had been proven through rigorous experiments. Hundreds of studies supported it. It was taught in psychology courses. It was in textbooks.

I built my productivity system around this concept. I scheduled important decisions for the morning. I avoided making choices when I was tired. I believed willpower worked like a muscle that could be depleted.

Then I learned that when other researchers tried to replicate the ego depletion studies, they couldn’t reproduce the results.

The effect disappeared.

This wasn’t an isolated incident. This was part of something much bigger and more troubling: The Replication Crisis.

And it’s not just academic drama-it affects every piece of psychology research you’ve ever read, every self-help book based on “science,” every intervention claimed to work.

Let me explain why psychology research is in crisis, and what it means for how you should think about scientific claims.

What Is the Replication Crisis?

The replication crisis is the alarming discovery that many published research findings cannot be reproduced when other scientists try to repeat the experiments.

The Moment It Became Undeniable

In 2015, a massive collaborative project called the Reproducibility Project: Psychology tried to replicate 100 psychology studies published in top journals.

The results were devastating:

Only 36% of the studies replicated successfully
Many effect sizes (how strong the phenomenon is) were much smaller when replicated
Some famous findings disappeared entirely

Think about that. 64% of published findings in prestigious psychology journals couldn’t be reproduced.

This wasn’t because the replicators were incompetent. The original authors were often involved. The methodology was rigorous.

The original findings were just… wrong.

Not Just Psychology

While psychology has gotten the most attention, the crisis extends to:

Medicine:

Only 11% of landmark cancer research findings could be replicated (Begley & Ellis, 2012)
Many drug studies fail to replicate

Economics:

Replication rates around 60-70%
Many behavioral economics findings are fragile

Social Sciences Generally:

Political science, sociology, education research all affected
Varying replication rates, but all concerning

Even Hard Sciences (to a lesser extent):

Biology and chemistry have replication issues
Physics is most reliable, but not immune

But psychology is the poster child for the crisis, and understanding why reveals deep problems in how science is done.

How Did This Happen? The Perverse Incentives of Academic Publishing

The replication crisis isn’t about a few bad apples. It’s a systemic problem created by how academic research is incentivized, published, and rewarded.

Problem #1: Publish or Perish

Academic careers depend almost entirely on publishing papers in prestigious journals.

The pressure:

Tenure depends on publication count and journal prestige
Grants depend on publication record
Status depends on citations
“Publish or perish” is literal

The consequence:

Researchers need to produce positive, novel, surprising findings
Negative results (no effect found) don’t get published
Replications don’t get published (they’re “not novel”)
Incentive is to find effects, not to find truth

Problem #2: Publication Bias (The File Drawer Problem)

Journals want to publish exciting, positive results. They reject boring negative findings.

What this creates:

Imagine 20 researchers independently test whether power posing increases confidence.

19 find no effect
1 finds a positive effect (by chance)

What gets published? The one positive study.

What gets filed away? The 19 negative studies.

The published literature shows “power posing works!” The file drawers contain the truth: “It probably doesn’t.”

The result: The published literature is systematically biased toward false positives.

This is called publication bias or the file drawer problem.

Problem #3: P-Hacking (Data Torture)

In statistics, p < 0.05 (less than 5% probability the result is due to chance) is the threshold for “statistically significant.”

The problem: Researchers are incentivized to get that magical p < 0.05 by any means necessary.

How to p-hack (also called data dredging or fishing):

1. Trying multiple analyses until one “works”:

Test 20 different relationships
Only report the one that’s significant
Don’t mention the 19 that weren’t

2. Optional stopping:

Collect data continuously
Check for significance repeatedly
Stop collecting data once you hit p < 0.05
(If you’d kept going, it might have disappeared)

3. Selective exclusion:

Remove “outliers” that weaken your effect
Justify why those data points “don’t count”
Keep removing until p < 0.05

4. Exploring multiple dependent variables:

Measure 10 different outcomes
Report only the ones that show effects
Ignore the ones that don’t

5. Adding covariates:

Add control variables one at a time
See which combination gives you significance
Report that combination as “the model”

None of these are technically fraud. They’re all “researcher degrees of freedom.”

But they inflate false positive rates from 5% to as high as 60%.

Researcher Andrew Gelman calls this “The Garden of Forking Paths.”

There are so many decisions to make in data analysis (which participants to exclude, which variables to control for, how to transform data, when to stop collecting) that you can almost always find a path to significance.

It’s not intentional dishonesty. It’s unconscious bias combined with perverse incentives.

Problem #4: HARKing (Hypothesizing After Results Known)

The scientific method:

Form hypothesis
Design experiment to test it
Collect data
Analyze results

What actually happens (HARKing):

Collect data (maybe with vague hypothesis)
Explore the data
Find a pattern
Pretend you predicted it all along
Write paper as if you had a strong a priori hypothesis

Why this is a problem:

If you explore data looking for patterns, you’ll find them (even in random data). This is called overfitting.

The pattern might be real, or it might be noise. You need new data to test whether it holds up.

But HARKing presents exploratory findings as confirmatory. You’re pretending you tested a specific hypothesis when you actually went fishing for patterns.

This massively increases false positives.

Problem #5: Small Sample Sizes

Many psychology studies have laughably small sample sizes:

20 participants per condition
Sometimes as few as 10

Why this is a problem:

1. Low statistical power:

Small samples can’t reliably detect effects
Only large effects will be significant
But real effects are often small

2. Winner’s curse:

When a small study finds significance, the effect size is probably overestimated
You got lucky with an extreme sample
Replication will show smaller (or no) effect

3. Sampling error:

Small samples are more subject to random variation
What looks like an effect might just be a weird sample

Example:

You want to know if a coin is fair. You flip it 10 times. You get 7 heads.

Conclusion: “The coin is biased toward heads! p < 0.05!”

Reality: Flip it 1000 times. You get 510 heads. The coin is basically fair. Your sample of 10 was just random variation.

Many psychology studies are like flipping a coin 10 times and declaring profound discoveries.

Problem #6: Lack of Pre-Registration

In medicine, clinical trials are pre-registered. Researchers must:

State their hypothesis in advance
Specify their analysis plan
Register with a public database

This prevents p-hacking and HARKing.

Psychology historically didn’t do this.

Researchers could:

Try multiple hypotheses
Explore the data
Report what “worked”
Never mention what didn’t

It’s impossible to tell from reading a paper whether the findings are real or the result of data exploration.

This is finally changing (more on this later), but decades of research lack pre-registration.

Famous Examples: Studies That Didn’t Replicate

Let’s look at some high-profile findings that collapsed under replication attempts.

Example 1: Ego Depletion (The Willpower Battery)

The Original Claim:

Willpower is a limited resource. Exerting self-control in one domain (resisting cookies) depletes your ability to exert self-control in another domain (solving hard puzzles).

The Evidence:

Hundreds of studies showed this effect. Meta-analyses confirmed it. Roy Baumeister’s research became famous. Books were written. Strategies were designed around it.

The Replication:

In 2016, a massive pre-registered replication with 2,141 participants across 23 labs found… no evidence of ego depletion.

The effect vanished.

What happened?

Likely a combination of:

Publication bias (negative results weren’t published)
Small samples (early studies had few participants)
P-hacking (flexibility in how “depletion” was measured)

The implications:

I’m not saying ego depletion is definitely false. But the evidence is far weaker than we thought. The effect (if it exists) is much smaller and more context-dependent than claimed.

This was taught as established fact. It was in textbooks. And it might not be real.

Example 2: Power Posing

The Original Claim:

Standing in a powerful pose (hands on hips, chest out) for 2 minutes increases confidence, testosterone, and risk-taking.

The Evidence:

Amy Cuddy’s 2010 study. Became a viral TED talk (60+ million views). Inspired a movement. People power posed before job interviews.

The Replication:

Multiple attempts to replicate found:

No effect on testosterone
Weak or no effect on behavior
Maybe a small effect on self-reported feelings (but that could be placebo)

The aftermath:

One of the original co-authors, Dana Carney, publicly stated she no longer believes the effect is real.

Amy Cuddy stands by the finding (sort of-she’s walked back the hormonal claims).

The truth: Power posing might make you feel slightly more confident (placebo effect?), but the dramatic hormonal and behavioral effects claimed in the original study don’t hold up.

The Original Claim:

Subtle environmental cues influence behavior unconsciously.

Famous examples:

Florida Effect (Bargh et al., 1996):

Participants unscrambled sentences with words related to elderly (wrinkle, gray, bingo)
Afterward, they walked slower down the hallway
Conclusion: Priming “elderly” activated elderly stereotypes, affecting behavior

Money Priming:

Seeing images of money makes people more selfish and less helpful

Achievement Priming:

Exposing people to briefcases and fountain pens improves performance on achievement tasks

The Replication:

Most of these effects failed to replicate.

The Florida effect, in particular, has been attempted dozens of times with mostly null results.

What likely happened:

Small samples (original Florida study: 30 participants)
Researcher expectations (experimenters might unconsciously influence participants)
P-hacking and selective reporting

Some priming effects are real (semantic priming in reaction time studies is robust), but the dramatic behavioral effects from subtle cues? Highly questionable.

Example 4: Facial Feedback Hypothesis (Pen-in-Teeth)

The Original Claim:

Forcing your face into a smile (by holding a pen in your teeth) makes you feel happier. Facial expressions don’t just reflect emotions-they cause them.

The Evidence:

Classic study by Strack, Martin, & Stepper (1988). People rated cartoons as funnier when forced to smile.

The Replication:

A massive pre-registered replication (17 labs, 1,894 participants) found… no effect.

People didn’t rate cartoons as funnier when smiling.

The original authors defended the study, suggesting the replication changed important details.

But the core claim-that manipulating facial muscles changes emotional experience in this specific way-is now highly questionable.

Example 5: Growth Mindset

The Original Claim:

Teaching students that intelligence is malleable (growth mindset) rather than fixed improves academic performance, especially for struggling students.

The Evidence:

Carol Dweck’s research became hugely influential. Schools worldwide implemented growth mindset interventions.

The Replication:

Results are mixed:

Some studies replicate
Many don’t
Effect sizes are much smaller than originally claimed
May only work in specific contexts
May fade over time

My take:

Growth mindset isn’t wrong-believing you can improve probably helps. But it’s not the silver bullet it was portrayed as.

The educational establishment ran with preliminary findings and scaled interventions before solid replication.

The Deeper Problem: Understanding What Science Actually Is

The replication crisis isn’t just “some studies were wrong.” It reveals fundamental misunderstandings about how science works.

Misunderstanding #1: Published = True

What people think: “It’s a peer-reviewed study in a top journal. It must be true.”

Reality: “It’s a preliminary finding that might be true, might be exaggerated, or might be false. Replication and meta-analysis will clarify.”

Publication is the beginning of scientific evaluation, not the end.

Misunderstanding #2: Statistically Significant = Important

What p < 0.05 actually means: “If there were truly no effect, there’s less than a 5% chance we’d see data this extreme by random chance.”

What people think it means: “There’s a 95% chance this finding is true and important.”

These are not the same.

Problems with p-values:

1. They don’t tell you the probability the hypothesis is true

P-values assume the null hypothesis is true and ask “how surprising is this data?”
They don’t tell you how likely the alternative hypothesis is

2. They don’t tell you how big or important the effect is

p < 0.05 can mean a tiny, meaningless effect with a large sample
Clinical significance ≠ statistical significance

3. They’re easy to hack

As we’ve seen, p-values are easily manipulated

The American Statistical Association released a statement in 2016:

“Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.”

But that’s exactly what we’ve been doing.

Misunderstanding #3: A Single Study Proves Anything

How science should work:

Exploratory research generates hypotheses
Confirmatory research tests hypotheses (pre-registered)
Replication verifies findings across contexts
Meta-analysis synthesizes evidence
Confidence builds gradually

How it actually works:

Single study published
Media coverage: “Science says X!”
Everyone believes X
No one does replications (not novel enough)
Original finding might be completely wrong

Science is supposed to be cumulative and self-correcting. But the incentives prevent this from happening.

Misunderstanding #4: Researchers Are Objective

The ideal: Scientists are neutral truth-seekers who follow data wherever it leads.

The reality: Scientists are humans with:

Career incentives (publish or perish)
Confirmation bias (we see what we expect)
Motivated reasoning (we defend our theories)
Ego investment (our research defines our identity)

This doesn’t make scientists bad people. It makes them human.

But it means we need systems that counteract these biases:

Pre-registration (commit to analysis before seeing data)
Open data (others can check your work)
Replication (verify findings independently)
Adversarial collaboration (work with critics)

We haven’t had these systems. That’s why we’re in crisis.

What’s Being Done? The Reform Movement

The good news: the scientific community is taking the replication crisis seriously and implementing reforms.

Reform #1: Pre-Registration

Researchers now register their hypotheses, methods, and analysis plans before collecting data.

What this prevents:

P-hacking
HARKing
Selective reporting

Platforms:

Open Science Framework (OSF)
AsPredicted
ClinicalTrials.gov (for clinical research)

This is becoming standard practice in top psychology journals.

Reform #2: Open Data and Open Materials

Researchers share:

Raw data
Analysis code
Experimental materials
Full methods

What this enables:

Others can verify analyses
Researchers can re-analyze with different methods
Meta-analyses can use original data
Transparency reveals errors

Requirements: Many journals now require or strongly encourage open data.

Reform #3: Registered Reports

A new publication format:

Stage 1:

Submit hypothesis and methods before data collection
Journal reviews and approves
Acceptance is conditional (if you follow the plan, they’ll publish regardless of results)

Stage 2:

Conduct research
Submit results
Journal publishes (even if results are null)

What this solves:

Publication bias (negative results get published)
P-hacking (analysis is locked in advance)
HARKing (hypothesis is on record)

Adoption: 300+ journals now offer registered reports.

Reform #4: Larger Sample Sizes

Researchers are recognizing that studies with 20 participants per condition are underpowered.

New standards:

Power analysis before data collection
Samples of hundreds, not dozens
Multi-lab collaborations for large-scale data

Example: The Many Labs project runs replications across dozens of labs worldwide with thousands of participants.

Reform #5: Replication Culture

Replications are finally being valued:

Journals dedicated to replication (e.g., Royal Society Open Science)
Replication studies can be published in top journals
Career credit for conducting replications
Understanding that replication is essential, not an attack

Reform #6: Meta-Science

Researchers are studying the research process itself:

How often do studies replicate?
What factors predict replication success?
How can we improve scientific practices?

This is science examining its own methods. It’s essential for progress.

Reform #7: Better Statistical Practices

Movement away from blind reliance on p < 0.05:

Report effect sizes and confidence intervals
Use Bayesian statistics (gives probability of hypotheses)
Consider practical significance, not just statistical significance
Avoid dichotomous thinking (significant vs. not significant)

What This Means For You: How to Read Psychology Research

You’re not a researcher, but you read about psychology studies in the news, self-help books, and articles (like this one).

How should you think about scientific claims?

Guideline #1: Be Skeptical of Single Studies

When you see: “New study shows that [surprising finding]!”

Think: “Interesting preliminary finding that needs replication.”

Don’t immediately change your behavior based on one study.

Guideline #2: Look for Replication and Meta-Analyses

Better evidence:

Multiple independent replications
Meta-analyses (statistical synthesis of many studies)
Pre-registered studies
Large samples

Example:

“Does meditation reduce stress?”

Single study with 30 participants: Weak evidence
Meta-analysis of 47 randomized controlled trials with 3,500 participants: Strong evidence

Strength of evidence scales with convergence across studies.

Guideline #3: Check the Sample Size

Red flags:

“Study of 20 college students shows…”
“Experiment with 15 participants finds…”

This doesn’t mean the finding is wrong, but it means:

Treat it as exploratory
Wait for replication
Don’t generalize broadly

Guideline #4: Beware Extraordinary Claims

As Carl Sagan said: “Extraordinary claims require extraordinary evidence.”

If a study claims:

“This simple trick changes your life!”
“Scientists discover the secret to [success/happiness/health]!”
“Revolutionary finding upends everything we thought!”

Demand:

Large samples
Multiple replications
Pre-registration
Plausible mechanism

Most likely:

Small sample got a lucky result
P-hacking produced false positive
Media exaggerated modest finding

Guideline #5: Understand the Difference Between “Significant” and “Large”

A study might report: “Mindfulness training significantly improved focus (p < 0.05)”

Questions to ask:

How much did it improve? (effect size)
Is that improvement meaningful in real life?
How long does it last?

Statistical significance doesn’t equal practical importance.

Guideline #6: Consider the Source

Higher credibility:

Pre-registered studies
Registered reports
Studies with open data
Multi-lab collaborations
Recent studies (post-crisis reforms)

Lower credibility:

Studies from pre-2015 without replication
Studies that refuse to share data
Studies with conflicts of interest
Studies contradicting robust meta-analyses

Guideline #7: Don’t Dismiss All Psychology Research

The replication crisis doesn’t mean: “All psychology is fake!”

It means: “Some published findings are false or exaggerated. We need better methods to distinguish real from false.”

Much psychology research is solid:

Well-replicated cognitive phenomena
Robust clinical interventions (CBT, exposure therapy)
Psychometrics (Big Five personality, IQ)
Developmental psychology basics
Social psychology with large effects

The crisis is about improving standards, not abandoning the field.

My Personal Takeaway: Embracing Uncertainty

The replication crisis changed how I think about knowledge.

Before: “Science says X. I believe X. I’ll live accordingly.”

After: “Science suggests X with [low/medium/high] confidence based on [single study/multiple studies/meta-analyses]. I’ll update my beliefs proportionally and remain open to new evidence.”

This is uncomfortable.

We want certainty. We want clear answers. We want experts to tell us what’s true.

But science doesn’t provide certainty. It provides:

Degrees of confidence
Provisional conclusions
Best current evidence
Probabilistic claims

And that’s okay.

Living with uncertainty is more honest than false confidence.

What I Still Trust

High confidence:

Core psychological principles with decades of replication (e.g., classical conditioning, cognitive biases that replicate)
Interventions with large effect sizes in randomized controlled trials (e.g., CBT for anxiety)
Meta-analyses of robust effects

Medium confidence:

Recent findings with pre-registration and decent sample sizes
Effects that replicate across multiple labs
Findings consistent with related evidence

Low confidence:

Single studies, especially with small samples
Surprising findings without replication
Studies from before the reform era

I update these as new evidence emerges.

What I’ve Changed

I no longer:

Cite single studies as proof
Build entire systems around preliminary findings
Trust flashy claims without checking replication
Assume published = true

I now:

Look for meta-analyses and replications
Check sample sizes
Prefer pre-registered studies
Hold claims loosely until confirmed
Update beliefs with new evidence

This makes me a slower adopter of new ideas. But it makes me right more often.

The Bigger Picture: What Is Science For?

The replication crisis is humbling and important.

It teaches us:

1. Science is a process, not a collection of facts

Science doesn’t “prove” things. It gathers evidence, tests hypotheses, updates models, and gradually converges on truth (hopefully).

2. Science is done by humans with human flaws

Researchers aren’t objective machines. They have biases, incentives, and limitations. Systems must account for this.

3. Science requires skepticism-even of itself

The replication crisis was discovered by scientists questioning their own field. This is science working as it should (eventually).

4. Uncertainty is honest

Admitting “we don’t know yet” or “the evidence is mixed” is more scientific than false confidence.

5. Reforms work, but require collective action

Pre-registration, open data, and replication culture are fixing the crisis. But they require journals, funders, and institutions to change incentives.

Final Thoughts: Trust Science, But Understand Science

Should you trust psychology research?

Yes, but wisely.

Trust the scientific method (hypothesis → test → replicate → revise)
Trust the community’s ability to self-correct (eventually)
Trust strong, replicated, converging evidence

Don’t trust:

Single flashy studies
Preliminary findings reported as facts
Research done under perverse incentives without safeguards

The replication crisis isn’t a reason to reject science. It’s a reason to demand better science.

And as consumers of research-whether you’re reading self-help books, making health decisions, or just trying to understand yourself-you can be part of the solution by:

Asking critical questions
Demanding evidence quality
Accepting uncertainty
Updating beliefs with new evidence

Science is messy, slow, and uncertain.

But it’s still the best method we have for understanding reality.

How has the replication crisis changed how you think about research? What studies have you believed that turned out to be questionable? I’d love to hear your thoughts.

What Is the Replication Crisis?#

The Moment It Became Undeniable#

Not Just Psychology#

How Did This Happen? The Perverse Incentives of Academic Publishing#

Problem #1: Publish or Perish#

Problem #2: Publication Bias (The File Drawer Problem)#

Problem #3: P-Hacking (Data Torture)#

Problem #4: HARKing (Hypothesizing After Results Known)#

Problem #5: Small Sample Sizes#

Problem #6: Lack of Pre-Registration#

Famous Examples: Studies That Didn’t Replicate#

Example 1: Ego Depletion (The Willpower Battery)#

Example 2: Power Posing#

Example 3: Social Priming#

Example 4: Facial Feedback Hypothesis (Pen-in-Teeth)#

Example 5: Growth Mindset#

The Deeper Problem: Understanding What Science Actually Is#

Misunderstanding #1: Published = True#

Misunderstanding #2: Statistically Significant = Important#

Misunderstanding #3: A Single Study Proves Anything#

Misunderstanding #4: Researchers Are Objective#

What’s Being Done? The Reform Movement#

Reform #1: Pre-Registration#

Reform #2: Open Data and Open Materials#

Reform #3: Registered Reports#

Reform #4: Larger Sample Sizes#

Reform #5: Replication Culture#

Reform #6: Meta-Science#

Reform #7: Better Statistical Practices#

What This Means For You: How to Read Psychology Research#

Guideline #1: Be Skeptical of Single Studies#

Guideline #2: Look for Replication and Meta-Analyses#

Guideline #3: Check the Sample Size#

Guideline #4: Beware Extraordinary Claims#

Guideline #5: Understand the Difference Between “Significant” and “Large”#

Guideline #6: Consider the Source#

Guideline #7: Don’t Dismiss All Psychology Research#

My Personal Takeaway: Embracing Uncertainty#

What I Still Trust#

What I’ve Changed#

The Bigger Picture: What Is Science For?#

Final Thoughts: Trust Science, But Understand Science#

What Is the Replication Crisis?

The Moment It Became Undeniable

Not Just Psychology

How Did This Happen? The Perverse Incentives of Academic Publishing

Problem #1: Publish or Perish

Problem #2: Publication Bias (The File Drawer Problem)

Problem #3: P-Hacking (Data Torture)

Problem #4: HARKing (Hypothesizing After Results Known)

Problem #5: Small Sample Sizes

Problem #6: Lack of Pre-Registration

Famous Examples: Studies That Didn’t Replicate

Example 1: Ego Depletion (The Willpower Battery)

Example 2: Power Posing

Example 3: Social Priming

Example 4: Facial Feedback Hypothesis (Pen-in-Teeth)

Example 5: Growth Mindset

The Deeper Problem: Understanding What Science Actually Is

Misunderstanding #1: Published = True

Misunderstanding #2: Statistically Significant = Important

Misunderstanding #3: A Single Study Proves Anything

Misunderstanding #4: Researchers Are Objective

What’s Being Done? The Reform Movement

Reform #1: Pre-Registration

Reform #2: Open Data and Open Materials

Reform #3: Registered Reports

Reform #4: Larger Sample Sizes

Reform #5: Replication Culture

Reform #6: Meta-Science

Reform #7: Better Statistical Practices

What This Means For You: How to Read Psychology Research

Guideline #1: Be Skeptical of Single Studies

Guideline #2: Look for Replication and Meta-Analyses

Guideline #3: Check the Sample Size

Guideline #4: Beware Extraordinary Claims

Guideline #5: Understand the Difference Between “Significant” and “Large”

Guideline #6: Consider the Source

Guideline #7: Don’t Dismiss All Psychology Research

My Personal Takeaway: Embracing Uncertainty

What I Still Trust

What I’ve Changed

The Bigger Picture: What Is Science For?

Final Thoughts: Trust Science, But Understand Science