evidence-based social science
|

What “Evidence-Based” Actually Means: A Hard Look at the Research Influencing Youth Programs

We have long known about the replication crisis in social science research — the problem that few findings hold up in later studies.  In April 2026, the largest replication project ever attempted  found that only about half of the existing research holds up under further testing. This finding has profound implications for youth-serving organizations that want to follow evidence-based practices.

The replication number in the most recent study, fifty-five percent, is consistent with a decade of similar projects. Only thirty-six percent of psychology studies replicated in the 2015 Open Science Collaboration project. Sixty-two percent replicated in a 2018 review of high-prestige social science papers.  Half replicated in the Many Labs 2 project covering classic findings.

Every YSO administrator who has signed a purchase order for a “research-backed” curriculum, training, or screening tool needs to take that number seriously. About half the time, the research backing those programs would not hold up to a careful retest. And the half that fails is not random. The studies most likely to be cited, marketed, and built into vendor materials are often the ones that fail to replicate, because surprising and dramatic findings get published more quickly and more frequently than steady, modest, replicable findings.

I’m not arguing for cynicism. The science of childhood adversity is real. The principles behind good youth-serving practice are real. What is at stake is the ability to evaluate specific claims about specific programs, especially when those claims arrive with confident scientific packaging and a price tag.

A study replicates when an independent team, using the same methods on a fresh sample, gets substantially the same result. When studies do not replicate, the original finding was probably exaggerated, found by chance, or built on analytical choices that bent the data toward a particular answer.

The causes are systemic. Researchers face career pressure to produce novel, statistically significant findings. Journals reward the same. Out of dozens of small analytical choices in any study, the ones that produce significant results are the ones that get reported. None of this requires fraud. It only requires the ordinary human tendency to keep going when results look good and stop when they do not.

A 2023 synthesis from a large international meta-research consortium describes both the problem and the reform movement responding to it. Preregistration, registered reports, open data, and large-team collaborations are improving the credibility of new research. But these reforms are still being adopted unevenly, and they do not retroactively fix the older studies. A 2023 paper that itself claimed an 86 percent replication rate when researchers used reform practices was retracted in 2024 over preregistration misrepresentations. Even reform-minded research is not exempt from scrutiny.

For a YSO administrator, the practical consequence is this. When a vendor cites studies from the 1990s, 2000s, or early 2010s as the foundation of their program, you are looking at exactly the era and the methods most affected by the replication problem. Treat those studies as the beginning of an evaluation, not the end.

The Adverse Childhood Experiences framework is the most influential research-backed concept in modern child welfare practice. It is also the clearest example of what happens when a sound population-level finding gets misapplied.

The original 1998 ACE study and a 2017 meta-analysis  covering 37 studies establish a real, dose-responsive association between childhood adversity and adult disease, mental health problems, and risk behavior. That association replicates across countries and populations. It is not in dispute, and it should inform how every YSO thinks about the children it serves.

What is in dispute, and what every YSO leader should understand, is whether the ACE score works as a screening or risk-stratification tool for individual children. The most rigorous individual-level test, published in 2021, used two long-running birth cohorts. The researchers found that ACE scores forecast group averages reasonably well and individual outcomes poorly.

The original ACE study’s senior author, Robert Anda, wrote a 2020 commentary explicitly warning against using the ACE score as a clinical screening or diagnostic tool. He noted that the score treats radically different experiences as equivalent, that it was never standardized as a measurement instrument, and that population-level risks should not be applied to individuals. A 2020 practitioner review reached the same conclusion through different reasoning. A 2024 critical appraisal in Pediatrics reaffirmed the same position for the pediatric profession.

There is a further methodological problem. A 2019 meta-analysis compared adults’ retrospective recall of childhood maltreatment with contemporaneous records and found agreement of only 0.19 on a scale where 1.0 would mean perfect agreement. The retrospective ACE checklist and the actual record of what happened to a child are largely measuring different things.

None of this means the ACE concept is wrong. It means the ACE score is the wrong instrument for the work most YSOs would use it for. If your organization has built intake protocols, programming decisions, or staff response frameworks around ACE scoring, that decision is worth a careful review.

The same period that produced the ACE critique also produced a complementary line of research that is more directly useful for YSOs. The Positive Childhood Experiences framework, sometimes called Benevolent Childhood Experiences, measures specific protective factors: feeling that family supported you, feeling a sense of belonging at school, and having two non-parent adults who genuinely cared about you, among others.

A 2019 study found that positive childhood experiences predicted adult mental and relational health in a dose-responsive way, and the effects held even when the same individuals reported significant adversity. The adverse experiences did not erase the adverse ones. They operate independently.

This finding maps directly onto what YSOs actually do. Camps, schools, churches, sports leagues, mentoring programs, and youth nonprofits exist in part to be the place where a child encounters supportive non-parent adults, structured belonging, and predictable care. The research evidence for that work is more robust than the screening tools many YSOs have been encouraged to adopt.

When you evaluate a specific category of intervention against the post-2020 evidence base, the quality of support varies widely from one category to the next.

Universal social-emotional learning has the strongest evidence of any intervention category in this space. A 2023 meta-analysis covering 424 studies and more than 575,000 students found consistent positive effects across multiple outcomes. The effects are modest in magnitude and depend heavily on implementation fidelity. They are real, but they are not transformational, and they require the program to actually be delivered as designed.

One-on-one youth mentoring has consistent but small effects. A 2019 meta-analysis  found an overall effect size of about 0.21, which is statistically significant but indicates relatively small absolute differences between mentored and non-mentored youth. Targeted, skills-focused mentoring outperforms general mentoring, and relationship quality matters more than program duration.

Trauma-informed care as a system-level framework has weaker evidence than its market presence suggests.  A 2024 systematic review commissioned by the federal Agency for Healthcare Research and Quality examined the evidence and concluded that it was insufficient to make any clear determinations about effectiveness on patient or client health outcomes. A 2019 systematic review of trauma-informed approaches in schools, conducted under the rigorous Campbell Collaboration methodology, found zero studies meeting basic methodological inclusion criteria. A 2021 systematic review reported that evidence on psychological outcomes was inconsistent.

This does not mean trauma-informed practice is useless. It means the framework has been rolled out faster than the evidence supporting it has accumulated, and that you should be cautious about vendor claims of dramatic outcomes from trauma-informed training.

When someone pitches you a program, training, or assessment tool with the words “research-backed,” your first job is to ask what the research actually shows. The right questions include:

What is the effect size, not just the statistical significance? Effect sizes around 0.20 are typical for youth interventions and indicate small absolute differences, not transformational change. A program that quotes p-values without effect sizes is hiding something.

Have the findings been replicated by labs unaffiliated with the program developer? Developer-led trials systematically produce larger effects than independent ones. If every cited study comes from the program’s own research team, the evidence is much weaker than it appears.

Is the instrument being used at the level it was designed for? An ACE score used as a population research tool is one thing. The same score used to identify which children in your camp need extra services is a category error.

What happens at follow-up? Many programs show short-term effects that fade within twelve months. If you only hear about immediate post-intervention results, ask why.

What is the comparison condition? “More effective than no treatment” is much weaker evidence than “more effective than a credible alternative.”

You do not need to become a research methodologist. You do need to recognize that the phrase “research-backed” carries less weight than it sounds like, and that asking these questions is part of due diligence.

About half of social-science research does not replicate. The evidence underlying many programs YSOs implement comes from exactly the era and methods most affected by that problem. The ACE framework, taken as a population-level finding, holds up. The ACE score, taken as an individual screening tool, does not. Trauma-informed care has captured significant attention without producing the rigorous evidence its popularity implies. Universal social-emotional learning and youth mentoring have real but modest effects that depend heavily on implementation quality.

Your job is not to throw out your programs. Your job is to know specifically what the research shows, how strong it is, and whether it has been replicated. The children you serve deserve programs built on what the evidence actually supports, not on what the marketing materials claim it does.

Take the time to look at the research base behind any program already in your organization. Ask for the citations. Read the abstracts. Look up whether the studies have been replicated. The answers will not always be reassuring, but they will be useful.

Similar Posts