Tylenol in Pregnancy and ADHD

Several weeks ago, a new paper was released in the European Journal of Epidemiology about Tylenol in pregnancy and ADHD in children. It was greeted with the usual breathless terror in the media and, for me, with the usual swath of emails about what, if anything, it meant.

I’ll get into the details below, but the top-line claim in the paper is that children of mothers who take acetaminophen (the active ingredient in Tylenol) during pregnancy are more likely to show symptoms of, or be diagnosed with, autism or ADHD.

This brought back some memories for me. In 2014 I wrote for 538 about this precise question. The article came out very shortly after I was turned down for tenure at my last job, and a day or two later a senior colleague came in to give me some career advice. He asked me: “Do you really want to be a person who writes about things like whether pregnant women can take Tylenol?”

I cannot remember what I said at the time, but I guess subsequent events suggest the answer was “Yes”.

Anyway: here we are again. I’ll do a short overview of the paper, and then get into some analysis.

Short overview of a meta-analysis on Tylenol

This paper is a “meta-analysis”, meaning the authors combine results from a number of different studies. In their case, it’s six European studies in which mothers were asked about behavior during pregnancy (including whether they took acetaminophen) and their children were evaluated for Autism Spectrum Conditions (ASC) or Attention-Deficit/Hyperactivity Disorder (ADHD).

The authors aggregate the datasets and argue that exposure to acetaminophen during pregnancy is associated with an increased risk of both disorders. The increase is around 20% in both cases. Note this is 20 percent and not 20 percentage points. If the baseline risk of ADHD is 4%, then these results indicate taking acetaminophen during pregnancy would increase the risk from 4% to 5%.

The authors argue the results are more convincing for ADHD than for ASC, given the robustness checks they run, and are similar across boys and girls, although are slightly stronger for boys.

The paper concludes by suggesting additional caution is warranted by pregnant people considering taking Tylenol. The question is: how much do we believe these results?

My analysis

When I turn to analyze a paper like this, I tend to focus on two things. The first is how convincing the regression analysis is. The second — slightly more nebulous — is whether the aggregate picture makes sense.

Let’s start here with the first.

This paper is well done. It’s very clearly explained and the authors are transparent about what they are doing. There is a complementary analysis of maternal exposure to acetaminophen after pregnancy which doesn’t suggest a link with ADHD or ASC; this is a nice form of what we’d call a “placebo test”.

Having said this, the paper also has all the problems I spend my life complaining about. This isn’t a randomized trial and the characteristics of the mothers who take acetaminophen are very different from those who do not. The controls are imperfect. For example: they measure education as “Low”, “Medium” or “High”. But, of course, there is more variation than that in education levels and it seems very possible that this residual variation is driving some of the effect.

And some of the tests which are broadly convincing are less so when we narrow in. For example, in the largest dataset they use — the Danish National Birth Cohort (DNBC) — almost 60% of mothers take acetaminophen during pregnancy, but only about 10% after. This means the selection of acetaminophen-takers before and after is very different, making their tests harder to interpret.

Finally, there is the obvious issue of whether we can separate the impact of Tylenol from the impact of whatever people took the Tylenol for (for example, fever). That concern is basically impossible to address.

That’s the analysis of the regressions. Turning to the bigger picture…. The argument in the paper is that exposure to acetaminophen during pregnancy increases ADHD risk. If this is true, it follows that groups with overall more exposure should have higher ADHD rates, at least through this channel.

The paper makes use of six groups of mothers with varying rates of acetaminophen use. In the RHEA cohort, for example, only 14% of mothers are exposed during pregnancy, versus 56% in the DNBC. This variation allows us to ask, basically, how do the aggregate data line up with what we would predict based on the estimates in the paper.

Think about it this way: on average across groups (unweighted), 37% of mothers took acetaminophen and 7.6% of children were diagnosed with ADHD. The coefficients suggest that exposure increases risk by 20%. If we combine these facts we can predict the expected ADHD diagnosis rate in each cohort based on their acetaminophen exposure. And then we can compare this to the actual diagnosis rate.

These predicted-versus-actual numbers are shown in the graph below. They do not line up very well. For example: in the DNBC, this predicts 7.9% of children will have ADHD diagnoses, but the actual share in the data is 2.1%. On the other side, in the RHEA data we predict 7.3% but the actual share is 12.2%.

Another way to make this point, perhaps more simply, is just to note that the ADHD rates do not seem to closely relate to exposure rates. You can see this in the table below. The groups with the highest exposure actually have lower ADHD rates.

What to make of this? Does this mean that the regression results are wrong? No. But what it means is that if they are right, there must be significant offsetting factors. That is: these data predict that the DNBC cohort would have an ADHD rate of about 8%, when the actual rate is 2.1%. To reconcile this, we must think there is some other large factor (maybe multiple factors) which lower the ADHD rate in the DNBC, raise it in the RHEA, and so on.

Any model of the world which takes this 20% increase as correct must also include a story for why the aggregate facts do not align. If we find the size of the necessary other factors implausible, this is a reason to question the results.

(Side note: this idea relates closely to work by my husband and coauthors, in a paper entitled “Bounds on a Slope from Size Restrictions on Economic Shocks.” (Jesse also came up with the title “Cribsheet” so he is a title writer with very wide range). Effectively, I’m suggesting a much, much less math-y version of this argument from the paper: “Large fluctuations in ε_t may be plausible if the good in question is a particular brand of scarf, preferences for which may change radically from year to year due to advertising campaigns, changes in fashion, etc. Large fluctuations in ε_t may be less plausible if the good in question is a standard agricultural commodity, preferences for which are likely more stable.”)

Where does this leave us? When I write about questions like this, there is always the temptation to either be totally dismissive or to be convinced. This question falls, for me, in a grey area. I’m generally very skeptical of observational data, and I find the disconnect with the aggregate facts problematic. But the paper is not riddled with errors like the caffeine and pregnancy study I dismissed a few months ago. This area has enough attention and work that I hope — perhaps this is naïve — that at some point someone will bring a better research design to it. Until then, we are limited.

Toward the end of my second pregnancy I got a serious hamstring injury and couldn’t really walk. I took Tylenol, and I would do so again, trading what I see as a small possibility of a small risk against the benefits at the time. But this is a space — an uncomfortable one in some ways — in which reasonable people will make different choices.