Random Acts of Data

Today I spoke with Dr. Anupam Jena, known as Bapu, who hosts the podcast Freakonomics, MD, about his new book, Random Acts of Medicine, written jointly with Dr. Christopher Worsham. The book is out this week, and after you listen here, I highly recommend you pick it up. (And if you prefer to read rather than listen, the transcript of the interview is below.)

Emily: Bapu, thank you so much for joining me.

Bapu: Thank you for having me.

Emily: So, we’re here today to talk about your new book, Random Acts of Medicine. And I was thinking about how to introduce you. And the thing that occurred to me is that I think you are in some ways the fellow academic who I am most similar to in terms of how I think. We work in these similar spaces, between health and economics. And I think we both basically agree that in the end, you can learn everything from data, and that if you just look at it right and you find the right data, there’s some kind of secret answer there.

But there are some ways in which we’re different. And the main one is that you are an actual medical doctor, in addition to being trained as an economist. And my sense is that we are both pretty critical of the data approaches that are used in the medical literature but that you are critical from the inside. And I’m critical from the outside. And in a sense, I think the book is really a presentation of how data could be used better in those questions, rather than just being a criticism of how people use data.

So what I’m hoping we’ll do today is actually talk about some of the examples from the book, but then also sort of step back and talk a little bit about these issues of methods, and how we might improve the methods that are used to estimate causal relationships in medical data. So before we do that, can you just please introduce yourself?

Bapu: Well, first of all, I thought you were just going to introduce me as the Sanjay Gupta of health policy, but you gave me a longer introduction than that.

Emily: I don’t even think of you as a Sanjay Gupta. Is that how you introduce yourself?

Bapu: Well, no, it’s not. But I’m seeing myself in my video here talking to you. I’m like, wait, if I just got my eyebrows done a little bit.

Emily: Your eyebrows are more significant.

Bapu: Yeah, more significant. Well, my name is Bapu Jena. I’m an economist and a physician at Harvard Medical School, and I see patients at Massachusetts General Hospital.

Emily: So, this is a parenting podcast. Do you want to tell us, do you have children?

Bapu: I have two. I have a 5-year-old and an 8-year-old. And yeah, they’re a joy. And I know you talk to a lot of parents. The one thing I think differently about now than ever before is I remember when I was growing up, and we didn’t have cellphones back then, anytime I’d go to a friend’s house, my parents would make me call them from that friend’s house when I got there. And if I was seeing a potential girlfriend, I’d say, oh, I’ve got to make this important phone call, and just whisper to my parents. Now I know what they’re going through — if my daughter comes back five minutes later than she’s supposed to, I’m getting nervous. So it’s come full circle.

Emily: So now you’re going to make your kids call you. You’re going to track them on their phone, put a chip in their shoe. Alright, so let’s start with you giving a very broad overview of describing your work. When you look for problems to work on, what are you looking for? Because the book is really an overview of your research, I would say.

Bapu: That’s right, it’s an overview of my work. In recent years, I’ve had the pleasure of working with someone named Chris Worsham, who wrote the book with me. He’s a critical-care doctor at MGH. But the book is about how chance occurrences affect our health. And it could be anything, like living in a city that’s hosting a marathon and you not being able to get to the hospital in time because the roads are closed. It’s a totally random thing. It affects your health, and we show how it does, but it also teaches us something about how health care works. For my research in general, I use big data. I use the tools of economics. But I’m really interested in questions that are sort of like Freakonomics meets medicine.

Emily: So, I want to focus on one chapter of the book, which is the one about kids and flu, partly because we do parenting and partly because actually I think that was the one chapter of the book where I didn’t know about this research before. And the chapter starts, like many of the chapters, with effectively an observation from your own experience. And so when you say it’s random acts of medicine, sometimes it feels like it’s random acts of things that Bapu noticed and then wanted to understand. Can you talk about what is the observation that motivated that chapter?

Bapu: By the way, we pitched that title first — it didn’t work.

Emily: Yeah, no, it’s a little long.

Bapu: Yeah. So, the basic thing that happened to me is our 5-year-old has an August birthday. And so, like many young kids, we take him to the doctor around his birthday. So we go to his checkup, and I’m walking out of the office. And as we’re walking out, the nurse says to me, come back in a few weeks to get the flu vaccine for your son. And the first thing that occurred to me is, wow, had he been born just two weeks later or three weeks later, we would have gotten the flu shot in the office that day, and we didn’t.

So that’s the first part of the story. The second part of the story is, I’m a motivated parent. I wanted to get him the flu shot. I probably spent about three hours trying to get that thing done. The pediatrician’s office booked up pretty quickly, like within hours, I think, of when I tried to call them, or when they open up the slots. I was calling CVS, Walgreens. Finally I was able to get him into the pediatrician’s office. But it made me think that something as random as when he was born could have a dramatic impact on the likelihood of him getting a vaccine. And could that same observation generalize to all kids in this country?

Emily: And so here, the randomness is when your kid is born. And the question is, to what extent does that influence whether they get a flu vaccine and then presumably their ultimate health, the chance they have a serious flu?

Bapu: That’s right.

Emily: Okay, so you have this idea and you spend all this time, and now you’re annoyed because it’s taking you three hours to get your kid this vaccine. And so you decide you’re going to write a paper about it. So then, where do you go from there? There’s a bunch of ways you could approach this. What is the next step for you in terms of turning that observation into something that would be of interest to an academic audience?

Bapu: I think the next step for me, and probably the next step for people like us, is to think, well, what is the experiment? The experiment is that a child is born in August versus born in September. They’re otherwise pretty similar and we can see that they’re similar. But one group of children with August birthdays is perhaps less likely to get the flu shot.

And so what we did is we looked at very large data from insurance companies. Anytime you or your kid goes to the doctor and the doctor bills the insurance company for that visit, there’s a record of that. And that’s because they get paid for that visit. So we know, for example, when kids were getting flu shots, and the nice thing here — or maybe not the nice thing — is that these records, these vaccinations, they show up in the insurance data because a 3-year-old could not go to CVS or Walgreens and pay out of pocket. So there’s an insurance record of that. And then it was very simple: we just looked at kids’ birth months and saw that kids who are born in the summer are about 15 percentage points less likely to get a flu shot than kids who are born in, let’s say, September, October, November, December.

And we did a couple of other things just to show the mechanism. And one of the things that we showed is that kids get their flu shot in the pediatrician’s office often around the time of that annual visit. But the main finding was that if you have a September, October, November birthday, you’re much more likely to get a flu shot as a kid.

Emily: And then are there better outcomes for those kids?

Bapu: Yeah, so that’s the next question, is: if the flu shot works, then you would think that the kids would be less likely to get the flu. And that is true. We showed that they are less likely to get the flu. The next question may not be obvious to people: are the parents or the family members of those kids differentially affected? And the answer is yes. So if you have a child with a summer birthday, they’re less likely to get the flu shot. They’re more likely to get the flu. And then their family members are also more likely to get the flu.

Emily: So, what I think is important… I mean, the findings are important, but I think it’s worth stepping back to the method that you’re using here, because to answer this last question, many people would be interested in the question of: to what extent does a kid getting the flu shot affect their parents getting the flu? Or affect older relatives getting the flu? That’s a question of a lot of policy interest. We could ask it about COVID. We could ask it about the flu. A simple way to do that analysis is to compare the flu-getting, the flu status, of older relatives of kids who do and do not get the flu shot. Compare kids who have the flu shot and kids who don’t have the flu shot, and then look at their parents.

The problem with that is that on average, the kind of kids who get the flu shot, and the characteristics of kids whose parents get them the flu shot, are different than those who do not. That the choice to get your kid the flu shot is connected to your parents, your income, your resources, parental education, a lot of other things that also impact whether the parents are going to get the flu independent of their kids.

What is very clever about your analysis is that it avoids that problem by using these birthdays. You’re exploiting this randomness in which kids are born to, what we’d say technically to identify, to see the impacts of the flu shot on kids and then on their parents. And that element of randomness, that is the direction for causality. That is the thing that lets you say that this is a causal relationship and not just a correlation.

Bapu: That’s exactly right. And let me add two more things and then I’ll ask you a question. The first is that it’s a good way to show the causal effect of getting the flu shot. And then there’s a second finding, which is, if you’re interested in how health-care systems should be structured, you might be interested in knowing that something as arbitrary as your birth date has a pretty large effect on the likelihood that children get flu shots. Children then get influenza and their family members get influenza. So it speaks to how we might think about redesigning the process by which we vaccinate kids, not necessarily just tying it to the visit that they happen to be able to get to because it’s in the fall, versus not happening to be able to get to because it’s in the summer.

We’ve got some other work which is not in the book, it’s not yet published, but it uses the same approach to answer a different question. About a year and a half ago, I’m on a Zoom call with the rest of the faculty in my department at Harvard, and the chair of my department asked me what I’ve been up to, and I said, my arm’s a little bit sore. I just got back from CVS and got the vaccine. And she says to me — this is maybe the end of August — Bapu, why did you get the vaccine so early? Aren’t you worried about the immunity waning before the end of December, January, and February? I didn’t even think about that.

I thought to myself, suppose I wanted to figure out the optimal timing of the flu shot. Should I get it in September, October, November, December? To give my body the time to develop immunity against the influenza virus, because that takes time, but also to make sure that the immunity doesn’t go away because it’s been too long since when I was vaccinated and when I start getting exposed to flu.

And the question that you just posed a moment ago is really germane here. If you look at people who get vaccinated earlier in the season and look at people who get vaccinated later, that’s not random. So you can’t infer anything about the optimal timing of getting the flu shot. But what you can do is say, let’s look at this same group of kids we’ve been talking about but just focus in on the kids who have been vaccinated. And we can show that among those kids who had been vaccinated, if you have an August or September birthday, your vaccine tends to be given to you earlier than if you have like a November or December birthday. And in fact, the November, December kids, they have this sort of bimodal, or two-peak, distribution. There’s a bunch of them who choose to get it early, but there’s a bunch of them who just happen to get it because they showed up in their pediatrician office in November and December. And so what we find is that the birth month with the lowest rate of influenza is October. And so that’s kind of interesting. So my chair was actually right that I might have gotten it too early. The optimal timing might be November to get the flu shot if you have a kid.

Emily: That’s fascinating. The other thing that occurs to me you could do with this identification strategy is do more on the efficacy of the flu vaccine. So, there’s a fair amount of variation, efficacy across years, right? Sometimes the flu is 10%, sometimes it’s 30%, 40%. Usually it’s higher. You could use that, right? Linking shots to flu. You could back out efficacy based on the effect sizes over time, yes?

Bapu: Absolutely. One way to do that would be, every flu season you do a little randomized or a large randomized trial at the very beginning of flu season to see if the thing works or not. Or you could just use existing data. And in the third week of September, run a quick analysis to look at whether or not August-born kids versus September-born kids have differential rates of the flu shot, which we think we would see. And importantly, whether or not they have differential rates corresponding to influenza. If they don’t, that means that the flu shot is not doing much that year.

Emily: This chapter is pretty emblematic of the book. It’s something you observed in the world, in your own life, and then you’re thinking carefully about how to study it. I actually want to pivot and talk a little bit about these issues of causality and why the medical literature is often not good at this.

For example, the idea of breastfeeding and links with IQ. Just to be clear, this is not something that’s in your book; it’s on my mind because there was a study published in the U.K. a couple of weeks before this conversation, and it got a lot of attention. And it was one of these many studies where they compare kids who are breastfed and kids who are not, and they look at their IQ scores. And there’s a very deep problem with this analysis, which is that the kids who are breastfed are totally different on a bunch of other dimensions than the kids who are not. Their parents are more educated, their moms have higher cognitive scores, and the authors aren’t really able to adjust for all of those. And so we would sort of say, this is correlation and not causation. There is a lot of work like that in the medical literature. You are pretty explicitly trying not to be that. Would you say that’s correct?

Bapu: Yeah, I’m the anti-that.

Emily: I don’t want to spend time dumping on this study, which was just the same as all the other studies like this, but it was so emblematic and I just want to understand what is your sense of, why? Why is there not more Bapu-style research?

Bapu: By the way, just to be clear, are you prepared to state that peanuts do not cause dementia? Just to be clear.

Emily: I’m prepared to state that chia seeds do not prevent it and peanuts don’t cause it.

Bapu: I mean, it’s a great question. And probably I joke about peanuts and dementia, but the area we see this problem probably the most is in nutritional epidemiology. And I think it’s probably two things. One is that there may be a generalized lack of awareness of how serious these problems are and how people actually look at these sorts of evidence and try to make decisions on that. And it’s not just a matter of publishing research. People actually listen to this stuff. So you’ve got to be cautious.

The second thing is that there’s an incentive problem. What do I mean by that? Well, think about where this stuff is coming from. This is research, right? These are researchers who are publishing research that is being published in research journals. And my view is that if the journals celebrate this type of work, if they publish relationships between exercise and mortality, or coffee consumption and cancer, whatever it may be, if they reward authors by publishing it, why wouldn’t we expect to see this? I think that researchers are probably responding to the incentives that they have available to them. So, take the converse in economics. In economics, do we not see this sort of poorly-thought-out research because economists are better trained? Yeah, they probably are better trained, but it’s also the fact that you will never be able to get these kinds of studies into economics journals. It’s just not going to fly. So there’s no incentive to try to do that kind of work.

Emily: I mean, we have our own problems. But it’s interesting, because I have the same instinct. I posed this question to my class once, and we basically got there and then kept going down the rabbit hole. What’s the incentive for the journal? Why is the journal publishing this? Because it’s going to be covered in the media. The journal’s incentive is attention, right? And the media loves “peanuts cause dementia”; “peanuts cause dementia” is a great headline. People like to read that. And so that is the incentive for the media. Then that is the incentive for the journal. It’s actually not that interesting to publish a paper that says there’s no link between any known foods and dementia, which is more or less true. It’s not getting you on the BBC or in the New York Times.

Then my students were like, well, why are the newspapers publishing this? Ultimately it’s our fault for clicking. That was where we got to. The students were like, “Actually, I think it’s my fault because I read that article and I’m incentivizing them.”

Bapu: Yeah, but I would say it’s not our fault in the following sense: we click that because we want to know the answer to a question that matters to us, but we’re tricked because what we’re clicking, we don’t know doesn’t answer the question that matters to us. It just purports to do that. And I think that’s where the gatekeepers need to be involved. And to be clear, I don’t actually think that the New York Times is at fault here. I don’t expect a really well-trained journalist to be able to parse out these issues. This is something that the journals should be the gatekeeper for. But for whatever reason, they’re not.

Emily: I think that’s right. And I think there’s a sense in which maybe some of the problems are really in the training and in people’s understanding. And I said that I thought it was obvious that some of these relationships are just correlation and not causality. I do think there’s a perspective that you and I share that is perhaps not universally shared, that these problems associated with comparing two groups with some adjustments, I think we have a sense that it is more or less impossible to learn about the impacts of diet or many other things from that type of analysis, but not everyone would agree with that.

I’ve thought a lot about: how do we get at these questions of the impact of diet using some of the tools that we like, these natural experiments? And it’s hard. The limitation with the approach in all of the chapters, really, in the book is that the set of questions that you’re able to answer here are limited. And if you started with, you know, I want to understand the impact of X on Y or, you know, the impact of this kind of diet on outcomes, that’s a really hard question to get at with the kinds of experiments, the kinds of sort of naturally occurring experiments, that you have here. So in a lot of places you’re taking advantage of something where nature has randomized it for you. But there are many things that nature doesn’t randomize for us. And then there’s a temptation to say, well, it’s the only way we can learn about this is something that smacks more of correlation.

Bapu: Yeah, that’s right. I was thinking the other day, I almost packed a peanut butter sandwich for one of our kids going to school, and I realized I wasn’t allowed to do that, but it made me think, wow. There was a point in time in which that would have been okay, but there might be some classes where there is a child with a peanut allergy just by chance, right? And my child, who does not have a peanut allergy, would not have selected into his classmate’s class based on whether or not his classmate had a peanut allergy. It’d be totally random. So might you use that sort of variation to figure out what is the causal effect of differential amounts of peanut exposure? But the problem that doesn’t work is because that’s just one iota of a place where my son would be exposed to peanuts. So yeah, it’s random, but it’s not going to matter.

Emily: So, I thought you were going to say something different about peanut exposure, which for me is the strongest example of where we can use correlational data to motivate us. It used to be that people were told: don’t give your kids peanuts before they’re 2 because they will develop an allergy, like they’ll be more likely to become allergic. There’s a guy named Gideon Lack in the U.K., and he did this very classic correlational experiment comparing kids in the U.K. and kids in Israel and showing kids in Israel are less likely to have peanut allergies. And he then concluded that that was because they were eating this peanut snack when they were like four months old. And our response was, are you kidding me? You compared kids in the U.K. and Israel and then you decided it was because of this peanut snack? It’s like, it just seemed like a ridiculous method. But then he did a randomized trial, and it turns out that exposing your kids to peanuts in their first four to six months actually lowers the risk of developing an allergy by 70%. So it turns out that he was totally right. But it’s an example, if you say like, what’s the observational data good for? It’s good for hypothesis generation.

And you see that, but it’s not enough. You wouldn’t want to change the way you were doing, you know, peanut exposure for kids based on his observation about the Israeli kids. But maybe you do want to change how you do your research. And once, of course, you have a randomized trial like that, then there’s a lot of reason to do it. So anyway, that’s where I thought you were going with peanuts.

Bapu: So, another example. I like to play soccer during lunchtime. That’s one of the benefits of this academic lifestyle. And there’s a phrase that always gets repeated to me anytime I happen to score or do something right. There’s a guy who just says, Bapu, a broken clock is right two times a day. And I was like, okay, great, thanks, dude. I appreciate that. You know, you can stumble upon it by accident.

Emily: How often would you say you score? Are you a good soccer player? No?

Bapu: No, I’m not very good. No, but I try hard. That’s all that matters.

Emily: We’re about the same age, maybe? And I feel like over time, if you can just keep playing, like eventually other people’s knees are going to go. So pretty much if you keep going, eventually you’ll be good. That’s how I’m working with running. I’m going to keep going. And people are just aging, I just got to keep going.

Bapu: But yeah, this is like a great question. Like, for example, if you want to think about the effects of exercise on health, what type of exercise is optimal? You can do a randomized trial where you encourage people to run long distances or you could do sort of high-intensity workouts, but are there examples of natural experiments that get people to do certain types of exercises versus others in a way that could tell us something about the effects of that type of exercise on health? And it’s really hard to find those sorts of examples.

Emily: I agree. I mean, exercise is an interesting one, where we do have some randomized trials but it’s mostly on things like: Do you exercise at all? Do you walk after lunch? Do you do any kind of aerobic exercise? But then when you get into these more nuanced things like, is it better to play soccer or is it better to run, is it better to swim, should you do CrossFit? Then it’s harder to imagine the experiment.

Bapu: Yeah. The other thing that I think we don’t really appreciate much, this is sort of a methodological point, but I think it’s important for trying to understand what causes what. You know, imagine that you randomized people to exercise versus not. And so you’d have that randomization. That would be great. But what happens if the people who exercise start doing other things because they are exercising that counteract the benefits of exercise? So, for example, when I exercise a lot, I will often go and eat out for lunch because I feel like I earned it. And if you see that sort of phenomenon happening even after randomization, you don’t actually know what the causal effect of the exercise is, because the behaviors are modified as a result of that treatment. So I think that’s a thorny problem to solve.

Emily: That comes up even when we look at something like statins, right? These cholesterol-lowering medications. If you’re taking the cholesterol-lowering medication, also as a result, thinking, well, I can just eat a bunch more cholesterol because I have this backstop, it changes the outcome. That’s an interesting methodological thing, because when the experiments, in the randomized experiments where they evaluate the effects of statins, you don’t know whether you’re taking it. So you don’t see as much of that. Then when it enters the real world, you may expect the effects to be smaller because people are compensating and because now they know that they’re taking the drug.

Bapu: Exactly.

Emily: All right, what is your favorite chapter in the book?

Bapu: My favorite chapter. These are all my babies — I love them all. I particularly like the last chapter, which deals with issues around COVID-19. I’ll pinpoint this particular chapter because it does relate to something else in our conversation, which is when COVID-19 started, there was this huge influx, as you know, of research, some high-quality, much of it very low-quality. And I made an intentional decision early on not to work on any topics related to COVID-19 unless I thought I could do something different. And so as a result, I think I maybe had one COVID-19 paper, which is in the book, but it speaks to the incentives. I mean, it’s true that probably people were interested in solving problems and expanding knowledge, but at the same time, there’s so many bad papers out there by people who are smart and presumably have some methodological training. And you wonder why they even thought to do that in the way they did. But the one paper I had with Chris Whaley and a couple other people is we were interested in this question of… Well, actually, let me tell you what the paper was, and then I backtracked what the motivation was after I had the idea.

So basically I was like, all right, I’m looking at this insurance data one day and I’m, you know, recognizing that we have information on when people have birthdays. And it was around the time of our daughter’s birthday and we had to make a decision whether or not to host it via Zoom or in person. And we made a decision to do it via Zoom with this magician who was quite phenomenal. And so I was looking at the data one day and I was like, oh, wait, this actually relates to this decision that my wife and I had to make. I bet that people are more likely to get COVID-19 after their birthday, because they might celebrate with other people and be exposed to a larger group than they otherwise would be, even if they are being cautious. And might this be an interesting question to study? At that point, I didn’t sort of have a motivation on how to sort of sell or motivate the paper. So we looked at the data and you see very, very clearly that if you look within the same city, within the same, let’s say, week of the year, households in which someone has a birthday, those households are nearly identical to households in which no one has a birthday that week in that same city. But the first set of households are about 20% more likely to get a new COVID-19 diagnosis in the next two to three weeks. And that to me was like, oh, that’s an interesting finding. And then I was tackling the problem with my colleagues, well, how do we “sell it”? How do we motivate what the question is? And then the motivation we settled on was, well, look, early in the pandemic, there’s all this discussion about superspreader events being a common source for spreading the virus. And we didn’t really have an appreciation of whether or not small gatherings with people that you know and trust, versus going to a random place like a bar, whether those sorts of gatherings could have an impact.

And the first thing we found is, yes, they do have an impact. And then the second thing that was interesting to me, and this is maybe a broader reflection of some of the issues that you’ve been thinking about and writing about a lot in the pandemic, was we found that this birthday effect was basically the same in really Republican areas and really Democratic areas. And you might think that if Democrats were much more conservative about social distancing, that kind of stuff, that there would be no “birthday effect” there. But it was the exact same. And it made me reflect on this thing that we always think about in economics, is that you’ve got to look at people’s behaviors. You can’t look at what they say they do. You’ve got to look at what they do. And here’s what they did —they get together. And by the way, the effect is larger when it’s a kid’s birthday than an adult birthday, for obvious reasons.

Emily: Because adults don’t have as good birthday parties. That’s the message.

Bapu: Exactly.

Emily: Bapu, thank you so much. It was a treat to talk. The book is fantastic, and everyone should pick it up and read it.

Bapu: I appreciate it. Thank you, it’s been fun.