Public Data and Public Health - Using Open Data to Promote Health and Safety

Every two years, George Mason University hosts the D.C. Health Communication Conference (DCHC), which brings together people across the academic, government, and private sectors to discuss issues and opportunities related to health communication. On April 16-18, researchers and practitioners converged on Fairfax, VA, for this year's DCHC. George Mason University DC Health Communication

The panel and poster presentations covered a wide array of topics, including intercultural communication competence, health advocacy, computer-mediated communication and health, and the application of big data to health communication research. Fors Marsh Group was among the presenters at DCHC this year, and we discussed and demonstrated how publicly available data from national surveys can be used to identify tobacco prevention messaging strategies—a topic which fits more generally with the use of open data in addressing public health issues.

The Utility of Publicly Available Data

Large-scale datasets that are available to the public can be found all over the Internet (e.g.,, These data cover a wide range of topics: International trade, air quality, smoking behavior, weather, motor vehicle registrations, crime victimization—if you can think of it, there's a good chance you can find it. When used effectively, these datasets can go a long way in diagnosing and addressing public health and safety concerns. You can check out some cool examples of how cities are using open data by clicking here.

Let's consider arrest data from the Bureau of Justice Statistics. As we can see in the graph, between 2008 and 2012, arrest rates for driving under the influence offenses (DUIs) appear to be highest among people between the ages of 21 and 25. This type of information suggests that DUI prevention messages may be best suited for people in their early 20s (perhaps on college campuses).

Multiple public-use datasets can also work together. After deciding on how best to combine the data, two or more datasets can be used to answer complex questions. For instance, let's say I was curious about whether DUI arrests in my hometown are often the result of police calls for service, meaning someone phoned the police about a potential crime. I could go to the city's website that houses open datasets ( —many cities have such websites—and use the datasets on police calls for service and police cases to map out where calls for service and arrests occurred over the course of a year. If I found that very few DUI arrests were linked to police calls in recent years, that could point to a messaging opportunity for targeting bystanders and recruiting them to serve as sources of enforcement.

Open Data and Tobacco Youth Prevention

At this year's DCHC, we gave a presentation that demonstrated how open data can be used to identify tobacco prevention messaging strategies aimed at youth aged 12 to 17. Drawing on the 2013 National Survey on Drug use and Health (NSDUH) public dataset, we conducted an analysis that helped to detect which beliefs are related to smoking intentions.

There are some simple steps that we took along the way to conducting our analysis. First, we reviewed publicly available datasets to assess which one was most relevant to our research questions. Second, we identified suitable proxy items for the behavioral intention of interest ("During the next 12 months, do you think you will smoke a cigarette?") and relevant beliefs ("How many students in your grade would you say smoke cigarettes?"). Third, we isolated certain subgroups that we hypothesized might display different results (for instance, self-identified risk-takers). Finally, we conducted the analyses. Ultimately, we found that the most promising beliefs to target among youth involved close friends and personal attitudes. For instance, close friends' disapproval of smoking cigarettes stuck out as one of the more promising beliefs around which to structure a messaging strategy.

Looking to the Future

The examples that we presented at DCHC and have discussed in this post are just a few of the many potential avenues for leveraging open data to promote public health and safety. This type of data is especially useful for researchers and practitioners who are unable to collect their own data or who are looking to address more complex problems that require combining more than one large-scale dataset. As more and more types of data become available to the public, the application of open data to local, national, and worldwide problems will become increasingly possible, too.

