TCORS: Media Data Acquisition and Content Analysis Core


Public and interpersonal communications have been transformed by the rapid diffusion of social media. In 2011, 78% of all American adults and 95% of young adults (18-29) reported using the Internet, and in 2012, 82% of adult Internet users reported using the Internet daily, according to Pew Internet. Use of social media is burgeoning as well; in August 2011, 65% of online adults reported using social networking sites (SNS) like Facebook, MySpace, or LinkedIn — more than double the percentage reporting SNS use in 2008. In May of 2012, 15% of all adults, and 26% of young adults (18-29) used Twitter, the microblogging platform; interestingly, although Twitter diffusion is lower than other social media, its use by low-income and non-white populations is significantly greater. The rapid transformation of the media landscape has important implications for public health in general, and for tobacco control and regulation in particular: the Internet forms a major source of health information; as of September 2010 80% of all Internet users, or 59% of American adults, said they searched online for health-related topics. This statistic is likely to be comparable for people seeking information about tobacco-related topics, and certainly even higher among young adults.

Beyond the challenge of measuring and understanding the messages individuals encounter across these media, the diffusion of digital and social media has important implications for research. Each facet of this new communications ecosystem generates a digital footprint. Not only is it possible to observe and measure the massive amount and content of tobacco-related information and mis-information; the metadata associated with each message also provide valuable information about patterns of diffusion and the social impact of such messages.

The Health Media Collaboratory at IHRP serves as the Media Data Acquisition and Content Analysis Core of the Tobacco Center of Regulatory Science based at the University of Pennsylvania. The Collaboratory will leverage existing resources to develop and enhance tools to harness, manage, reduce, and understand these “big data,” along with traditional media data, in the context of the TCORS mission. This core serves as a fully integrated resource for each of the Penn TCORS projects, its Developmental Pilots Core, its Training Core, and its Tobacco FactCheck Core. The core aims to do the following:

  1. Acquire and archive traditional and social media data from public and syndicated sources.
  2. Develop tools and procedures to reduce, clean and manage these data.
  3. Develop and validate measures of precision, recall, reach and impact for each media platform, and research question/application across the TCORS.

Research Partner(s)

Principal investigator
Funding Agency

National Institute on Drug Abuse (to University of Pennsylvania, Grant No. P50CA179546), part of the Tobacco Centers of Regulatory Science (TCORS), funded by the Food and Drug Administration and the National Institutes of Health

Start date
End date
Total award
About this grant

This funding is subcontracted from the University of Pennsylvania to the University of Illinois at Chicago.

Parent Study
Tobacco Product Messaging in a Complex Communication Environment
PI of Parent Study
Robert C. Hornik and Caryn Lerman
University of Pennsylvania, Philadelphia