Unlocking the Potential of Secondary Data Analysis from Existing Databases: A Relevant Approach for Health Psychology Research


Jessica Emick, PhD
Pediatric Psychologist and Clinical Psychology Faculty
School of Psychology
Fielding Graduate University

Madeline Foster, MSPH
Clinical Psychology Ph.D. Student
School of Psychology
Fielding Graduate University

Christine Crowell, MA, NCC, LPC
Clinical Psychology Ph.D. Student
School of Psychology
Fielding Graduate University

Sonia Agarwal, M.Sc.
Clinical Psychology Ph.D. Student
School of Psychology
Fielding Graduate University

Nathan M. Griffith, Ph.D.
Clinical Psychology Faculty
School of Psychology
Fielding Graduate University

What if it was possible to have all the data collected for your next research project having spent minimal time and money? On top of that, the data you have is representative of diverse populations, generalizable, and provides access to health conditions with low prevalence that health psychologists often find collecting data to be challenging (Langkamp et al., 2021). Well, this is the reality of working with secondary data from existing databases. In our pediatric health psychology lab at Fielding Graduate University, we use archival data from the National Survey of Children’s Health which provides us with a large, nationally representative sample that we use to answer pediatric health research questions.

We have experienced the numerous benefits and unique challenges of using secondary data from an existing database and encourage other health psychologists to consider using this approach. It can be difficult to know where to start when working with archival databases and many health psychologists do not have much research experience with existing databases (Wiley et al., 2013). In this article, we will share some of the benefits we have experienced, as well as some personal reflections from both faculty and graduate students working on secondary data analysis projects.

Benefits of Secondary Data Analysis

  • Decreased time and cost: Collecting primary data can be a resource-intensive task. With vast amounts of data already available, researchers can take advantage of a less costly data collection process.
  • Large sample sizes: Large existing databases provide researchers with sample sizes much larger than they would be able to generate via primary data collection which can increase statistical power and help identify smaller effects. Specifically for health psychologists, databases can also make it possible to have sufficient numbers of participants with less common health conditions.
  • Better representation for diverse participants and greater ecological validity: Since databases are commonly based on real-world populations, they are often reflective of the true diversity within a setting. This can increase the generalizability of findings, help us understand the diverse health needs of all populations to address health disparities, and helps uphold ethical principles.
  • Access to longitudinal data: Some databases have health psychology data that spans years and even decades. Secondary analysis of such data allows investigation into long-term effects and an overview of trends.
  • Replicability and advancement of research: Secondary data analysis facilitates replicability and can inspire future research that elaborates on findings.

Reflections from Our Lab

Faculty Perspective

As a faculty member, secondary analysis of existing databases has provided meaningful research opportunities that would not otherwise be possible – including unparalleled access to diverse populations and less frequent clinical conditions – but it is not without challenges. For example, the easy accessibility of data can sometimes distract from the importance of a thorough literature review and of developing research questions grounded in theory prior to analyzing data. Often it is easy to underestimate the amount of time and effort necessary for initial projects, but after learning the ins and outs of a specific database, subsequent projects are much more efficient. In working with existing databases, I have found significant value in consulting with colleagues using the same database and connecting with the authors of the database.

Additionally, our research lab is supported by a research specialist who provides data analysis support. It is important that anyone providing support for data analysis has strong knowledge of the dataset. Health-related data in secondary databases also frequently includes dichotomous, categorical, and ordinal variables, and thus may necessitate analyses based on patterns of frequencies. This may require use of logistic regression and odds ratios, which students may not have learned or frequently used.

Looking forward, as I train clinical psychology students on analyzing existing databases, I am hopeful that this skill set and access to existing databases will allow students to continue meaningful research throughout their clinical careers.

Graduate Student Perspective

In my experience as a graduate student, the apprenticeship model (through which new team members join projects led by those more experienced) provides opportunities to familiarize myself with the database within ongoing team research. While existing databases can present as large and complex, once fluency with the specific database and its coding is established, there emerges a wide range of opportunities for asking varied research questions within large and diverse samples.

It is essential to embody a grounded top-down approach and frequently review existing literature and theory when utilizing existing databases in order to justify and conceptualize research questions within their broader fields of study. This is one of the most difficult aspects of using existing databases. While the amount of accessible data is tremendous, new data cannot be collected to supplement a given research query, so researchers must instead find ways to ‘work with what they have’. A final tip would be to keep clinical implications at the front of mind always – asking yourself why the research matters and how the specifics of your analysis influence the meaning of the results and the implications on the sample in question.

Health Psychology Databases       

An important step in secondary analysis of existing databases is selecting the most appropriate database (Smith et al., 2011). Secondary data can come in a variety of forms, e.g., clinical registries, archival records, continuous monitoring systems, big data via technological applications, population-based survey data, or nationally-based survey data. Secondary data collected by credible research entities produces impactful findings that uphold standards of reliability and validity in data collection.

With a well-researched knowledge base in an area of interest, publicly available databases can provide a wealth of information. One place to start the search is an online compendium of databases. For example, the database Directory of Health and Human Services Data Resources from the US Department of Health and Human Services holds information and links to a variety of datasets from entities like the Centers for Medicare Services (CMS), Centers for Disease Control (CDC), Agency for Healthcare Research and Quality (AHRQ), the National Institutes of Health (NIH), the Health Resources Services Administration (HRSA), the National Center for Health Statistics, the Substance Abuse and Mental Health Services Administration (SAMHSA), and other departments under the umbrella of the US Department of Health and Human Services (U.S. Department of Health and Human Services, 2023). Trusted governmental research entities will also have the added benefit of utilizing established measures and indicators within the area of study.

The American Psychological Association maintains a list of data repositories like epidemiology studies, clinical trials from the National Heart, Lung, and Blood Institute (NHLBI), and the National Center for Educational Statistics (NCES) surveys, data, and tools (American Psychological Association, 2021). Other data suggested by the American Psychological Association include:

  • US Census data
  • Consumer Behavior data from the University of Michigan
  • Inter-University Consortium for Political and Social Research (ICPSR)

The full list can be found at https://www.apa.org/research/responsible/data-links

When specifically considering research areas within health psychology, there are a vast amount of available resources. There is room for combining data from databases across healthcare, public health, policy, psychology, education, and other fields to yield innovative and interesting findings (Langkamp et al., 2021). Imagine utilizing datasets from the National Center for Biotechnology Information (NCBI) database, which provides access to biomedical and genomic data, in combination with data that captures psychological constructs like a diagnosis-specific database (e.g., the National Database for Autism Research (NDAR) or the Maternal and Child Health database).

There are possibilities for research across cultures or geographic areas with databases

like the European Social Survey (ESS) of social and health-related experiences, the World Health Organization Global Health Observatory, or the Integrated Public Use Microdata Series (IPUMS), which houses various census and survey data from around the world.

Final Thoughts

When weighing the benefits and limitations of secondary data analysis research, there is potential for valuable and innovative insights. Research with existing databases can fill gaps in the literature that may exist due in part to common barriers faced in research, such as the resource-intensive nature of primary data collection. For those who wish to explore this methodology further, we encourage considering the power of secondary data analysis as a research tool.


American Psychological Association. (2021). Data sharing and other research resources. https://www.apa.org/research/responsible/data-links

Langkamp, D. L., Barnes, A. J., & Zuckerman, K. E. (2021). Secondary analysis of existing data sets for developmental behavioral pediatrics. Journal of Developmental & Behavioral Pediatrics, 42(4), 322-330.

Smith, A. K., Ayanian, J. Z., Covinsky, K. E., Landon, B. E., McCarthy, E. P., Wee, C. C., &  Steinman, M. A. (2011). Conducting high-value secondary dataset analysis: An introductory guide and resources. Journal of General Internal Medicine, 26(8), 920–929. https://doi.org/10.1007/s11606-010-1621-5

Wiley, S., Schonfeld, D. J., Fredstrom, B., & Huffman, L. (2013). Research training of developmental-behavioral pediatrics fellows: A survey of fellowship directors by developmental-behavioral pediatrics research network. Journal of Developmental & Behavioral Pediatrics, 34(6), 406-413.

U.S. Department of Health and Human Services. (n.d.). Directory of health and human services data resources.