Skip to main content

Writing About Race in Biological Science


Author:  Anna Wheless 
Reviewers:  Laura Raffield, PhD • Nikea Pittman, PhD • Gail Henderson, PhD • Micah Hysong • John Patrick “JP” Flores 
Date of Publication: June 11, 2023

This is the product of a Justice Equity Diversity and Inclusion (JEDI) Leadership Fellows Project (link to JEDI webpage) developed by the author and reviewers. The reviewers (and the author) have volunteered their time for their role in producing this document and do not comprise an official committee. The author accepts full responsibility for the content of the article and acknowledges an imperfect understanding of this complex and dynamic topic. Please direct any communications regarding this article to awheless@unc.edu 


TL;DR

Many publications in biomedical and biomedically-adjacent basic research still stratify results by race as if race were an accurate proxy for shared heritage, which it is not. Recent NIH-funded projects (NASEM Report, All of Us Research Program) officially recommend that we do not use race as a proxy for genetic ancestry because race is socially, not genetically, determined. Race and ethnicity can affect a person’s biology through social determinants of health (SDOH) and these effects should be investigated, but race and ethnicity themselves are not biological categories. Our papers and presentations cannot allow our audience to walk away with the misinterpretation that races are biologically distinct groupings. See recommended readings at the bottom.


Introduction

Scientists have known that there is not a genetic basis for racial divisions since before we even knew that genes were encoded by DNA. Biological race, as Ashley Montagu put it in the 1940s, is a myth, but one that persists.  

In modern day biological and biomedical science, and even in fields farther removed from the health sciences, you’re very likely to have interacted with or contributed to research that describes how some biological factor or outcome varies between races. The purpose of this text is to remind the reader that language is powerful–how we write influences how we think, and how we think influences how we write. Therefore, when we write about race, we have the power to influence how people think about race. Let us use that power not to perpetuate a myth, but to reinforce the truth: that race is a social category with biological consequences.  

“The ambiguities and uncritical use of our language give rise to ambiguities of their own and constitute the compost upon which myths proliferate and are sustained.”

Ashley Montagu, Man’s Most Dangerous Myth: The Fallacy of Race, 1942


Background: Race Is Not Biological

Our job as biological scientists is to uncover causality. We ask what causes a disease, what causes it to appear in one demographic more than others, and what causes treatments to succeed or to fail. 

To answer these questions, we might collect data including age, sex, gender, occupation, nationality, race, family history, personal habits, current address, and various physical measurements. We want this information because all these factors can be linked to biological consequences, but all are not biological in origin.  

For example, it is clear to us that a person’s occupation is not biologically determined, though it may influence their health (link to NIOSH website) through exposures, stress, or physical exertion. Similarly, race (link to NIH definition) can influence health outcomes, but it is not a biological category. There is no genetic test that can tell someone which ‘race’ they are because races in human beings are socially constructed categories. The physical traits that we might associate with different races in truth are continuously variable between human populations, so the divisions that separate what we call the different ‘races’ are arbitrary designations. In the cases where race appears to be an influential variable in a biological outcome, racism and discrimination are likely playing a role through their effects on healthcare access, environment, and life experience. These factors are collectively known as social determinants of health (SDOH).  

Sometimes, the illusion of biological racial differences masks a completely unrelated phenomenon and contributes to misinformation. Consider the allele for sickle-cell hemoglobin (HbS), which many of us learn is more prevalent in Black and African American populations. Because of the way that people have written and communicated about sickle-cell anemia, a misconception has spread that it only occurs in Black and African American people (Google’s “People also ask” search result section reveals people are searching phrases like “Why do only Black people have sickle cell”). As it turns out, due to the selective advantage it can provide, the HbS allele is found to some degree in places where malaria is endemic, including regions of Europe and Asia. The allele has nothing to do with race. So while it is true that HbS is more common overall in the U.S. in those who identify as Black and African American, this is an example of an assigned racial category coincidentally partially overlapping with a population of people with heritage from a region where malaria was prevalent.   

Because there are occurrences of this overlap, there is a history in research of using race as a proxy for shared genetic ancestry. The problem is that when we use race as a proxy, it hides the true biological or social reasons for the effects we observe, which does not help us discover causality. Furthermore, when we use race as a proxy in our communications and do not explicitly acknowledge that race is not biological, we effectively conflate race and ancestry and give the impression that we believe biology differs along racial lines. In truth, there can be greater genetic differences between members of the same race than between members of different races.  

What is a “proxy”? Merriam-Webster gives the definition “authority or power to act for another.” In other words, a proxy is supposed to be a stand-in, or a representative for something else in that thing’s absence. Published guidelines suggest that we should not be using race as a proxy for shared heritage.

  1. A collection of short videos on the topic of race in medical and biological science: https://www.raceandmedicine.com/biology
  2. McChesney, K. Y. (2015). Teaching Diversity: The Science You Need to Know to Explain Why Race Is Not Biological. SAGE Open, 5(4). https://doi.org/10.1177/2158244015611712
  3. HbS allele and malaria sources:
  • Piel, F. B., Patil, A. P., Howes, R. E., Nyangiri, O. A., Gething, P. W., Williams, T. N., Weatherall, D. J., & Hay, S. I. (2010). Global distribution of the sickle cell gene and geographical confirmation of the malaria hypothesis. Nature communications1, 104. https://doi.org/10.1038/ncomms1104
  • Roberts, Dorothy. Fatal Invention: How Science, Politics, and Big Business Recreate Race in the Twenty-First Century. 2011. See page 113.  
  1. Tishkoff, S., Kidd, K. Implications of biogeography of human populations for ‘race’ and medicine. Nat Genet 36 (Suppl 11), S21–S27 (2004). https://doi.org/10.1038/ng1438

How We Write About Race

Author’s note: My goal in providing the following deidentified examples is not to “call out” other researchers, but instead to illustrate how easy it is to convey the wrong idea due to a lack of training in the research environment on this topic. 

It is common and frequently well-intentioned in clinical, biomedical, and public health research to stratify results by race. It is even mandated by the NIH that researchers report the races of study participants in the interest of including underrepresented groups in research that historically focused on white cisgender men. However, comparing outcomes between racial groups in the context of biomedical, epidemiological, or biological studies without discussing that race is social invites the assumption that race is biological. Consider the following fabricated example that borrows language from a real published article cited on the NIH website:  

Within the study population, the mutant allele of gene ABC associated with disease XYZ occurs in 3% of white, 2% of Native American, 4% of Hispanic or Latino, 3% of Asian, and 55% of black individuals. This genetic defect may be responsible for the higher incidence of disease XYZ in blacks, but the role of this and other genetic defects in blacks without disease XYZ is still unknown. 

There are multiple conceptual and linguistic issues with the above example.  

      1. If the purpose of the study was not to discuss the social causes of health disparities, the results probably did not need to be divided by race at all. Using race to stratify the results of a genetic study contributes negatively to our understanding of genetics because it is actively misleading.  
As an alternative, it may be possible instead to use genetic similarity clusters that are based on public reference panels (see NASEM report as a reference).
      2. The term “genetic defect” used here to refer to the allele found frequently in Black participants is negatively charged. Neutral language should be used to describe the genetics of any group of people. Possible options include “allele” or “sequence” or another phrase that suits the context.  

      3. The way the racial categories are discussed (e.g., “blacks” in the example) is not currently considered standard. “Black participants” would be one alternative. The only reason why “blacks” would be usable in the text is if the participants whose data are being used in the study originally reported their race by filling out a form that specifically said “blacks” on it—in which case it would be an issue with the original form using outdated language–and even then, the writing needs to state that they are using the term only to be consistent with the records. If race is going to be discussed, the paper needs to describe how racial categories were decided for the participants–i.e., were they self-reported or not* (link to recommendations for researchers)– and which categories were available to choose from. Lastly, to reiterate, the use of racial categories needs to be justified and grounded in the goal of advancing equity. They are not approximations for genetically distinct categories of human beings. *Editor note: Today, it is standard practice for race and ethnicity demographic data to be self-reported if it is going to be used at all. However, this is not always what happened in older studies, and actual methods (and their limitations) need to be described accurately.

      

    Overall, the language in the example makes it very easy for the reader to come away with the impression that race is biological. In this particular case, the language (“defect”) even contributes to an impression that one race is biologically inferior to others in some way, which is obviously unacceptable. Although the researchers may not be asserting that the allele found more commonly in Black study participants is associated with some racist stereotype, they are still leaving room for the reader to infer that biology differs along racial lines. The authors may know better and simply believe they are stating facts, but if the readers are allowed to infer that the outcomes differ between races due to genetic differences, that creates in their minds a scientific basis for racist beliefs. We cannot allow findings to be misinterpreted in this way. 

    Below are more examples that show how race is commonly used as a proxy for ancestry in research and how casually it shows up in our communications. Consider how the absence of a disclaimer* and lack of a more rigorous analysis of ancestry and the genomic data both contribute to the impression that races are biologically different from each other.  

    *For example: A statement such as “Race and ethnicity are not biological and the differences observed between races are an incomplete picture of the participants’ risk factors and population’s trends” or perhaps “The data presented here should not be used to conclude there are biologically distinct races and merely reflect one type of diversity in our study pool” could at least add some helpful context.

    Image description: A screenshot (above) of the entire text of the “Polymorphism” section from Uniprot for the entry for a protein called Apolipoprotein A5, taken on May 25, 2023. Even though this example is not making a specific claim about differences, the fact that race is the only point of comparison in a discussion about allele frequency could contribute to an impression that race is biological. Also note that the term “Caucasian” (link to more information on the term) is not an advisable substitute for “white” as it has roots in white supremacist language. 

     

    Image description: A table (above) reproduced from a pre-print article on medRxiv (published in 2023) with the specific data points removed for privacy. Unrelated, but “DM” here stands for “diabetes mellitus” status. In this case, the authors have conflated ancestry and race and did not discuss this choice or its implications in the text.  


    It is critical to notice the difference between the language of the above examples and the overtly malign “racist science” many of us have been taught to avoid, i.e. misusing biology to draw connections between races and intelligence, strength, or personality. Science can uphold racism without making boldly racist claims. The fact is that biological races are not real, so if the way we convey our results makes biological races seem real, then we need to do more analysis to uncover the true reason behind the appearance of that correlation and we need to adjust our language accordingly 

    Going forward, we have to remove any ambiguity from our papers, presentations, and grants that allows the audience to interpret race as a biological category. In the study design and data collection phase, we need to make every effort to discover the true root of any biological differences that appear to vary by race. In the writing process, if results linked to race are being presented, the data need to be accompanied by text that acknowledges that race is not biological.  

    This topic and the research available to describe human genetic diversity will continue to evolve. Please see the following references for more information and do continue to read about this topic over time.  


    Recommended Reading

     

    NASEM Report

    National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. https://doi.org/10.17226/26902. Link to download PDF: http://nap.nationalacademies.org/26902  

    Excerpt:
    Recommendation 1. Researchers should not use race as a proxy for human genetic variation. In particular, researchers should not assign genetic ancestry group labels to individuals or sets of individuals based on their race, whether self-identified or not.  

    Recommendation 2. When grouping people in studies of human genetic variation, researchers should avoid typological thinking, including the assumption and implication of hierarchy, homogeneity, distinct categories, or stability over time of the groups.  

    Recommendation 3. Researchers, as well as those who draw on their findings, should be attentive to the connotations and impacts of the terminology they use to label groups. 

     

    All of Us Training course

    The All of Us Research Program has three tiers of data and two tiers of training to access Restricted and Controlled data tiers. The Controlled Tier training contains a lot of good information on this topic and I recommend anyone with a couple extra hours to enroll and complete this training. This will also give you access to the public health data available through the All of Us program, including genomic data at the Controlled Tier of information. The training slides are not publicly available as of May 2023 so you have to enroll to view them. https://researchallofus.org/ 

     

    Fatal Invention by Dorothy Roberts

    Roberts, Dorothy. Fatal Invention: How Science, Politics, and Big Business Recreate Race in the Twenty-First Century. 2011 

    One way to read chapter 5 is through the newly established BBSP summer reading package, though you will need to log in with your onyen: https://uncch.instructure.com/courses/347/pages/summer-reading 

    Excerpt from Chapter 5 “The Allure of Race in Biomedical Research”: The NIH Revitalization Act of 1993 mandates that federally funded clinical studies enroll women and minorities as subjects to “elicit information about individuals” in these groups and, in the case of trials evaluating interventions, “examine differential effects on such groups.” Many researchers interpret this policy as a requirement to break down research findings into racial categories. The act also specifies that researchers must use the racial categories provided in OMB Directive No. 15 for all federal reporting.  

    There were dissenters to the race-conscious approach to inclusion in biomedical research. Some conservatives argued that the NIH rules imposed unlawful gender and racial quotas on researchers, likening them to affirmative action policies. Some minority doctors dedicated to racial equality in medicine were also worried. Otis Brawley, an African American oncologist then at the NIH National Cancer Institute and now chief medical officer of the American Cancer Society, warned that the NIH Revitalization Act “may eventually do more harm than good for the minority populations that it hopes to benefit. The legislation’s emphasis on potential racial differences fosters the racism that its creators want to abrogate by establishing government sponsored research on the basis of the belief that there are significant biological differences among the races.”  

    Race consciousness in federal funding guidelines creates a perplexing paradox. While designed to correct historic neglect of people of color in biomedical research, requiring that biomedical researchers use race as a variable risks reinforcing the very biological definitions of race that have historically supported racial discrimination. Paying attention to racial disparities in health is crucial to eliminating them, but attention to race in biomedical research can also make these disparities seem to be grounded in biological difference rather than social inequality. 

    Additional Sources

    Additional Reading: Getting genetic ancestry right for science and society

    Article: Anna C. F. Lewis et al., Getting genetic ancestry right for science and society. Science 376, 250-252 (2022). DOI: 10.1126/science.abm7530 

    Excerpt: Systems of racial classification have historically regarded continents as meaningful group boundaries; thus, it is not surprising that racial categories and continental ancestry categories are often confounded. Whenever continental ancestry categories are used, the risk is high that a misconception of race as a biological attribute will reenter through the back door. Insufficiently nuanced thinking about continental categories, genetic ancestry, and racial groups can lead to the conflation of the three.  

    Our genetic ancestry is defined by the stretches of the genome that we inherit from our ancestors. Geneticists have a concept for this known as the ancestral recombination graph (ARG). Put simply, an individual’s genetic ancestry is the subset of paths through the human family tree by which they have inherited DNA from specific ancestors. Most often, geneticists study the ARG of multiple individuals at the same time. 

    Crucially, this definition makes clear that there are two things that are not necessary to the definition of genetic ancestry. The first is any categorization by populations or groups. And the second is any contextualization of the individuals apart from their genealogical connections—for example, by labeling these individuals with geographical or cultural information. Yet current practices around ancestry estimation and reporting almost always impose categories and, when they do so, very often default to just one way to contextualize individuals: by continent of origin. Both practices limit the accuracy and reliability of claims being made by researchers about human genetic difference. 

    Additional Reading: Best Practices for Using Race in Public Health Research

    Best Practices Guide: Grant, A. Sullivan, G., Hebert-Beirne, J., Acosta, M. 2021. “Best Practices for Using Race in Public Health Research.” Collaboratory for Health Justice, University of Illinois Chicago School of Public Health. Link to download guide: https://publichealth.uic.edu/community-engagement/collaboratory-for-health-justice/best-practices-race-public-health-research/ 

    Recommended statement to include in published work: “We are presenting these data by ‘race.’ We are using ‘race’ here as a proxy for racism. While ‘race’ is socially constructed and has no genetic basis, racism has real biological, physiological, political, and economic consequences. These consequences are rooted in state- sanctioned historical and contemporary racial oppression. We acknowledge that racism is complex. We challenge you to critically engage with this material and question the use of ‘race’ without the context of racism. Public health and other health professionals have a responsibility to contextualize measures of inequity along the lines of ‘race’ with a discussion of the negative impacts of racism on health. 

    This statement was developed by the 2016 Curricular Praxis Workgroup of Radical Public Health, a group of students, faculty, practitioners, and alumni connected to the University of Illinois at Chicago School of Public Health. It is intended for members of our school community to use when presenting data by race. Similar to the way that a map is expected to include a legend, compass rose, and scale for it to be usable, the authors contend that charts, graphs, maps, and text that describes inequities along “racial” categories cannot be accurately understood without the context of racism. 

    Additional Reading: Race and genetics versus ‘race’ in genetics: A systematic review of the use of African ancestry in genetic studies

    Research Article: Duello TM, Rivedal S, Wickland C, Weller A. Race and genetics versus ‘race’ in genetics: A systematic review of the use of African ancestry in genetic studies. Evol Med Public Health. 2021 Jun 15;9(1):232-245. doi: 10.1093/emph/eoab018. PMID: 34815885; PMCID: PMC8604262. 

    Abstract: Social scientists have long understood race to be a social category invented to justify slavery and evolutionary biologists know the socially constructed racial categories do not align with our biological understanding of genetic variation. The completion of the Human Genome Project in 2003 confirmed humans are 99.9% identical at the DNA level and there is no genetic basis for race. A systematic review of the PubMed medical literature published since 2003 was conducted to assess the use of African ancestry to denote study populations in genetic studies categorized as clinical trials, to examine the stated rationale for its use and to assess the use of evolutionary principles to explain human genetic diversity. We searched for papers that included the terms ‘African’, ‘African American’ or ‘Black’ in studies of behavior (20 papers), physiological responses, the pharmacokinetics of drugs and/or disease associations (62 papers), and as a genetic category in studies, including the examination of genotypes associated with life stress, pain, stuttering and drug clearance (126 papers). Of these, we identified 74 studies in which self-reported race alone or in combination with admixture mapping was used to define the study population. However, none of these studies provided a genetic explanation for the use of the self-identified race as a genetic category and only seven proffered evolutionary explanations of their data. The concept of continuous genetic variation was not clearly articulated in any of these papers, presumably due to the paucity of evolutionary science in the college and medical school curricula.