Life Science

Genomics for the future of California

A collaborative effort to map California's genomic diversity

By Rachel Weinberg

Designs by Liya Oster

November 30, 2021

Neil Tsutsui, professor of environmental Science, policy, and management at UC Berkeley, was walking the path behind his in-laws’ house in the foothills of the Sierra Nevada mountains that he had trekked “dozens, if not hundreds of times” when he first saw it: a colony of parasitic kidnapper ants raiding a neighboring colony of field ants. Carrying larvae in their sickle-shaped jaws, the bright red kidnapper ants ran to and from their victim’s nest to replenish the supply of field ant workers that they require to keep their colony alive. After running back to the house to get his collecting supplies, Tsutsui returned to pluck a kidnapper ant from the ground before placing the tiny raider in a tube of 100 percent ethanol, a solution that rapidly preserves DNA.


Tsutsui drove the preserved ant back to his lab at UC Berkeley. Immaculate preservation of the kidnapper ant’s DNA is essential for the task at hand: this ant is one of the thousands of plant and animal specimens whose genetic material will form the basis for the California Conservation Genomics Project (CCGP). The CCGP aims to sequence the genome— the molecular code composed of the bases that make up DNA (A, T, G, and C) that provides an “instruction manual” for how every living organism develops—of up to 150 individuals from over 200 native species of California. The goal is to create a database of genomic diversity that will inform conservation efforts in the state for years to come.

With $10 million in funding from California, the CCGP has designed and funded a protocol for genomic sequencing and analysis that will provide researchers across the state with new levels of access to genomic data. DNA from samples submitted to the project will be extracted and sequenced at core laboratories that have been set up to process specimens from researchers across the UC system. Teams of technicians from the CCGP specializing in the collection and analysis of genomic data will use these samples to compile a database of genomic sequences from California’s most ecologically important plants and animals.

After Tsutsui sends the kidnapper ant to the CCGP’s facilities, its DNA will be sequenced and passed through a computational analysis pipeline that will convert the raw data—which includes many gigabytes of genetic code broken up into millions of short snippets—into a format that can be more readily analyzed. The first step in this analysis is generating a reference genome: a sequence of one individual’s genetic code mapped out in the order that it appears before it has been broken up into shorter pieces for sequencing. Just as its name implies, the reference genome provides a point of reference that genomes of other kidnapper ants can be compared against. Using the reference genome as a guide, up to 150 other samples collected from across the state will be sequenced so researchers can see how the ants’ genetic code varies. By taking on the most expensive and technically challenging aspects of genomic sequencing, the CCGP is freeing researchers to focus their attention on finding ways to use the data to answer important questions about their study organisms.

At UC Berkeley, professors are designing studies that will use the CCGP data to build on knowledge gained from years-long lab and field research programs. These project goals include studying how plant and animal populations have changed over time, identifying hidden species within groups of similar-looking organisms, and learning how specific physical traits are encoded in DNA. According to Ian Wang, professor of environmental science, policy, and management at UC Berkeley and member of the CCGP’s scientific executive committee, “We can understand all kinds of things from genomic data that we can’t just see by observing these species in the field, even if we do it for years at a time.” The research spurred by the CCGP is paving the way for a new, scientifically informed approach to conservation in California.

Clarifying California’s diversity

While California is already known for its remarkable diversity of plant and animal species, genomic sequencing is adding another, previously invisible dimension to assessing diversity. Because natural selection can only act on genes that have different variants in a population, genetic diversity is a prerequisite for natural populations to adapt to their environments. One hope for the CCGP is that it will enable the discovery of genetic diversity “hotspots.” Hotspots are places that harbor populations with unusually high levels of genetic diversity compared to those in other parts of the state. If properly protected, hotspots could serve as reservoirs of genetic variation that could help species adapt to changing environmental conditions or emerging pathogens.

Aggregating genomic data from multiple species occupying the same region is a relatively new approach to conservation. Wang explains, “I think we have this opportunity to look broadly across the whole state, across the tree of life, to see what kind of common factors there are across all these species of conservation concern… Is it really idiosyncratic across species [such that] every species needs its own action plan? Or are there some generalities that we could apply across groups, or even more broadly?”

To design effective plans for managing California’s biodiversity, scientists first need to answer a question that many people might assume had been resolved long ago: just how many species are there? Many of the known species in California can be identified by eye, or by trained experts using a microscope. However, scientists are now learning that these described groups may represent only a fraction of the true number of evolutionarily distinct species. Researchers can now use genetic sequencing to identify cryptic species, which appear externally identical to another species but are in fact their own lineage, evolving separately from their visually indistinguishable sister species. Without genomic sequencing, cryptic species are at risk of going extinct before they are even discovered.

Determining how many species a taxonomic group contains can be especially challenging when the organisms are small and have not historically received much scientific attention, like the aquatic mosquito ferns studied by UC Berkeley Professor of Integrative Biology Carl Rothfels. Before extracting DNA to send to the CCGP, members of Rothfels’ lab photograph the tiny ferns and record detailed morphological measurements of each specimen. This information will help them develop methods to visually identify these species more accurately in the future.


Currently, there are only two formally described species of mosquito ferns in the state. However, Dr. Rothfels believes that two species is an underestimate. Most specimens of mosquito ferns that he sees identified by amateur naturalists are referred to as Azolla filiculoides, but Rothfels is investigating the possibility that this species is quite rare in the state and the most common mosquito fern is a different species entirely. “There are multiple levels of uncertainty in the taxonomic side of things,” explains Rothfels. Complicating things further is the fact that humans have accidentally introduced some species of mosquito ferns to ponds and lakes throughout the United States. Without using genetic sequencing, it can be impossible to tell which mosquito ferns are California natives in need of protection, and which are non-native species that might be capable of inflicting severe ecological damage. Out of three samples his lab has sequenced prior to joining the CCGP, at least one appears to be a new species. Rothfels says, “I would not at all be surprised if there were other unidentified, unrecognized species in California.”


The new conservation science

Data from the CCGP will help wildlife managers leverage California’s genetic diversity to maximize the survival prospects for as many species as possible. One way this might be done is through a process called managed translocation, in which wildlife managers transport individuals with beneficial genes from one population to another, often more vulnerable, population. Michael Nachman, director of the Museum of Vertebrate Zoology (MVZ) and one of the principal investigators in the CCGP explains that for species where previous managed translocations have had low success, strategies informed by population genomics show great promise. Access to at least 100 full genomic sequences from each species in the CCGP will provide wildlife managers with a clearer view of which populations in California harbor beneficial genes, and which might be at risk due to inbreeding and low genetic diversity. “We’re really hoping to see how fitness effects segregate in these different populations,” says Nachman, noting that “for different species, we want to learn how many of the mutations they have are neutral, weakly deleterious, or strongly deleterious.” Information about the kinds of mutations found in healthy populations might also show that some mutations are not as harmful as once thought. “There’s a lot of variation between species in the types of mutations they can have,” he explains, “and getting that data, for me as a population geneticist, is very exciting.”

Scientists can also use genomic data to assess a population’s long-term survival prospects by comparing species that occupy narrow ranges with their more broadly distributed relatives. Wang is comparing the genomes of widespread and spatially restricted toad species, a study that he hopes can shed light on the different kinds of risks faced by each species. “You want to know what is driving species declines,” he states, “and I think you can answer that question by comparing species that are abundant locally to those that seem to be much more restricted or experiencing declines.” By finding out whether spatially restricted species have genomic adaptations that tie them closely to a particular set of conditions, scientists can more accurately assess the potential impacts of habitat loss. As the varied impacts of climate change will inevitably alter habitat structures, it will be important to know which species are most susceptible to drought, warmer temperatures, and other environmental shifts.

Concerns about climate change permeate every discussion about conservation, and dialogue within the CCGP is no exception. California’s habitats span a range of extremes, from the dry Mojave Desert to the snow-capped Sierra Nevada mountains. Species already living at the edge of these ecological extremes face some of the greatest risks from climate-related habitat loss. One of Tsutsui’s ant species, the winter ant, is unique as it thrives in temperatures as low as zero degrees Celsius. This cold-loving species might face particular peril in a warming climate, but right now it is one of the most widely distributed native ants in North America. Tsutsui hopes that sequencing the genomes of winter ants thriving in a wide range of habitats could yield new insights into how species that can only thrive in a narrow range of temperatures might tolerate climate change. His lab has already done several experiments showing that winter ants from the highest elevations of the Sierra Nevada mountain range, where snow and freezing temperatures are an annual occurrence, can also tolerate higher temperatures than winter ants from warmer, more moderate climates. Comparing their genomes can pinpoint what variations allows some of these ants to thrive while others perish under the same conditions.


Using conservation data to understand evolution, and vice versa

Some professors at UC Berkeley will use the genomes collected by the CCGP to inch closer to one of the longstanding goals in evolutionary biology: linking specific variations in an organism’s genetic code to its physical appearance or behavior. To make the most of the genetic diversity California already has, researchers need to understand how variations within the genome, genotypes, are linked to phenotypes, physical or behavioral traits that directly impact an organism’s survival. Linking genotypes to phenotypes usually requires generations of breeding animals in controlled environments in the lab. For this reason, genes that code for specific traits have only been definitively identified in a few species. Making genotype-to-phenotype connections in animals that cannot be easily raised in the lab requires enormous data sets of hundreds of individual genomes—just the kind of data the CCGP will be generating.

Because the projects within the CCGP are led by researchers with deep knowledge of the species being sequenced, leaders from each project can choose what phenotypes are important for their studies. For Tsutsui, one of the most important traits to measure is how his ants produce and respond to different types of chemicals. “Insects live in a world of smell,” Tsutsui states, “so it is important to know about the chemical environment.” To that end, researchers in his lab are running several chemical ecology experiments in parallel with their genomic studies for CCGP. They are collecting data on the chemicals produced by female worker ants and the ways that male reproductive ants, which only live for a few weeks to find and mate with a queen ant, respond to smells produced by different queen ants. Tsutsui explains that he sees the CCGP as an “enormous hypothesis-building engine,” which can pave the way for more targeted experiments to study the genetic basis of traits.

Nachman is planning to use data from the CCGP to learn about the genetic basis of fur color variation in California pocket gophers. To connect differences in the gophers’ fur color with variations in their genomes, his team is categorizing the fur colors of different specimens from the MVZ before sending them to the CCGP. “We are, I think, the only CCGP project using all museum specimens,” states Nachman, “which means we have a whole physical specimen voucher to link to every genome we will be sequencing.” Previous work by MVZ Curator John Patton showed that the color of a pocket gopher’s fur is usually a close match to the color of its habitat. The similarity between fur color and habitat probably arises from natural selection by predators like hawks that prey on poorly camouflaged gophers when they leave their burrows. By matching up each gopher’s genome and fur color, Nachman hopes to figure out how many genes contribute to variation in fur color and the magnitude of the effect.

Putting conservation in the hands of Californians

The collaborative nature of the CCGP means that the data is likely to find purposes far beyond the initial goal of forming a database to inform land management practices. Already, the project is growing and strengthening a collaborative network of professors and citizen scientists across the state who share passions for nature and conservation. Cross-campus collaborations are common among CCGP projects. Professors across the UC system who study similar species have teamed up to share samples and expertise. Part of Wang’s role in the Scientific Executive Committee involves getting feedback from as many researchers as possible across the UC system about the goals and design of the CCGP. Doing so ensures that the CCGP is a database that can help scientists, lawmakers, and other stakeholders make informed decisions about land management. He is also helping to standardize the analysis pipeline so that the data will be comparable across different species and regions, leaving the door open for further collaborations when it comes time to analyze the aggregate data.

As a state-funded project, involving the public is essential to the CCGP’s mission. Although fulfilling this goal eventually means sharing the data with different stakeholders in California’s natural resources, some labs are already helping ordinary citizens get involved with the CCGP. One significant avenue for enabling participation from citizen scientists is the natural history observation app, iNaturalist. Originally publicized as an outreach project by the California Academy of Sciences, iNaturalist has also provided a means for researchers to find new locations where they can search for their study species. Users upload photos and locations of wild organisms that they observe, and scientists or experienced naturalists provide taxonomic identifications. Rothfels has used iNaturalist to target his searches for mosquito ferns. The fern “has this weird ephemeral nature, so you never know where you’re going to find it,” says Rothfels, noting that some of his collections have come from unexpected locations “like a pond in a golf course.” Deputizing citizen scientists to collect samples allows Rothfels to significantly broaden the geographic range of his study. After putting out a call on Twitter asking people to send him their mosquito ferns, Rothfels was pleased to see that many people were excited to have the opportunity to contribute to science. He explains, “The local naturalists were excited about the possibility of having a mini project to keep an eye out for, and we’re spreading as wide a net as we can.”


Tapping into the expertise of online communities has proved to be an effective strategy for getting samples of many different species. Fans of ants have a particularly active online community on sites like Facebook, Reddit, and Discord. Through these platforms, Tsutsui was able to connect with some of these groups to arrange a local meet-up where people of all ages gathered to bring him winter ant queens that they had caught throughout the mating season. Both Tsutsui and Rothfels expressed hope that they can continue to foster cooperation from online communities they connected with for the CCGP. “It isn’t even really fair to call them hobbyists,” said Tsutsui of the ant enthusiasts who brought him winter ant queens. “They spend all this time just watching the ants and have built all this knowledge that we can really benefit from.”

Wang also hopes that the CCGP will show Californians how important genomic data has become as a tool for conservation. “A lot of the public is now familiar with genomes or the idea of gene sequencing but don’t necessarily see the connections between genomic data and conservation efforts,” he says. “I hope we can explain to them the value in these data, what it can do for us, and how we can collect these data efficiently at relatively low cost to provide a wealth of information on species we want to conserve.”

Data collection for the CCGP is still in the early stages, and it remains to be seen how the effects of this massive effort will reverberate throughout the state. However, the efforts of scientists across the UC system working to translate the CCGP into evolutionary insights are likely to stretch the outcomes of the CCGP far beyond its stated goal of establishing a baseline of genomic diversity in California.

Rachel Weinberg is a graduate student in environmental science, policy, and management.

Designs by Liya Oster

This article is part of the Fall 2021 issue.

Notice something wrong?

Please report it here.