October 25, 2021

Nutra Health

The Ideal Health

Changes in the distribution of fitness effects and adaptive mutational spectra following a single first step towards adaptation

Strains and strain handling

All strains used are S288C derivatives, which were evolved and characterized previously4,20,44 (Table 1). Yeast strains and pools were saved as glycerol stocks at −80 °C. Yeast transformations were performed by a lithium acetate/PEG method73. Strains are available upon request.

Yeast growth media and growth cycle

Evolutionary and fitness remeasurement conditions matched those used earlier4,44. Briefly, M3 medium, consisting of 5× Delft medium74 with 4% ammonium sulfate and 1.5% dextrose, was used. Serial batch cultures were conducted by growing cells in 100 mL M3 medium in 500 mL Delong flasks (Bellco) at 30 °C and 223 RPM. Yeast was grown for 48 h between transfers and for each new cycle 400 μL of the grown culture (~8 × 107 cells) were used as inoculum for the new culture, resulting in a 1:250 bottleneck.

Construction and characterization of the founder strains of the evolutions

The prelanding pad strain (SHA118, Table 1) that is receptive to barcoding44 was transformed with a galactose-inducible HO-containing plasmid. The strain diploidized upon exposure to galactose and a diploid clone was sporulated and dissected; a MATa derivative was isolated (GSY5375, Table 1) and saved for subsequent crosses with the evolved clones. Loss of the HO-containing plasmid was verified by absence of growth on appropriate selective medium. Strains GSY5481, GSY5128, and GSY5153, derived from evolution under glucose limitation and previously characterized4,20,44 (Table 1), were backcrossed twice to GSY5375. Competitive fitness of segregants and evolved parents was assayed in triplicate compared to a fluorescent derivative of the ancestor, as in ref. 4, with the following modifications: to increase throughput the assays were performed in 5 mL cultures in tubes incubated in a roller drum, instead of 100 mL cultures in flasks. As a result, the fitness estimates deviate from those previously reported4 (Supplementary Fig. 1, Supplementary Table 1). Additionally, since the derived strains do not contain a barcode (which reconstitutes a URA3 gene), they require uracil, so the fitness assays were performed in M3 medium supplemented with uracil. All segregants were genotyped for the variant of interest by amplification of the respective locus and Sanger sequencing. The oligos used for genotyping are shown in Supplementary Table 3.


Strains GSY6701, GSY6702, and GSY6703 (Table 1) were transformed with a low and high complexity barcode, consecutively. These strains have the YBR209w locus replaced with the prelanding pad (corresponding to strain SHA11844). The low complexity barcode was derived by PCR amplification of part of the L001 plasmid library, containing the lox66 site, the DNA barcode, the artificial intron, the 3′ half of URA3, and HygMX75. The fragment was amplified with primers BC_F-DY and BC_R1-DY (Supplementary Table 3), from 12 ng of L001 library in a 50 µL reaction with PrimeSTAR (TAKARA, Mountain View, CA) using the following conditions: hot start, initial denaturation at 98 °C for 2′, 30 cycles of 98 °C for 30′′, 55 °C for 15′′ and 72 °C for 3′, and final extension at 72 °C for 10′. The product was purified with the QIAquick PCR purification kit (QIAGEN, Germantown MD), transformed into each of GSY6701, GSY6702, and GSY6703 and successful transformants were selected on YPD + Hygromycin. Single transformants were further transformed with the high complexity library (pBAR3)44 with the following modification: after transformation the cells were grown on liquid YP + 2% galactose for ~16 h for Cre recombinase induction prior to selection on SC-ura plates with 2% glucose. Cell growth was estimated by cell counting immediately after transformation and before plating. The number of unique transformants was estimated by plating a dilution on selective medium and correcting for growth. After 1 day of growth on selective medium the transformants were pooled and saved as glycerol stocks at −80 °C (high complexity subpools with a common low complexity barcode). The final founding pools for the evolutions were constructed by pooling high complexity subpools to an estimated total of ~700,000 unique transformants per initial clone.

Evolution experiments

Evolution experiments were conducted under identical conditions to those that gave rise to our adapted ancestors44. Briefly, 108 cells of each of the founding populations were used to inoculate 100 mL of SC-ura, 2% dextrose, supplemented with hygromycin in 500 mL Delong flasks (Bellco). The cells were grown for 24 h at 30 °C and 223 RPM, the end of which was considered generation 0 of the evolution experiment. 400 μL of the initial culture were used to inoculate M3 medium in duplicate, as described in the ‘Yeast Growth Media and Growth Cycle’ section. The evolution experiments were conducted for a total of 20 transfers, corresponding to approximately 160 generations. Prior to each transfer the medium was prewarmed at 30 °C for 1 h. For each timepoint, 2 × 1 mL aliquots were saved as glycerol stocks at −80 °C and the rest of the culture was spun down, resuspended in 5 mL sorbitol buffer (0.9 M sorbitol, 100 mM Tris pH 7.5, 100 mM EDTA), aliquoted in Eppendorf tubes (~1 mL each), and saved at −20 °C to be used for genomic DNA and barcode library preparations.

Clone isolation

Individual clones were sorted at the Stanford Shared FACS facility either from all timepoints (one or two 96-well plates each) for ploidy determination, or from selected timepoints (10 plates each of the following timepoints: cyr1 evolution, replicate 1, timepoint 20 (generation 160), gpb2 evolution, replicates 1 and 2, timepoint 13 (generation 104) and tor1 evolution, replicate 1, timepoint 12 (generation 96)) for fitness remeasurements, ploidy determination and whole-genome sequencing.

Ploidy assay

Ploidy was determined with a high-throughput benomyl-based assay4. Clones archived in 96-well format were grown in deep-well plates to saturation at 30 °C without shaking. The saturated cultures were mixed with pipetting and subsequently pinned onto YPD agar rectangular plates containing 20 mg/ml benomyl (prepared as a DMSO solution) or DMSO (control). The plates were grown at 25 °C for 2 days and then imaged. Clones with inhibited growth on the benomyl medium were identified as diploids. Clones with inhibited growth on the control medium were excluded from analysis. Clones that grew on both media were identified as haploids.

Genomic DNA and library preparation for barcode lineage tracking

Genomic DNA was prepared as follows. An aliquot of cells stored at −20 °C was thawed at room temperature. The cells were spun down, washed once in water, resuspended in 400 µL lysis buffer (0.9 M sorbitol, 50 mM Na phosphate pH 7.5, 240 µg/mL zymolase, 14 mM β-mercaptoethanol) and incubated at 37 °C for 30 min. After the incubation, 40 µL 0.5 M EDTA, 40 µL 10% SDS and 56 µL 20 mg/mL proteinase K (Life Technologies 25530-015) were added consecutively, with brief vortexing after each addition, and the samples were incubated at 65 °C for 30 min. Samples were then cooled on ice for 5′, 200 µL of 5 M potassium acetate were added, and the samples were mixed by shaking, then incubated on ice for an additional 30 min. Following incubation, the samples were spun down at full speed in a microcentrifuge for 10 min, and the supernatant was transferred to a new tube with 750 µL isopropanol and was let to rest on ice for 5 min. The precipitated nucleic acid was spun down full speed in a microcentrifuge for 10 min and washed twice with 70% ethanol. After the second wash the nucleic acid was let to dry completely and then it was resuspended in 50 µL 10 mM Tris pH 7.5. Overnight incubation at room temperature or short incubation at 65 °C sometimes was necessary for complete resuspension. RNA was digested with the addition of 0.5 µL 20 mg/mL RNase A (Fisher Scientific, Waltham MA) and incubation at 65 °C for 30 min.

A two-step PCR protocol was used to amplify the barcoded locus (see Supplementary Table 3 for primers, same as used previously4). The first amplification was conducted using OneTaq 2X Master Mix (NEB, Ipswich MA), a total of 6 µg genomic DNA and a limited amount of primers in 6 × 50 µL reactions with the following composition: 1X OneTaq Mix, 50 nM each forward and reverse primer, 2 mM MgCl2, 20 ng/µL gDNA, in the following conditions: hot start, initial denaturation at 94 °C for 10′, 3 cycles of 94 °C for 3′, 55 °C for 1′ and 68 °C for 1′, and final extension at 68 °C for 1′. The 6 reactions were combined, purified using the QIAquick PCR purification kit (QIAGEN, Germantown MD) and eluted into 30 µL EB buffer. All the eluate was used as template in a single 50 µL 2nd reaction, with the following composition: 0.5 µL Herculase II fusion DNA polymerase (Agilent, Santa Clara CA) per 50 µL reaction, 1×Herculase buffer, 1 mM dNTPs, and 500 nM each of PE1 and PE24, and was amplified in the following conditions: hot start, initial denaturation at 98 °C for 2′, 20 cycles of 98 °C for 10′′, 69 °C for 20′′ and 72 °C for 30′′, and final extension at 72 °C for 1′. Barcode libraries were pooled isostoichiometrically and sequenced on an Illumina NextSeq 550.

DFE/mutational fitness spectrum μ(s) inference

Lineage tracking from barcode sequencing was performed as described44, using code at https://github.com/Sherlock-Lab/Barcode_seq/blob/master/bartender_BC1_BC2.py with minor modifications. Briefly, after extraction of the unique molecular identifiers (UMIs, randomers used for identifying duplicates introduced during PCR amplification), and both low and high complexity barcodes from the sequencing read, low complexity barcodes were clustered against their expected sequences, whereas the high complexity barcodes were pooled across all libraries and clustered with bartender (v1.1)76. The updated reads and the UMIs were used to derive raw barcode counts, which were assembled into the raw count lineage trajectories. Low coverage timepoints and barcodes that appeared in only a single timepoint (considering replicate evolutions) or had no reads at timepoint 0 were excluded from subsequent analysis. The included timepoints and the number of reads and barcodes per timepoint are shown in Supplementary Data 1. Filtered raw count lineage trajectories are provided for each replicate evolution (Source data for Supplementary Fig. 2, “Lineage trajectory counts”).

Using the lineage frequency changes over time, lineages’ fitness per generation (s) and establishment time (tau) were estimated as in ref. 44 . Lineages with reads between 20–30 at each timepoint were treated as neutral and were used to estimate population mean fitness. Lineage tracking data from generation 0 to generation 136 were used for fitness inference in all evolutions, except for gpb2 evolution replicate 1, for which we only had adequate data up to generation 120. Lineage tracking data for the ancestor up to generation 112 and 96, for replicates 1 and 2, respectively, were used for fitness inference as in ref. 44. The generations chosen are the times at which adapted lineages have reached a sufficient frequency in the population, while the majority of such lineages theoretically carry a single beneficial mutation.

Mutations can occur during the barcoding process and prior to the onset of the experiment, some of which can be beneficial in the evolutionary condition44. To characterize the mutational rate during the evolution, lineages with such pre-existing mutations were removed from fitness inference. The following two criteria were used to define lineages with pre-existing mutations: (1) being adaptive in both evolutionary replicates and (2) having an establishment time < −2/s in at least one replicate.

Mutation rates (μ(s)*ds, defined as the mutation rate per generation per cell for mutations with fitness within a range [s, s + ds]) in different fitness intervals were calculated using equation 101 in ref. 44:

$$mu left(sright)rmcdot dsrmcdot left[1+srmcdot rmlnleft(N_f{{{{{rmcdot }}}}}mu left(sright){{{{{rmcdot }}}}}dsright)right]=f(dleft(sright),t){{{{{rmcdot }}}}}fracse^sbullet t$$


where (ds=0.002) is the fitness interval considered, (mu left(sright)) the mutation rate within a specific fitness interval ([s,s+ds]), defined as the fitness-dependent probability density function of the mutation rate, (N_f=10^12), the approximated largest size the population has reached during the barcoding process, and (f(dleft(sright),t)) the summed frequency of lineages whose fitness are within the interval ([s,s+ds]) at generation (t). The error of the estimated (mu left(sright)) is:

$$sqrtfracmu (s)bullet dsN$$


where (N=6bullet 10^8) is the approximated effective population size during evolution.

Note that the barcode sequencing coverage of the ancestor evolutions was ~10–20× higher than those of cyr1, gpb2, and tor1 evolutions (Supplementary Data 1, also Table 2 in Supplementary information in ref. 44). To avoid biases introduced by sequencing coverage differences, we down-sampled the ancestor sequencing data to a depth comparable to those of the cyr1, gpb2, and tor1 evolutions: 2 × 107 at time 0 and 3 × 106 at the rest of the timepoints. Fitness was inferred before and after down-sampling. Lineages with 5–10 reads were treated as neutrals to infer the population mean fitness (vs 20–30 used in the full datasets).

The theoretical expectations on our detection threshold were set as in ref. 44. The fitness detection threshold was determined by our definition of neutrality. We treated low abundance lineages as neutrals and used the decay of those lineages over time to infer population mean fitness. The resolution of our study cannot detect adaptive lineages with a fitness < 2% due to clonal inference and the lineage size limit (see Supplementary Fig. 36 in ref. 44). Certain mutations with fitness < 2% can establish in the population without being detected. However, such mutations never reach large sizes and will not dominate or have much of an impact on population dynamics. In particular, during the 1st step evolutions, population mean fitness reached 0.02 at generation 90. Due to clonal interference, mutations with s = 0.02 have to arise before generation 40 (90–1/s) in order to establish. To be detected, such mutations have to arise 60 generations before the start of the evolution experiment (from equation 77 in ref. 44; (90-1/sbullet {{{{rmln}}}}left(n_ebullet sright)), where (n_e=1000) cells/lineage represents the lineage size). Thus, mutations of such effect that occur immediately at generation 0, are nearly impossible to be detected unless this group of mutations has a very large mutational target size or a very high mutation rate. This is illustrated in detail in the Supplementary material in ref. 44, section 11.1, page 54.

Barcode determination of isolated clones

To identify the barcodes of the isolated clones in 96-well plates, we employed a 2- (column, row) or 3- (column, row, plate) dimensional pooling strategy, inspired by ref. 77. Briefly, we arranged 20 plates per batch into a 4 columns × 5 rows plate matrix and constructed 48 column pools from clones out of 40 wells each and 40 row pools from clones out of 48 wells each. For the second batch we included semi-redundant half-plate pools (40 pools from clones out of 48 wells each) to increase the successful barcode recovery rate. We pooled our samples after cell growth and prepared barcode libraries for Illumina sequencing. Barcodes were recovered for each well at a rate of ~90%, which was somewhat dependent on the barcode diversity of the sampled timepoint (identical barcodes in multiple wells makes it more challenging algorithmically to match barcodes to wells).

High-throughput fitness measurements and analysis

Pooling of clones

Clones isolated from the evolutions were pooled together for high-throughput fitness assays. We used a multi-pronged pinner to take clones from frozen stock and pin them into a set of 96 deep-well plates with 700 µL YPD medium in each well. Cells were grown at 30 °C for 2 days to reach saturation without shaking. 500 µL of 50% glycerol were added into each well using a multichannel pipette. 1 mL of the mixture from each well was pooled, and the final pool was mixed and aliquoted into 2 mL Eppendorf tubes, which were stored at −80 °C for future fitness measurements.


Each replicate fitness experiment was initiated with a 1 mL frozen aliquot of the pooled cell culture, thawed at room temperature, and inoculated into 15 mL M3 in a 500 mL Delong flask. The culture was grown at 30 °C and 223 RPM overnight for cell propagation. 400 µL of the overnight culture were inoculated into 100 mL of fresh M3 medium and precultured at the standard condition for 2 days.

A derivative of the ancestor carrying a restriction site in the barcode region was used to compete with the pool of evolved clones for fitness measurements4. The ancestor clone was resurrected from frozen stock onto M3 agar plates and grown for 2 days until colonies were visible. A single colony was inoculated into 3 mL of M3 medium and grown for 48 h (30 °C in a roller drum). 400 µL of that culture were used to inoculate precultures (100 mL M3 medium in 500 mL Delong flasks, 223 RPM 30 °C).


Fitness assays were conducted by mixing the pooled preculture with the ancestor preculture in a 1:9 ratio (time 0) and growing the resulting population for four successive growth cycles (timepoints 1, 2, 3, and 4), under the evolutionary condition. At the end of each cycle, 400 µL cell culture were inoculated into 100 mL fresh media to start the next cycle. Cells were collected at time 0, and at the end of each of the four growth cycles. The cell pellet from each sample was resuspended in 5 mL sorbitol solution (0.9 M sorbitol, 0.1 M Tris-HCL pH 7.5, 0.1 M EDTA pH 8.0), aliquoted into 2 mL Eppendorf tubes and stored at −20 °C. Three technical replicates were performed per fitness assay. Genome extraction, barcode amplification, and Illumina sequencing were conducted for each sample (timepoint and replicate).

Genomic DNA sample preparation

Genomic DNA was isolated and treated as described in the “Genomic DNA and library preparation for barcode lineage tracking” section.

Fitness estimation

DNA barcodes were sequenced on an Illumina NextSeq 500/550 platform and their abundances were used to estimate lineages’ frequencies in the population, as previously described4. Fitness estimates were conducted for all clones against the neutrals from the wild type evolution and for clones derived from each ancestral genotype separately against the neutrals of the specific genotype. The source code for computing these fitness estimates can be found at https://github.com/barcoding-bfa/fitness-assay-python. We ran two iterations of the script. First, we used all barcode counts as input and recovered fitness estimates and barcodes that were likely to be neutral. Barcodes identified by the first iteration were associated with their physical position on the 96-well plates in frozen stock, and the ploidy of the clones they represent. For the second iteration, apart from the barcode counts, a list of specifically haploid neutral clones was also provided (this is an optional argument of the fitness estimation algorithm). Fitness estimates from the 2nd run were used for further analysis. Final fitness estimates were calculated by inverse variance weighting of estimates from all three replicates.

Calculation of diploidization rates

Diploidization rates were estimated for each genotype and replicate evolution using information from both the fitness remeasurements and lineage tracking data. First, we identified pure diploids based on fitness remeasurements and ploidy assays (see “Fitness gains of isolated adaptive clones from the 2nd-step evolutions tend to be smaller than in clones from 1st-step evolutions” section in Results for details). Second, we identified which of these diploids were “pre-existing diploidy lineages” (see “DFE/Mutational fitness spectrum μ(s) inference” section for details), which are beneficial in both evolution replicates, suggesting that their diploidization events likely happened before evolution. Thus, the “pre-existing diploidy lineages” is a group of diploids whose ploidy has been experimentally verified but does not contribute to the estimate of diploidization rate during evolution. In addition, “pre-existing diploidy lineages” whose fitness was estimated to be larger than or equal to 0.1 per generation from the lineage tracking data, were also excluded, in order to avoid diploids that had acquired additional mutation(s).

Diploids arising during evolution were identified via their fitness values because not all evolving lineages were assayed for their ploidy. Lineages that do not carry pre-existing mutations and have a similar fitness to those of the “pre-existing diploidy lineages” were characterized as diploids that emerged during evolution. Specifically, using the lineage tracking data, we estimated the mean fitness of “pre-existing diploidy lineages” and its corresponding 95% CI for each evolution replicate. Lineages whose fitness fell into this 95% CI were classified as diploids and were used to estimate the diploidization rate for each replicate evolution. Note that the mean fitness of diploids and its 95% fitness CI were estimated using a small group of curated “pre-existing diploidy lineages”. Thus, lineages whose fitness fall into the CI are likely only a subgroup of diploids that arose during evolution and our calculation results in a conservative and therefore likely underestimate for the diploidization rate per genotype and replicate evolution.

Genome-wide sequencing library preparation

Clones selected for sequencing were grown in 500 µL YPD in 96 deep-well plates for two days at 30 °C without shaking. 400 µL of saturated cell culture were used for DNA extraction with the Invitrogen PureLink Pro 96 Genomic DNA Kit (Catalog no. K1821-04A) in a 96-well format. Libraries were prepared and multiplexed with Nextera technology, and a high throughput protocol78. Samples were sequenced on an Illumina HiSeq 4000 with 2 × 150 bp paired end reads.

Variant calling

SNP, small indel, and structural variants were called for 105 clones using Sentieon Genomic Tools Version 201711.0279, as follows. FASTQ files were trimmed using cutadapt version 1.1680 and trimmed reads were mapped to the S. cerevisiae S288C reference genome R64-1-1 (https://downloads.yeastgenome.org/sequence/S288C_reference/genome_releases/) using bwa81. Mapped and sorted reads were then used for the variant calling. Variants were further annotated using snpEff and SNPSift82. The source code for variant calling and annotation can be found at https://github.com/liyuping927/DNAscope-variants-calling.

Variant filtering

To eliminate false positive variants, we applied the following filters. First, variants from lineages with an average genome-wide coverage <10, and all mitochondrial variants were filtered out. Second, variants in FLO1 and FLO9 genes were filtered out due to poor alignment in both genomic regions. Third, variants present in more than five clones and at least two genetic backgrounds out of cyr1, gpb2 and tor1 mutants, they were likely present in the common ancestor and were filtered out. Fourth, variants with a quality score <150 and only occurring in one clone were filtered out. Locus alignment against the reference genome was visually inspected to assess variants present in more than one clone, but with a quality score <150 in at least one of them. Provided that the implicated loci were well-covered and not in highly repetitive regions, the variants were considered bona fide regardless of their quality score. Otherwise, they were discarded in all clones where they occurred. Lastly, we manually verified variants that passed the above filtering by inspecting the corresponding loci alignments against the reference genome and further filtering out false positives, typically occurring in highly repetitive or poorly sequenced regions.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.