NOTE: This post has been updated.
The paper on brewing yeast domestication authored by the Verstrepen lab and White Labs last year is a fantastic piece of work (i.e. Domestication and Divergence of Saccharomyces cerevisiae Beer Yeasts, Gallone et al. 2016, Cell). I’m sure most of you have seen it by now, so I won’t be going into any details about it. While the article and the work that has been done is amazing, there is one big negative about it: the strains used in the study have been coded, and there is no way of knowing what strain in the paper corresponds to what strain from White Labs or the Verstrepen lab. This means that the readers can’t really benefit from the huge amount of phenotypic data that the study generated. Wouldn’t be nice, for example, to know which White Labs strains are POF+, which ones can’t use maltotriose, which ones produce high concentrations of isoamyl acetate, or which ones sporulate easily?
There is one way of finding out what the coded strains are though, but unfortunately it isn’t completely straight-forward and at the moment it can only be done with a handful of strains. The authors have released the genome assemblies for each of the strains used in the study (available here: https://www.ncbi.nlm.nih.gov/bioproject/323691). The genomes of some White Labs strains have been sequenced in other studies, and performing a search for ‘WLP*’ in the NCBI database yields some hits (https://www.ncbi.nlm.nih.gov/biosample/?term=WLP*). Most of these Illumina reads are from a recent paper focusing on wine yeasts by Borneman et al. 2016. Luckily the strains weren’t coded in this study, and we know what White Labs strains these sequence reads are derived from.
What I then did, was download all of the raw sequence reads for any White Labs strains I could find. I then aligned them to a brewing yeast reference genome (VTT-A81062). After alignment, I looked for SNPs with FreeBayes, and used the resulting VCF files to create consensus sequences for each White Labs strain using BCFtools. I then used ParSNP to perform core genome alignment on these consensus sequences and all 157 of the assemblies from the Gallone et al. paper that I had obtained previously. ParSNP outputs a core genome phylogeny (generated with FastTree2) and a SNP matrix. I additionally produced a maximum likelihood phylogenetic tree using the SNP matrix and ExaML. To identify the White Labs strains from the coded strains in the Gallone et al. paper, I then looked for the closest hits in the phylogenetic tree (this was really obvious for most strains). In cases when there wasn’t one obvious hit, I also looked at Supplementary Table S1 in Gallone et al. 2016 for the reported origin and source of each strain.
Using this approach, I was able to identify eight White Labs strains from the set of coded strains. I know it is not much, but at least it is a good start, and it already gives some interesting and valuable information. First of all here are the results:
The question mark (?) after the ‘Code in paper’ means that there were two close hits, and I chose the match based on the reported origin and source as well. First of all, my suspicions regarding WLP099 were true (see this old post), with it being Beer033, a yeast grouping in the wine clade and unable to use maltotriose. Interestingly, WLP570 (Beer085) also seems to be unable to use maltotriose. This should be the ‘Duvel’ strain, and based on this information, it should only be able to produce bone-dry beers in worts supplemented with sugars (as Duvel is, if I’ve understood correctly). Another interesting observation we can make is that WLP705 appears to be Sake002. This is the only out of the seven Sake strains that wasn’t grouped in the Asian clade, but rather in the wine clade. Is this really a Sake yeast, or is it a mislabeled wine yeast? Anyways, there is a lot of interesting data to be extracted from these strains alone (esters, ethanol tolerance, etc.)!
While writing this blog post, I also noticed that the authors have released the Illumina reads to all the strains as well (in June 2017 apparently). This should allow me to confirm these results by looking at things such as chromosome copy numbers (through coverage) and SNP heterozygosity (the assemblies are haploid sequences). Unfortunately, I probably won’t have time to do that anytime soon though. But hopefully I can have a look at those later during the winter.
I will also be keeping an eye on the NCBI databases in case more White Labs strains are sequenced in the future!