Phylogenetic tree of 1011+157 yeast genomes

A couple of weeks ago the main results of the 1002 yeast genomes project (which actually ended up as 1011 yeast genomes) were published in Nature. This amazing piece of work from the J Schacherer & G Liti labs offers insights into the evolutionary history of S. cerevisiae, and is also an amazing source of data for any yeast nerd (most of the data is freely available to download here). While browsing through the paper and the supplements, I noticed there wasn’t any phylogenetic tree available where the individual strains names were visible (yes I know, such a tree would be quite messy with the number of strains). The relatedness of different brewing yeast strains has been discussed in some of my previous posts and gathered much interest from readers, so I decided to put together a phylogenetic tree myself from the genome assemblies the authors have made available. As I’m a brewing yeast guy, I decided to also expand the tree with the 157 yeast genomes from the Gallone et al. 2016 study. I’ll get into the details below, and bring up some general observations. So, here it is, a phylogenetic tree of 1168 yeast genomes (click the image below to download the PDF):


First of all, sorry about the colors. It was difficult to find a good dark color palette (with 24+ colors) to differentiate the different strain origins and clades. I hope the tree is still readable. If not I will post a version with all the strains and branches is black.

The strains were originally named with their code names (3 letter code in 1011 yeast genomes, and XX### in Gallone et al. 2016). I’ve then replaced the code names with the strain names as listed in Supplementary Table S1 of the 1011 yeast genomes paper, and our decoded White Labs strains (only the medium to high confidence identifications). Here is a copy of the phylogenetic tree using only the original code names.

Many of the brewing strains sequenced in the 1011 yeast genomes paper are quite different from the Gallone et al. strains, but there is some overlap (e.g. Beer002, Beer003, WLP099 = Beer071, WLP570 = Beer085).

I think DBVPG6694 (Artois) and DBVPG6695 (Orval) might be mixed up in the paper, since Beer041 is reported as ‘Belgian Lager’ while Beer077 is reported as ‘Belgian Trappist’.

If CFG is Fermentis S-04 (and not S-40 as stated in the Table S1), then it interestingly doesn’t seem to cluster with the other Whitbread yeasts, but rather seems to be close to WLP006 Bedford and WLP013 London.

Fermentis S-33 and Lallemand Windsor are quite closely related.

The WLP530 isolate (CFC) sequenced in the 1011 yeast genomes paper is not at all where I was expecting it. Me and ‘qq’ were assuming Beer078 from the Gallone et al. paper would be WLP530 (which clusters together with several other Trappist beer strains), but instead WLP530 clustered together with Beer095-097 of unknown origin and WLP009 Australian Ale (Beer052). I’m not really sure what is going on here?

There are a couple of S. cerevisiae var. diastaticus strains (e.g. AEQ/CBS1782/NCYC361, YAG/YJM271, and AAQ/CLIB272_2) that cluster in the Beer 2 / Mosaic beer group (the genomes of which might be a source of good info for new identification methods).

There is probably a lot of observations I’m missing, so please feel free to comment 🙂

Quick summary of the methods:

Genome assemblies were downloaded and aligned to S288c using NUCmer through the NASP pipeline. SNPs were then called from each alignment. The resulting VCF was annotated with SnpEff, and filtered to only retain sites present in all 1167 strains, inside the coding region of a gene, and with a minor allele frequency greater than 0.25% (i.e. minor allele present in at least 2 strains). A maximum likelihood tree was then generated based on 462,842 filtered sites with IQ-TREE, using the GTR+F+R4 model and 1000 ultrafast bootstrap replicates.

Here is an archive containing the newick trees, FigTree NEXUS files, and the various strain maps (e.g. color map, code-to-strain name translation).


Gallone et al. 2016. Domestication and Divergence of Saccharomyces cerevisiae Beer Yeasts. Cell 166:1397 – 1410.e16
Peter et al. 2018. Genome evolution across 1,011 Saccharomyces cerevisiae isolates. Nature 556:339–344

20 thoughts on “Phylogenetic tree of 1011+157 yeast genomes

  1. wh

    Nice work and posting. Is it okay to print off the pdf as a poster?

    The flavour profiles of wlp013 and wlp006 seem pretty different including how whitelabs says wlp013 doesn’t flocculate well until 4°C. I think I read that wyeast 1028 (wlp013) can get sulphury also. This yeast collector says wlp013 is a burton strain:

    I wonder if the origin story of s-04 is mixed up or if whitbread dry and B are different? I suppose they collected other brewery’s yeasts when buying them out or does dry just refer to how it is packaged? I also wonder if whitelabs naming wlp007 “dry english ale yeast” might be a hint at the origin?

    I suppose maybe someone could see if s-04 forms snowflake yeast? That’d hint at ncyc 1026 / whitbread b.

    I’ve been digging through some archives because I seem to remember the origin story of s-04 being of whitbread origin being “a guy asked a technician when he toured DCL/fermentis”, looks like it was the late Graham Wheeler but he says the real whitbread b isn’t bottom fermenting which ncyc most likely is.

    The linked page by Chris is archived here but I think the “safale 04 whitbread” bit was added on since I’ve seen that whole description by DCL yeasts without the whitbread part*:

    Here’s some old posts:

    Homebrewdigest did a side by side with wyeast, thought it was closest to 1028; I wonder if wyeast 1028 (wlp013) is the same as ncyc 1028, ncyc does look like it’s flocculent and a chain former. However the yeast collector up there said wyeast 1028 was powdery (ncyc has two different descriptions, non-flocculent and chain former). It was deposited to the ncyc in January while 1026 was deposited in June 1958.!msg/rec.crafts.brewing/TC4ejkVR_gg/fF0ThYIU-dYJ;context-place=msg/rec.crafts.brewing/Xatd1hEp8e0/epyyqeV_kfQJ

    More from the yeast collector.


    A well-known, commercial English ale yeast, selected for its fast fermentation character and its ability to form a very compact sediment at the end of the fermentation, helping to improve beer clarity. This yeast is recommended for the production of a large range of ale beers and is especially well adapted to cask-conditioned ales and fermentation in cylindro-conical tanks.

  2. wh

    New flavour snippets, probably more information – I’m still looking but they say wlp051 is pastorianus!

    So for example wlp051 says:
    “WLP051 This strain has more similarities to an English strain than WLP001 California Ale Yeast®. It is a big ester producer, California V showcasing notes of cherry and apple which compliment pale ales, blonde and brown ales. Even in pale ales, Ale Yeast this strain’s characteristic lower attenuation results in a full-bodied malt forward beer. Typically leaves some residual, lager-like sulfur compounds in finished beer. Recent sequencing studies show that WLP051 belongs to Saccharomyces pastorianus species, the same hybrid species as most lager strains. However, this strain has been used to make ales for decades and was previously categorized as belonging to Saccharomyces cerevisiae.”

  3. suregork Post author

    Wow, very interesting! Not surprising though that many ale strains are turning out to be lager strains (as the opposite was already shown). It could be Saaz, need to see if I can get my hands on WLP051 to test along with the other lager strains.

  4. suregork Post author

    Lots of great info, thanks a lot! I’m definitely no expert on English yeasts, so hopefully someone else can help.

    Feel free to print the tree as a poster! I tried printing it myself as an A3, but it was way too small. A2 might be readable, but A1 might be preferable 🙂

  5. wh

    I’m short on time but a cursory skim looks like wlp029 wlp030 and wlp041 are in the wrong places?

  6. suregork Post author

    Thanks! I’ll look through it tomorrow in detail. A quick glance suggests that most of the guesses were correct, but some of the more difficult strains (e.g. like the ones you mentioned) are in the wrong place.

  7. qq

    It gets a bit complicated because they’ve obviously used a slightly different method to compile the family tree which results in the strains being ordered slightly differently, but usually it’s pretty obvious what’s going on even if it does require a small leap of faith. For instance around 4 o’clock, there’s a group of four including WLP540 “Rochefort” and a pair including WLP039 East Midlands that have been moved as blocks. There’s also been quite a lot of rearrangement in the Chicos (6 o’clock) and in the middle of Group 2 (the movement of the near-identical block of three Dupont strains is really obvious). WLP041 is in the same place, but the WLP002/7 group has been slightly rearranged.

    It adds a slight uncertainty, but I’m pretty confident that I’ve adjusted for it. So it looks like the two big ones we got wrong ended up next to each other, between Fuller’s and Redhook :
    WLP029 Kolsch – goes to BE094 from BE008 (which always felt a bit uncomfortably low for a code, but it put it next to WLP003 in the kolschy bit of the tree)
    WLP030 Thames Valley goes to BE090 from BE067 among the Chicos.

    WLP041 is correct but that leaves just one slot empty in the Fuller’s/Whitbread B group – other evidence suggests that Conan is in that group, so BE099 could well be WLP095 (or possibly WLP4000?).

    Also WLP585 Belgian Saison III has disappeared from Group 2 as it has been withdrawn from sale.

    SP010 is confirmed as WLP065 American Whiskey – previously I’d guessed it was either WLP065 or WLP070

    WLP005 and WLP006 suffer from careless cropping so can’t be confirmed – worth keeping an eye out for other versions of the tree from White Labs, but given how well the rest have worked out, I’m reasonably confident.

    Otherwise there’s a couple that were never mentioned in the lists in last year’s catalogue. Suregork has confirmed WLP099 through other sequences, but WLP013 London, WLP019 California IV, WLP076 Sonoma and WLP515 Antwerp were not on last year’s list assigning yeast to groups, and are not on the tree in this year’s catalogue. So we’re still in the dark on those. And of course WLP051 is now a pastorianus.

    But overall I don’t think we did too badly – well done all. I’m sure White Labs wouldn’t have released this data without the efforts on this blog. I guess it’s an example of Cunningham’s Law – the way to get the right answer on the internet is not to ask a question but to post the wrong answer.

    Now, if only we could get them to unblind the rest of it….

  8. Weehee

    I guess we have to wait to see if lallemand update or not, if they are equivalent.

    Had a bit of a google for wlp051, found this pdf thesis about Kveik.

    In it they say: “Vidgren et al. (2005, 2010) found that the AGT1 sequences from the ale strains encoded full-length (616 amino acid) polypeptides, while the lager strains encoded truncated (394 amino acid) polypeptides. The authors concluded that this particular AGT1 gene mutation producing a premature stop codon, is a characteristic of lager strains (Vidgren et al., 2010). This matches the result for the control strain NCYC456, that is a S. pastorianus strain, as the extra T was established. The majority of the other samples that carried the presumably defective gene, had only been classified to the genus level Saccharomyces, and it is therefore not possible to draw the same conclusion from these. Interestingly, two of the samples, NFAY 8_P1 and WLP028, also carried the particular gene mutation and were classified as S. cerevisiae.”

    Wlp051 wasn’t classed as cerevisiae. Gallone’s paper is used also and they made their own trees as far as I can see. I remember this old wyeast pdf with attenuations of a test on, 1272 (wlp051) and 1728 (wlp028) are pretty much the same if that changes anything?

  9. qq

    I wouldn’t get too carried away with WLP028 – for one thing BE060 (WLP028) is a mosaic strain. Also on p87 Lægreid goes on to say that 33 out of 136 Gallone sequences have the mutant AGT1 transporter. No indication Impressive little project though – I’m glad I didn’t have to write my masters’ project in Norwegian!

    Looking at the tree on p59, BE052,59,60,62,63,73,77,80,83,84,85,86,87,91,92,93,95,96,97 and SP08/9/10 among others all have the same or nearly the same AGT1 as BE060. In general the BE060 version is associated with higher maltotriose scores and it’s generally found in Group 2’s or the “funnies” – mosaics or the ones outside the main groups.

    I meant to say that previous post of mine was based on going through checking each strain individually.

    Going back to S-04 (why can’t we reply to individual post now? Hmm – seems to be a Blogspot general change?) I’m in two minds. My first thought is to ask if this is another cockup by the French, especially given that they couldn’t get the name right – is this something that’s come via a third party?

    Certainly isomerization’s interdelta for S-04 is quite distinctive, and doesn’t look particularly like Fuller’s (no liquid Whitbread B were tested). But everyone seems convinced it is Whitbread.

    Much good though Graham Wheeler did, he was no great expert on yeast – in the above link he tries to make out that penicillin is a killer factor. It is a factor that kills, but it’s not a killer factor in the way that yeast people use the phrase. I also wouldn’t put too much weight on the top/bottom-cropping thing, it seems clear that commercial populations of yeast have a range of flocculation mutants and it’s fairly easy to select for one or other. In particular I was surprised to hear him talk about Whitbread B as a top-cropper – it may have been once upon a time but the whole point of Whitbread B was that it was one of the few yeast that suited tower fermentations, where you want a bottom cropper and from there its bottom-cropping characteristic suited the move to conicals.

    Whitbread dry refers to the colony morphology – and I’ve never seen any suggestion that it anything other than Whitbread B.

    As for 1026 – it seems to be another of those strains where there’s a lot of conflicting stories, and it’s not certain it’s the same as WLP013 either. I’ve seen it suggested it’s the bottling yeast for White Shield, and then there’s the London Ale name. Personally I’m not going to sweat it too much, let’s just see what the PCRs say.

  10. suregork Post author

    Sorry, I changed the way comments are displayed (probably why you can’t reply to a single comment): from ‘nested’ to ‘linear’. This was because I couldn’t read the ‘nested’ comments on my phone as they were pushed so far to the right. I’ll see if I can somehow reactivate the replying.

  11. qq

    No don’t worry about it, it’s a lot easier to see new comments this way and nested comments got confusing because it was more like a nested/linear blend. At least this way we know where we are, it was just the change that confused me.

    If you were to change anything about the site, I’d shuffle round the modules on the right sidebar – tags and Untappd at the bottom, archives and links in the middle, latest comments and search up towards the top.

  12. Todd H.

    Just responding to the S-04 vs whitbread dry (007)… I did a side by side split batch with them and found that while flavours were similar, S-04 stripped bitterness out of the beer. Maybe that’s a characteristic gained since Fermentis started drying it, but I’ve taken it to mean they are two separate strains.
    Who knows.
    Great tree! Thanks!

  13. qq

    It’s funny what you notice when you look at things a second time. I’ve only just clicked that figure 2 in the Priess et al kveik article from last year has a tree based on
    interdelta 12/21 and 2/12 PCRs :

    Obviously it’s not as sophisticated as a full genome tree, but it’s useful because they used a mix of WL and Wyeast strains as benchmarks. At 4 o’clock there’s an outlying group with 1318 London III, 2565 Kolsch and 1007 German Ale. It’s notable that 2565 has a reputation for being a bit hazy with good mouthfeel, one wonders if it is as good for NEIPAs as 1318.

    1007 is supposedly the same as WLP036 and has also been linked to K-97. A link to WLP036 would imply we’re looking at the mixed group, which also fits Priess’ tree structure as a bit of an outlier. It also makes a bit of sense in the context of isomerization’s 2/12 interdeltas, 1318 and K-97 both have a big band at ~550bp which is pretty distinctive. So that points to a tentative working hypothesis of 1318 being a mixed strain close to WLP036.

    Priess et al also have distinct groups for WLP570 (so Group 2?) and WLP001/090 (Group 1 US). Where it gets interesting is the British Group 1’s – in fact they seem to just have the Fullers/Whitbread B subgroup. 1272 American Ale II falls between WLP002 and WLP007/029, suggesting it’s somewhere in the region of WLP041 “Redhook” and WLP030. That’s interesting, because traditionally 1272 has been linked to WLP051 and BRY-97 but maybe it’s not related at all, to the WLP051 lager strain at least.

    It’s annoying we didn’t pick up on the 029 & 007 link sooner though, it would have led us straight to Beer094. ;-/

    PS White Labs have renamed their catalogue file to

  14. Pingback: An updated brewing yeast family tree | Suregork Loves Beer

  15. suregork Post author

    Well that tree from the Preiss et al. preprint is based on the interdelta profiles as you say (which only is a rough view of the structure of the genome), while the Fay et al. study included the whole genome sequence of the strain. In this case the whole genome sequence will be much more accurate.

  16. Pingback: Brewing yeast family tree (Oct 2019 update) | Suregork Loves Beer

Leave a Reply

Your email address will not be published. Required fields are marked *