Phylogenetic Assignment of Named Global Outbreak Lineages

Phylogenetic Assignment of Named Global Outbreak Lineages (pangolin) is a software tool developed by members of the Rambaut Lab and the web application is developed by the Center for Genomic Pathogen Surveillance.[1] Its purpose is to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the Pango nomenclature.[2] It allows a user to assign a SARS-CoV-2 (the virus that causes COVID-19) sample a Pango lineage by comparing the sample's genome sequence with other genome sequences,[3] and assigns the most likely lineage (Pango lineage) to SARS-CoV-2 query sequences. As described in Andrew Rambaut et al. (2020),[2] a Pango lineage is described as a cluster of sequences that are associated with a epidemiological event, for instance an introduction of the virus into a distinct geographic area with evidence of onward spread. Lineages are designed to capture the emerging edge of the pandemic and are at a fine-grain resolution suitable to genomic epidemiological surveillance and outbreak investigation.

pangolin logo

Description

Lineage designation

Distinct from the pangolin tool, Pango lineages are regularly, manually curated based on the current globally circulating diversity. A large phylogenetic tree is constructed from an alignment containing publically available SARS-CoV-2 genomes, and sub-clusters of sequences in this tree are manually examined and cross-referenced against epidemiological information to designate new lineages; these can be designated by data producers, and lineage suggestions can be submitted to the database.[4]

Model training

These manually curated lineage designations, and the associated genome sequences, are the input into the machine learning model training. This model, both the training and the assignment, has been termed 'pangoLEARN'. The current version of pangoLEARN uses a classification tree, based on the scikit learn implementation[5] of a decision tree classifier.

Lineage assignation

Originally, pangolin used a maximum-likelihood-based assignment algorithm to assign query SARS-CoV-2 the most likely lineage sequence. Since July 2020, however, it has used the 'pangoLEARN' machine-learning-based assignment algorithm to assign lineages to new SARS-CoV-2 genomes. This approach is fast and can assign 10,000 SARS-CoV-2 genomes in 10 minutes.

Availability

pangolin is available as a command-line-based tool, downloadable from Conda and from a github repository,[6] and as a web-application[7] with a drag-and-drop graphical user interface. The pangolin web application has assigned more that 512,000 unique SARS-CoV-2 sequences as of January 2021.

Developers

pangolin was created by Áine O'Toole and the Rambaut lab on 5 April 2020. The main developers of pangolin are Áine O'Toole and Emily Scher; many others have contributed to various aspects of the tool, including Ben Jackson, J.T. McCrone, Verity Hill, and Rachel Colquhoun of the Rambaut Lab.

The pangolin web application was developed by the Centre for Genomic Pathogen Surveillance, namely Anthony Underwood, Ben Taylor, Corin Yeats, Khali Abu-Dahab, and David Aanensen.

It has been used extensively during the COVID-19 pandemic.[2][8][9]

See also

References

  1. "Real-Time Epidemiology for COVID-19". www.pathogensurveillance.net. Retrieved 22 January 2021.
  2. Rambaut, A.; Holmes, E.C.; O’Toole, Á.; et al. (2020). "A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology". Nature Microbiology. 5 (11): 1403–1407. doi:10.1038/s41564-020-0770-5. PMID 32669681. S2CID 220544096.
  3. "Pangolin web application release". virological.org. Retrieved 22 January 2021.
  4. cov-lineages/pangoLEARN
  5. https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
  6. https://github.com/cov-lineages/pangolin
  7. https://pangolin.cog-uk.io/
  8. Pipes, Lenore; Wang, Hongru; Huelsenbeck, John P; Nielsen, Rasmus (9 December 2020). Malik, Harmit (ed.). "Assessing Uncertainty in the Rooting of the SARS-CoV-2 Phylogeny". Molecular Biology and Evolution. Oxford University Press (OUP). doi:10.1093/molbev/msaa316. ISSN 0737-4038.
  9. Jacob, Jobin John; Vasudevan, Karthick; Pragasam, Agila Kumari; Gunasekaran, Karthik; Kang, Gagandeep; Veeraraghavan, Balaji; Mutreja, Ankur (22 December 2020). "Evolutionary tracking of SARS-CoV-2 genetic variants highlights intricate balance of stabilizing and destabilizing mutations". bioRxiv 10.1101/2020.12.22.423920. Phylogenetic Assignment of Named Global Outbreak LINeages tool (pangolin) has been the most widely used tool for lineage assignment to newly emerging variants.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.