PLoS ONE 5, e10434 (2010). 1c). Anderson, K. G., Rambaut, A., Lipkin, W. I., Holmes, E. C. & Garry, R. F. The proximal origin of SARS-CoV-2. Nature 579, 270273 (2020). Trova, S. et al. This new approach classifies the newly sequenced genome against all the diverse lineages present instead of a representative select sequences. Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus. Means and 95% HPD intervals are 0.080 [0.0580.101] and 0.530 [0.3040.780] for the patristic distances between SARS-CoV-2 and RaTG13 (green) and 0.143 [0.1090.180] and 0.154 [0.0930.231] for the patristic distances between SARS-CoV-2 and Pangolin 2019 (orange). Conservatively, we combined the three BFRs >2kb identified above into non-recombining region1 (NRR1). Possible Bat Origin of Severe Acute Respiratory Syndrome Coronavirus 2 3) clusters with viruses from provinces in the centre, east and northeast of China. We use three bioinformatic approaches to remove the effects of recombination, and we combine these approaches to identify putative non-recombinant regions that can be used for reliable phylogenetic reconstruction and dating. Emerg. RegionB showed no PI signals within the region, except one including sequence SC2018 (Sichuan), and thus this sequence was also removed from the set. Intraspecies diversity of SARS-like coronaviruses in Rhinolophus sinicus and its implications for the origin of SARS coronaviruses in humans. 5 Comparisons of GC content across taxa. Its genome is closest to that of severe acute respiratory syndrome-related coronaviruses from horseshoe bats, and its receptor-binding domain is closest to that of pangolin viruses. A., Lytras, S., Singer, J. Nucleotide positions for phylogenetic inference are 147695, 9621,686 (first tree), 3,6259,150 (second tree, also BFR B), 9,26111,795 (third tree, also BFR C), 12,44319,638 (fourth tree) and 23,63124,633, 24,79525,847, 27,70228,843 and 29,57430,650 (fifth tree). Rev. Posterior means with 95% HPDs are shown in Supplementary Information Table 2. A reduced sequence set of 25sequences chosen to capture the breadth of diversity in the sarbecoviruses (obvious recombinants not involving the SARS-CoV-2 lineage were also excluded) was used because GARD is computationally intensive. These authors contributed equally: Maciej F. Boni, Philippe Lemey. 21, 255265 (2004). Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. The estimated divergence times for the pangolin virus most closely related to the SARS-CoV-2/RaTG13 lineage range from 1851 (17301958) to 1877 (17461986), indicating that these pangolin lineages were acquired from bat viruses divergent to those that gave rise to SARS-CoV-2. Biol. Nature 503, 535538 (2013). 35, 247251 (2018). Since the release of Version 2.0 in July 2020, however, it has used the 'pangoLEARN' machine-learning-based assignment algorithm to assign lineages to new SARS-CoV-2 genomes. Its origin and direct ancestral viruses have not been . Bryant, D. & Moulton, V. Neighbor-Net: an agglomerative method for the construction of phylogenetic networks. Duchene, S., Holmes, E. C. & Ho, S. Y. W. Analyses of evolutionary dynamics in viruses are hindered by a time-dependent bias in rate estimates. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage - Nature 1c). 4. Pangolins may have incubated the novel coronavirus, gene study shows The web application was developed by the Centre for Genomic Pathogen Surveillance. wrote the first draft of the manuscript, and all authors contributed to manuscript editing. Google Scholar. Posterior rate distributions for MERS-CoV (far left) and HCoV-OC43 (far right) using BEAST on n=27 sequences spread over 4 years (MERS-CoV) and n=27 sequences spread over 49 years (HCoV-OC43). 2, vew007 (2016). 13, e1006698 (2017). PureBasic 53 13 constellations Public Python 42 17 Our third approach involved identifying breakpoints and masking minor recombinant regions (with gaps, which are treated as unobserved characters in probabilistic phylogenetic approaches). J. Med. One geographic clade includes viruses from provinces in southern China (Guangxi, Yunnan, Guizhou and Guangdong), with its major sister clade consisting of viruses from provinces in northern China (Shanxi, Henan, Hebei and Jilin) as well as Hubei Province in central China and Shaanxi Province in northwestern China. PDF single centre retrospective study According to GISAID . Pangolin was developed to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the Pango nomenclature. Cov-Lineages 1 Phylogenetic relationships in the C-terminal domain (CTD). Evolutionary rate estimation can be profoundly affected by the presence of recombination50. We extracted a total of 2189 full-length SARS-CoV-2 viral genomes from various states of India from the EpiCov repository of the GISAID initiative on 12 June 2020. 5. 26, 450452 (2020). In the presence of time-dependent rate variation, a widely observed phenomenon for viruses43,44,52, slower prior rates appear more appropriate for sarbecoviruses that currently encompass a sampling time range of about 18years. D.L.R. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the causative agent for the current coronavirus disease (COVID-19) pandemic that has affected more than 35 million people and caused . We compare both MERS-CoV- and HCoV-OC43-centred prior distributions (Extended Data Fig. Conducting analogous analyses of codon usage bias as Ji et al. Combining regions A, B and C and removing the five named sequences gives us putative NRR1, as an alignment of 63sequences. Divergence dates between SARS-CoV-2 and the bat sarbecovirus reservoir were estimated as 1948 (95% highest posterior density (HPD): 18791999), 1969 (95% HPD: 19302000) and 1982 (95% HPD: 19482009), indicating that the lineage giving rise to SARS-CoV-2 has been circulating unnoticed in bats for decades. 4), that region and shorter BFRs were not included in combined putative non-recombinant regions. There is a 90% DNA match between SARS CoV 2 and a coronavirus in pangolins. SARS-CoV-2 is an appropriate name for the new coronavirus. Posada, D., Crandall, K. A. Viruses 11, 174 (2019). Centre for Genomic Pathogen Surveillance. the development of viral diversity. https://doi.org/10.1093/molbev/msaa163 (2020). The origins we present in Fig. Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia. 6, 8391 (2015). volume5,pages 14081417 (2020)Cite this article. PI signals were identified (with bootstrap support >80%) for seven of these eight breakpoints: positions 1,684, 3,046, 9,237, 11,885, 21,753, 22,773 and 24,628. SARS-like WIV1-CoV poised for human emergence. Lam, H. M., Ratmann, O. Pangolin-CoV is 91.02% and 90.55% identical to SARS-CoV-2 and BatCoV RaTG13, respectively, at the whole-genome level. The rate of genome generation is unprecedented, yet there is currently no coherent nor accepted scheme for naming the expanding . Boxplots show interquartile ranges, white lines are medians and box whiskers show the full range of posterior distribution. Extended Data Fig. PubMed Central The relatively fast evolutionary rate means that it is most appropriate to estimate shallow nodes in the sarbecovirus evolutionary history. DRAGEN COVID Lineage App This app aligns reads to a SARS-CoV-2 reference genome and reports coverage of targeted regions. 36)gives a putative recombination-free alignment that we call non-recombinant alignment3 (NRA3) (see Methods). & Li, X. Crossspecies transmission of the newly identified coronavirus 2019nCoV. is funded by The National Natural Science Foundation of China Excellent Young Scientists Fund (Hong Kong and Macau; no. When the genomic data included both coding and non-coding regions we used a single GTR+ substitution model; for concatenated coding genes we partitioned the alignment by codon position and specified an independent GTR+ model for each partition with a separate gamma model to accommodate inter-site rate variation. PLoS Pathog. Identifying the origins of an emerging pathogen can be critical during the early stages of an outbreak, because it may allow for containment measures to be precisely targeted at a stage when the number of daily new infections is still low. Med. Divergence time estimates based on the three regions/alignments where the effects of recombination have been removed. Softw. The Bat, the Pangolin and the City: A Tale of COVID-19 To gauge the length of time this lineage has circulated in bats, we estimate the time to the most recent common ancestor (TMRCA) of SARS-CoV-2 and RaTG13. Preprint at https://doi.org/10.1101/2020.02.10.942748 (2020). Impact of SARS-CoV-2 Gamma lineage introduction and COVID-19 - Nature To evaluate the performance procedure, we confirmed that the recombination masking resulted in (1) a markedly different outcome of the PHI test64, (2) removal of well-supported (bootstrap value >95%) incompatible splits in Neighbor-Net65 and (3) a near-complete reduction of mosaic signal as identified by 3SEQ. Schierup, M. H. & Hein, J. Recombination and the molecular clock. Coronavirus: Pangolins found to carry related strains. 190, 20882095 (2004). Sci. Specifically, using a formal Bayesian approach42 (see Methods), we estimate a fast evolutionary rate (0.00169 substitutions per siteyr1, 95% highest posterior density (HPD) interval (0.00131,0.00205)) for SARS viruses sampled over a limited timescale (1year), a slower rate (0.00078 (0.00063,0.00092) substitutions per siteyr1) for MERS-CoV on a timescale of about 4years and the slowest rate (0.00024 (0.00019,0.00029) substitutions per siteyr1) for HCoV-OC43 over almost five decades. Methods Ecol. Extended Data Fig. Gray inset shows majority rule consensus trees with mean posterior branch lengths for the two regions, with posterior probabilities on the key nodes showing the relationships among SARS-CoV-2, RaTG13, and Pangolin 2019. Provided by the Springer Nature SharedIt content-sharing initiative, Molecular and Cellular Biochemistry (2023), Nature Microbiology (Nat Microbiol) Unfortunately, a response that would achieve containment was not possible. Because coronaviruses are known to be highly recombinant, we used three different approaches to identify non-recombinant regions for use in our Bayesian time-calibrated phylogenetic inference. & Boni, M. F. Improved algorithmic complexity for the 3SEQ recombination detection algorithm. Despite the high frequency of recombination among bat viruses, the block-like nature of the recombination patterns across the genome permits retrieval of a clean subalignment for phylogenetic analysis. While there is evidence of positive selection in the sarbecovirus lineage leading to RaTG13/SARS-CoV-2 (ref. Lie, P., Chen, W. & Chen, J.-P. Coronavirus: Pangolins found to carry related strains - BBC News Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins. 16, e1008421 (2020). Google Scholar. J. Gen. Virol. The S1 protein of Pangolin-CoV is much more closely related to SARS-CoV-2 than to RaTG13. 1, vev003 (2015). 32, 268274 (2014). It allows a user to assign a SARS-CoV-2 genome sequence the most likely lineage (Pango lineage) to SARS-CoV-2 query sequences. and JavaScript. The research leading to these results received funding (to A.R. Holmes, E. C., Rambaut, A. In the absence of any reasonable prior knowledge on the TMRCA of the sarbecovirus datasets (which is required for grid specification in a skygrid model), we specified a simpler constant size population prior. Collectively our analyses point to bats being the primary reservoir for the SARS-CoV-2 lineage. 5, 536544 (2020). Coronavirus: Pangolins may have spread the disease to humans Results and discussion Genomic surveillance has been a hallmark of the COVID-19 pandemic that, in contrast to other pandemics, achieves tracking of the virus evolution and spread worldwide almost in real-time ( 4 ). Adv. Identifying SARS-CoV-2 related coronaviruses in Malayan pangolins The fact that they are geographically relatively distant is in agreement with their somewhat distant TMRCA, because the spatial structure suggests that migration between their locations may be uncommon. However, formal testing using marginal likelihood estimation41 does provide some evidence of a temporal signal, albeit with limited log Bayes factor support of 3 (NRR1), 10 (NRR2) and 3 (NRA3); see Supplementary Table 1. The Artic Network receives funding from the Wellcome Trust through project no. Posterior distributions were approximated through Markov chain Monte Carlo sampling, which were run sufficiently long to ensure effective sampling sizes >100. Current Overview on Disease and Health Research Vol. 6 An initial genomic sequence analysis found that the reemergence of COVID-19 in New Zealand was caused by a SARS-CoV-2 from the (now ancestral) lineage B.1.1.1 of the pangolin nomenclature ( 17 ). acknowledges support by the Research FoundationFlanders (Fonds voor Wetenschappelijk OnderzoekVlaanderen (nos. A SARS-like cluster of circulating bat coronaviruses shows potential for human emergence. the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in 04:20. Webster, R. G., Bean, W. J., Gorman, O. T., Chambers, T. M. & Kawaoka, Y. Evolution and ecology of influenza A viruses. Across a large region of the virus genome, corresponding approximately to ORF1b, it did not cluster with any of the known bat coronaviruses indicating that recombination probably played a role in the evolutionary history of these viruses5,7. This provides compelling support for the SARS-CoV-2 lineage being the consequence of a direct or nearly-direct zoonotic jump from bats, because the key ACE2-binding residues were present in viruses circulating in bats. We thank A. Chan and A. Irving for helpful comments on the manuscript. These rate priors are subsequently used in the Bayesian inference of posterior rates for NRR1, NRR2, and NRA3 as indicated by the solid arrows. Maciej F. Boni, Philippe Lemey, Andrew Rambaut or David L. Robertson. PubMedGoogle Scholar. Specifically, progenitors of the RaTG13/SARS-CoV-2 lineage appear to have recombined with the Hong Kong clade (with inferred breakpoints at 11.9 and 20.8kb) to form the CoVZXC21/CoVZC45-lineage. Med. Rambaut, A., Lam, T. T., Carvalho, L. M. & Pybus, O. G. Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). Sibling lineages to RaTG13/SARS-CoV-2 include a pangolin sequence sampled in Guangdong Province in March 2019 and a clade of pangolin sequences from Guangxi Province sampled in 2017. Researchers in the UK had just set the scientific world . Overview of the SARS-CoV-2 genotypes circulating in Latin America performed recombination analysis for non-recombining regions1 and 2, breakpoint analysis and phylogenetic inference on recombinant segments. Internet Explorer). S. China corresponds to Guangxi, Yunnan, Guizhou and Guangdong provinces. from the European Research Council under the European Unions Horizon 2020 research and innovation programme (grant agreement no. Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. 725422-ReservoirDOCS). Figure 1 (top) shows the distribution of all identified breakpoints (using 3SEQs exhaustive triplet search) by the number of candidate recombinant sequences supporting them. For weather, science, and COVID-19 . performed recombination analysis for non-recombining alignment3, calibration of rate of evolution and phylogenetic reconstruction and dating. Download a free copy. The ongoing pandemic spread of a new human coronavirus, SARS-CoV-2, which is associated with severe pneumonia/disease (COVID-19), has resulted in the generation of tens of thousands of virus . The Sichuan (SC2018) virus appears to be a recombinant of northern/central and southern viruses, while the two Zhejiang viruses (CoVZXC21 and CoVZC45) appear to carry a recombinant region from southern or central China. Influenza viruses reassort17 but they do not undergo homologous recombination within RNA segments18,19, meaning that origins questions for influenza outbreaks can always be reduced to origins questions for each of influenzas eight RNA segments. Several of the recombinant sequences in these trees show that recombination events do occur across geographically divergent clades. Biol. 4 TMRCAs for SARS-CoV and SARS-CoV-2. The 2009 influenza pandemic and subsequent outbreaks of MERS-CoV (2012), H7N9 avian influenza (2013), Ebola virus (2014) and Zika virus (2015) were met with rapid sequencing and genomic characterization. The time-calibrated phylogeny represents a maximum clade credibility tree inferred for NRR1. (2020) with additional (and higher quality) snake coding sequence data and several miscellaneous eukaryotes with low genomic GC content failed to find any meaningful clustering of the SARS-CoV-2 with snake genomes (a). Transparent bands of interquartile range width and with the same colours are superimposed to highlight the overlap between estimates. Calibration of priors can be performed using other coronaviruses (SARS-CoV, MERS-CoV and HCoV-OC43), but estimated rates vary with the timescale of sample collection. Originally, PANGOLIN used a maximum-likelihood-based assignment algorithm to assign query SARS-CoV-2 the most likely lineage sequence. J. Virol. The idea is that pangolins carrying the virus, SARS-CoV-2, came into contact with humans. The lineage B.1 has been the major basal and widespread lineage from the initial SARS-CoV-2 spread and it became the more prevalent lineage in Colombia ( 13 ), while the B.1.111 lineage, first detected in the USA from a sample collected on March 7, 2020 and subsequently in Colombia on March 13, 2020 is currently circulating and mainly represented 24, 490502 (2016). We infer time-measured evolutionary histories using a Bayesian phylogenetic approach while incorporating rate priors based on mean MERS-CoV and HCoV-OC43 rates and with standard deviations that allow for more uncertainty than the empirical estimates for both viruses (see Methods). Early detection via genomics was not possible during Southeast Asias initial outbreaks of avian influenza H5N1 (1997 and 20032004) or the first SARS outbreak (20022003).