The Complex Origins of SARS-CoV-2 Virus

Deepa Agashe, NCBS

Right from the beginning of the COVID-19 pandemic, everyone wanted to know where this new havoc-causing virus came from. From early genome sequencing efforts, it was clear that there was a strong connection to viruses that infect bats, because their sequences were very similar. But viruses accumulate mutations faster than most other species, making even small differences in sequence important. In addition, coronaviruses have a very high rate of recombination, making it very challenging to identify their precise origin. 

What is recombination and why is it so problematic? Imagine we want to trace your ancestry. You know that each of your parents contributed to about half of your DNA sequence. But things are not organized neatly: each continuous piece of DNA (a chromosome) was not fully inherited from one parent. Instead, the parental chromosomes recombined during fertilization; i.e. some bits of the chromosomes were swapped. As a result, each strand of DNA is a mosaic of genetic material from both your parents. In coronaviruses, such recombination happens very frequently (though not in the same way, since they do not reproduce sexually). If a host is infected by different viruses at the same time, there is a high probability that the new virus particles produced in the host’s body will mistakenly incorporate genetic material from both strains. After hundreds of cycles of infection, recombination and reproduction, you get a virus whose genome is so mixed up that it is difficult to trace its ancestry.

 In the case of the novel coronavirus, scientists had earlier found somewhat puzzling and conflicting results. On the one hand, much of its genome was similar to that of a virus isolated from a horseshoe bat in 2013. On the other hand, parts of the spike protein – which the virus uses to bind to human cells – resembled the protein from viruses found in Pangolins. These results are a hallmark of recombination, where different parts of the genome are more closely related to different ancestors. But we still want to find out the origin of the “backbone” of the virus’ genome, to which bits and pieces from other viruses were added over time. And this is not easy.

In a new study, Boni and colleagues used three different methods to carefully identify pieces of the genome of the novel coronavirus that arose through recombination. They then assembled the remaining parts of the genome – the “backbone” – and found that it best matched the genomes of viruses from horseshoe bats. But what about the spike protein? Zooming in on different parts of this protein, they realized that it was indeed more closely related to the Pangolin coronaviruses; but that it was the spike protein of the bat coronavirus that had acquired a different piece through recombination. 
So how long ago did these viruses share a common ancestor; i.e. when did the different lineages split from each other? This is difficult to estimate for viruses, not only because of their high mutation rate, but also because the rate at which mutations spread in the virus population fluctuates as the virus spreads and infects new hosts. The scientists measured these fluctuations using data from older coronavirus epidemics (MERS and SARS), which allowed them to refine their methods to date the lineages. They estimated that the novel coronavirus lineage most likely split from the closest bat-infecting viruses somewhere around the 1940s to 1980s. On the other hand, the Pangolin-infecting virus lineage separated earlier, around the 1850s-1870s.
 
Putting these pieces of the puzzle together, it appears that the novel coronavirus belongs to a group of viruses that may have already acquired the ability to infect humans over a century ago, and have circulated among wild animals since then. Different lineages of these viruses went on to infect Pangolins, various bat species, and eventually humans when the opportunity arose. Thus, the novel coronavirus did not suddenly and quickly evolve to infect humans; it was ready and lurking among animal hosts for a long time. The good news is that continuous surveillance of animal pathogen reservoirs can provide early warning about potential zoonoses, before they make a successful jump and spread across the world.
 
References
Boni et al 2020, Nature Microbiology https://www.nature.com/articles/s41564-020-0771-4
Zhou et al 2020, Nature https://www.nature.com/articles/s41586-020-2012-7
Xiao et al 2020, Nature https://www.nature.com/articles/s41586-020-2313-x
Lam et al 2020, Nature https://www.nature.com/articles/s41586-020-2169-0

[Last updated 17 August 2020]