Chinese scientists have harnessed artificial intelligence (AI) algorithms to identify over 160,000 RNA viruses, enhancing the ability to quickly pinpoint potential pathogens during outbreak situations.
On October 9, Sun Yat-sen University announced that a research team led by Professor Shi Mang from its medical school, in collaboration with Li Zhaorong's team from Alibaba Cloud, published findings in the journal Cell, detailing the discovery of 180 viral supergroups and more than 160,000 RNA viruses, significantly expanding the known diversity of global RNA viruses.
Traditional virus discovery methods, including virus isolation and bioinformatics analyses of genomics, rely heavily on existing knowledge, resulting in low identification efficiency for highly diverse and rapidly mutating RNA viruses. The research team developed the LucaProt AI algorithm, capable of deep learning from viral and non-viral genomic sequences to autonomously assess viral sequences within datasets.
Using this algorithm, the team analyzed 10,487 RNA sequencing samples from various global biological environments and discovered over 510,000 viral genomes, representing more than 160,000 potential virus species and 180 RNA viral supergroups. Notably, 23 supergroups could not be identified through traditional homology methods, being referred to as the "dark matter" of the viral realm.
Further analysis revealed the longest RNA viral genome documented to date, measuring 47,250 nucleotides. The team also identified previously unknown genomic structures, highlighting the evolutionary flexibility of RNA virus genomes, and detected multiple viral functional proteins, particularly those associated with bacteria, indicating that many types of RNA phages remain to be explored. Additionally, the study found that RNA viruses are still abundant and diverse in extreme environments, such as Antarctic sediments, deep-sea hydrothermal vents, activated sludge, and saline-alkaline flats.
"The AI algorithm model has the ability to uncover viruses that we previously overlooked or were entirely unaware of, which is crucial for disease control and rapid identification of new pathogens," Shi Mang explained. "Our research shows that the diversity of viruses far exceeds human imagination; what we currently observe is just the tip of the iceberg, and we may see significant adjustments in the future viral classification system."