Global Health Press

The missing piece in finding a vaccine: bioinformatics

Genetics research from Macquarie University could help pinpoint the ideal strain of the SARS-CoV-2 virus for vaccine development

As scientists race to develop a vaccine for the new virus known as SARS-CoV-2, new research from Macquarie University Associate Professor Denis Bauer and her colleagues explores how the virus is mutating – and how severe the different versions of the virus could be.

It is important data, because researchers must choose which coronavirus strain to validate animal models with, as this will be used to test vaccines that eventually go into humans.

“A successful vaccine should protect against the virus strains that are most representative of the ones most likely to be in high circulation in the future,” says Dr Bauer, an Honorary Associate Professor at Macquarie University’s Faculty of Medicine and Health Science.

Dr Bauer is a bioinformatician leading a group at the CSIRO, where her team has spent years developing computer models that crack genetic codes to help figure out certain cancers and infectious diseases, and now – to decipher the genome of the novel coronavirus.

Like other coronaviruses, at the core of the SARS-CoV-2 virus is ribonucleic acid (RNA), which is less stable and will typically mutate more rapidly than DNA.

“RNA viruses usually mutate very fast, which is why we need a new version of flu vaccine each season, but fortunately coronaviruses have a sophisticated proof-reading mechanism enabling them to replicate quite faithfully – meaning less mutations slip through,” says Dr Bauer.

The mutations that do occur can establish distinct strains of the virus, which seem to already cluster in different parts of the world, but more research is needed to understand how these different strains might influence the symptoms and progression of the COVID-19 disease.

Adapting the algorithms

Dr Bauer’s team has adapted the algorithms they originally developed to analyse the human genome in motor neurone disease research.

They are using these new methodologies to identify the differences between over 180 separate genetic sequences of the SARS-CoV-2 virus.

Most of these genetic sequences come from a not-for-profit influenza virus database, called the Global Initiative on Sharing All Influenza Data (GISAID), which stores genetic information supplied by researchers around the world.

“Finding a vaccine quickly relies on a worldwide co-operative effort,” Dr Bauer says – and she is hopeful that her field can make a valuable contribution.

“Using technology like artificial intelligence and cloud computing, we can perform billions of calculations in a short time, completing work in a few hours that previously would have taken days,” she says.

Bioinformatics analyses virus behaviour to find the best vaccine

Bioinformatics allows researchers to include additional, very important information about different strains of the virus along with the genetic code when they run calculations. By comparing multiple genomes from different patients, places, and times, researchers can work out what some of the extra information stored in the virus RNA might mean to patients.

Most of the genetic data from the SARS-CoV-2 virus doesn’t include additional details about the patients that the virus RNA was taken from. But Bauer says that her team can do far more useful research when the data includes not just the genomic sequences of the virus, but also details of clinical symptoms and patient characteristics (with identifying information removed).

“That lets us monitor the changes in the virus and helps us better understand the role that various genetic differences play in the disease’s progression,” she explains.

Virus adapts to human hosts

Bioinformatics can also let scientists work out which virus strains are most common based on genetic changes. That’s potentially a more future-proof way for testing vaccines than the alternative of testing vaccines based on strains that are close to “patient zero,” or the original strain.

The original virus moved over from bats, probably via another animal vector, and early versions were probably still better adapted to the bat or intermediate host, Dr Bauer says.

“Newly evolving mutations make the virus better adapted to its new human host and they are more likely to become widespread as the virus moves around the human population,” she explains.

Fortunately, mutations that are less deadly to the host are also better for the long-term survival of the virus, she adds. “My guess is that it will become less harmful to humans over time because that helps it spread more effectively; but we need data to prove that.”

Call for information

Her team is calling on the international community to share more information about the genomic sequences they gather.

Through this analysis, researchers can identify which strain is likely to be closest to the original virus, which one is the most common and widespread, and which strain is the most harmful.

“SARS-CoV-2 has a relatively small genome compared to other organisms, but there are still over 30,000 bases to analyse,” says Dr Bauer.

“It’s a different challenge to analysing the three billion base pairs in the human genome; this time, we may only have 30,000 bases, but we probably soon will have tens of thousands of different samples to compare, so the scale is still huge, it’s just flipped around.”

Using the sophisticated statistics and computer science techniques of bioinformatics lets scientists sift through this unprecedented information efficiently to extract insights about the virus, Dr Bauer says.

“Learning about the properties of this virus fast is important – because time is a luxury we don’t have.”

Source: Macquarie University

Notify of
Inline Feedbacks
View all comments