Introduction to Bioinformatics: Viral Proteins

The SARS-CoV-2 virus causes COVID-19. In clinics and labs around the world, people are trying to understand how it works. Students will follow a case study and use a computer database to study the genetic sequence of a novel virus.


100 minutes (two 50-minute sessions recommended)

Grade: 11th & 12th Grade


  • Students will locate RNA (stored as cDNA in the national database) sequences associated with a virus.
  • Students will use the NCBI website and database to examine information on a viral genome.
  • Students will communicate their results.


Students will learn about protein translation as it relates to an enveloped, positive-sense, single-stranded RNA virus (SARS-CoV-2). Students will use a mock patient and interact with online resources to learn more about a mysterious viral infection.


Teacher preparation

Make sure students will be able to access the National Center for Biotechnology Information (NCBI) website. Print out the Investigating Viral Proteins handout and the Mock Patient Chart.
Using the provided slideshow, Introduction to Bioinformatics: Viral Proteins:

Slide 2
Ask students: What is “disease”?

Answer: When something is out of balance in the body or homeostasis is not being maintained.

Ask students: What is the image on the left of the page?

If pneumonia is not mentioned, call attention to the white lines in the lung spaces. Those opacities suggest a pneumonia infection.

Slide 3-6
Ask students to fill out the patient information on their mock patient charts.

After students have filled in the information on slide 4, ask students:

  • If this were one patient, how might a doctor react?
  • If this is the second patient a doctor has seen with this infection, how might they react?
  • If this is the 425th patient that the hospital has seen over three weeks with this infection, how might a doctor react?
  • What do you make of the observations?
  • What might be your next step if anti-microbial medicines are not working?

Slide 7
Ask students to answer questions in the “Is a virus alive?” section on the Investigating Viral Proteins handout.


  • Viruses are constructed of barriers around nucleic acid (DNA or RNA). In this case, there is an envelope (think plasma membrane) around a protein capsid. Inside these barriers is a single-stranded RNA molecule used by the virus to hijack a cell's machinery and replicate itself.
  • The virus has 29881 RNA base pairs, making the genome small enough to fit inside the virus, but not large enough to have all the genes that a living organism needs to survive. The smallest “free-living” organism must have 473 genes. This virus can enter the cell and reproduce with only 28 proteins encoded in its genome.

Slide 8-9
Tell students:

  • DNA and RNA are “read” in a specific direction as directed by the sugars used in their construction the ribose in RNA and the deoxyribose in DNA. Similar to how people read the English language left to right, genetic codes read 5’ to 3’ (pronounced as prime).
  • Note: Consider a refresher on the orientation of the ribose ring to help the kids understand the origin of 3’ and 5’.
  • Chemical structure determines the alignment of the nucleotides in a particular direction.
  • Positive-sense RNA from a virus is readable by a ribosome and thus can be translated into amino acids (enzymes that help make copies of the RNA).

Slide 10
Tell students:

  • Remember transcription and translation in eukaryotes DNA mRNA protein.

Ask students:

  • In a positive-sense RNA virus, there is no DNA. How does it do things like transcription or translation?
  • How does the genome replicate itself, so that more viruses can be made?

Slide 11-13
Tell students:

  • SARS-CoV-2 is a positive-sense RNA virus. This allows ribosomes within an infected cell to translate the RNA directly onto proteins. The proteins created by this virus are enzymes and other components to aid in copying itself, including the formation of vesicles.
  • The RNA from the virus is also used as a template for transcription. It is transcribed into negative-sense (aka complementary) RNA and then the negative-sense RNA is again transcribed into positive-sense RNA. This new positive-sense RNA can be used as additional templates for either transcription or translation.
  • The translation of these viruses’ proteins happens in two waves. First, the virus needs ribosomes from the infected cell to translate its RNA into initial proteins so the virus can start transcription. Then, it transcribes a section of its viral genome to be used as templates for the translation of the new viral proteins.

For the teacher’s information:

  • Because the RNA in a virus must adhere to the rules of translation set by the host cell, an mRNA transcript cannot be translated into multiple proteins without the ability to cut itself apart. Translation in a eukaryotic cell usually has an mRNA that it can read from start to stop without any gaps (exons have been removed). The virus works within this pattern by maintaining leader sequences within the genome in front of each open reading frame (ORF). The leader sequences can attract the RNA dependent RNA polymerase (RdRp) to transcribe an mRNA starting from each leader sequence to the end of the genome. These mRNA transcripts of variable length will have ribosomes translate their first protein and then the ribosomes will fall off at the first stop codon, but with the leader sequences before each transcript, there are enough transcripts to make all the proteins necessary for viral function.

Examples of transcription:

  • DNA to RNA: The typical way students learn about transcription. Performed by DNA polymerase.
  • RNA to DNA: Reverse transcription is used by HIV and some other viruses. Also important for the storage and cataloging of RNA in the cell the DNA created is called cDNA. Because cDNA is easier to store, you will not find U (uracil) if you look up a sequence in a national database of stored sequences. The enzyme that does this process is often called reverse transcriptase.
  • RNA to RNA: For this specific virus, the RNA to RNA is called the RNA dependent RNA polymerase (RdRp).

Slide 14-16
Tell students:

  • Translation RNA protein is going to look familiar. Much of the same machinery is going to be used from the host cell. Because the virus does not make proteins, it relies on the machinery of the cell.

Discuss: What are some proteins?

Slide 17
Slide with three examples of proteins. Ideally, most students have heard of one or more.

Slide 18
Sequence analysis is the standard by which we live. There was a time before sequencing when biochemical analysis would have been done, but sequencing is cheap and reliable in the current environment.

Tell students:

  • Add a new test and observation to their patient chart.
  • Tests: Genomic sequence isolated virus
  • Results: Expected
  • Observations: Sequence like the virus: SARS-CoV-2
  • The RNA in the virus has been sequenced. The next step is to research this virus.

Ask students what they expect to see when they get a sequence.


  • U will replace T in the sequence because the program stores RNA sequences as cDNA. RNA does not stay in the cytoplasm because of RNAases that break down RNA.
  • When scientists sequence RNA, they will reverse transcribe or copy the RNA code into the complementary DNA code (cDNA), which is more stable.
  • When looking up viral RNA in a database, you are looking at cDNA, so you will not see any uracil in the code.

Ask students: Where would you look for additional information about a virus?


  • Google
  • Ask an expert
  • Consult a government website

Where experts get their information:

  • Science journals
  • Scientific research
  • Bench research
  • Case studies
  • Computational research (biology scientists use the term “in-silico” to describe research done on a computer)

Slide 19
Information about National Centers for Biotechnology Information (NCBI) database:

  • Scientists from around the world store genetic sequences in this database.
  • This database can be accessed by anyone.
  • It houses genetic sequences from many distinct species along with thousands of viruses.
  • Watch our short tutorial video to learn how to find and analyze DNA and protein sequences.

Slide 20
Students should watch and follow along with this tutorial to learn some basic information about the virus identified as SARS-CoV-2.

Slide 21
Tell students to gather information about one of the five coronavirus proteins identified in their Investigating Viral Proteins handout.

Let them search for some information identifying the function of its proteins. After 5-10 minutes, remind the students that they have some links to suggested abstracts located in their handout related to the function of their assigned proteins:

  • E E (Envelope Protein)
  • S S (Spike Protein)
  • 3a 3a
  • NSP1 NSP1 (Non-structural protein 1)
  • NSP9 NSP9


  • Use screenshots of the beginning of the sequences instead of complete sequences since many of them are exceptionally long.
  • When students have completed their research tasks, ask them to share what they know about the proteins they researched. If you decide not to do formal presentations with the information they gathered, make space to aggregate and display it.

Before you move on, check for understanding:

  • Were there details about the function of each of the five proteins assigned?

Answers for the questions on page 2 of the Investigating Viral Proteins handout:

How many open reading frames (ORF) were in this virus?

  • 18 ORF

How many proteins?

  • 29 proteins

Do you know what they all do?

  • Not all functions are well resolved

Why not?

  • Did anyone look for one of the proteins that were not on the list of five proteins?
  • Reiterate: Some of the proteins have little or no information about what they do. Scientists are working to understand how they work.

Extension opportunity:
The printable, Analysis of Coronavirus Evolution, was designed to replicate a standardized test question using SARS-Cov-2. The information asked about in this document is relevant for a student completing an AP biology class. It is also an introduction to the types of analysis that scientists do with the data available at NCBI, although the data sets are very truncated with only four viral entries. The people that are going to analyze genomes may use hundreds or thousands of genomes to devise a study, which takes a long time to analyze and takes many computing resources to resolve.

If you have decided to use this extension with students that are not completing an advanced biology class, it’s recommended to answer the questions as a large group and remind the students that sometimes answers are unclear and scientists must deal with that also.

Slide 22
Ask students to revisit their mock patient charts. Have students fill out the final diagnosis as COVID-19. Ask students to determine a treatment plan based on current best practices for COVID-19.

Slide 23
Sanford Research is involved in trying to understand many aspects of SARS-CoV-2. Many clinical trials are going on at Sanford Health to help determine the best approach to COVID-19 infections. In the research lab, scientists are trying to understand how the virus works. Watch an interview with a scientist currently studying SARS-CoV-2 proteins.

Sanford Connection

Sanford Health, along with Sanford Research, is studying the coronavirus in many ways, including how it affects the body, how to treat it optimally and how the virus works within your body. We study viruses even when we’re not in a pandemic, although they are an especially important part of our research right now. We study how other viruses, like the HPV virus, make you sick and we use viruses to develop treatments like gene therapies for rare diseases. We also use viruses to manipulate the genomes of cells from human, animal or stem cells that we grow in tissue culture.

Did you try this lesson? Tell us about your experience.

View Survey


Access to Internet Service

Performance Expectations

HS-LS1-1 Construct an explanation based on evidence for how the structure of DNA determines the structure of proteins, which carry out the essential functions of life through systems of specialized cells.

Science & Engineering Practices

  • Constructing explanations and designing solutions
  • Engaging in arguments from evidence

Core Ideas

LS1-A From Molecules to Organisms: Structure and Function

Crosscutting Concepts

Cause and effect Structure and function