ORF predictions

To capture the full SARS-CoV-2 coding capacity, we applied a suite of ribosome profiling approaches to Vero cells infected with SARS-CoV-2 for 5 and 24 hours, and Calu3 cells infected for 7 hours. For each time point we prepared three different ribosome-profiling libraries, each one in two biological replicates. Two Ribo-seq libraries facilitate mapping of translation initiation sites, by treating cells with lactimidomycin (LTM) or harringtonine (Harr), two drugs with distinct mechanisms that prevent 80S ribosomes at translation initiation sites from elongating. The third Ribo-seq library was prepared from cells treated with the translation elongation inhibitor cycloheximide (CHX), and gives a snap-shot of actively translating ribosomes across the body of the translated ORF. In parallel, RNA-sequencing was applied to map viral transcripts.

The ORF prediction was done by using two computational tools, PRICE and ORF-RATER, that rely on different features of ribosome profiling data, and by manual inspection of the data. The predictions are based on Ribo-seq libraries from two time points (5 and 7 hpi) of two different cell lines (Vero E6 and Calu3 cells), infected with separate virus isolates.

The Ribo-Seq data of the 24 hours samples do not show the expected profile of read distribution on viral genes and therefore were not used for the procedure of ORF predictions.

For more details Finkel, Y., Mizrahi, O., Nachshon, A. et al. The coding capacity of SARS-CoV-2. Nature (2020).

The raw data files can be found in GSE149973