An approach combining fluorescence-activated cell sorting and high-throughput DNA sequencing (FACS-seq) was employed to determine the efficiency of start codon recognition for all possible translation initiation sites (TIS) utilizing AUG start codons. Using FACS-seq, we measured translation from a genetic reporter library representing all 65,536 possible TIS sequences spanning the -6 to +5 positions. We found that the motif RYMRMVAUGGC enhanced start codon recognition and translation efficiency. However, dinucleotide interactions, which cannot be conveyed by a single motif, were also important for modeling TIS efficiency. Our dataset combined with modeling allowed us to predict genome-wide translation initiation efficiency for all mRNA transcripts. Additionally, we screened somatic TIS mutations associated with tumorigenesis to identify candidate driver mutations consistent with known tumor expression patterns. Finally, we implemented a quantitative leaky scanning model to predict alternative initiation sites that produce truncated protein isoforms and compared predictions with ribosome footprint profiling data. The comprehensive analysis of the TIS sequence space enables quantitative predictions of translation initiation based on genome sequence.
View details for DOI 10.15252/msb.20145136
View details for PubMedID 25170020