An in-silico framework for comparing and validating transcripts predicted from single and paired-end reads

Anna Paola Carrieri, Stefano Beretta, Gianluca Della Vedova, Ernesto Picardi, Yuri Pirola, Raffaella Rizzi, Graziano Pesole, and Paola Bonizzoni, NGS 2012 (poster and abstract).

With the advent of high-throughput sequencing of transcriptome (RNA-Seq), different computational methods that use RNA-Seq data to assemble full-length mRNA isoforms have been proposed, albeit not solving completely the problem. We have analyzed some of the most used available tools, evaluating their performance and accuracy.

Our experimental analysis reveals that using GSNAP instead of TopHat gives more specific predictions with also a minor (but statistically inconclusive) improvement in sensitivity.

We plan to extend our study (i) by introducing some alternatives to EVAL for comparing predictions, (ii) by considering different kinds of simulated data (more coverage levels and/or errors), as well as real data, and (iii) by analyzing in more detail the structure of predicted transcripts since a preliminary study in this direction reveals that the actual methods have various shortcomings in assembling transcripts.