A fast and practical approach to genotype phasing and imputation on a pedigree with erroneous and incomplete information

A fast and practical approach to genotype phasing and imputation on a pedigree with erroneous and incomplete information, Yuri Pirola, ICCABS 2012 (slides).

This work proposes the Min-Recombinant Haplotype Configuration with Bounded Errors problem (MRHCE), which extends the original Min-Recombinant Haplotype Configuration formulation by incorporating two common characteristics of real data: errors and missing genotypes (including untyped individuals). We describe a practical algorithm for MRHCE that is based on a reduction to the Satisfiability problem (SAT) and exploits recent advances in the constraint programming literature. An experimental analysis demonstrates the soundness of our model and the effectiveness of the algorithm under several scenarios. The analysis on real data and the comparison with state-of-the-art programs reveals that our approach couples better scalability to large and complex pedigrees with the explicit inclusion of genotyping errors into the model. The software, released under the GNU General Public License, can be freely downloaded from this page.

PIntron: a fast method for gene structure prediction via maximal pairings of a pattern and a text

PIntron: a fast method for gene structure prediction via maximal pairings of a pattern and a text, Yuri Pirola, ICCABS 2011 (slides).

In this work, we propose a novel pipeline for computational gene-structure prediction based on spliced alignment of expressed sequences (ESTs and mRNAs). This pipeline, called PIntron, is composed by four steps: Firstly, alternative alignments of expressed sequences to a reference genomic sequence are implicitly computed and represented in a graph (called embedding graph) by a novel fast spliced alignment procedure. Secondly, biologically meaningful alignments are extracted. Then, a consensus gene structure induced by the previously computed alignments is determined based on a parsimony principle. Finally, the resulting introns are reconciliated and classified according to general biological criteria. The software, released under the GNU Affero General Public License, can be freely downloaded from this page.

 

Haplotype Inference on Pedigrees with Recombinations and Mutations

Haplotype Inference on Pedigrees with Recombinations and Mutations, Yuri Pirola, WABI 2010 (slides).

The work proposes a new combinatorial formulation for haplotype inference on general pedigrees that generalizes the existing combinatorial ones to a more realistic settings. We prove an approximation-preserving reduction from this problem, called Minimum Change Haplotype Configuration (MCHC), to a well-known coding theory problem, namely the Nearest Codeword Problem. This reduction automatically implies the approximability of MCHC within a factor O(n/log n) and, more importantly, it leads to a new very efficient heuristic algorithm that has been experimentally compared with other state-of-the-art HI methods. The software implementing the heuristic is freely available at this page under the GNU General Public License.

 

Pure Parsimony Xor Haplotyping

Pure Parsimony Xor Haplotyping, Yuri Pirola, ISBRA 2009 (slides).

In this work we addressed the problem of haplotype inference from xor genotypes under the pure parsimony assumption. Exact algorithms for restricted instances, a fixed parameter algorithm, an approximation algorithm, and an effective heuristic have been proposed. A prototypical implementation of the heuristic is freely available at this page under the GNU General Public License.

Minimum Factorization Agreement of Spliced ESTs

Minimum Factorization Agreement of Spliced ESTs, Yuri Pirola, WABI 2009 (slides).

The work presents a new parsimony-based formulation for the problem of choosing the best alignment of an expressed sequence against a genomic sequence exploiting the redundancy of current expressed sequence databases. A preliminary experimental evaluation of the formulation and the related algorithm shows their applicability on real instances. The implementation of this approach is integrated in our software PIntron.