The investigation of the genetic differences among humans has given evidence that mutations in DNA sequences are responsible for some genetic diseases. The most common mutation is the one that involves only a single nucleotide of the DNA sequence, which is called a single nucleotide polymorphism (SNP). As a consequence, computing a complete map of all SNPs occurring in the human populations is one of the primary goals of recent studies in human genomics.
Our research in this field is mainly focused on the design and experimentation of algorithms for solving combinatorial problems related to haplotype inference and genetic variation analysis.
Specific computational problems of interest are: (1) genotype imputation and haplotype reconstruction in pedigrees on real data (human and farm animals) (2) haplotype phasing and genotype analysis assuming the coalescent model of the perfect phylogeny describing the evolutionary history of SNP data in the presence of recurrent mutations.
Haplotype assembly is the problem that aims at reconstructing the two distinct copies of each chromosome, called haplotypes, starting from a collection of sequencing reads that are aligned to a reference genome.
Since the current state-of-the-art approaches failed to fully exploit the novel characteristics of future-generation sequencing technologies, our goal is the modelling of new combinatorial formulations of the problem and the design of algorithms that allow to overcome the limits of current state-of-the-art approaches, allowing to phase larger datasets in order to increase the accuracy and without restrictive assumption.
Here are some links explaining our work, and the associated projects.
Pirola, Y., Zaccaria, S., Dondi, R., Klau, G. W., Pisanti, N., & Bonizzoni, P. (2015). HapCol: accurate and memory-efficient haplotype assembly from long reads. Bioinformatics, 32(11), 1610-1617.
Beretta, S., Patterson, M., Marschall, T., Martin, M., Zaccaria, S., Della Vedova, G., & Bonizzoni, P. (2017). HapCHAT: Adaptive haplotype assembly for efficiently leveraging high coverage in long reads. bioRxiv, 170225.