Paola Bonizzoni

Paola Bonizzoni is full professor in Computer Science at the Università di Milano-Bicocca.

Main research areas:  computational complexity, algorithms on graphs, algorithms for computational biology  and bioinformatics.
Her contributions include the solution of combinatorial problems in bioinformatics related to sequence analysis, haplotypying, haplotype assembly, phylogenetic reconstruction and comparison and alternative splicing prediction. Her most recent interests are on algorithms and data structures for computational pangenomics.

She also contributed to the investigation of mathematical models for the recombination of linear and circular DNA and on the computational power of splicing systems for regular  languages.

Associations

Service

Managing editor of Computability, the Journal of the Association CiE (IOS Press)

Editor of  Theory and Application of Computability – CIE book series

Associate Editor Theoretical Computer Science-C

President of  the Association Computability in Europe from July 2016 to July 2020

Paola Bonizzoni – Other services

Past services

Chair of

  • CIE 2013, Computability in Europe 2013

Projects

MUR PRIN 2022 PANGENOME INFORMATICS: from Theory to Applications

  H2020- MSCA-RISE 2019  Pan-genome Graph Algorithms and Data Integration 

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 872539

  H2020- ITN 2020 Algorithms for Pangenome Computational Analysis

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 956229

CARIPLO 2013-0955 “Modulation of anti-cancer immune response by regulatory non-coding RNAs”

  • PIntron, intron prediction software
  • ASPIC-DB, alternative splicing prediction database
  • MIUR PRIN 2010-2011 grant “Automi e Linguaggi Formali: Aspetti Matematici e Applicativi” H41J12000190001
  • PPH, software for haplotype inference by perfect phylogeny
  • FIRB ’08 Bioinformatics: genomics and proteomics

Past Students:

Simone Zaccaria GROUP LEADER at UCL, London UK

Anna Paola Carrieri Senior Research Scientist and a Manager at IBM Research Europe – Daresbury (UK)

Giulia Bernardini

Riccardo Dondi

Marco Previtali Senior Software Engineer at Bloomberg

Stefano Beretta Senior Bioinformatician

Recent Award:

BEST POSTER AWARD – recomb2018

Luca Denti, Raffaella Rizzi, Stefano Beretta, Gianluca Della Vedova, Marco Previtali and Paola Bonizzoni
ASGAL: Aligning RNA-Seq Data to a Splicing Graph to Detect Novel Alternative Splicing Events

Recent Invited talks:

How Formal Languages can help Pangenomics? Joint talk at DLT22  and DTMB22 (Tampa, Florida, DLT22)

Algorithms and data structures in graph pangenomes: what is next? (Padua, Italy, Recomb-seq 2020)

Automata and formal languages for next generation sequencing data. (15th AFL 2017, September Debrecen, Hungary)

Reconstructing and comparing the evolution of genomic mutations from discrete data. (American Mathematical Society – Session on Mathematics of Biomolecules: Discrete, Algebraic, and Topological – Orlando, Florida 2017)

What is a gene? How and why computer science helps in answering this Question? (Joined keynote ACM-Conference on Bioinformatics, Computational Biology BCB  and Workshop on Algorithms in Bioinformatics  (WABI) 2015 )

The Holy Grail: Finding the Genetic Bases of Phenotypic Characters (Unconventional Computation and Natural Computation  (UCNC – 2012)

A novel perspective in algorithmic combinatorial methods for phasing populations in a coalescent model (International Conference on Computational Advances in Bio and Medical Sciences ICCABS 2011)

The binary perfect phylogeny model with persistent characters Discrete and Topological Models in Mathematical Biology, satellite workshop of the annual conference of the American Mathematical Society, University of Florida, Tampa, USA, March 2012.

RECENT WORK:

Nature Methods: Improved structural variant discovery in hard-to-call regions using sample-specific string detection from accurate long reads  (Biorxiv)

NEW: Numeric Lyndon-based feature embedding of sequencing reads for machine learning approaches. Inf. Sci. 607: 458-476 (2022)

MALVIRUS: an integrated application for viral variant analysis BMC BioinformaticsShark: fishing relevant reads in an RNA-Seq sample Bioinformatics, to appear

Inferring Cancer Progression from Single-Cell Sequencing while Allowing Mutation Losses Bioinformatics  

Triplet-based similarity score for fully multi-labeled trees with poly-occurring labels  Bioinformatics  

On Two Measures of Distance Between Fully-Labelled Trees. CPM 2020: 6:1-6:16

Lyndon Words versus Inverse Lyndon Words: Queries on Suffixes and Bordered Words. LATA 2020: 385-396

new  MALVA: genotyping by Mapping-free ALlele detection of known VAriants. iScience volume 18: 20-27 (2019) Recomb-seq special issue

iScienceWeb

ASGAL: aligning RNA-Seq data to a splicing graph to detect novel alternative splicing events. BMC Bioinformatics 19(1): 444:1-444:21 (2018)

Schermata 2019-09-19 alle 16.08.20

12859_2018_2436_Fig6_HTML-1                 12859_2018_2436_Fig1_HTML

Multithread Multistring Burrows-Wheeler Transform and Longest Common Prefix Array. Journal of Computational Biology 26(9): 948-961 (2019)

Effective Clustering for Single Cell Sequencing Cancer Data. BCB 2019: 437-446

A Rearrangement Distance for Fully-Labelled Trees. CPM 2019: 28:1-28:15

Schermata 2019-09-19 alle 16.21.22

HapCHAT: adaptive haplotype assembly for efficiently leveraging high coverage in long reads. BMC Bioinformatics 19(1): 252:1-252:19 (2018)

gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data. ICCABS 2018: 1

Beyond Perfect Phylogeny: Multisample Phylogeny Reconstruction via ILP (ACM-BCB 2017)

Mapping RNA-seq Data to a Transcript Graph via Approximate Pattern Matching to a Hypertext. AlCoB 2017: 49-61

An External-Memory Algorithm for String Graph Construction. Algorithmica 78(2): 394-424 (2017)
HapCol: accurate and memory-efficient haplotype assembly from long reads.
Pirola Y., Zaccaria S., Dondi R., Klau GW., Pisanti N., Bonizzoni P. Bioinformatics. 2015 Aug
Abstract
MOTIVATION:
Haplotype assembly is the computational problem of reconstructing haplotypes in diploid organisms and is of fundamental importance for characterizing the effects of single-nucleotide polymorphisms on the expression of phenotypic traits. Haplotype assembly highly benefits from the advent of ‘future-generation’ sequencing technologies and their capability to produce long reads at increasing coverage. Existing methods are not able to deal with such data in a fully satisfactory way, either because accuracy or performances degrade as read length and sequencing coverage increase or because they are based on restrictive assumptions.
RESULTS:
By exploiting a feature of future-generation technologies-the uniform distribution of sequencing errors-we designed an exact algorithm, called HapCol, that is exponential in the maximum number of corrections for each single-nucleotide polymorphism position and that minimizes the overall error-correction score. We performed an experimental analysis, comparing HapCol with the current state-of-the-art combinatorial methods both on real and simulated data. On a standard benchmark of real data, we show that HapCol is competitive with state-of-the-art methods, improving the accuracy and the number of phased positions. Furthermore, experiments on realistically simulated datasets revealed that HapCol requires significantly less computing resources, especially memory. Thanks to its computational efficiency, HapCol can overcome the limits of previous approaches, allowing to phase datasets with higher coverage and without the traditional all-heterozygous assumption.
AVAILABILITY AND IMPLEMENTATION:
Our source code is available under the terms of the GNU General Public License at http://hapcol.algolab.eu/.
CONTACT:
bonizzoni@disco.unimib.itSupplemen
On the Fixed Parameter Tractability and Approximability of the Minimum Error Correction Problem.

Paola Bonizzoni, Riccardo Dondi, Gunnar W. Klau, Yuri Pirola, Nadia Pisanti, Simone Zaccaria
CPM 2015, 100-113, Springer-Verlag

Transcriptome Assembly and Alternative Splicing Analysis
Paola Bonizzoni, Gianluca Della Vedova, Graziano Pesole, Ernesto Picardi, Yuri Pirola, and Raffaella Rizzi, in RNA Bioinformatics, Methods in Molecular Biology, vol. 1269, Springer-Verlag 2015

Modeling Alternative Splicing Variants from RNA-Seq Data with Isoform Graphs  Journal of Computational Biology Journal of Computational Biology 21(1): 16-40 (2014)  S. Beretta, P. Bonizzoni, G.D. Vedova, Y. Pirola, and R. Rizzi

When and how the Perfect Phylogeny model explains Evolution (review) In  Discrete and Topological Models in Molecular Biology in preparation, Springer-Verlag  2013 editors N. Jonoska, M. Saito.

Further Steps in TANGO: Improved Taxonomic Assignment in Metagenomics.  Bioinformatics 30(1): 17-23 (2014)

Existence of Constants in Regular Splicing Languages Information and Computation (2015), pp. 340-353 DOI information: 10.1016/j.ic.2015.04.001

The binary perfect phylogeny with persistent characters. Theor. Comput. Sci. 454: 51-63 (2012)

Address

Dipartimento di Informatica Sistemistica e Comunicazione
Università Degli Studi di Milano-Bicocca
Viale Sarca 336 Milano, 20126 (ITALY)
email: bonizzoni at disco dot unimib dot it, tel. (+39) 02-64487814, fax: (+39) 02-64487839

Publications

see a  list at DBLP

see Paola Bonizzoni – Other services

Computer science and women:

PROGETTO NERD? (NON E’ ROBA PER DONNE ?)