First Workshop On Pangenomics Algorithms for Omics Data

June 21st, 2019

Univ. Milano-Bicocca. U6 Building – Room Martini (floor -1).

Theme and scope

Computational Pangenomics wants to produce the computational tools for transitioning from a sequence-based genome to graph-based reference genomes, where variants of huge cohorts are represented in a unified way.

Transcriptomics was the first field where the need of imposing a structure over strings has become clear, leading to the notion of a splicing graph to represent all possible transcripts of a gene, as well as the exon-intron structure.

Genome assemblers are increasingly aware that a diploid organism has two different, but very similar, copies of each genome: hence the correct outcome of an assembly cannot be a single string. Finally, metagenomics studies different species that can share portions of the genome: another example where the correct representation of an assembly cannot be linear. Moreover, the 1000 Genomes project has generated new interest in the analysis of a population genome.

The workshop had a previous edition in 2016 focused on Graph assembly algorithm. This time the main aim of the workshop is to bring together researchers that can collaborate on some open problems on graph-based algorithmic approaches to bioinformatics common topics in Bioinformatics.  This year the plan is to apply for the COST program. Since the aim is also stimulate new collaborations, this edition will be mainly focused on the discussion of open problems. To this purpose, part of the day will be focused on the creation of small working groups with each working on an open problem.

Please fill in the registration form if you are interested in participating.


  • 10.00 Broňa Brejová (Comenius University): Differential nanopore coverage analysis through k-mers
  • 10.25 Travis Gagie (Univ. Diego Portales): Report from Dagstuhl 19241
  • 10.50 Fabio Vandin (Univ. Padova): Fast Approximation of Frequent k-mers and Applications to Metagenomics
  • 11.15 Coffee break
  • 11.30 Iman Hajirasouliha (Weill Cornell Medical College): Novel algorithms and applications of Linked-Read genomics and metagenomics (slides)
  • 12.05 Erik Garrison (UCSC): Untangling the pangenome (slides)
  • 12.40 Lunch
  • 14.15 The COST proposal
  • 14.30 Open problems
  • 15.00 Group work on open problems
  • 17.00 Summary of the group work
  • 17.30 Closing

Nonexclusive list of relevant topics

  • graph assembly
  • variation graphs
  • de Bruijn graphs
  • string graphs
  • alternative splicing
  • metagenomics
  • the population genome
  • pan-genome graph
  • graph fragment assembly

Venue and accommodation

The workshop will take place in Milano, at the U6 building (Edificio U6). The building is close to the Bicocca underground station (MM5).

Milan is easily reachable by all major European and main international airports.

The suggested hotels are Starhotel Tourist and Hotel Arcimboldi that are both within a 5-minute walk from the workshop site.

Scientific Committee:

Paola Bonizzoni (
Gianluca Della Vedova (
Marco Previtali (
Raffaella Rizzi (

Advanced Techniques for Combinatorial Algorithms (2018)

This is the first PhD-level course on the design and the analysis of efficient algorithms, with a strong emphasis on theoretical aspects. The prerequisites are a very basic knowledge of graph theory and computational complexity, as well as a good understanding of undergraduate-level courses of algorithms, discrete mathematics, and operation research.
It is not necessary to have already taken a graduate-level course on algorithms.
The goal of the course is to give a broad coverage of the main techniques in algorithm design and analysis, so that the course can be useful also for a researcher in a different field. To this purpose, the computational problems tackled are among the most basic problems on strings and graphs, such as pattern matching, vertex cover and max cut.
In fact, anyone that deals with one of the following questions will find the course interesting.

  • How can I cope with problems that are provably hard to solve exactly (i.e. are NP-hard)?
  • How can I exploit the massive availability of CPUs?
  • How can I query efficiently massive texts?


The exam will consists of some exercises that will be given during the course.
Collaboration is encouraged, but each student must write their own version of the solution.


  1. Jan. 15, 10:30-12:30 – meeting room III floor, U14 (lecturer Gianluca Della Vedova) Parallel algorithms. (PRAM model, prefix sum algorithm, Map-Reduce)
  2. Jan. 16, 10:30-12:30 – meeting room III floor, U14 (lecturer Gianluca Della Vedova) Randomized algorithms. (Karp-Rabin algorithm for pattern matching)
  3. Jan. 18, 10:30-12:30 – meeting room III floor, U14 (lecturer Raffaella Rizzi)  Text indexing (Suffix trees, suffix arrays, pattern matching, generalized suffix trees, Longest Common Substring between two strings)
  4. Jan. 22, 10:30-12:30 – meeting room I floor, U14 (lecturer Gianluca Della Vedova) Fixed-parameter algorithms. (Vertex cover)
  5. Jan. 29, 10:30-12:30 – meeting room III floor, U14 (lecturer Gianluca Della Vedova) Approximation algorithms. (Algorithms for Vertex Cover, Max Cut.)
  6. Jan. 30, 13:30-15:30 – meeting room III floor, U14 (lecturer Gianluca Della Vedova) Approximation algorithms. (Satisfiability, Approximation Complexity)
  7. Feb. 1, 10:30-12:30 – meeting room III floor, U14 (lecturer Raffaella Rizzi) Text indexing (Computing a suffix array from the suffix tree, Longest Common Prefix LCP array)
  8. Feb. 5, 10:30-12:30 – meeting room III floor, U14 (lecturer Gianluca Della Vedova) External-memory algorithms. (Parallel Disk Model, Sorting, B-trees)
  9. Feb. 6, 10:30-12:30 – meeting room III floor, U14 (lecturer Gianluca Della Vedova) Streaming algorithms. (Heavy Hitters, Count-Min sketch, Reservoir sampling)
  10. Feb. 8, 10:30-12:30 – meeting room III floor, U14 (lecturer Raffaella Rizzi) Text indexing (Burrows-Wheeler Transform and FM-index)

Lecture notes


Workshop on Combinatorial Algorithms in Bioinformatics

January 26th, 2018
Univ. Milano-Bicocca. Building U14 – Room Aula Seminari

Theme and scope

The ever increasing amount of biological data available requires more efficient tools for managing and analyzing those data. While a widely used line of attack is to throw huge computational resources at the problem and using known methods as black boxes (just think at the recent widespread use of machine learning techniques such as deep learning), this
approach can only be used by a few well-funded organizations, or via cloud computing — the latter possibility brings a plethora of privacy issues. In both cases, it will be hard to extend the results to the clinicians’ desks, where a standard workstation is the most advanced hardware that can be assumed to be available.

For this reason, highly efficient (from a computational point of view) methods are a precondition to translational medicine. In particular we need to compress and query a large set of biological strings (i.e. a pan-genome) and to reconstruct tumor evolutionary histories. The introduction of Burrows-Wheeler Transform (BWT) methods to index and query single strings has resulted in a tremendous progress in short-read mapping against a single (linear) genome: today all widely used programs to align High-Throughput Sequencing data are based on the BWT or on some related notion. This is especially remarkable as it shows how widespread the applications of theoretical results can be.

The aim of this workshop is to gather researchers that are exploring both theoretical and experimental sides of combinatorial algorithms in bioinformatics, to explore the possibility of a joint research proposal.

List of relevant topics

  • sequence alignment
  • sequence mapping
  • text indexing
  • phylogeny reconstruction
  • Burrows-Wheeler Transform and FM-index
  • haplotyping

Tentative schedule

10.00-10.30 Gianluca Della Vedova (Univ. Milano-Bicocca) Computational pan-genomics

10.30-11.00. Nadia Pisanti (Univ. Pisa) Exact and Approximate Read Mapping on Reference Pan-Genome

11.00-11.30. Cinzia Pizzi (Univ. Padova) Alignment-free sequence analysis: recent results and future perspectives

11.30-12.00. Marinella Sciortino (Univ. Palermo) Combinatorial method for sequence analysis

12.00-12.30 Tiziana Calamoneri (Univ. Roma – La Sapienza) Some Problems in (Co-)Phylogeny

14-17 Discussion on research directions and funding proposals

Venue and accommodation

The workshop will take place in Milano, at the Department of Computer Science (DISco), in the Seminar Room (Aula Seminari) located in the first floor of the U14 building (Edificio U14). (Google maps)

Milan is easily reachable by all major European and main international airports. How to reach us

The suggested hotels are Starhotel Tourist and Hotel Arcimboldi that are both within a 5-minute walk from the workshop site.

Scientific Committee

  • Paola Bonizzoni
  • Gianluca Della Vedova
  • Raffaella Rizzi

AlgoDay 2017


26 July 2017


Sala Seminari, U14, DISCo, Univ. Milano-Bicocca


We are witnessing an explosion in the availability of data, ranging from social media interactions, to biological sequences, to sensor data, which share the huge availability and low cost. Since those data are, by far, impractical or impossible to keep in memory, we need a paradigm shift in algorithm design, towards succinct data structures and parallel and/or external memory algorithms.


We are organizing a 1-day brainstorming event devoted to Algorithms and Data Structures for Massive Data, focused on innovative algorithmic solutions and on discussing open problems. Graph-based formulations will be especially investigated. AlgoDay is open to graduate students and researchers that are interested in this topic and are willing to do research in this field.

The event is open to all researchers that are interested in algorithms for massive datasets and want to investigate research directions that can lead to collaboration.

Some topics of interest are:

  • indexing massive data
  •  representation and queries in massive graphs
Detailed program
  • h. 10.30. Marco Previtali (AlgoLab – DISCo). Bloom filter tree and succinct bidirectional de Brujin graphs: Two tools for indexing massive data and graphs
  • h. 11.00. Simone Ciccolella (AlgoLab – DISCo). Beyond Perfect Phylogeny: Multisample Phylogeny Reconstruction via ILP
  • h. 11.30. Paolo Boldi and Marco Genuzio (Univ. Milano). Graph compression and static functions representation
  • h. 12.30. Andrea Maurino (DISCo) TBA
  • h. 13.00. Mauricio Soto (DISCo) Embeddings of semantic graphs
  • h. 13.30. Lunch and discussion on open problems and research directions

Stefano Beretta – Gianluca Della Vedova – Raffaella Rizzi



Special session on Algorithmics for biology

Special session at Computability in Europe 2017.

Organized by Paola Bonizzoni (Milano, Italy) and Veli Mäkinen (Helsinki, Finland).


  • Tobias Marschall (Max-Planck-Institut für Informatik, Germany)
    A Guided Tour to Computational Haplotyping (slides)
  • Fabio Vandin (University of Padova, Italy)
    Algorithms for Cancer Mutation Networks
  • Gregory Kucherov (University Paris-Est Marne-la-Vallée, France)
    Modern algorithmic techniques for biosequence search
  • Gianluca Della Vedova (University of Milano-Bicocca, Italy)
    Character-based Phylogeny Construction and its Application to Tumor Evolution (slides)