Evolution and Comparative Genomics

Persistent Phylogeny

Character-based parsimony models have been among the most studied notions in computational evolution, with a special focus on the perfect phylogeny model, where each character can be acquired at most once and cannot be lost. Soon this model has been touted as unrealistic.

On the other hand this model has been popular in Bioinformatics, as several efficient algorithms have been proposed. This project studies a generalization of the perfect phylogeny model, called persistent phylogeny, where each character can be acquired at most once and lost at most once in the entire tree. The main computational problem is, given a binary matrix M, to compute a tree (if it exists) explaining M.

We have proposed a fixed-parameter algorithm for this problem and for a more general version where some species are known to never have acquired a certain character in this evolution.

pp-9pp-12

Cancer genomics

Recent developments in targeted therapies for cancer treatment rely on the accurate inference of the clonal evolution and progression of the disease, in fact understanding the order of accumulation and the prevalence of somatic mutations during cancer progression can help better devise these treatment strategies.

Most of the tools available for the inference of cancer progression rely on the Infinite Site Assumption stating that no two mutations can occur at the same locus and therefore assuming a perfect phylogeny. However recent studies are proving that the Infinite Site Assumption is too strict to correctly model all type of cancer, therefore more generic models are needed.

We pioneered the integration of different phylogeny models to the field of cancer mutational reconstruction by proposing two tools that employ the Dollo-k and Camin-Sokal-k parsimony model. The Dollo-k model allows each character to be acquired exactly once in the evolutionary history and to be lost at most k times; on the other hand the Camin-Sokal-k model allows multiple acquisitions of character but no losses.

We developed gppf (General Parsimony Phylogeny models from Frequencies) tool, an Mixed Integer Linear Programming (MILP) model used to reconstruct cancer progressions from Variant Allele Frequency (VAF) data that introduce the possibility of recontructing cancer progression based on
Perfect, Persistent, Dollo-k and Camin-Sokal-k phylogeny models.

Several advantages in sequencing technology led to a larger use of Single Cell Sequencing (SCS) for the study of cancer heterogeneity, thus new methods for the inference of cancer progression are needed to elaborate this new type of data.
We proposed SASC (Simulated Annealing Single Cell) inference tool, a maximum likelihood tree search inference tool, the first method that employs a Dollo-k model to the field of SCS tumour analysis.

Software

Polynomial-time implementation is available on github, where you can also find support.

Exponential-time implementation is available on github, where you can also find support.

The cancer progression inference tool from VAF data, gppf, is available on github, where you can also find support.

The cancer progression inference tool from SCS data, SASC, is available on github, where you can also find support.

 

References

Bonizzoni, P., Della Vedova, G., & Trucco, G. (2016). Solving the Persistent Phylogeny Problem in polynomial time. Arxiv

Bonizzoni, P., Carrieri, A.P., Della Vedova, G., Rizzi R. & Trucco, G. (2016). A colored graph approach to perfect phylogeny with persistent characters. Theoretical Computer Science.

Bonizzoni, P., Carrieri, A.P., Della Vedova, G., & Trucco, G. (2014). Explaining evolution via constrained persistent perfect phylogeny. BMC GENOMICS, 15(Suppl 6), S10.

Bonizzoni, P., Carrieri, A.P., Della Vedova, G., Dondi, R., & Przytycka, T. (2014). When and How the Perfect Phylogeny Model Explains Evolution. In N. Jonoska, & M. Saito (editors, Discrete and Topological Models in Molecular Biology (pp. 67-83). Springer.

Bonizzoni, P., Braghin, C., Dondi, R., & Trucco, G. (2012). The binary perfect phylogeny with persistent characters. THEORETICAL COMPUTER SCIENCE, 454, 51-63. Arxiv

Bonizzoni, P., Ciccolella, S., Della Vedova, G., Soto, M. (2017). Beyond Perfect Phylogeny: Multisample Phylogeny Reconstruction via ILP. In Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics (ACM-BCB ’17). ACM, New York, NY, USA, 1-10.

Ciccolella, S., Soto Gomez, M., Patterson, M., Della Vedova, G., Hajirasouliha, I., Bonizzoni, P. (2018). Inferring Cancer Progression from Single-cell Sequencing while Allowing Mutation Losses. Preprint available on BioRxiv.