Papers | Parallel Computing

2019

Ivan Merelli, Federico Fornari, Fabio Tordini, Daniele D'Agostino, Marco Aldinucci, Daniele Cesini

Exploiting Docker containers over Grid computing for a comprehensive study of chromatin conformation in different cell types Journal Article

In: Journal of Parallel and Distributed Computing, vol. 134, pp. 116–127, 2019, ISSN: 0743-7315.

Abstract | Links | BibTeX | Tags: bioinformatics

2017

Fabio Tordini, Maurizio Drocco, Claudia Misale, Luciano Milanesi, Pietro Liò, Ivan Merelli, Massimo Torquati, Marco Aldinucci

NuChart-II: the road to a fast and scalable tool for Hi-C data analysis Journal Article

In: International Journal of High Performance Computing Applications, vol. 31, no. 3, pp. 196–211, 2017.

Abstract | Links | BibTeX | Tags: bioinformatics, fastflow, repara, rephrase

@article{16:ijhpca:nuchart,

title = {NuChart-II: the road to a fast and scalable tool for Hi-C data analysis},

author = {Fabio Tordini and Maurizio Drocco and Claudia Misale and Luciano Milanesi and Pietro Liò and Ivan Merelli and Massimo Torquati and Marco Aldinucci},

url = {https://iris.unito.it/retrieve/handle/2318/1607126/238747/main.pdf},

doi = {10.1177/1094342016668567},

year  = {2017},

date = {2017-01-01},

journal = {International Journal of High Performance Computing Applications},

volume = {31},

number = {3},

pages = {196–211},

abstract = {Recent advances in molecular biology and bioinformatics techniques brought to an explosion of the information about the spatial organisation of the DNA in the nucleus of a cell. High-throughput molecular biology techniques provide a genome-wide capture of the spatial organization of chromosomes at unprecedented scales, which permit to identify physical interactions between genetic elements located throughout a genome. Recent results have shown that there is a large correlation between co-localization and co-regulation of genes, but these important information are hampered by the lack of biologists-friendly analysis and visualisation software. In this work we present NuChart-II, an efficient and highly optimized tool for genomic data analysis that provides a gene-centric, graph-based representation of genomic information. While designing NuChart-II we addressed several common issues in the parallelisation of memory bound algorithms for shared-memory systems. With performance and usability in mind, NuChart-II is a R package that embeds a C++ engine: computing capabilities and memory hierarchy of multi-core architectures are fully exploited, while the versatile R environment for statistical analysis and data visualisation rises the level of abstraction and permits to orchestrate analysis and visualisation of genomic data.},

keywords = {bioinformatics, fastflow, repara, rephrase},

pubstate = {published},

tppubtype = {article}

}

2016

Fabio Tordini

The road towards a Cloud-based High-Performance solution for genomic data analysis PhD Thesis

Computer Science Department, University of Torino, Italy, 2016.

Abstract | Links | BibTeX | Tags: bioinformatics, fastflow

@phdthesis{tordiniThesis16,

title = {The road towards a Cloud-based High-Performance solution for genomic data analysis},

author = {Fabio Tordini},

url = {http://calvados.di.unipi.it/storage/paper_files/2016_tordini_phdthesis.pdf},

year  = {2016},

date = {2016-04-01},

school = {Computer Science Department, University of Torino, Italy},

abstract = {Nowadays, molecular biology laboratories are delivering more and more data about DNA organisation, at increasing resolution and in a large number of samples. So much that genomic research is now facing many of the scale-out issues that high-performance computing has been addressing for years: they require powerful infrastructures with fast computing and storage capabilities, with substantial challenges in terms of data processing, statistical analysis and data representation. With this thesis we propose a high-performance pipeline for the analysis and interpretation of heterogeneous genomic information: beside performance, usability and availability are two essential requirements that novel Bioinformatics tools should satisfy. In this perspective, we propose and discuss our efforts towards a solid infrastructure for data processing and storage, where software that operates over data is exposed as a service, and is accessible by users through the Internet. We begin by presenting NuChart-II, a tool for the analysis and interpretation of spatial genomic information. With NuChart-II we propose a graph-based representation of genomic data, which can provide insights on the disposition of genomic elements in the DNA. We also discuss our approach for the normalisation of biases that affect raw sequenced data. We believe that many currently available tools for genomic data analysis are perceived as tricky and troublesome applications, that require highly specialised skills to obtain the desired outcomes. Concerning usability, we want to rise the level of abstraction perceived by the user, but maintain high performance and correctness while providing an exhaustive solution for data visualisation. We also intend to foster the availability of novel tools: in this work we also discuss a cloud solution that delivers computation and storage as dynamically allocated virtual resources via the Internet, while needed software is provided as a service. In this way, the computational demand of genomic research can be satisfied more economically by using lab-scale and enterprise-oriented technologies. Here we discuss our idea of a task farm for the integration of heterogeneous data resulting from different sequencing experiments: we believe that the integration of multi-omic features on a nuclear map can be a valuable mean for studying the interactions among genetic elements. This can reveal insights on biological mechanisms, such as genes regulation, translocations and epigenetic patterns.},

keywords = {bioinformatics, fastflow},

pubstate = {published},

tppubtype = {phdthesis}

}

Nowadays, molecular biology laboratories are delivering more and more data about DNA organisation, at increasing resolution and in a large number of samples. So much that genomic research is now facing many of the scale-out issues that high-performance computing has been addressing for years: they require powerful infrastructures with fast computing and storage capabilities, with substantial challenges in terms of data processing, statistical analysis and data representation. With this thesis we propose a high-performance pipeline for the analysis and interpretation of heterogeneous genomic information: beside performance, usability and availability are two essential requirements that novel Bioinformatics tools should satisfy. In this perspective, we propose and discuss our efforts towards a solid infrastructure for data processing and storage, where software that operates over data is exposed as a service, and is accessible by users through the Internet. We begin by presenting NuChart-II, a tool for the analysis and interpretation of spatial genomic information. With NuChart-II we propose a graph-based representation of genomic data, which can provide insights on the disposition of genomic elements in the DNA. We also discuss our approach for the normalisation of biases that affect raw sequenced data. We believe that many currently available tools for genomic data analysis are perceived as tricky and troublesome applications, that require highly specialised skills to obtain the desired outcomes. Concerning usability, we want to rise the level of abstraction perceived by the user, but maintain high performance and correctness while providing an exhaustive solution for data visualisation. We also intend to foster the availability of novel tools: in this work we also discuss a cloud solution that delivers computation and storage as dynamically allocated virtual resources via the Internet, while needed software is provided as a service. In this way, the computational demand of genomic research can be satisfied more economically by using lab-scale and enterprise-oriented technologies. Here we discuss our idea of a task farm for the integration of heterogeneous data resulting from different sequencing experiments: we believe that the integration of multi-omic features on a nuclear map can be a valuable mean for studying the interactions among genetic elements. This can reveal insights on biological mechanisms, such as genes regulation, translocations and epigenetic patterns.

Fabio Tordini, Ivan Merelli, Pietro Liò, Luciano Milanesi, Marco Aldinucci

NuchaRt: embedding high-level parallel computing in R for augmented Hi-C data analysis Book Section

In: Publishing, Springer International (Ed.): Computational Intelligence Methods for Bioinformatics and Biostatistics, vol. 9874, pp. 259–272, Springer International Publishing, Cham (ZG), 2016, ISBN: 978-3-319-44331-7.

Abstract | Links | BibTeX | Tags: bioinformatics, fastflow, repara

Fabio Tordini

A cloud solution for multi-omics data integration Proceedings Article

In: Proceedings of the 16th IEEE International Conference on Scalable Computing and Communication, pp. 559–566, IEEE Computer Society, 2016, (Best paper award).

Abstract | Links | BibTeX | Tags: bioinformatics, fastflow, rephrase

2015

Fabio Tordini, Maurizio Drocco, Ivan Merelli, Luciano Milanesi, Pietro Liò, Marco Aldinucci

NuChart-II: a graph-based approach for the analysis and interpretation of Hi-C data Proceedings Article

In: Serio, Clelia Di, Liò, Pietro, Nonis, Alessandro, Tagliaferri, Roberto (Ed.): Proc. of 11th Intl. Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB), pp. 298–311, Springer, Cambridge, UK, 2015, ISBN: 978-3-319-24461-7.

Abstract | Links | BibTeX | Tags: bioinformatics, fastflow, paraphrase, repara

@inproceedings{14:ff:nuchart:cibb,

title = {NuChart-II: a graph-based approach for the analysis and interpretation of Hi-C data},

author = {Fabio Tordini and Maurizio Drocco and Ivan Merelli and Luciano Milanesi and Pietro Liò and Marco Aldinucci},

editor = {Clelia Di Serio and Pietro Liò and Alessandro Nonis and Roberto Tagliaferri},

url = {http://calvados.di.unipi.it/storage/paper_files/2014_nuchart_cibb.pdf},

doi = {10.1007/978-3-319-24462-4_25},

isbn = {978-3-319-24461-7},

year  = {2015},

date = {2015-06-01},

booktitle = {Proc. of 11th Intl. Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB)},

volume = {8623},

pages = {298–311},

publisher = {Springer},

address = {Cambridge, UK},

series = {LNCS},

abstract = {Long-range chromosomal associations between genomic regions, and their repositioning in the 3D space of the nucleus, are now considered to be key contributors to the regulation of gene expressions, and important links have been highlighted with other genomic features involved in DNA rearrangements. Recent Chromosome Conformation Capture (3C) measurements performed with high throughput sequencing (Hi-C) and molecular dynamics studies show that there is a large correlation between co-localization and co-regulation of genes, but these important researches are hampered by the lack of biologists-friendly analysis and visualisation software. In this work we present NuChart-II, a software that allows the user to annotate and visualize a list of input genes with information relying on Hi-C data, integrating knowledge data about genomic features that are involved in the chromosome spatial organization. This software works directly with sequenced reads to identify related Hi-C fragments, with the aim of creating gene-centric neighbourhood graphs on which multi-omics features can be mapped. NuChart-II is a highly optimized implementation of a previous prototype package developed in R, in which the graph-based representation of Hi-C data was tested. The prototype showed inevitable problems of scalability while working genome-wide on large datasets: particular attention has been paid in optimizing the data structures employed while constructing the neighbourhood graph, so as to foster an efficient parallel implementation of the software. The normalization of Hi-C data has been modified and improved, in order to provide a reliable estimation of proximity likelihood for the genes.},

keywords = {bioinformatics, fastflow, paraphrase, repara},

pubstate = {published},

tppubtype = {inproceedings}

}

Maurizio Drocco, Claudia Misale, Guilherme Peretti Pezzi, Fabio Tordini, Marco Aldinucci

Memory-Optimised Parallel Processing of Hi-C Data Proceedings Article

In: Proc. of 23rd Euromicro Intl. Conference on Parallel Distributed and network-based Processing (PDP), pp. 1–8, IEEE, 2015.

Abstract | Links | BibTeX | Tags: bioinformatics, fastflow, impact, paraphrase, repara

Fabio Tordini, Maurizio Drocco, Claudia Misale, Luciano Milanesi, Pietro Liò, Ivan Merelli, Marco Aldinucci

Parallel Exploration of the Nuclear Chromosome Conformation with NuChart-II Proceedings Article

In: Proc. of 23rd Euromicro Intl. Conference on Parallel Distributed and network-based Processing (PDP), IEEE, 2015.

Abstract | Links | BibTeX | Tags: bioinformatics, fastflow, impact, paraphrase, repara

Ivan Merelli, Fabio Tordini, Maurizio Drocco, Marco Aldinucci, Pietro Liò, Luciano Milanesi

Integrating Multi-omic features exploiting Chromosome Conformation Capture data Journal Article

In: Frontiers in Genetics, vol. 6, no. 40, 2015, ISSN: 1664-8021.

Abstract | Links | BibTeX | Tags: bioinformatics, fastflow

@article{nuchart:frontiers:15,

title = {Integrating Multi-omic features exploiting Chromosome Conformation Capture data},

author = {Ivan Merelli and Fabio Tordini and Maurizio Drocco and Marco Aldinucci and Pietro Liò and Luciano Milanesi},

url = {http://journal.frontiersin.org/Journal/10.3389/fgene.2015.00040/pdf},

doi = {10.3389/fgene.2015.00040},

issn = {1664-8021},

year  = {2015},

date = {2015-01-01},

journal = {Frontiers in Genetics},

volume = {6},

number = {40},

abstract = {The representation, integration and interpretation of omic data is a complex task, in particular considering the huge amount of information that is daily produced in molecular biology laboratories all around the world. The reason is that sequencing data regarding expression profiles, methylation patterns, and chromatin domains is difficult to harmonize in a systems biology view, since genome browsers only allow coordinate-based representations, discarding functional clusters created by the spatial conformation of the DNA in the nucleus. In this context, recent progresses in high throughput molecular biology techniques and bioinformatics have provided insights into chromatin interactions on a larger scale and offer a formidable support for the interpretation of multi-omic data. In particular, a novel sequencing technique called Chromosome Conformation Capture (3C) allows the analysis of the chromosome organization in the cell's natural state. While performed genome wide, this technique is usually called Hi-C. Inspired by service applications such as Google Maps, we developed NuChart, an R package that integrates Hi-C data to describe the chromosomal neighbourhood starting from the information about gene positions, with the possibility of mapping on the achieved graphs genomic features such as methylation patterns and histone modifications, along with expression profiles. In this paper we show the importance of the NuChart application for the integration of multi-omic data in a systems biology fashion, with particular interest in cytogenetic applications of these techniques. Moreover, we demonstrate how the integration of multi-omic data can provide useful information in understanding why genes are in certain specific positions inside the nucleus and how epigenetic patterns correlate with their expression.},

keywords = {bioinformatics, fastflow},

pubstate = {published},

tppubtype = {article}

}

Marco Aldinucci, Andrea Bracciali, Tobias Marschall, Murray Patterson, Nadia Pisanti, Massimo Torquati

High-Performance Haplotype Assembly Proceedings Article

In: Serio, Clelia Di, Liò, Pietro, Nonis, Alessandro, Tagliaferri, Roberto (Ed.): Computational Intelligence Methods for Bioinformatics and Biostatistics - 11th International Meeting, CIBB 2014, Cambridge, UK, June 26-28, 2014, Revised Selected Papers, pp. 245–258, Springer, Cambridge, UK, 2015.

Abstract | Links | BibTeX | Tags: bioinformatics, fastflow

2014

Marco Aldinucci, Massimo Torquati, Concetto Spampinato, Maurizio Drocco, Claudia Misale, Cristina Calcagno, Mario Coppo

Parallel stochastic systems biology in the cloud Journal Article

In: Briefings in Bioinformatics, vol. 15, no. 5, pp. 798–813, 2014, ISSN: 1467-5463.

Abstract | Links | BibTeX | Tags: bioinformatics, fastflow, impact, paraphrase

Marco Aldinucci, Cristina Calcagno, Mario Coppo, Ferruccio Damiani, Maurizio Drocco, Eva Sciacca, Salvatore Spinella, Massimo Torquati, Angelo Troina

On designing multicore-aware simulators for systems biology endowed with on-line statistics Journal Article

In: BioMed Research International, 2014.

Abstract | Links | BibTeX | Tags: bioinformatics, fastflow, paraphrase

Marco Aldinucci, Maurizio Drocco, Guilherme Peretti Pezzi, Claudia Misale, Fabio Tordini, Massimo Torquati

Exercising high-level parallel programming on streams: a systems biology use case Proceedings Article

In: Proc. of 34th IEEE Intl. Conference on Distributed Computing Systems Workshops (ICDCSW), IEEE, Madrid, Spain, 2014.

Abstract | Links | BibTeX | Tags: bioinformatics, fastflow, impact, paraphrase

@inproceedings{cwc:gpu:dcperf:14,

title = {Exercising high-level parallel programming on streams: a systems biology use case},

author = {Marco Aldinucci and Maurizio Drocco and Guilherme Peretti Pezzi and Claudia Misale and Fabio Tordini and Massimo Torquati},

url = {https://iris.unito.it/retrieve/handle/2318/154516/26657/2014_dcperf_cwc_gpu.pdf},

doi = {10.1109/ICDCSW.2014.38},

year  = {2014},

date = {2014-01-01},

booktitle = {Proc. of 34th IEEE Intl. Conference on Distributed Computing Systems Workshops (ICDCSW)},

publisher = {IEEE},

address = {Madrid, Spain},

abstract = {The stochastic modelling of biological systems, cou- pled with Monte Carlo simulation of models, is an increasingly popular technique in Bioinformatics. The simulation-analysis workflow may result into a computationally expensive task reducing the interactivity required in the model tuning. In this work, we advocate high-level software design as a vehicle for building efficient and portable parallel simulators for a variety of platforms, ranging from multi-core platforms to GPGPUs to cloud. In particular, the Calculus of Wrapped Compartments (CWC) parallel simulator for systems biology equipped with on- line mining of results, which is designed according to the FastFlow pattern-based approach, is discussed as a running example. In this work, the CWC simulator is used as a paradigmatic example of a complex C++ application where the quality of results is correlated with both computation and I/O bounds, and where high-quality results might turn into big data. The FastFlow parallel programming framework, which advocates C++ pattern- based parallel programming makes it possible to develop portable parallel code without relinquish neither run-time efficiency nor performance tuning opportunities. Performance and effectiveness of the approach are validated on a variety of platforms, inter-alia cache-coherent multi-cores, cluster of multi-core (Ethernet and Infiniband) and the Amazon Elastic Compute Cloud.},

keywords = {bioinformatics, fastflow, impact, paraphrase},

pubstate = {published},

tppubtype = {inproceedings}

}

Claudia Misale, Giulio Ferrero, Massimo Torquati, Marco Aldinucci

Sequence alignment tools: one parallel pattern to rule them all? Journal Article

In: BioMed Research International, 2014.

Abstract | Links | BibTeX | Tags: bioinformatics, fastflow, paraphrase, repara

2013

Marco Aldinucci, Fabio Tordini, Maurizio Drocco, Massimo Torquati, Mario Coppo

Parallel stochastic simulators in system biology: the evolution of the species Proceedings Article

In: Proc. of 21st Euromicro Intl. Conference on Parallel Distributed and network-based Processing (PDP), IEEE, Belfast, Nothern Ireland, U.K., 2013.

Abstract | Links | BibTeX | Tags: bioinformatics, fastflow

2012

Marco Aldinucci, Mario Coppo, Ferruccio Damiani, Maurizio Drocco, Eva Sciacca, Salvatore Spinella, Massimo Torquati, Angelo Troina

On Parallelizing On-Line Statistics for Stochastic Biological Simulations Proceedings Article

In: Alexander, Michael, D'Ambra, Pasqua, Belloum, Adam, Bosilca, George, Cannataro, Mario, Danelutto, Marco, Martino, Beniamino Di, Gerndt, Michael, Jeannot, Emmanuel, Namyst, Raymond, Roman, Jean, Scott, Stephen L., Träff, Jesper Larsson, Vallée, Geoffroy, Weidendorfer, Josef (Ed.): Proc. of Euro-Par Workshops: 2nd Workshop on High Performance Bioinformatics and Biomedicine (HiBB), pp. 3–12, Springer, Bordeaux, France, 2012.

Abstract | Links | BibTeX | Tags: bioinformatics, fastflow

2011

Marco Aldinucci, Andrea Bracciali, Pietro Liò, Anil Sorathiya, Massimo Torquati

StochKit-FF: Efficient Systems Biology on Multicore Architectures Proceedings Article

In: Guarracino, M. R., Vivien, F., Träff, J. L., Cannataro, M., Danelutto, M., Hast, A., Perla, F., Knüpfer, A., Martino, B. Di, Alexander, M. (Ed.): Euro-Par 2010 Workshops, Proc. of the 1st Workshop on High Performance Bioinformatics and Biomedicine (HiBB), pp. 167–175, Springer, Ischia, Italy, 2011.

Abstract | Links | BibTeX | Tags: bioinformatics