Fabio Tordini

Fabio Tordini Post-doc researcher at Computer Science Department
University of Torino
Parallel Computing group
via Pessinetto, 12, 10149 Torino – Italy
E-mail: tordini AT di.unito.it
Skype: fabio.tordini

 

Short Bio

Fabio Tordini  is currently a postdoc researcher at the University of Torino, Italy, and at the Fondazione Edo ed Elvo Tempia, Biella, Italy. He received his Ph.D. in Computer Science from the University of Torino in April 2016, with the thesis “The road towards a cloud-based High-Performance solution for genomic data analysis“. He is a research associate at the University of Torino since 2012.
Before, he has been a visiting scholar at the State University of New York at Stony Brook, in 2011, and received is Ms.C. in Computer Science in 2010 from the University of Camerino, Italy.

He has co-authored papers in international journals and conference proceedings. He has been participating in the European STREP FP7 ParaPhrase project, and the H2020 RePhrase project, working on techniques and methodologies for developing data-intensive applications in C++, targeting heterogeneous systems that combine CPUs and GPUs into a coherent parallel platform. He has been participating in the cHiPSet ICT Cost action on High-Performance Modelling and Simulation for Big data Applications.

His research activity began with high-level parallel programming patterns for multi-core and many-cores systems and memory optimization. His research is now focused on high-performance computing and Bioinformatics. In particular, his work at the University of Torino focuses on HPC solutions for scientific workflows, cloud deployment, job scheduling and resource provisioning. At the Fondazione Tempia, his work focuses on tools and methods for the optimization of Bioinformatics workflows and NGS data analysis.

See also LinkedIn profile and Scholar

 

Publications

2016

  • F. Tordini, M. Aldinucci, L. Milanesi, P. LiÒ, and I. Merelli, “The Genome Conformation as an Integrator of Multi-Omic Data: The Example of Damage Spreading in Cancer,” Frontiers in Genetics, vol. 7, iss. 194, pp. 1-17, 2016. doi:10.3389/fgene.2016.00194
    [BibTeX] [Abstract] [Download PDF]

    Publicly available multi-omic databases, in particular if associated with medical annotations, are rich resources with the potential to lead a rapid transition from high-throughput molecular biology experiments to better clinical outcomes for patients. In this work, we propose a model for multi-omic data integration (i.e. genetic variations, gene expression, genome conformation and epigenetic patterns), which exploits a multi-layer network approach to analyse, visualize and obtain insights from such biological information, in order to use achieved results at a macroscopic level. Using this representation, we can describe how driver and passenger mutations accumulate during the development of diseases providing, for example, a tool able to characterise the evolution of cancer. Indeed, our test case concerns the MCF-7 breast cancer cell line, before and after the stimulation with estrogen, since many datasets are available for this case study. In particular, the integration of data about cancer mutations, gene functional annotations, genome conformation, epigenetic patterns, gene expression and metabolic pathways in our multi-layer representation will allow a better interpretation of the mechanisms behind a complex disease such as cancer. Thanks to this multi-layer approach, we focus on the interplay of chromatin conformation and cancer mutations in different pathways, such as metabolic processes, that are very important for tumour development. Working on this model, a variance analysis can be implemented to identify normal variations within each omics and to characterize, by contrast, variations that can be accounted to pathological samples compared to normal ones. This integrative model can be used to identify novel biomarkers and to provide innovative omic-based guidelines for treating many diseases, improving the efficacy of decision trees currently used in clinic.

    @article{2016_omics_fgenetics,
      abstract = {Publicly available multi-omic databases, in particular if associated with medical annotations, are rich resources with the potential to lead a rapid transition from high-throughput molecular biology experiments to better clinical outcomes for patients. In this work, we propose a model for multi-omic data integration (i.e. genetic variations, gene expression, genome conformation and epigenetic patterns), which exploits a multi-layer network approach to analyse, visualize and obtain insights from such biological information, in order to use achieved results at a macroscopic level.
     Using this representation, we can describe how driver and passenger mutations accumulate during the development of diseases providing, for example, a tool able to characterise the evolution of cancer. Indeed, our test case concerns the MCF-7 breast cancer cell line, before and after the stimulation with estrogen, since many datasets are available for this case study. In particular, the integration of data about cancer mutations, gene functional annotations, genome conformation, epigenetic patterns, gene expression and metabolic pathways in our multi-layer representation will allow a better interpretation of the mechanisms behind a complex disease such as cancer.
     Thanks to this multi-layer approach, we focus on the interplay of chromatin conformation and cancer mutations in different pathways, such as metabolic processes, that are very important for tumour development. Working on this model, a variance analysis can be implemented to identify normal variations within each omics and to characterize, by contrast, variations that can be accounted to pathological samples compared to normal ones. This integrative model can be used to identify novel biomarkers and to provide innovative omic-based guidelines for treating many diseases, improving the efficacy of decision trees currently used in clinic.},
      author = {Tordini, Fabio and Aldinucci, Marco and Milanesi, Luciano and Li{\`o}, Pietro and Merelli, Ivan},
      date-modified = {2016-12-22 14:19:14 +0000},
      doi = {10.3389/fgene.2016.00194},
      journal = {Frontiers in Genetics},
      number = {194},
      pages = {1--17},
      title = {The Genome Conformation as an Integrator of Multi-Omic Data: The Example of Damage Spreading in Cancer},
      url = {http://journal.frontiersin.org/article/10.3389/fgene.2016.00194},
      volume = {7},
      year = {2016},
      bdsk-url-1 = {http://journal.frontiersin.org/article/10.3389/fgene.2016.00194},
      bdsk-url-2 = {http://dx.doi.org/10.3389/fgene.2016.00194}
    }

  • F. Tordini, “The road towards a Cloud-based High-Performance solution for genomic data analysis,” PhD Thesis, 2016.
    [BibTeX] [Abstract] [Download PDF]

    Nowadays, molecular biology laboratories are delivering more and more data about DNA organisation, at increasing resolution and in a large number of samples. So much that genomic research is now facing many of the scale-out issues that high-performance computing has been addressing for years: they require powerful infrastructures with fast computing and storage capabilities, with substantial challenges in terms of data processing, statistical analysis and data representation. With this thesis we propose a high-performance pipeline for the analysis and interpretation of heterogeneous genomic information: beside performance, usability and availability are two essential requirements that novel Bioinformatics tools should satisfy. In this perspective, we propose and discuss our efforts towards a solid infrastructure for data processing and storage, where software that operates over data is exposed as a service, and is accessible by users through the Internet. We begin by presenting NuChart-II, a tool for the analysis and interpretation of spatial genomic information. With NuChart-II we propose a graph-based representation of genomic data, which can provide insights on the disposition of genomic elements in the DNA. We also discuss our approach for the normalisation of biases that affect raw sequenced data. We believe that many currently available tools for genomic data analysis are perceived as tricky and troublesome applications, that require highly specialised skills to obtain the desired outcomes. Concerning usability, we want to rise the level of abstraction perceived by the user, but maintain high performance and correctness while providing an exhaustive solution for data visualisation. We also intend to foster the availability of novel tools: in this work we also discuss a cloud solution that delivers computation and storage as dynamically allocated virtual resources via the Internet, while needed software is provided as a service. In this way, the computational demand of genomic research can be satisfied more economically by using lab-scale and enterprise-oriented technologies. Here we discuss our idea of a task farm for the integration of heterogeneous data resulting from different sequencing experiments: we believe that the integration of multi-omic features on a nuclear map can be a valuable mean for studying the interactions among genetic elements. This can reveal insights on biological mechanisms, such as genes regulation, translocations and epigenetic patterns.

    @phdthesis{tordiniThesis16,
      abstract = {Nowadays, molecular biology laboratories are delivering more and more data about DNA organisation, at increasing resolution and in a large number of samples. So much that genomic research is now facing many of the scale-out issues that high-performance computing has been addressing for years: they require powerful infrastructures with fast computing and storage capabilities, with substantial challenges in terms of data processing, statistical analysis and data representation.
      With this thesis we propose a high-performance pipeline for the analysis and interpretation of heterogeneous genomic information: beside performance, usability and availability are two essential requirements that novel Bioinformatics tools should satisfy. In this perspective, we propose and discuss our efforts towards a solid infrastructure for data processing and storage, where software that operates over data is exposed as a service, and is accessible by users through the Internet.
      We begin by presenting NuChart-II, a tool for the analysis and interpretation of spatial genomic information. With NuChart-II we propose a graph-based representation of genomic data, which can provide insights on the disposition of genomic elements in the DNA. We also discuss our approach for the normalisation of biases that affect raw sequenced data.
      We believe that many currently available tools for genomic data analysis are perceived as tricky and troublesome applications, that require highly specialised skills to obtain the desired outcomes. Concerning usability, we want to rise the level of abstraction perceived by the user, but maintain high performance and correctness while providing an exhaustive solution for data visualisation.
      We also intend to foster the availability of novel tools: in this work we also discuss a cloud solution that delivers computation and storage as dynamically allocated virtual resources via the Internet, while needed software is provided as a service. In this way, the computational demand of genomic research can be satisfied more economically by using lab-scale and enterprise-oriented technologies. Here we discuss our idea of a task farm for the integration of heterogeneous data resulting from different sequencing experiments: we believe that the integration of multi-omic features on a nuclear map can be a valuable mean for studying the interactions among genetic elements. This can reveal insights on biological mechanisms, such as genes regulation, translocations and epigenetic patterns.},
      author = {Fabio Tordini},
      keywords = {fastflow, bioinformatics},
      month = {4},
      school = {Computer Science Department, University of Torino, Italy},
      title = {{The road towards a Cloud-based High-Performance solution for genomic data analysis}},
      url = {http://calvados.di.unipi.it/storage/paper_files/2016_tordini_phdthesis.pdf},
      year = {2016},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2016_tordini_phdthesis.pdf}
    }

  • F. Tordini, “A cloud solution for multi-omics data integration,” in Proceedings of the 16th IEEE International Conference on Scalable Computing and Communication, 2016, pp. 559-566. doi:10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.131
    [BibTeX] [Abstract] [Download PDF]

    Recent advances in molecular biology and Bioinformatics techniques have brought to an explosion of the information about the spatial organisation of the DNA inside the nucleus. In particular, 3C-based techniques are revealing the genome folding for many different cell types, and permit to create a more effective representation of the disposition of genes in the three-dimensional space. This information can be used to re-interpret heterogeneous genomic data (multi-omic) relying on 3D maps of the chromosome. The storage and computational requirements needed to accomplish such operations on raw sequenced data have to be fulfilled using HPC solutions, and the the Cloud paradigm is a valuable and convenient mean for delivering HPC to Bioinformatics. In this work we describe a data analysis work-flow that allows the integration and the interpretation of multi-omic data on a sort of “topographical” nuclear map, capable of representing the effective disposition of genes in a graph-based representation. We propose a cloud-based task farm pattern to orchestrate the services needed to accomplish genomic data analysis, where each service represents a special-purpose tool, playing a part in well known data analysis pipelines.

    @inproceedings{16:scalcom:cloud,
      abstract = {Recent advances in molecular biology and Bioinformatics techniques have brought to an explosion of the information about the spatial organisation of the DNA inside the nucleus. In particular, 3C-based techniques are revealing the genome folding for many different cell types, and permit to create a more effective representation of the disposition of genes in the three-dimensional space. This information can be used to re-interpret heterogeneous genomic data (multi-omic) relying on 3D maps of the chromosome. The storage and computational requirements needed to accomplish such operations on raw sequenced data have to be fulfilled using HPC solutions, and the the Cloud paradigm is a valuable and convenient mean for delivering HPC to Bioinformatics. In this work we describe a data analysis work-flow that allows the integration and the interpretation of multi-omic data on a sort of ``topographical'' nuclear map, capable of representing the effective disposition of genes in a graph-based representation. We propose a cloud-based task farm pattern to orchestrate the services needed to accomplish genomic data analysis, where each service represents a special-purpose tool, playing a part in well known data analysis pipelines.},
      author = {Fabio Tordini},
      booktitle = {Proceedings of the 16th IEEE International Conference on Scalable Computing and Communication},
      date-modified = {2016-08-30 10:26:12 +0000},
      doi = {10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.131},
      keywords = {fastflow, bioinformatics, rephrase},
      note = {Best paper award},
      pages = {559--566},
      publisher = {IEEE Computer Society},
      title = {{A cloud solution for multi-omics data integration}},
      url = {http://calvados.di.unipi.it/storage/paper_files/2016_cloudpipeline_scalcom.pdf},
      year = {2016},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2016_cloudpipeline_scalcom.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.131}
    }

  • F. Tordini, M. Drocco, C. Misale, L. Milanesi, P. LiÒ, I. Merelli, M. Torquati, and M. Aldinucci, “NuChart-II: the road to a fast and scalable tool for Hi-C data analysis,” International Journal of High Performance Computing Applications (IJHPCA), pp. 1-16, 2016. doi:10.1177/1094342016668567
    [BibTeX] [Abstract]

    Recent advances in molecular biology and bioinformatics techniques brought to an explosion of the information about the spatial organisation of the DNA in the nucleus of a cell. High-throughput molecular biology techniques provide a genome-wide capture of the spatial organization of chromosomes at unprecedented scales, which permit to identify physical interactions between genetic elements located throughout a genome. Recent results have shown that there is a large correlation between co-localization and co-regulation of genes, but these important information are hampered by the lack of biologists-friendly analysis and visualisation software. In this work we present NuChart-II, an efficient and highly optimized tool for genomic data analysis that provides a gene-centric, graph-based representation of genomic information. While designing NuChart-II we addressed several common issues in the parallelisation of memory bound algorithms for shared-memory systems. With performance and usability in mind, NuChart-II is a R package that embeds a C++ engine: computing capabilities and memory hierarchy of multi-core architectures are fully exploited, while the versatile R environment for statistical analysis and data visualisation rises the level of abstraction and permits to orchestrate analysis and visualisation of genomic data.

    @article{16:ijhpca:nuchart,
      abstract = {Recent advances in molecular biology and bioinformatics techniques brought to an explosion of the information about the spatial organisation of the DNA in the nucleus of a cell. High-throughput molecular biology techniques provide a genome-wide capture of the spatial organization of chromosomes at unprecedented scales, which permit to identify physical interactions between genetic elements located throughout a genome. Recent results have shown that there is a large correlation between co-localization and co-regulation of genes, but these important information are hampered by the lack of biologists-friendly analysis and visualisation software. In this work we present NuChart-II, an efficient and highly optimized tool for genomic data analysis that provides a gene-centric, graph-based representation of genomic information. While designing NuChart-II we addressed several common issues in the parallelisation of memory bound algorithms for shared-memory systems. With performance and usability in mind, NuChart-II is a R package that embeds a C++ engine: computing capabilities and memory hierarchy of multi-core architectures are fully exploited, while the versatile R environment for statistical analysis and data visualisation rises the level of abstraction and permits to orchestrate analysis and visualisation of genomic data.},
      author = {Fabio Tordini and Maurizio Drocco and Claudia Misale and Luciano Milanesi and Pietro Li{\`o} and Ivan Merelli and Massimo Torquati and Marco Aldinucci},
      date-modified = {2016-10-09 21:55:39 +0000},
      doi = {10.1177/1094342016668567},
      journal = {International Journal of High Performance Computing Applications (IJHPCA)},
      keywords = {fastflow, bioinformatics, repara, rephrase, interomics, mimomics},
      pages = {1--16},
      title = {{NuChart-II}: the road to a fast and scalable tool for {Hi-C} data analysis},
      year = {2016},
      bdsk-url-1 = {http://hdl.handle.net/2318/1607126},
      bdsk-url-2 = {http://dx.doi.org/10.1177/1094342016668567}
    }

  • F. Tordini, I. Merelli, P. LiÒ, L. Milanesi, and M. Aldinucci, “NuchaRt: embedding high-level parallel computing in R for augmented Hi-C data analysis,” in Computational Intelligence Methods for Bioinformatics and Biostatistics, S. I. Publishing, Ed., Cham (ZG): Springer International Publishing, 2016, vol. 9874, pp. 259-272. doi:10.1007/978-3-319-44332-4
    [BibTeX] [Abstract] [Download PDF]

    Recent advances in molecular biology and Bioinformatics techniques brought to an explosion of the information about the spatial organisation of the DNA in the nucleus. High-throughput chromosome conformation capture techniques provide a genome-wide capture of chromatin contacts at unprecedented scales, which permit to identify physical interactions between genetic elements located throughout the human genome. These important studies are hampered by the lack of biologists-friendly software. In this work we present NuchaRt, an R package that wraps NuChart-II, an efficient and highly optimized C++ tool for the exploration of Hi-C data. By rising the level of abstraction, NuchaRt proposes a high-performance pipeline that allows users to orchestrate analysis and visualisation of multi-omics data, making optimal use of the computing capabilities offered by modern multi-core architectures, combined with the versatile and well known R environment for statistical analysis and data visualisation.

    @incollection{15:lnbi:nuchaRt,
      abstract = {Recent advances in molecular biology and Bioinformatics techniques brought to an explosion of the information about the spatial organisation of the DNA in the nucleus. High-throughput chromosome conformation capture techniques provide a genome-wide capture of chromatin contacts at unprecedented scales, which permit to identify physical interactions between genetic elements located throughout the human genome. These important studies are hampered by the lack of biologists-friendly software. In this work we present NuchaRt, an R package that wraps NuChart-II, an efficient and highly optimized C++ tool for the exploration of Hi-C data. By rising the level of abstraction, NuchaRt proposes a high-performance pipeline that allows users to orchestrate analysis and visualisation of multi-omics data, making optimal use of the computing capabilities offered by modern multi-core architectures, combined with the versatile and well known R environment for statistical analysis and data visualisation.},
      address = {Cham (ZG)},
      author = {Fabio Tordini and Ivan Merelli and Pietro Li{\`o} and Luciano Milanesi and Marco Aldinucci},
      booktitle = {Computational Intelligence Methods for Bioinformatics and Biostatistics},
      doi = {10.1007/978-3-319-44332-4},
      editor = {Springer International Publishing},
      isbn = {978-3-319-44331-7},
      keywords = {fastflow, bioinformatics, repara, interomics, mimomics},
      pages = {259--272},
      publisher = {Springer International Publishing},
      series = {{Lecture Notes in Computer Science}},
      title = {{NuchaRt}: embedding high-level parallel computing in {R} for augmented {Hi-C} data analysis},
      url = {http://link.springer.com/book/10.1007%2F978-3-319-44332-4},
      volume = {9874},
      year = {2016},
      bdsk-url-1 = {http://link.springer.com/book/10.1007%2F978-3-319-44332-4},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-319-44332-4}
    }

2015

  • F. Tordini, M. Drocco, C. Misale, L. Milanesi, P. LiÒ, I. Merelli, and M. Aldinucci, “Parallel Exploration of the Nuclear Chromosome Conformation with NuChart-II,” in Proc. of Intl. Euromicro PDP 2015: Parallel Distributed and network-based Processing, 2015. doi:10.1109/PDP.2015.104
    [BibTeX] [Abstract] [Download PDF]

    High-throughput molecular biology techniques are widely used to identify physical interactions between genetic elements located throughout the human genome. Chromosome Conformation Capture (3C) and other related techniques allow to investigate the spatial organisation of chromosomes in the cell’s natural state. Recent results have shown that there is a large correlation between co-localization and co-regulation of genes, but these important information are hampered by the lack of biologists-friendly analysis and visualisation software. In this work we introduce NuChart-II, a tool for Hi-C data analysis that provides a gene-centric view of the chromosomal neighbour- hood in a graph-based manner. NuChart-II is an efficient and highly optimized C++ re-implementation of a previous prototype package developed in R. Representing Hi-C data using a graph- based approach overcomes the common view relying on genomic coordinates and permits the use of graph analysis techniques to explore the spatial conformation of a gene neighbourhood.

    @inproceedings{nuchar:tool:15,
      abstract = {High-throughput molecular biology techniques are widely used to identify physical interactions between genetic elements located throughout the human genome. Chromosome Conformation Capture (3C) and other related techniques allow to investigate the spatial organisation of chromosomes in the cell's natural state. Recent results have shown that there is a large correlation between co-localization and co-regulation of genes, but these important information are hampered by the lack of biologists-friendly analysis and visualisation software. In this work we introduce NuChart-II, a tool for Hi-C data analysis that provides a gene-centric view of the chromosomal neighbour- hood in a graph-based manner. NuChart-II is an efficient and highly optimized C++ re-implementation of a previous prototype package developed in R. Representing Hi-C data using a graph- based approach overcomes the common view relying on genomic coordinates and permits the use of graph analysis techniques to explore the spatial conformation of a gene neighbourhood.
    },
      author = {Fabio Tordini and Maurizio Drocco and Claudia Misale and Luciano Milanesi and Pietro Li{\`o} and Ivan Merelli and Marco Aldinucci},
      booktitle = {Proc. of Intl. Euromicro PDP 2015: Parallel Distributed and network-based Processing},
      date-added = {2014-12-03 13:51:17 +0000},
      date-modified = {2015-09-24 11:16:43 +0000},
      doi = {10.1109/PDP.2015.104},
      keywords = {fastflow, bioinformatics, paraphrase, repara, impact},
      month = mar,
      publisher = {IEEE},
      title = {Parallel Exploration of the Nuclear Chromosome Conformation with {NuChart-II}},
      url = {http://calvados.di.unipi.it/storage/paper_files/2015_pdp_nuchartff.pdf},
      year = {2015},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2015_pdp_nuchartff.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1109/PDP.2015.104}
    }

  • M. Drocco, C. Misale, G. P. Pezzi, F. Tordini, and M. Aldinucci, “Memory-Optimised Parallel Processing of Hi-C Data,” in Proc. of Intl. Euromicro PDP 2015: Parallel Distributed and network-based Processing, 2015, pp. 1-8. doi:10.1109/PDP.2015.63
    [BibTeX] [Abstract] [Download PDF]

    This paper presents the optimisation efforts on the creation of a graph-based mapping representation of gene adjacency. The method is based on the Hi-C process, starting from Next Generation Sequencing data, and it analyses a huge amount of static data in order to produce maps for one or more genes. Straightforward parallelisation of this scheme does not yield acceptable performance on multicore architectures since the scalability is rather limited due to the memory bound nature of the problem. This work focuses on the memory optimisations that can be applied to the graph construction algorithm and its (complex) data structures to derive a cache-oblivious algorithm and eventually to improve the memory bandwidth utilisation. We used as running example NuChart-II, a tool for annotation and statistic analysis of Hi-C data that creates a gene-centric neigh- borhood graph. The proposed approach, which is exemplified for Hi-C, addresses several common issue in the parallelisation of memory bound algorithms for multicore. Results show that the proposed approach is able to increase the parallel speedup from 7x to 22x (on a 32-core platform). Finally, the proposed C++ implementation outperforms the first R NuChart prototype, by which it was not possible to complete the graph generation because of strong memory-saturation problems.

    @inproceedings{nuchart:speedup:15,
      abstract = {This paper presents the optimisation efforts on the creation of a graph-based mapping representation of gene adjacency. The method is based on the Hi-C process, starting from Next Generation Sequencing data, and it analyses a huge amount of static data in order to produce maps for one or more genes. Straightforward parallelisation of this scheme does not yield acceptable performance on multicore architectures since the scalability is rather limited due to the memory bound nature of the problem. This work focuses on the memory optimisations that can be applied to the graph construction algorithm and its (complex) data structures to derive a cache-oblivious algorithm and eventually to improve the memory bandwidth utilisation. We used as running example NuChart-II, a tool for annotation and statistic analysis of Hi-C data that creates a gene-centric neigh- borhood graph. The proposed approach, which is exemplified for Hi-C, addresses several common issue in the parallelisation of memory bound algorithms for multicore. Results show that the proposed approach is able to increase the parallel speedup from 7x to 22x (on a 32-core platform). Finally, the proposed C++ implementation outperforms the first R NuChart prototype, by which it was not possible to complete the graph generation because of strong memory-saturation problems.},
      author = {Maurizio Drocco and Claudia Misale and Guilherme {Peretti Pezzi} and Fabio Tordini and Marco Aldinucci},
      booktitle = {Proc. of Intl. Euromicro PDP 2015: Parallel Distributed and network-based Processing},
      date-added = {2014-12-03 13:54:08 +0000},
      date-modified = {2015-09-24 11:17:47 +0000},
      doi = {10.1109/PDP.2015.63},
      keywords = {fastflow,bioinformatics, paraphrase, repara, impact},
      month = mar,
      pages = {1-8},
      publisher = {IEEE},
      title = {Memory-Optimised Parallel Processing of {Hi-C} Data},
      url = {http://calvados.di.unipi.it/storage/paper_files/2015_pdp_memopt.pdf},
      year = {2015},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2015_pdp_memopt.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1109/PDP.2015.63}
    }

  • F. Tordini, M. Drocco, I. Merelli, L. Milanesi, P. LiÒ, and M. Aldinucci, “NuChart-II: a graph-based approach for the analysis and interpretation of Hi-C data,” in Computational Intelligence Methods for Bioinformatics and Biostatistics – 11th International Meeting, CIBB 2014, Cambridge, UK, June 26-28, 2014, Revised Selected Papers, Cambridge, UK, 2015, pp. 298-311. doi:10.1007/978-3-319-24462-4_25
    [BibTeX] [Abstract] [Download PDF]

    Long-range chromosomal associations between genomic regions, and their repositioning in the 3D space of the nucleus, are now considered to be key contributors to the regulation of gene expressions, and important links have been highlighted with other genomic features involved in DNA rearrangements. Recent Chromosome Conformation Capture (3C) measurements performed with high throughput sequencing (Hi-C) and molecular dynamics studies show that there is a large correlation between co-localization and co-regulation of genes, but these important researches are hampered by the lack of biologists-friendly analysis and visualisation software. In this work we present NuChart-II, a software that allows the user to annotate and visualize a list of input genes with information relying on Hi-C data, integrating knowledge data about genomic features that are involved in the chromosome spatial organization. This software works directly with sequenced reads to identify related Hi-C fragments, with the aim of creating gene-centric neighbourhood graphs on which multi-omics features can be mapped. NuChart-II is a highly optimized implementation of a previous prototype package developed in R, in which the graph-based representation of Hi-C data was tested. The prototype showed inevitable problems of scalability while working genome-wide on large datasets: particular attention has been paid in optimizing the data structures employed while constructing the neighbourhood graph, so as to foster an efficient parallel implementation of the software. The normalization of Hi-C data has been modified and improved, in order to provide a reliable estimation of proximity likelihood for the genes.

    @inproceedings{14:ff:nuchart:cibb,
      abstract = {Long-range chromosomal associations between genomic regions, and their repositioning in the 3D space of the nucleus, are now considered to be key contributors to the regulation of gene expressions, and important links have been highlighted with other genomic features involved in DNA rearrangements. Recent Chromosome Conformation Capture (3C) measurements performed with high throughput sequencing (Hi-C) and molecular dynamics studies show that there is a large correlation between co-localization and co-regulation of genes, but these important researches are hampered by the lack of biologists-friendly analysis and visualisation software. In this work we present NuChart-II, a software that allows the user to annotate and visualize a list of input genes with information relying on Hi-C data, integrating knowledge data about genomic features that are involved in the chromosome spatial organization. This software works directly with sequenced reads to identify related Hi-C fragments, with the aim of creating gene-centric neighbourhood graphs on which multi-omics features can be mapped. NuChart-II is a highly optimized implementation of a previous prototype package developed in R, in which the graph-based representation of Hi-C data was tested. The prototype showed inevitable problems of scalability while working genome-wide on large datasets: particular attention has been paid in optimizing the data structures employed while constructing the neighbourhood graph, so as to foster an efficient parallel implementation of the software. The normalization of Hi-C data has been modified and improved, in order to provide a reliable estimation of proximity likelihood for the genes.},
      address = {Cambridge, UK},
      author = {Fabio Tordini and Maurizio Drocco and Ivan Merelli and Luciano Milanesi and Pietro Li{\`o} and Marco Aldinucci},
      booktitle = {Computational Intelligence Methods for Bioinformatics and Biostatistics - 11th International Meeting, {CIBB} 2014, Cambridge, UK, June 26-28, 2014, Revised Selected Papers},
      date-modified = {2015-09-24 11:22:30 +0000},
      doi = {10.1007/978-3-319-24462-4_25},
      editor = {Clelia Di Serio and Pietro Li{\`{o}} and Alessandro Nonis and Roberto Tagliaferri},
      isbn = {978-3-319-24461-7},
      keywords = {fastflow, bioinformatics, paraphrase, repara, interomics, mimomics, hirma},
      pages = {298-311},
      publisher = {Springer},
      series = {{LNCS}},
      title = {{NuChart-II}: a graph-based approach for the analysis and interpretation of {Hi-C} data},
      url = {http://calvados.di.unipi.it/storage/paper_files/2014_nuchart_cibb.pdf},
      volume = {8623},
      year = {2015},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2014_nuchart_cibb.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-319-24462-4_25}
    }

  • I. Merelli, F. Tordini, M. Drocco, M. Aldinucci, P. LiÒ, and L. Milanesi, “Integrating Multi-omic features exploiting Chromosome Conformation Capture data,” Frontiers in Genetics, vol. 6, iss. 40, 2015. doi:10.3389/fgene.2015.00040
    [BibTeX] [Abstract] [Download PDF]

    The representation, integration and interpretation of omic data is a complex task, in particular considering the huge amount of information that is daily produced in molecular biology laboratories all around the world. The reason is that sequencing data regarding expression profiles, methylation patterns, and chromatin domains is difficult to harmonize in a systems biology view, since genome browsers only allow coordinate-based representations, discarding functional clusters created by the spatial conformation of the DNA in the nucleus. In this context, recent progresses in high throughput molecular biology techniques and bioinformatics have provided insights into chromatin interactions on a larger scale and offer a formidable support for the interpretation of multi-omic data. In particular, a novel sequencing technique called Chromosome Conformation Capture (3C) allows the analysis of the chromosome organization in the cell’s natural state. While performed genome wide, this technique is usually called Hi-C. Inspired by service applications such as Google Maps, we developed NuChart, an R package that integrates Hi-C data to describe the chromosomal neighbourhood starting from the information about gene positions, with the possibility of mapping on the achieved graphs genomic features such as methylation patterns and histone modifications, along with expression profiles. In this paper we show the importance of the NuChart application for the integration of multi-omic data in a systems biology fashion, with particular interest in cytogenetic applications of these techniques. Moreover, we demonstrate how the integration of multi-omic data can provide useful information in understanding why genes are in certain specific positions inside the nucleus and how epigenetic patterns correlate with their expression.

    @article{nuchart:frontiers:15,
      abstract = {The representation, integration and interpretation of omic data is a complex task, in particular considering the huge amount of information that is daily produced in molecular biology laboratories all around the world. The reason is that sequencing data regarding expression profiles, methylation patterns, and chromatin domains is difficult to harmonize in a systems biology view, since genome browsers only allow coordinate-based representations, discarding functional clusters created by the spatial conformation of the DNA in the nucleus. In this context, recent progresses in high throughput molecular biology techniques and bioinformatics have provided insights into chromatin interactions on a larger scale and offer a formidable support for the interpretation of multi-omic data. In particular, a novel sequencing technique called Chromosome Conformation Capture (3C) allows the analysis of the chromosome organization in the cell's natural state. While performed genome wide, this technique is usually called Hi-C. Inspired by service applications such as Google Maps, we developed NuChart, an R package that integrates Hi-C data to describe the chromosomal neighbourhood starting from the information about gene positions, with the possibility of mapping on the achieved graphs genomic features such as methylation patterns and histone modifications, along with expression profiles. In this paper we show the importance of the NuChart application for the integration of multi-omic data in a systems biology fashion, with particular interest in cytogenetic applications of these techniques. Moreover, we demonstrate how the integration of multi-omic data can provide useful information in understanding why genes are in certain specific positions inside the nucleus and how epigenetic patterns correlate with their expression.},
      author = {Merelli, Ivan and Tordini, Fabio and Drocco, Maurizio and Aldinucci, Marco and Li{\`o}, Pietro and Milanesi, Luciano},
      date-added = {2015-02-01 16:38:47 +0000},
      date-modified = {2015-09-24 11:23:10 +0000},
      doi = {10.3389/fgene.2015.00040},
      issn = {1664-8021},
      journal = {Frontiers in Genetics},
      keywords = {bioinformatics, fastflow, interomics, hirma, mimomics},
      number = {40},
      title = {Integrating Multi-omic features exploiting {Chromosome Conformation Capture} data},
      url = {http://journal.frontiersin.org/Journal/10.3389/fgene.2015.00040/pdf},
      volume = {6},
      year = {2015},
      bdsk-url-1 = {http://journal.frontiersin.org/Journal/10.3389/fgene.2015.00040/pdf},
      bdsk-url-2 = {http://dx.doi.org/10.3389/fgene.2015.00040}
    }

2014

  • M. Aldinucci, G. P. Pezzi, M. Drocco, F. Tordini, P. Kilpatrick, and M. Torquati, “Parallel video denoising on heterogeneous platforms,” in Proc. of Intl. Workshop on High-level Programming for Heterogeneous and Hierarchical Parallel Systems (HLPGPU), 2014.
    [BibTeX] [Abstract] [Download PDF]

    In this paper, a highly-effective parallel filter for video denoising is presented. The filter is designed using a skeletal approach, and has been implemented by way of the FastFlow parallel programming library. As a result of its high-level design, it is possible to run the filter seamlessly on a multi-core machine, on GPGPU(s), or on both. The design and the implementation of the filter are discussed, and an experimental evaluation is presented. Various mappings of the filtering stages are comparatively discussed.

    @inproceedings{ff:video:hlpgpu:14,
      abstract = {In this paper, a highly-effective parallel filter for video denoising is presented. The filter is designed using a skeletal approach, and has been implemented by way of the FastFlow parallel programming library. As a result of its high-level design, it is possible to run the filter seamlessly on a multi-core machine, on GPGPU(s), or on both. The design and the implementation of the filter are discussed, and an experimental evaluation is presented. Various mappings of the filtering stages are comparatively discussed.},
      author = {Marco Aldinucci and Guilherme {Peretti Pezzi} and Maurizio Drocco and Fabio Tordini and Peter Kilpatrick and Massimo Torquati},
      booktitle = {Proc. of Intl. Workshop on High-level Programming for Heterogeneous and Hierarchical Parallel Systems (HLPGPU)},
      date-added = {2013-12-07 18:28:32 +0000},
      date-modified = {2015-09-27 12:42:02 +0000},
      keywords = {fastflow, paraphrase, impact},
      title = {Parallel video denoising on heterogeneous platforms},
      url = {http://calvados.di.unipi.it/storage/paper_files/2014_ff_video_denoiser_hlpgpu.pdf},
      year = {2014},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2014_ff_video_denoiser_hlpgpu.pdf}
    }

  • M. Aldinucci, M. Drocco, G. P. Pezzi, C. Misale, F. Tordini, and M. Torquati, “Exercising high-level parallel programming on streams: a systems biology use case,” in Proc. of the 2014 IEEE 34th Intl. Conference on Distributed Computing Systems Workshops (ICDCS), Madrid, Spain, 2014. doi:10.1109/ICDCSW.2014.38
    [BibTeX] [Abstract] [Download PDF]

    The stochastic modelling of biological systems, cou- pled with Monte Carlo simulation of models, is an increasingly popular technique in Bioinformatics. The simulation-analysis workflow may result into a computationally expensive task reducing the interactivity required in the model tuning. In this work, we advocate high-level software design as a vehicle for building efficient and portable parallel simulators for a variety of platforms, ranging from multi-core platforms to GPGPUs to cloud. In particular, the Calculus of Wrapped Compartments (CWC) parallel simulator for systems biology equipped with on- line mining of results, which is designed according to the FastFlow pattern-based approach, is discussed as a running example. In this work, the CWC simulator is used as a paradigmatic example of a complex C++ application where the quality of results is correlated with both computation and I/O bounds, and where high-quality results might turn into big data. The FastFlow parallel programming framework, which advocates C++ pattern- based parallel programming makes it possible to develop portable parallel code without relinquish neither run-time efficiency nor performance tuning opportunities. Performance and effectiveness of the approach are validated on a variety of platforms, inter-alia cache-coherent multi-cores, cluster of multi-core (Ethernet and Infiniband) and the Amazon Elastic Compute Cloud.

    @inproceedings{cwc:gpu:dcperf:14,
      abstract = {The stochastic modelling of biological systems, cou- pled with Monte Carlo simulation of models, is an increasingly popular technique in Bioinformatics. The simulation-analysis workflow may result into a computationally expensive task reducing the interactivity required in the model tuning. In this work, we advocate high-level software design as a vehicle for building efficient and portable parallel simulators for a variety of platforms, ranging from multi-core platforms to GPGPUs to cloud. In particular, the Calculus of Wrapped Compartments (CWC) parallel simulator for systems biology equipped with on- line mining of results, which is designed according to the FastFlow pattern-based approach, is discussed as a running example. In this work, the CWC simulator is used as a paradigmatic example of a complex C++ application where the quality of results is correlated with both computation and I/O bounds, and where high-quality results might turn into big data. The FastFlow parallel programming framework, which advocates C++ pattern- based parallel programming makes it possible to develop portable parallel code without relinquish neither run-time efficiency nor performance tuning opportunities. Performance and effectiveness of the approach are validated on a variety of platforms, inter-alia cache-coherent multi-cores, cluster of multi-core (Ethernet and Infiniband) and the Amazon Elastic Compute Cloud.},
      address = {Madrid, Spain},
      author = {Marco Aldinucci and Maurizio Drocco and Guilherme {Peretti Pezzi} and Claudia Misale and Fabio Tordini and Massimo Torquati},
      booktitle = {Proc. of the 2014 IEEE 34th Intl. Conference on Distributed Computing Systems Workshops (ICDCS)},
      date-added = {2014-04-19 12:44:39 +0000},
      date-modified = {2015-09-27 12:43:13 +0000},
      doi = {10.1109/ICDCSW.2014.38},
      keywords = {fastflow, gpu, bioinformatics, paraphrase, impact, nvidia},
      publisher = {IEEE},
      title = {Exercising high-level parallel programming on streams: a systems biology use case},
      url = {http://calvados.di.unipi.it/storage/paper_files/2014_dcperf_cwc_gpu.pdf},
      year = {2014},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2014_dcperf_cwc_gpu.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1109/ICDCSW.2014.38}
    }

2013

  • M. Aldinucci, F. Tordini, M. Drocco, M. Torquati, and M. Coppo, “Parallel stochastic simulators in system biology: the evolution of the species,” in Proc. of Intl. Euromicro PDP 2013: Parallel Distributed and network-based Processing, Belfast, Nothern Ireland, U.K., 2013. doi:10.1109/PDP.2013.66
    [BibTeX] [Abstract] [Download PDF]

    The stochastic simulation of biological systems is an increasingly popular technique in Bioinformatics. It is often an enlightening technique, especially for multi-stable systems which dynamics can be hardly captured with ordinary differential equations. To be effective, stochastic simulations should be supported by powerful statistical analysis tools. The simulation-analysis workflow may however result in being computationally expensive, thus compromising the interactivity required in model tuning. In this work we advocate the high-level design of simulators for stochastic systems as a vehicle for building efficient and portable parallel simulators. In particular, the Calculus of Wrapped Components (CWC) simulator, which is designed according to the FastFlow’s pattern-based approach, is presented and discussed in this work. FastFlow has been extended to support also clusters of multi-cores with minimal coding effort, assessing the portability of the approach.

    @inproceedings{ff_cwc_distr:pdp:13,
      abstract = {The stochastic simulation of biological systems is an increasingly popular technique in Bioinformatics. It is often an enlightening technique, especially for multi-stable systems which dynamics can be hardly captured with ordinary differential equations. To be effective, stochastic simulations should be supported by powerful statistical analysis tools. The simulation-analysis workflow may however result in being computationally expensive, thus compromising the interactivity required in model tuning. In this work we advocate the high-level design of simulators for stochastic systems as a vehicle for building efficient and portable parallel simulators. In particular, the Calculus of Wrapped Components (CWC) simulator, which is designed according to the FastFlow's pattern-based approach, is presented and discussed in this work. FastFlow has been extended to support also clusters of multi-cores with minimal coding effort, assessing the portability of the approach.},
      address = {Belfast, Nothern Ireland, U.K.},
      author = {Marco Aldinucci and Fabio Tordini and Maurizio Drocco and Massimo Torquati and Mario Coppo},
      booktitle = {Proc. of Intl. Euromicro PDP 2013: Parallel Distributed and network-based Processing},
      date-added = {2012-01-20 19:22:15 +0100},
      date-modified = {2013-11-24 00:30:43 +0000},
      doi = {10.1109/PDP.2013.66},
      keywords = {fastflow, bioinformatics},
      month = feb,
      publisher = {IEEE},
      title = {Parallel stochastic simulators in system biology: the evolution of the species},
      url = {http://calvados.di.unipi.it/storage/paper_files/2013_cwc_d_PDP.pdf},
      year = {2013},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2013_cwc_d_PDP.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1109/PDP.2013.66}
    }

  • M. Aldinucci, S. Campa, F. Tordini, M. Torquati, and P. Kilpatrick, “An abstract annotation model for skeletons,” in Formal Methods for Components and Objects: Intl. Symposium, FMCO 2011, Torino, Italy, October 3-5, 2011, Revised Invited Lectures, B. Beckert, F. Damiani, F. S. de Boer, and M. M. Bonsangue, Eds., Springer, 2013, vol. 7542, pp. 257-276. doi:10.1007/978-3-642-35887-6_14
    [BibTeX] [Abstract] [Download PDF]

    Multi-core and many-core platforms are becoming increasingly heterogeneous and asymmetric. This significantly increases the porting and tuning effort required for parallel codes, which in turn often leads to a growing gap between peak machine power and actual application performance. In this work a first step toward the automated optimization of high level skeleton-based parallel code is discussed. The paper presents an abstract annotation model for skeleton programs aimed at formally describing suitable mapping of parallel activities on a high-level platform representation. The derived mapping and scheduling strategies are used to generate optimized run-time code.

    @incollection{toolchain:fmco:11,
      abstract = {Multi-core and many-core platforms are becoming increasingly heterogeneous and asymmetric. This significantly increases the porting and tuning effort required for parallel codes, which in turn often leads to a growing gap between peak machine power and actual application performance. In this work a first step toward the automated optimization of high level skeleton-based parallel code is discussed. The paper presents an abstract annotation model for skeleton programs aimed at formally describing suitable mapping of parallel activities on a high-level platform representation. The derived mapping and scheduling strategies are used to generate optimized run-time code.},
      author = {Marco Aldinucci and Sonia Campa and Fabio Tordini and Massimo Torquati and Peter Kilpatrick},
      booktitle = {Formal Methods for Components and Objects: Intl. Symposium, FMCO 2011, Torino, Italy, October 3-5, 2011, Revised Invited Lectures},
      date-added = {2012-06-04 19:23:25 +0200},
      date-modified = {2013-11-24 00:33:41 +0000},
      doi = {10.1007/978-3-642-35887-6_14},
      editor = {Bernhard Beckert and Ferruccio Damiani and Frank S. de Boer and Marcello M. Bonsangue},
      isbn = {978-3-642-35886-9},
      keywords = {fastflow, paraphrase},
      pages = {257-276},
      publisher = {Springer},
      series = {LNCS},
      title = {An abstract annotation model for skeletons},
      url = {http://calvados.di.unipi.it/storage/paper_files/2013_fmco11_annotation.pdf},
      volume = {7542},
      year = {2013},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2013_fmco11_annotation.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-642-35887-6_14}
    }

2012

  • F. Tordini, M. Aldinucci, and M. Torquati, “High-level lock-less programming for multicore,” in Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES) — Poster Abstracts, Fiuggi, Italy, 2012.
    [BibTeX] [Abstract] [Download PDF]

    Modern computers are built upon multi-core architectures. Achieving peak performance on these architectures is hard and may require a substantial programming effort. The synchronisation of many processes racing to access a common resource (the shared memory) has been a fundamental problem on parallel computing for years, and many solutions have been proposed to address this issue. Non-blocking synchronisation and transactional primitives have been envisioned as a way to reduce memory wall problem. Despite sometimes effective (and exhibiting a great momentum in the research community), they are only one facet of the problem, as their exploitation still requires non-trivial programming skills. With non-blocking philosophy in mind, we propose high-level programming patterns that will relieve the programmer from worrying about low-level details such as synchronisation of racing processes as well as those fine tunings needed to improve the overall performance, like proper (distributed) dynamic memory allocation and effective exploitation of the memory hierarchy.

    @inproceedings{ff:acaces:12,
      abstract = {Modern computers are built upon multi-core architectures. Achieving peak performance on these architectures is hard and may require a substantial programming effort. The synchronisation of many processes racing to access a common resource (the shared memory) has been a fundamental problem on parallel computing for years, and many solutions have been proposed to address this issue. Non-blocking synchronisation and transactional primitives have been envisioned as a way to reduce memory wall problem. Despite sometimes effective (and exhibiting a great momentum in the research community), they are only one facet of the problem, as their exploitation still requires non-trivial programming skills.
    With non-blocking philosophy in mind, we propose high-level programming patterns that will relieve the programmer from worrying about low-level details such as synchronisation of racing processes as well as those fine tunings needed to improve the overall performance, like proper (distributed) dynamic memory allocation and effective exploitation of the memory hierarchy.},
      address = {Fiuggi, Italy},
      author = {Fabio Tordini and Marco Aldinucci and Massimo Torquati},
      booktitle = {Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES) -- Poster Abstracts},
      date-added = {2012-07-17 17:58:06 +0200},
      date-modified = {2013-11-24 00:36:10 +0000},
      isbn = {9789038219875},
      keywords = {fastflow},
      publisher = {HiPEAC},
      title = {High-level lock-less programming for multicore},
      url = {http://calvados.di.unipi.it/storage/paper_files/2012_ACACES_ex-abstract.pdf},
      year = {2012},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2012_ACACES_ex-abstract.pdf}
    }