Robert René Maria Birke


Robert Birke

Tenured assistant professor at Computer Science Department, University of Turin
Parallel Computing group
Via Pessinetto 12, 10149 Torino – Italy 
Email: robert.birke@unito.it

Short Bio

Robert Birke is a tenured assistant professor in the Parallel Computing research group at the University of Torino. He received his Ph.D. in Electronics and Communications Engineering from the Politecnico di Torino, Italy, (2009).
He has been a visiting researcher at IBM Research Zurich, Switzerland, and a Principal Scientist at ABB Corporate Research, Switzerland. His research interests are in the broad area of virtual resource management including network design, workload characterization, and AI and big-data application optimization.
He has published more than 90 papers at venues related to communication, system performance and machine learning, e.g., SIGCOMM, SIGMETRICS, FAST, INFOCOM, ACML and JSAC.
He is a senior member of IEEE.

Publications

To appear

  • A. Ghiassi, R. Birke, and L. Chen, “Robust learning via golden symmetric loss of (un)trusted labels,” in SDM ’23: SIAM international conference on data mining, To appear.
    [BibTeX] [Abstract] [Download PDF]

    Learning robust deep models against noisy labels becomes ever critical when today’s data is commonly collected from open platforms and subject to adversarial corruption. The information on the label corruption process, i.e., corruption matrix, can greatly enhance the robustness of deep models but still fall behind in combating hard classes. In this paper, we propose to construct a golden symmetric loss (GSL) based on the estimated corruption matrix as to avoid overfitting to noisy labels and learn effectively from hard classes. GSL is the weighted sum of the corrected regular cross entropy and reverse cross entropy. By leveraging a small fraction of trusted clean data, we estimate the corruption matrix and use it to correct the loss as well as to determine the weights of GSL. We theoretically prove the robustness of the proposed loss function in the presence of dirty labels. We provide a heuristics to adaptively tune the loss weights of GSL according to the noise rate and diversity measured from the dataset. We evaluate our proposed golden symmetric loss on both vision and natural language deep models subject to different types of label noise patterns. Empirical results show that GSL can significantly outperform the existing robust training methods on different noise patterns, showing accuracy improvement up to 18% on CIFAR-100 and 1% on real world noisy dataset of Clothing1M.

    @inproceedings{sdm-ghiassi23,
    abstract = {Learning robust deep models against noisy labels becomes ever critical when today's data is commonly collected from open platforms and subject to adversarial corruption. The information on the label corruption process, i.e., corruption matrix, can greatly enhance the robustness of deep models but still fall behind in combating hard classes. In this paper, we propose to construct a golden symmetric loss (GSL) based on the estimated corruption matrix as to avoid overfitting to noisy labels and learn effectively from hard classes. GSL is the weighted sum of the corrected regular cross entropy and reverse cross entropy. By leveraging a small fraction of trusted clean data, we estimate the corruption matrix and use it to correct the loss as well as to determine the weights of GSL. We theoretically prove the robustness of the proposed loss function in the presence of dirty labels. We provide a heuristics to adaptively tune the loss weights of GSL according to the noise rate and diversity measured from the dataset. We evaluate our proposed golden symmetric loss on both vision and natural language deep models subject to different types of label noise patterns. Empirical results show that GSL can significantly outperform the existing robust training methods on different noise patterns, showing accuracy improvement up to 18% on CIFAR-100 and 1% on real world noisy dataset of Clothing1M.},
    author = {Amirmasoud Ghiassi and Robert Birke and Lydia Chen},
    booktitle = {{SDM} '23: {SIAM} International Conference on Data Mining},
    title = {Robust Learning via Golden Symmetric Loss of (un)Trusted Labels},
    keywords = {textrossa},
    year = {To appear},
    url = {https://datacloud.di.unito.it/index.php/s/b6z3moNLxnNiCxz}
    }

2022

  • B. Cox, R. Birke, and L. Y. Chen, “Memory-aware and context-aware multi-dnn inference on the edge,” Pervasive and mobile computing, vol. 83, p. 101594, 2022. doi:https://doi.org/10.1016/j.pmcj.2022.101594
    [BibTeX] [Abstract] [Download PDF]

    Deep neural networks (DNNs) are becoming the core components of many applications running on edge devices, especially for real time image-based analysis. Increasingly, multi-faced knowledge is extracted by executing multiple DNNs inference models, e.g., identifying objects, faces, and genders from images. It is of paramount importance to guarantee low response times of such multi-DNN executions as it affects not only users quality of experience but also safety. The challenge, largely unaddressed by the state of the art, is how to overcome the memory limitation of edge devices without altering the DNN models. In this paper, we design and implement Masa, a responsive memory-aware multi-DNN execution and scheduling framework, which requires no modification of DNN models. The aim of Masa is to consistently ensure the average response time when deterministically and stochastically executing multiple DNN-based image analyses. The enabling features of Masa are (i) modeling inter- and intra-network dependency, (ii) leveraging complimentary memory usage of each layer, and (iii) exploring the context dependency of DNNs. We verify the correctness and scheduling optimality via mixed integer programming. We extensively evaluate two versions of Masa, context-oblivious and context-aware, on three configurations of Raspberry Pi and a large set of popular DNN models triggered by different generation patterns of images. Our evaluation results show that Masa can achieve lower average response times by up to 90% on devices with small memory, i.e., 512 MB to 1 GB, compared to the state of the art multi-DNN scheduling solutions.

    @article{COX2022101594,
    abstract = {Deep neural networks (DNNs) are becoming the core components of many applications running on edge devices, especially for real time image-based analysis. Increasingly, multi-faced knowledge is extracted by executing multiple DNNs inference models, e.g., identifying objects, faces, and genders from images. It is of paramount importance to guarantee low response times of such multi-DNN executions as it affects not only users quality of experience but also safety. The challenge, largely unaddressed by the state of the art, is how to overcome the memory limitation of edge devices without altering the DNN models. In this paper, we design and implement Masa, a responsive memory-aware multi-DNN execution and scheduling framework, which requires no modification of DNN models. The aim of Masa is to consistently ensure the average response time when deterministically and stochastically executing multiple DNN-based image analyses. The enabling features of Masa are (i) modeling inter- and intra-network dependency, (ii) leveraging complimentary memory usage of each layer, and (iii) exploring the context dependency of DNNs. We verify the correctness and scheduling optimality via mixed integer programming. We extensively evaluate two versions of Masa, context-oblivious and context-aware, on three configurations of Raspberry Pi and a large set of popular DNN models triggered by different generation patterns of images. Our evaluation results show that Masa can achieve lower average response times by up to 90% on devices with small memory, i.e., 512 MB to 1 GB, compared to the state of the art multi-DNN scheduling solutions.},
    author = {Bart Cox and Robert Birke and Lydia Y. Chen},
    doi = {https://doi.org/10.1016/j.pmcj.2022.101594},
    issn = {1574-1192},
    journal = {Pervasive and Mobile Computing},
    pages = {101594},
    title = {Memory-aware and context-aware multi-DNN inference on the edge},
    url = {https://www.sciencedirect.com/science/article/pii/S1574119222000372},
    volume = {83},
    year = {2022},
    bdsk-url-1 = {https://www.sciencedirect.com/science/article/pii/S1574119222000372},
    bdsk-url-2 = {https://doi.org/10.1016/j.pmcj.2022.101594}
    }

  • C. Stewart, N. Morris, L. Y. Chen, and R. Birke, “Performance modeling for short-term cache allocation,” in Proceedings of the 51st international conference on parallel processing (ICPP), 2022, p. 31:1–31:11. doi:10.1145/3545008.3545094
    [BibTeX] [Abstract] [Download PDF]

    Short-term cache allocation grants and then revokes access to processor cache lines dynamically. For online services, short-term allocation can speed up targeted query executions and free up cache lines reserved, but normally not needed, for performance. However, in collocated settings, short-term allocation can increase cache contention, slowing down collocated query executions. To offset slowdowns, collocated services may request short-term allocation more often, making the problem worse. Short-term allocation policies manage which queries receive cache allocations and when. In collocated settings, these policies should balance targeted query speedups against slowdowns caused by recurring cache contention. We present a model-driven approach that (1) predicts response time under a given policy, (2) explores competing policies and (3) chooses policies that yield low response time for all collocated services. Our approach profiles cache usage offline, characterizes the effects of cache allocation policies using deep learning techniques and devises novel performance models for short-term allocation with online services. We tested our approach using data processing, cloud, and high-performance computing benchmarks collocated on Intel processors equipped with Cache Allocation Technology. Our models predicted median response time with 11\% absolute percent error. Short-term allocation policies found using our approach out performed state-of-the-art shared cache allocation policies by 1.2-2.3X.

    @inproceedings{icpp-stewart22,
    abstract = {Short-term cache allocation grants and then revokes access to processor cache lines dynamically. For online services, short-term allocation can speed up targeted query executions and free up cache lines reserved, but normally not needed, for performance. However, in collocated settings, short-term allocation can increase cache contention, slowing down collocated query executions. To offset slowdowns, collocated services may request short-term allocation more often, making the problem worse. Short-term allocation policies manage which queries receive cache allocations and when. In collocated settings, these policies should balance targeted query speedups against slowdowns caused by recurring cache contention. We present a model-driven approach that (1) predicts response time under a given policy, (2) explores competing policies and (3) chooses policies that yield low response time for all collocated services. Our approach profiles cache usage offline, characterizes the effects of cache allocation policies using deep learning techniques and devises novel performance models for short-term allocation with online services. We tested our approach using data processing, cloud, and high-performance computing benchmarks collocated on Intel processors equipped with Cache Allocation Technology. Our models predicted median response time with 11\% absolute percent error. Short-term allocation policies found using our approach out performed state-of-the-art shared cache allocation policies by 1.2-2.3X.},
    author = {Christopher Stewart and
    Nathaniel Morris and
    Lydia Y. Chen and
    Robert Birke},
    booktitle = {Proceedings of the 51st International Conference on Parallel Processing ({ICPP})},
    doi = {10.1145/3545008.3545094},
    month = {29 Aug -- 1 Sep},
    pages = {31:1--31:11},
    publisher = {{ACM}},
    title = {Performance Modeling for Short-Term Cache Allocation},
    url = {https://doi.org/10.1145/3545008.3545094},
    year = {2022}
    }

  • Y. Zhu, Z. Zhao, R. Birke, and L. Y. Chen, “Permutation-invariant tabular data synthesis,” in IEEE international conference on big data (big data), 2022, p. 5855–5864. doi:10.1109/BigData55660.2022.10020639
    [BibTeX] [Abstract] [Download PDF]

    Tabular data synthesis is an emerging approach to circumvent strict regulations on data privacy while discovering knowledge through big data. Although state-of-the-art AI-based tabular data synthesizers, e.g., table-GAN, CTGAN, TVAE, and CTAB-GAN, are effective at generating synthetic tabular data, their training is sensitive to column permutations of input data. In this paper, we first c onduct a n e xtensive e mpirical s tudy to disclose such a property of permutation invariance and an in-depth analysis of the existing synthesizers. We show that changing the input column order worsens the statistical difference between real and synthetic data by up to 38.67\% due to the encoding of tabular data and the network architectures. To fully unleash the potential of big synthetic tabular data, we propose two solutions: (i) AE-GAN, a synthesizer that uses an autoencoder network to represent the tabular data and GAN networks to synthesize the latent representation, and (ii) a feature sorting algorithm to find t he s uitable c olumn o rder o f i nput d ata f or CNN-based synthesizers. We evaluate the proposed solutions on five datasets in terms of the sensitivity to the column permutation, the quality of synthetic data, and the utility in downstream analyses. Our results show that we enhance the property of permutation-invariance when training synthesizers and further improve the quality and utility of synthetic data, up to 22\%, compared to the existing synthesizers.

    @inproceedings{bigdata-zhu22,
    abstract = {Tabular data synthesis is an emerging approach to circumvent strict regulations on data privacy while discovering knowledge through big data. Although state-of-the-art AI-based tabular data synthesizers, e.g., table-GAN, CTGAN, TVAE, and CTAB-GAN, are effective at generating synthetic tabular data, their training is sensitive to column permutations of input data. In this paper, we first c onduct a n e xtensive e mpirical s tudy to disclose such a property of permutation invariance and an in-depth analysis of the existing synthesizers. We show that changing the input column order worsens the statistical difference between real and synthetic data by up to 38.67\% due to the encoding of tabular data and the network architectures. To fully unleash the potential of big synthetic tabular data, we propose two solutions: (i) AE-GAN, a synthesizer that uses an autoencoder network to represent the tabular data and GAN networks to synthesize the latent representation, and (ii) a feature sorting algorithm to find t he s uitable c olumn o rder o f i nput d ata f or CNN-based synthesizers. We evaluate the proposed solutions on five datasets in terms of the sensitivity to the column permutation, the quality of synthetic data, and the utility in downstream analyses. Our results show that we enhance the property of permutation-invariance when training synthesizers and further improve the quality and utility of synthetic data, up to 22\%, compared to the existing synthesizers.},
    author = {Yujin Zhu and
    Zilong Zhao and
    Robert Birke and
    Lydia Y. Chen},
    editor = {Shusaku Tsumoto and
    Yukio Ohsawa and
    Lei Chen and
    Dirk Van den Poel and
    Xiaohua Hu and
    Yoichi Motomura and
    Takuya Takagi and
    Lingfei Wu and
    Ying Xie and
    Akihiro Abe and
    Vijay Raghavan},
    title = {Permutation-Invariant Tabular Data Synthesis},
    booktitle = {{IEEE} International Conference on Big Data (Big Data)},
    doi = {10.1109/BigData55660.2022.10020639},
    month = {17--20 Dec},
    pages = {5855--5864},
    publisher = {{IEEE}},
    year = {2022},
    url = {https://datacloud.di.unito.it/index.php/s/b6z3moNLxnNiCxz}
    }

  • A. Ghiassi, R. Birke, and L. Y. Chen, “LABNET: A collaborative method for DNN training and label aggregation,” in 14th international conference on agents and artificial intelligence (ICAART), 2022, p. 56–66. doi:10.5220/0010770400003116
    [BibTeX] [Abstract] [Download PDF]

    Today, to label the massive datasets needed to train Deep Neural Networks (DNNs), cheap and error-prone methods such as crowdsourcing are used. Label aggregation methods aim to infer the true labels from noisy labels annotated by crowdsourcing workers via labels statistics features. Aggregated labels are the main data source to train deep neural networks, and their accuracy directly affects the deep neural network performance. In this paper, we argue that training DNN and aggregating labels are not two separate tasks. Incorporation between DNN training and label aggregation connects data features, noisy labels, and aggregated labels. Since each image contains valuable knowledge about its label, the data features help aggregation methods enhance their performance. We propose LABNET an iterative two-step method. Step one: the label aggregation algorithm provides labels to train the DNN. Step two: the DNN shares a representation of the data features with the label aggregation algorithm. These steps are repeated until the converging label aggregation error rate. To evaluate LABNET we conduct an extensive empirical comparison on CIFAR-10 and CIFAR-100 under different noise and worker statistics. Our evaluation results show that LABNET achieves the highest mean accuracy with an increase of at least 8% to 0.6% and lowest error rate with a reduction of 7.5% to 0.25% against existing aggregation and training methods in most cases.

    @inproceedings{ghiassi/iccart22,
    abstract = {Today, to label the massive datasets needed to train Deep Neural Networks (DNNs), cheap and error-prone methods such as crowdsourcing are used. Label aggregation methods aim to infer the true labels from noisy labels annotated by crowdsourcing workers via labels statistics features. Aggregated labels are the main data source to train deep neural networks, and their accuracy directly affects the deep neural network performance. In this paper, we argue that training DNN and aggregating labels are not two separate tasks. Incorporation between DNN training and label aggregation connects data features, noisy labels, and aggregated labels. Since each image contains valuable knowledge about its label, the data features help aggregation methods enhance their performance. We propose LABNET an iterative two-step method. Step one: the label aggregation algorithm provides labels to train the DNN. Step two: the DNN shares a representation of the data features with the label aggregation algorithm. These steps are repeated until the converging label aggregation error rate. To evaluate LABNET we conduct an extensive empirical comparison on CIFAR-10 and CIFAR-100 under different noise and worker statistics. Our evaluation results show that LABNET achieves the highest mean accuracy with an increase of at least 8% to 0.6% and lowest error rate with a reduction of 7.5% to 0.25% against existing aggregation and training methods in most cases.},
    author = {Amirmasoud Ghiassi and
    Robert Birke and
    Lydia Y. Chen},
    editor = {Ana Paula Rocha and
    Luc Steels and
    H. Jaap van den Herik},
    title = {{LABNET:} {A} Collaborative Method for {DNN} Training and Label Aggregation},
    booktitle = {14th International Conference on Agents and Artificial
    Intelligence ({ICAART})},
    pages = {56--66},
    publisher = {{SCITEPRESS}},
    year = {2022},
    url = {https://www.scitepress.org/Link.aspx?doi=10.5220/0010770400003116},
    doi = {10.5220/0010770400003116}
    }

2021

  • Z. Zhao, R. Birke, R. Han, B. Robu, S. Bouchenak, S. B. Mokhtar, and L. Y. Chen, “Enhancing robustness of on-line learning models on highly noisy data,” IEEE trans. dependable secur. comput., vol. 18, iss. 5, p. 2177–2192, 2021. doi:10.1109/TDSC.2021.3063947
    [BibTeX] [Abstract] [Download PDF]

    Classification algorithms have been widely adopted to detect anomalies for various systems, e.g., IoT, cloud and face recognition, under the common assumption that the data source is clean, i.e., features and labels are correctly set. However, data collected from the wild can be unreliable due to careless annotations or malicious data transformation for incorrect anomaly detection. In this article, we extend a two-layer on-line data selection framework: Robust Anomaly Detector (RAD) with a newly designed ensemble prediction where both layers contribute to the final anomaly detection decision. To adapt to the on-line nature of anomaly detection, we consider additional features of conflicting opinions of classifiers, repetitive cleaning, and oracle knowledge. We on-line learn from incoming data streams and continuously cleanse the data, so as to adapt to the increasing learning capacity from the larger accumulated data set. Moreover, we explore the concept of oracle learning that provides additional information of true labels for difficult data points. We specifically focus on three use cases, (i) detecting 10 classes of IoT attacks, (ii) predicting 4 classes of task failures of big data jobs, and (iii) recognising 100 celebrities faces. Our evaluation results show that RAD can robustly improve the accuracy of anomaly detection, to reach up to 98.95 percent for IoT device attacks (i.e., +7%), up to 85.03 percent for cloud task failures (i.e., +14%) under 40 percent label noise, and for its extension, it can reach up to 77.51 percent for face recognition (i.e., +39%) under 30 percent label noise. The proposed RAD and its extensions are general and can be applied to different anomaly detection algorithms.

    @article{ZhaoBHRBMC21,
    abstract = {Classification algorithms have been widely adopted to detect anomalies for various systems, e.g., IoT, cloud and face recognition, under the common assumption that the data source is clean, i.e., features and labels are correctly set. However, data collected from the wild can be unreliable due to careless annotations or malicious data transformation for incorrect anomaly detection. In this article, we extend a two-layer on-line data selection framework: Robust Anomaly Detector (RAD) with a newly designed ensemble prediction where both layers contribute to the final anomaly detection decision. To adapt to the on-line nature of anomaly detection, we consider additional features of conflicting opinions of classifiers, repetitive cleaning, and oracle knowledge. We on-line learn from incoming data streams and continuously cleanse the data, so as to adapt to the increasing learning capacity from the larger accumulated data set. Moreover, we explore the concept of oracle learning that provides additional information of true labels for difficult data points. We specifically focus on three use cases, (i) detecting 10 classes of IoT attacks, (ii) predicting 4 classes of task failures of big data jobs, and (iii) recognising 100 celebrities faces. Our evaluation results show that RAD can robustly improve the accuracy of anomaly detection, to reach up to 98.95 percent for IoT device attacks (i.e., +7%), up to 85.03 percent for cloud task failures (i.e., +14%) under 40 percent label noise, and for its extension, it can reach up to 77.51 percent for face recognition (i.e., +39%) under 30 percent label noise. The proposed RAD and its extensions are general and can be applied to different anomaly detection algorithms.},
    author = {Zilong Zhao and Robert Birke and Rui Han and Bogdan Robu and Sara Bouchenak and Sonia Ben Mokhtar and Lydia Y. Chen},
    doi = {10.1109/TDSC.2021.3063947},
    journal = {{IEEE} Trans. Dependable Secur. Comput.},
    number = {5},
    pages = {2177--2192},
    title = {Enhancing Robustness of On-Line Learning Models on Highly Noisy Data},
    url = {https://doi.org/10.1109/TDSC.2021.3063947},
    volume = {18},
    year = {2021},
    bdsk-url-1 = {https://doi.org/10.1109/TDSC.2021.3063947}
    }

  • R. Birke, J. F. Pérez, Z. Qiu, M. Björkqvist, and L. Y. Chen, “Spare: partial replication for multi-tier applications in the cloud,” IEEE trans. serv. comput., vol. 14, iss. 2, p. 574–588, 2021. doi:10.1109/TSC.2017.2780845
    [BibTeX] [Abstract] [Download PDF]

    Offering consistent low latency remains a key challenge for distributed applications, especially when deployed on the cloud where virtual machines (VMs) suffer from capacity variability caused by co-located tenants. Replicating redundant requests was shown to be an effective mechanism to defend application performance from high capacity variability. While the prior art centers on single-tier systems, it still remains an open question how to design replication strategies for distributed multi-tier systems. In this paper, we design a first of its kind PArtial REplication system, sPARE, that replicates and dispatches read-only workloads for distributed multi-tier web applications. The two key components of sPARE are (i) the variability-aware replicator that coordinates the replication levels on all tiers via an iterative searching algorithm, and (ii) the replication-aware arbiter that uses a novel token-based arbitration algorithm (TAD) to dispatch requests in each tier. We evaluate sPARE on web serving and searching applications, i.e., MediaWiki and Solr, the former deployed on our private cloud and the latter on Amazon EC2. Our results based on various interference patterns and traffic loads show that sPARE is able to improve the tail latency of MediaWiki and Solr by a factor of almost 2.7x and 2.9x, respectively.

    @article{BirkePQBC21,
    abstract = {Offering consistent low latency remains a key challenge for distributed applications, especially when deployed on the cloud where virtual machines (VMs) suffer from capacity variability caused by co-located tenants. Replicating redundant requests was shown to be an effective mechanism to defend application performance from high capacity variability. While the prior art centers on single-tier systems, it still remains an open question how to design replication strategies for distributed multi-tier systems. In this paper, we design a first of its kind PArtial REplication system, sPARE, that replicates and dispatches read-only workloads for distributed multi-tier web applications. The two key components of sPARE are (i) the variability-aware replicator that coordinates the replication levels on all tiers via an iterative searching algorithm, and (ii) the replication-aware arbiter that uses a novel token-based arbitration algorithm (TAD) to dispatch requests in each tier. We evaluate sPARE on web serving and searching applications, i.e., MediaWiki and Solr, the former deployed on our private cloud and the latter on Amazon EC2. Our results based on various interference patterns and traffic loads show that sPARE is able to improve the tail latency of MediaWiki and Solr by a factor of almost 2.7x and 2.9x, respectively.},
    author = {Robert Birke and Juan F. P{\'{e}}rez and Zhan Qiu and Mathias Bj{\"{o}}rkqvist and Lydia Y. Chen},
    doi = {10.1109/TSC.2017.2780845},
    journal = {{IEEE} Trans. Serv. Comput.},
    number = {2},
    pages = {574--588},
    title = {sPARE: Partial Replication for Multi-Tier Applications in the Cloud},
    url = {https://doi.org/10.1109/TSC.2017.2780845},
    volume = {14},
    year = {2021},
    bdsk-url-1 = {https://doi.org/10.1109/TSC.2017.2780845}
    }

  • Z. Zhao, A. Kunar, R. Birke, and L. Y. Chen, “Ctab-gan: effective table data synthesizing,” in Proceedings of the 13th asian conference on machine learning, 2021, p. 97–112.
    [BibTeX] [Abstract] [Download PDF]

    While data sharing is crucial for knowledge development, privacy concerns and strict regulation (e.g., European General Data Protection Regulation (GDPR)) unfortunately limit its full effectiveness. Synthetic tabular data emerges as an alternative to enable data sharing while fulfilling regulatory and privacy constraints. The state-of-the-art tabular data synthesizers draw methodologies from Generative Adversarial Networks (GAN) and address two main data types in industry, i.e., continuous and categorical. In this paper, we develop CTAB-GAN, a novel conditional table GAN architecture that can effectively model diverse data types, including a mix of continuous and categorical variables. Moreover, we address data imbalance and long tail issues, i.e., certain variables have drastic frequency differences across large values. To achieve those aims, we first introduce the information loss, classification loss and generator loss to the conditional GAN. Secondly, we design a novel conditional vector, which efficiently encodes the mixed data type and skewed distribution of data variable. We extensively evaluate CTAB-GAN with the state of the art GANs that generate synthetic tables, in terms of data similarity and analysis utility. The results on five datasets show that the synthetic data of CTAB-GAN remarkably resembles the real data for all three types of variables and results into higher accuracy for five machine learning algorithms, by up to 17%.

    @inproceedings{pmlr-v157-zhao21a,
    abstract = {While data sharing is crucial for knowledge development, privacy concerns and strict regulation (e.g., European General Data Protection Regulation (GDPR)) unfortunately limit its full effectiveness. Synthetic tabular data emerges as an alternative to enable data sharing while fulfilling regulatory and privacy constraints. The state-of-the-art tabular data synthesizers draw methodologies from Generative Adversarial Networks (GAN) and address two main data types in industry, i.e., continuous and categorical. In this paper, we develop CTAB-GAN, a novel conditional table GAN architecture that can effectively model diverse data types, including a mix of continuous and categorical variables. Moreover, we address data imbalance and long tail issues, i.e., certain variables have drastic frequency differences across large values. To achieve those aims, we first introduce the information loss, classification loss and generator loss to the conditional GAN. Secondly, we design a novel conditional vector, which efficiently encodes the mixed data type and skewed distribution of data variable. We extensively evaluate CTAB-GAN with the state of the art GANs that generate synthetic tables, in terms of data similarity and analysis utility. The results on five datasets show that the synthetic data of CTAB-GAN remarkably resembles the real data for all three types of variables and results into higher accuracy for five machine learning algorithms, by up to 17%.},
    author = {Zhao, Zilong and Kunar, Aditya and Birke, Robert and Chen, Lydia Y.},
    booktitle = {Proceedings of The 13th Asian Conference on Machine Learning},
    editor = {Balasubramanian, Vineeth N. and Tsang, Ivor},
    month = {17--19 Nov},
    pages = {97--112},
    pdf = {https://proceedings.mlr.press/v157/zhao21a/zhao21a.pdf},
    publisher = {PMLR},
    series = {Proceedings of Machine Learning Research},
    title = {CTAB-GAN: Effective Table Data Synthesizing},
    url = {https://proceedings.mlr.press/v157/zhao21a.html},
    volume = {157},
    year = {2021},
    bdsk-url-1 = {https://proceedings.mlr.press/v157/zhao21a.html}
    }

  • T. Younesian, Z. Zhao, A. Ghiassi, R. Birke, and L. Y. Chen, “QActor: active learning on noisy labels,” in Proceedings of the 13th asian conference on machine learning, 2021, p. 548–563.
    [BibTeX] [Abstract] [Download PDF]

    Noisy labeled data is more a norm than a rarity for self-generated content that is continuously published on the web and social media from non-experts. Active querying experts are conventionally adopted to provide labels for the informative samples which don’t have labels, instead of possibly incorrect labels. The new challenge that arises here is how to discern the informative and noisy labels which benefit from expert cleaning. In this paper, we aim to leverage the stringent oracle budget to robustly maximize learning accuracy. We propose a noise-aware active learning framework, QActor, and a novel measure \emph{CENT}, which considers both cross-entropy and entropy to select informative and noisy labels for an expert cleansing. QActor iteratively cleans samples via quality models and actively querying an expert on those noisy yet informative samples. To adapt to learning capacity per iteration, QActor dynamically adjusts the query limit according to the learning loss for each learning iteration. We extensively evaluate different image datasets with noise label ratios ranging between 30% and 60%. Our results show that QActor can nearly match the optimal accuracy achieved using only clean data at the cost of only an additional 10% of ground truth data from the oracle.

    @inproceedings{pmlr-v157-younesian21a,
    abstract = {Noisy labeled data is more a norm than a rarity for self-generated content that is continuously published on the web and social media from non-experts. Active querying experts are conventionally adopted to provide labels for the informative samples which don't have labels, instead of possibly incorrect labels. The new challenge that arises here is how to discern the informative and noisy labels which benefit from expert cleaning. In this paper, we aim to leverage the stringent oracle budget to robustly maximize learning accuracy. We propose a noise-aware active learning framework, QActor, and a novel measure \emph{CENT}, which considers both cross-entropy and entropy to select informative and noisy labels for an expert cleansing. QActor iteratively cleans samples via quality models and actively querying an expert on those noisy yet informative samples. To adapt to learning capacity per iteration, QActor dynamically adjusts the query limit according to the learning loss for each learning iteration. We extensively evaluate different image datasets with noise label ratios ranging between 30% and 60%. Our results show that QActor can nearly match the optimal accuracy achieved using only clean data at the cost of only an additional 10% of ground truth data from the oracle.},
    author = {Younesian, Taraneh and Zhao, Zilong and Ghiassi, Amirmasoud and Birke, Robert and Chen, Lydia Y},
    booktitle = {Proceedings of The 13th Asian Conference on Machine Learning},
    editor = {Balasubramanian, Vineeth N. and Tsang, Ivor},
    month = {17--19 Nov},
    pages = {548--563},
    pdf = {https://proceedings.mlr.press/v157/younesian21a/younesian21a.pdf},
    publisher = {PMLR},
    series = {Proceedings of Machine Learning Research},
    title = {{QActor}: Active Learning on Noisy Labels},
    url = {https://proceedings.mlr.press/v157/younesian21a.html},
    volume = {157},
    year = {2021},
    bdsk-url-1 = {https://proceedings.mlr.press/v157/younesian21a.html}
    }

  • A. Ghiassi, R. Birke, and L. Y. Chen, “TrustNet: learning from trusted data against (a)symmetric label noise,” in 8th IEEE/ACM international conference on big data computing, applications and technologies (BDCAT), 2021, p. 52–62. doi:10.1145/3492324.3494166
    [BibTeX] [Abstract] [Download PDF]

    Big Data systems allow collecting massive datasets to feed the data hungry deep learning. Labelling these ever-bigger datasets is increasingly challenging and label errors affect even highly curated sets. This makes robustness to label noise a critical property for weakly-supervised classifiers. The related works on resilient deep networks tend to focus on a limited set of synthetic noise patterns, and with disparate views on their impacts, e.g., robustness against symmetric v.s. asymmetric noise patterns. In this paper, we first extend the theoretical analysis of test accuracy for any given noise patterns. Based on the insights, we design TrustNet that first learns the pattern of noise corruption, being it both symmetric or asymmetric, from a small set of trusted data. Then, TrustNet is trained via a robust loss function, which weights the given labels against the inferred labels from the learned noise pattern. The weight is adjusted based on model uncertainty across training epochs. We evaluate TrustNet on synthetic label noise for CIFAR-10, CIFAR-100 and big real-world data with label noise, i.e., Clothing1M. We compare against state-of-the-art methods demonstrating the strong robustness of TrustNet under a diverse set of noise patterns.

    @inproceedings{bdcat-ghiassi21,
    abstract = {Big Data systems allow collecting massive datasets to feed the data hungry deep learning. Labelling these ever-bigger datasets is increasingly challenging and label errors affect even highly curated sets. This makes robustness to label noise a critical property for weakly-supervised classifiers. The related works on resilient deep networks tend to focus on a limited set of synthetic noise patterns, and with disparate views on their impacts, e.g., robustness against symmetric v.s. asymmetric noise patterns. In this paper, we first extend the theoretical analysis of test accuracy for any given noise patterns. Based on the insights, we design TrustNet that first learns the pattern of noise corruption, being it both symmetric or asymmetric, from a small set of trusted data. Then, TrustNet is trained via a robust loss function, which weights the given labels against the inferred labels from the learned noise pattern. The weight is adjusted based on model uncertainty across training epochs. We evaluate TrustNet on synthetic label noise for CIFAR-10, CIFAR-100 and big real-world data with label noise, i.e., Clothing1M. We compare against state-of-the-art methods demonstrating the strong robustness of TrustNet under a diverse set of noise patterns.},
    author = {Amirmasoud Ghiassi and Robert Birke and Lydia Y. Chen},
    booktitle = {8th {IEEE/ACM} International Conference on Big Data Computing, Applications and Technologies ({BDCAT})},
    doi = {10.1145/3492324.3494166},
    month = {6--9 Dec},
    pages = {52--62},
    publisher = {{ACM}},
    title = {{TrustNet}: Learning from Trusted Data Against (A)symmetric Label Noise},
    url = {https://doi.org/10.1145/3492324.3494166},
    year = {2021},
    bdsk-url-1 = {https://doi.org/10.1145/3492324.3494166}
    }

  • G. Albanese, R. Birke, G. Giannopoulou, S. Schönborn, and T. Sivanthi, “Evaluation of networking options for containerized deployment of real-time applications,” in 26th IEEE international conference on emerging technologies and factory automation (ETFA), 2021, p. 1–8. doi:10.1109/ETFA45728.2021.9613320
    [BibTeX] [Abstract] [Download PDF]

    Enterprises in the field of industrial automation experience an increasing demand for providing virtualized software solutions. Inspired by the recent trends in serverless and cloud computing, software virtualization is considered even for safety-critical applications with hard real-time requirements, as a means of avoiding hardware vendor lock-in and reducing volume and maintenance cost of devices. In this work, we evaluate the applicability of OS-level virtualization to an industrial automation use case. Our application runs in Docker containers on top of Linux patched with PREEMPT_RT. We investigate the ability of Docker coupled with diverse networking technologies to fulfill the latency requirements of the application under normal or heavy system load. We empirically compare four networking technologies with respect to communication latency and frequency of missing packets. The results indicate that Docker with certain technologies, such as the Single Root I/O Virtualization interface, performs robustly even under heavy load, enabling sufficient performance isolation and low overhead that does not jeopardise the real-time performance of our application.

    @inproceedings{etfa-albanese21,
    abstract = {Enterprises in the field of industrial automation experience an increasing demand for providing virtualized software solutions. Inspired by the recent trends in serverless and cloud computing, software virtualization is considered even for safety-critical applications with hard real-time requirements, as a means of avoiding hardware vendor lock-in and reducing volume and maintenance cost of devices. In this work, we evaluate the applicability of OS-level virtualization to an industrial automation use case. Our application runs in Docker containers on top of Linux patched with PREEMPT_RT. We investigate the ability of Docker coupled with diverse networking technologies to fulfill the latency requirements of the application under normal or heavy system load. We empirically compare four networking technologies with respect to communication latency and frequency of missing packets. The results indicate that Docker with certain technologies, such as the Single Root I/O Virtualization interface, performs robustly even under heavy load, enabling sufficient performance isolation and low overhead that does not jeopardise the real-time performance of our application.},
    author = {Giuliano Albanese and Robert Birke and Georgia Giannopoulou and Sandro Sch{\"{o}}nborn and Thanikesavan Sivanthi},
    booktitle = {26th {IEEE} International Conference on Emerging Technologies and Factory Automation ({ETFA})},
    doi = {10.1109/ETFA45728.2021.9613320},
    month = {7--10 Sep},
    pages = {1--8},
    publisher = {{IEEE}},
    title = {Evaluation of Networking Options for Containerized Deployment of Real-Time Applications},
    url = {https://doi.org/10.1109/ETFA45728.2021.9613320},
    year = {2021},
    bdsk-url-1 = {https://doi.org/10.1109/ETFA45728.2021.9613320}
    }

  • A. Ghiassi, R. Birke, R. Han, and L. Y. Chen, “LABELNET: recovering noisy labels,” in International joint conference on neural networks (IJCNN), 2021, p. 1–8. doi:10.1109/IJCNN52387.2021.9533562
    [BibTeX] [Abstract] [Download PDF]

    Today’s available datasets in the wild, e.g., from social media and open platforms, present tremendous opportunities and challenges for deep learning, as there is a significant portion of tagged images, but often with noisy, i.e. erroneous, labels. Recent studies improve the robustness of deep models against noisy labels without the knowledge of true labels. In this paper, we advocate to derive a stronger classifier which proactively makes use of the noisy labels in addition to the original images – turning noisy labels into learning features. To such an end, we propose a novel framework, LABELNET, composed of Amateur and Expert, which iteratively learn from each other. Amateur is a regular image classifier trained by the feedback of Expert, which imitates how human experts would correct the predicted labels from Amateur using the noise pattern learnt from the knowledge of both the noisy and ground truth labels. The trained Amateur and Expert proactively leverage the images and their noisy labels to infer image classes. Our empirical evaluations on noisy versions of MNIST, CIFAR-10, CIFAR-100 and real-world data of Clothing1M show that the proposed model can achieve robust classification against a wide range of noise ratios and with as little as 20-50% training data, compared to state-of-the-art deep models that solely focus on distilling the impact of noisy labels.

    @inproceedings{ijcnn-ghiassi21,
    abstract = {Today's available datasets in the wild, e.g., from social media and open platforms, present tremendous opportunities and challenges for deep learning, as there is a significant portion of tagged images, but often with noisy, i.e. erroneous, labels. Recent studies improve the robustness of deep models against noisy labels without the knowledge of true labels. In this paper, we advocate to derive a stronger classifier which proactively makes use of the noisy labels in addition to the original images - turning noisy labels into learning features. To such an end, we propose a novel framework, LABELNET, composed of Amateur and Expert, which iteratively learn from each other. Amateur is a regular image classifier trained by the feedback of Expert, which imitates how human experts would correct the predicted labels from Amateur using the noise pattern learnt from the knowledge of both the noisy and ground truth labels. The trained Amateur and Expert proactively leverage the images and their noisy labels to infer image classes. Our empirical evaluations on noisy versions of MNIST, CIFAR-10, CIFAR-100 and real-world data of Clothing1M show that the proposed model can achieve robust classification against a wide range of noise ratios and with as little as 20-50% training data, compared to state-of-the-art deep models that solely focus on distilling the impact of noisy labels.},
    author = {Amirmasoud Ghiassi and Robert Birke and Rui Han and Lydia Y. Chen},
    booktitle = {International Joint Conference on Neural Networks ({IJCNN})},
    doi = {10.1109/IJCNN52387.2021.9533562},
    month = {18--22 Jul},
    pages = {1--8},
    publisher = {{IEEE}},
    title = {{LABELNET:} Recovering Noisy Labels},
    url = {https://doi.org/10.1109/IJCNN52387.2021.9533562},
    year = {2021},
    bdsk-url-1 = {https://doi.org/10.1109/IJCNN52387.2021.9533562}
    }

  • B. Cox, J. Galjaard, A. Ghiassi, R. Birke, and L. Y. Chen, “Masa: responsive multi-dnn inference on the edge,” in 19th IEEE international conference on pervasive computing and communications (PerCom), 2021, p. 1–10. doi:10.1109/PERCOM50583.2021.9439111
    [BibTeX] [Abstract] [Download PDF]

    Deep neural networks (DNNs) are becoming the core components of many applications running on edge devices, especially for real time image-based analysis. Increasingly, multi-faced knowledge is extracted via executing multiple DNNs inference models, e.g., identifying objects, faces, and genders from images. The response times of multi-DNN highly affect users’ quality of experience and safety as well. Different DNNs exhibit diversified resource requirements and execution patterns across layers and networks, which may easily exceed the available device memory and riskily degrade the responsiveness. In this paper, we design and implement Masa, a responsive memory-aware multi-DNN execution framework, an on-device middleware featuring on modeling inter- and intra-network dependency and leveraging complimentary memory usage of each layer. Masa can consistently ensure the average response time when deterministically and stochastically executing multiple DNN-based image analyses. We extensively evaluate Masa on three configurations of Raspberry Pi and a large set of popular DNN models triggered by different generation patterns of images. Our evaluation results show that Masa can achieve lower average response times by up to 90% on devices with small memory, i.e., 512 MB to 1 GB, compared to the state of the art multi-DNN scheduling solutions.

    @inproceedings{percom-cox21a,
    abstract = {Deep neural networks (DNNs) are becoming the core components of many applications running on edge devices, especially for real time image-based analysis. Increasingly, multi-faced knowledge is extracted via executing multiple DNNs inference models, e.g., identifying objects, faces, and genders from images. The response times of multi-DNN highly affect users' quality of experience and safety as well. Different DNNs exhibit diversified resource requirements and execution patterns across layers and networks, which may easily exceed the available device memory and riskily degrade the responsiveness. In this paper, we design and implement Masa, a responsive memory-aware multi-DNN execution framework, an on-device middleware featuring on modeling inter- and intra-network dependency and leveraging complimentary memory usage of each layer. Masa can consistently ensure the average response time when deterministically and stochastically executing multiple DNN-based image analyses. We extensively evaluate Masa on three configurations of Raspberry Pi and a large set of popular DNN models triggered by different generation patterns of images. Our evaluation results show that Masa can achieve lower average response times by up to 90% on devices with small memory, i.e., 512 MB to 1 GB, compared to the state of the art multi-DNN scheduling solutions.},
    author = {Bart Cox and Jeroen Galjaard and Amirmasoud Ghiassi and Robert Birke and Lydia Y. Chen},
    booktitle = {19th {IEEE} International Conference on Pervasive Computing and Communications ({PerCom})},
    doi = {10.1109/PERCOM50583.2021.9439111},
    month = {22--26 Mar},
    pages = {1--10},
    publisher = {{IEEE}},
    title = {Masa: Responsive Multi-DNN Inference on the Edge},
    url = {https://doi.org/10.1109/PERCOM50583.2021.9439111},
    year = {2021},
    bdsk-url-1 = {https://doi.org/10.1109/PERCOM50583.2021.9439111}
    }

  • J. Galjaard, B. Cox, A. Ghiassi, L. Y. Chen, and R. Birke, “MemA: fast inference of multiple deep models,” in 19th IEEE international conference on pervasive computing and communications workshops and other affiliated events, 2021, p. 281–286. doi:10.1109/PerComWorkshops51409.2021.9430952
    [BibTeX] [Abstract] [Download PDF]

    The execution of deep neural network (DNN) inference jobs on edge devices has become increasingly popular. Multiple of such inference models can concurrently analyse the on-device data, e.g. images, to extract valuable insights. Prior art focuses on low-power accelerators, compressed neural network architectures, and specialized frameworks to reduce execution time of single inference jobs on edge devices which are resource constrained. However, it is little known how different scheduling policies can further improve the runtime performance of multi-inference jobs without additional edge resources. To enable the exploration of scheduling policies, we first develop an execution framework, EdgeCaffe, which splits the DNN inference jobs by loading and execution of each network layer. We empirically characterize the impact of loading and scheduling policies on the execution time of multi-inference jobs and point out their dependency on the available memory space. We propose a novel memory-aware scheduling policy, MemA, which opportunistically interleaves the executions of different types of DNN layers based on their estimated run-time memory demands. Our evaluation on exhaustive combinations of five networks, data inputs, and memory configurations show that MemA can alleviate the degradation of execution times of multi-inference (up to 5×) under severely constrained memory compared to standard scheduling policies without affecting accuracy.

    @inproceedings{percom-galjaard21,
    abstract = {The execution of deep neural network (DNN) inference jobs on edge devices has become increasingly popular. Multiple of such inference models can concurrently analyse the on-device data, e.g. images, to extract valuable insights. Prior art focuses on low-power accelerators, compressed neural network architectures, and specialized frameworks to reduce execution time of single inference jobs on edge devices which are resource constrained. However, it is little known how different scheduling policies can further improve the runtime performance of multi-inference jobs without additional edge resources. To enable the exploration of scheduling policies, we first develop an execution framework, EdgeCaffe, which splits the DNN inference jobs by loading and execution of each network layer. We empirically characterize the impact of loading and scheduling policies on the execution time of multi-inference jobs and point out their dependency on the available memory space. We propose a novel memory-aware scheduling policy, MemA, which opportunistically interleaves the executions of different types of DNN layers based on their estimated run-time memory demands. Our evaluation on exhaustive combinations of five networks, data inputs, and memory configurations show that MemA can alleviate the degradation of execution times of multi-inference (up to 5×) under severely constrained memory compared to standard scheduling policies without affecting accuracy.},
    author = {Jeroen Galjaard and Bart Cox and Amirmasoud Ghiassi and Lydia Y. Chen and Robert Birke},
    booktitle = {19th {IEEE} International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events},
    doi = {10.1109/PerComWorkshops51409.2021.9430952},
    month = {22--26 Mar},
    pages = {281--286},
    publisher = {{IEEE}},
    title = {{MemA}: Fast Inference of Multiple Deep Models},
    url = {https://doi.org/10.1109/PerComWorkshops51409.2021.9430952},
    year = {2021},
    bdsk-url-1 = {https://doi.org/10.1109/PerComWorkshops51409.2021.9430952}
    }

  • C. Hong, A. Ghiassi, Y. Zhou, R. Birke, and L. Y. Chen, “Online label aggregation: A variational bayesian approach,” in WWW ’21: the web conference 2021, 2021, p. 1904–1915. doi:10.1145/3442381.3449933
    [BibTeX] [Abstract] [Download PDF]

    Noisy labeled data is more a norm than a rarity for crowd sourced contents. It is effective to distill noise and infer correct labels through aggregating results from crowd workers. To ensure the time relevance and overcome slow responses of workers, online label aggregation is increasingly requested, calling for solutions that can incrementally infer true label distribution via subsets of data items. In this paper, we propose a novel online label aggregation framework, BiLA , which employs variational Bayesian inference method and designs a novel stochastic optimization scheme for incremental training. BiLA is flexible to accommodate any generating distribution of labels by the exact computation of its posterior distribution. We also derive the convergence bound of the proposed optimizer. We compare BiLA with the state of the art based on minimax entropy, neural networks and expectation maximization algorithms, on synthetic and real-world data sets. Our evaluation results on various online scenarios show that BiLA can effectively infer the true labels, with an error rate reduction of at least 10 to 1.5 percent points for synthetic and real-world datasets, respectively.

    @inproceedings{www-hong21,
    abstract = {Noisy labeled data is more a norm than a rarity for crowd sourced contents. It is effective to distill noise and infer correct labels through aggregating results from crowd workers. To ensure the time relevance and overcome slow responses of workers, online label aggregation is increasingly requested, calling for solutions that can incrementally infer true label distribution via subsets of data items. In this paper, we propose a novel online label aggregation framework, BiLA , which employs variational Bayesian inference method and designs a novel stochastic optimization scheme for incremental training. BiLA is flexible to accommodate any generating distribution of labels by the exact computation of its posterior distribution. We also derive the convergence bound of the proposed optimizer. We compare BiLA with the state of the art based on minimax entropy, neural networks and expectation maximization algorithms, on synthetic and real-world data sets. Our evaluation results on various online scenarios show that BiLA can effectively infer the true labels, with an error rate reduction of at least 10 to 1.5 percent points for synthetic and real-world datasets, respectively.},
    author = {Chi Hong and Amirmasoud Ghiassi and Yichi Zhou and Robert Birke and Lydia Y. Chen},
    booktitle = {{WWW} '21: The Web Conference 2021},
    doi = {10.1145/3442381.3449933},
    editor = {Jure Leskovec and Marko Grobelnik and Marc Najork and Jie Tang and Leila Zia},
    month = {19--23 Apr},
    pages = {1904--1915},
    publisher = {{ACM} / {IW3C2}},
    title = {Online Label Aggregation: {A} Variational Bayesian Approach},
    url = {https://doi.org/10.1145/3442381.3449933},
    year = {2021},
    bdsk-url-1 = {https://doi.org/10.1145/3442381.3449933}
    }