Programmes de recherche

DC-GNN : Deep Clustering with Graph Neural Networks for real world data

Thanks to technological progress, a huge amount of images, texts, and other types of data are now available. Their labelling, however, is often expensive, tedious, or requires expert knowledge. Hence, there is a real need to develop unsupervised learning (or clustering) techniques. Deep clustering [1] can contribute to a better understanding, retrieval, visualization, and organization of big data, in addition to being an important component of complex decision-making systems. For instance, there has been an increasing interest in the medical community to use clustering techniques in diagnosis of disorders (breast cancer [2], Parkinson and Alzheimer diseases, heart and diabetes diseases, epilepsy [3]).

In this project, we aim to develop new deep clustering techniques based on Graph Neural Networks (GNN) [4,5,6] to improve patient care and therapy. We will focus on applying our algorithms on clinical data, mainly on epileptic datasets that were already provided by the neuropediatric department of CHU of Angers and used in our current and previous research works (thesis of Mohamad Jomaa, Gaëlle Milon-Hranois and Tala Abdallah on automatic detection of biomarkers of children epilepsy) and oncological datasets provided by ICO of Angers and explored in our current research works (thesis of Elena Spirina on predicting clinical outcomes of ovarian cancer).

From theoretical point of view, we seek to develop new mechanisms for direct clustering with Graph Neural Networks. The first goal is to develop GNN blocks that process data and graphs together. The second objective is to develop a suitable loss function to perform clustering in one step. Estimating the cluster index matrix directly may lead to more challenges in developing the solution because the cluster index matrix may have multiple constraints. Third, the goal is to jointly estimate the graph with the GNN architecture. Another extension of these objectives is to handle data with multiple descriptors. In other words, we would address the multi-view clustering. In this case, both the architecture and the losses should be redesigned to obtain a deep learning model that can take into account the complementarity of the different views. To find a solution, we can consider fusion at the feature level (early fusion) or at the data representation level (late fusion). This work will be done in collaboration with Université du Pays Basque.

As a first application, we will focus on epilepsy, which is the second most frequent major neurological disorder in humans after stroke. According to the World Health Organization, the disorder affects 50 million individuals globally. Unfortunately, 30 percent of epilepsy sufferers continue to have unpredictable recurrent seizures. Doctors used to diagnose epilepsy and localize seizures by a visual annotation of electroencephalogram (EEG) signals which is very time-consuming. For this reason, the development of unsupervised techniques that can detect seizure occurrences is crucial. EEGs can be viewed as structural time series, because they are multivariate signals recorded over electrodes placed on scalp. These signals recorded all over the scalp provide prior information about the structure of interactions among brain regions (spatial information). Commonly used deep learning models for time series don't offer a way to leverage structural information, but this would be desirable to have a suitable model for structural time series. To address this challenge, we will develop new GNN clustering techniques to represent the spatiotemporal dependencies in EEGs by capturing the electrode geometry or dynamic brain connectivity, and by proposing a quantitative model interpretability that is able to localize seizures within EEGs.

Similarly, Pan-Gyn cancers entail 1 in 5 cancer cases worldwide, breast cancer being the most commonly diagnosed and responsible for most cancer deaths in women. The high incidence and mortality of these malignancies, together with the handicaps of taxanes —first-line treatments— turn prognosing cancer signatures into an urgency. We will use deep clustering to analyze biopsy image patches. Learning over patch-wise features using non-graphical deep learning models does not allow to capture global contextual information and comprehensively model tissue composition. The phenotypical and topological distribution of constituent histological entities play a critical role in tissue diagnosis. As such, graph data representations and GNN can enhance significantly encoding tissue representations, and capturing intra- and inter- entity level interactions.

Another possible application of GNN is to conceive an intelligent transport system, predicting traffic speed accurately, road volume, or density in traffic networks. We will consider traffic network as a spatial-temporal graph, the sensors installed on the roads as the nodes, and the distance between the pairs of sensors as the edges. Each node is a dynamic input feature with average traffic speed during a frame. We can use open data that are widely available (see [7] for various examples) to validate our algorithms. What we really hope is to apply our algorithms to Angers municipality dataset so that we participate in the improvement of the organization of our city and the daily life of all citizens. This application will make a link between OSPL (graph theory, optimization) and OAV members (data analysis, machine learning) of MAI team.

Finally, a new line of research has emerged recently during the workshop of Faculty of Sciences at UCO Bretagne Nord. The idea is to select the best receptors of seaweed molecules (which could be represented as graphs) that have beneficial effects on skin cells. This project may lead to build new national relationships with E-COS team of UCO Bretagne Nord.


[1] X. Zhan, J. Xie, Z.Liu, Y. Ong, C. Loy. Online deep clustering for unsupervised learning. IEEE CVPR 2020.

[2] Chen, C.-H. (2014). A hybrid intelligent model of analyzing clinical breast cancer data using clustering techniques with feature selection. Appl. Soft. Comput. 20, 4–14. doi: 10.1016/j.asoc.2013.10.024

[3] Wen T and Zhang Z 2018 Deep convolution neural network and autoencoders-based unsupervised feature learning of EEG signals IEEE Access 6 25399–410

[4] X. Zhanga, H. Liua, X. Wuc, X. Zhang, X. Liu Spectral Embedding Network for Attributed Graph Clustering. Neural Networks, 2021.

[5] T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. International Conference on Learning Representations, 2017.

[6] M. T. Kejani, F. Dornaika and H. Talebi. Graph Convolution Networks with Manifold Regularization for Semi-Supervised Learning. Neural Networks, volume 127, pp. 160-167, 2020.

[7] L. Waikhom & R. Patgiri, Graph Neural Networks: Methods, Applications, and Opportunities. arXiv preprint arXiv:2108.10733, 2021.


Porteur(s) du projet
Voir le profil des chercheurs UCO participant au projet
Voir le profil des doctorants participant au projet
Equipes concernées
Durée du programme
01/10/2022 - 30/09/2025