Thèse – M2 Apprentissage et Algorithmes (M2A)

May 31, 2025

Université du Luxembourg

https://recruitment.uni.lu/fr/details.html?id=QMUFK026203F3VBQB7V7VV4S8&nPostingID=99918&nPostingTargetID=144562&mask=karriereseiten&lg=FR

May 9, 2025

The Chair of Risk, Safety and Uncertainty quantification at ETH Zurich (Switzerland) opens two PhD positions fully funded for 4 years, as part of the ORACLES project (“Optimization, Reliability And CaLibration using Emulators of Stochastic computational models“) recently funded by the Swiss National Science Foundation. The two PhD positions are:

Reliability-Based Design Optimization using Stochastic Emulators (details here)
Hierarchical Bayesian Inference using Stochastic Emulators (details here)

If you’re motivated by challenging problems at the intersection of computational science, statistics, and engineering, and if youwant to work in a vibrant, international research environment, we’d love to hear from you! More information: Prof. Bruno Sudret (sudret@ethz.ch) – Applications ONLY through the above links

May 6, 2025

Université Gustave Eiffel

Dans le cadre d’un projet financé par l’institut des Mathématiques pour la Planète Terre (IMPT, https://impt.math.cnrs.fr/), nous cherchons un·e candidat·e pour une thèse de mathématiques appliquées à l’écologie évolutive autour de “Matrices Aléatoires, Diversité et Coévolution”.
Le sujet de thèse complet, comprenant les modalités de candidature, est disponible ici :
https://drive.google.com/file/d/1hanzHz6VHvQhBignEZeXOhktYUnLb60e/view?usp=sharing
Nous commencerons à étudier les candidatures à compter du 31 Mai 2025.

May 6, 2025

Sorbonne Université

2 offres de thèse.

May 2, 2025

Université de Caen Normandie

Explainable AI for Graph Data Augmentation in Machine Learning
PhD position – starting fourth quarter 2025

Location :
GREYC laboratory, CNRS UMR 6072, Université de Caen Normandie, 14000 Caen,
France

Scientific context

Pandora
This thesis is financed within the Pandora project funded by the French ANR
(National Research Agency), underway since February 2025. Pandora is
situated in the
context of explainable artificial intelligence (XAI) as applied to graph
neural networks (GNN). By focusing on the internal functioning of GNNs, the
objectives of the project are as follows :
— characterize, understand and clearly explain the internal workings of
GNNs using pattern extraction techniques ;
— uncover statistically significant patterns of neural activation, called
“activation rules,” to determine how networks encode concepts [7, 8] ;
— translate these activation rules into graph patterns interpretable by a
user ;
— use this knowledge to improve GNNs by identifying learning biases,
generating additional data, and building explanatory systems.
The thesis will be concerned with the last of those research questions. The
work carried out in this project (and by extension in the thesis) will be
partially
based on molecular data resulting from biochemical experiments from our
collaboration with the CERMN laboratory (Centre d’Études et de Recherche
sur le Médicament de Normandie), University of Caen Normandy.

Problem setting
In machine learning, we do not always have training data sets that are
sufficiently representative of the real world (for example,
chemical/biological experiments
often focus only on certain well-explored molecules or certain therapeutic
targets). How to detect that a training data set is insufficient ? Two
non-exhaustive proposals for this :
— possible parts of the data space are not represented (e.g. some node/edge
combinations cannot be found).
— the learned model is unreliable in some subspaces of the data (the
reliability of a supervised model can be studied, for example, by looking
at the importance of instances in the construction of decision boundaries).
The literature contains methods to characterize data in a model-independent
manner [5] and methods to characterize the behavior of a model based on the
components of the individual graphs considered [9, 2, 6, 3, 4, 1]. However,
there is no approach that establishes the link between data and the
performance of a specific model. Furthermore, there exist no approaches for
augmenting the data as a means for improving model performance and
reliability. The thesis is intended to address these gaps.

Objectives
This thesis has three objectives. First, we want to characterize at a
global level graph datasets in a way similar to that already used for
vectorial datasets. Second, we want to design one (or more) approaches to
use the explanations of the behavior of GNNs to identify relevant instances
of the training set used. Finally, we leverage the results of the first two
points to generate additional data instances to improve the data set and
therefore render GNNs more accurate and more robust.

Topic and overview of the work plan of the thesis
In short, the thesis deals with the use of patterns learnt from GNN to
improve GNNs by identifying learning biases, generating additional data,
and building explanatory systems. More precisely, we wish to develop new
methods to improve the learning of graph models by relying on the analysis
of the internal functioning of these models via, for example, activation
rules expressed in the latent space. This will involve analyzing decision
boundaries, characterizing the errors of the model studied in the data
space or in their latent representations in order to propose corrective
solutions. This approach can be broken down
into sub-problems :
Data characterization and bias identification. The characterization of
training data can help identify instances on which the model commits errors
but also detect whether the data are not the source of bias in learning.
One work direction is to study the complexity of activation rules and
compare them to domain knowledge. Targeted generation of additional data.
Once the model’s limitations have been identified, we want to automatically
define “corrective patches” to improve the model’s robustness. A preferred
area of work will be the generation of targeted additional data to allow
the model to better separate the data according to the class studied in the
constructed representation.
The first problem, i.e. data characterization will start from the knowledge
developed in meta-learning for vectorial data, combined with existing work
explaining GNN predictions and on activation rules.
The second problem poses relatively complex research questions since
realistic graph data with desired properties is rather hard to generate.
While a number of graph data generators exist in the literature, the
generated data have often been found to lack properties observed in
real-world data.

Preliminary work plan

Conduct a literature review of methods for explaining the behavior of
GNN models [9, 2, 8, 7, 6, 3, 4, 1]. The aim of this study is to establish
in what sense the different methods identify certain aspects of the data
used to train the model.
Design and implement approaches to identify the instances (graphs)
involved by the explanatory descriptors/rules. It is not certain that such
approaches will be found for all of them, which will then lead to a
selection of descriptors. Highlighting the instances and subgraphs linked
to the explanatory descriptors/rules will also allow to determine how the
descriptors characterize different subsets of data.
Develop a formalism to extend concepts defined for vector data (density,
decision boundaries, value distribution) to graph data. This formalism, in
combination with the results of step 2, will allow to determine where
learning instances are missing in a training dataset and thus where it is
useful to generate synthetic data.
Exploit the information derived from the first three points, as well as
others — for instance graph patterns extracted using pattern mining methods
— to define
constraints on symbolic data generators to arrive at data with precise
properties that fill the gaps in the data sets.
Evaluate the generated data in the context of project use cases,
particularly molecular data activity prediction.

Keywords : Statistical learning, graph neural networks, explainable AI,
data mining.

Thesis period : Starting in autumn 2025

Remuneration : Approximately 2,200e gross per month.

Supervising team :
— Bruno Crémilleux (GREYC – Université de Caen Normandie).
— Marc Plantevit (LRE – EPITA)
— Albrecht Zimmermann (GREYC – Université de Caen Normandie).

Candidate profile
The candidate must be enrolled in the final year of a Master’s degree or an
engineering degree, or hold such a degree, in a field related to computer
science or applied mathematics, and have solid programming skills.
Experience in data science, deep learning, etc. would be a plus.The
candidate must be able to write scientific reports and communicate research
results at conferences in English.

To apply
Application period : from now until the position is filled.
Send the following documents (exclusively in pdf format) to
bruno.cremilleux@unicaen.fr, marc.plantevit@epita.fr et
albrecht.zimmermann@unicaen.fr :
— cover letter explaining your qualifications, experiences and motivation
for this subject ;
— curriculum vitae ;
— transcript of grades (if possible with ranking) of 3rd year of Bachelor’s
degree, 1st and 2nd year of Master’s degree or equivalent for engineering
schools ;
— if possible, names of people (teachers or other person) who can provide
information on your skills and your work ;
— a link to personal project repositories (e.g. GitHub) ;
— any other information you consider useful.

Références
[1] C. Abrate, G. Preti, and F. Bonchi. Counterfactual explanations for
graph classification
through the lenses of density. In World Conference on Explainable
Artificial Intelligence,
pages 324–348. Springer, 2023.
[2] A. Duval and F. D. Malliaros. Graphsvx : Shapley value explanations for
graph neu-
ral networks. In Machine Learning and Knowledge Discovery in Databases.
Research
Track : European Conference, ECML PKDD 2021, Bilbao, Spain, September 13–17,
2021, Proceedings, Part II 21, pages 302–318. Springer, 2021.
[3] Q. Huang, M. Yamada, Y. Tian, D. Singh, and Y. Chang. Graphlime : Local
interpre-
table model explanations for graph neural networks. IEEE Transactions on
Knowledge
and Data Engineering, 35(7) :6968–6972, 2022.
[4] A. Mastropietro, G. Pasculli, C. Feldmann, R. Rodríguez-Pérez, and J.
Bajorath. Ed-
geshaper : Bond-centric shapley value-based explanation method for graph
neural net-
works. Iscience, 25(10), 2022.
[5] M. A. Munoz, L. Villanova, D. Baatar, and K. Smith-Miles. Instance
spaces for machine
learning classification. Machine Learning, 107(1) :109–147, 2018.
[6] A. Perotti, P. Bajardi, F. Bonchi, and A. Panisson. Graphshap :
Explaining
identity-aware graph classifiers through the language of motifs. arXiv
preprint
arXiv :2202.08815, 2022.
[7] L. Veyrin-Forrer, A. Kamal, S. Duffner, M. Plantevit, and C. Robardet.
In pursuit of
the hidden features of gnn’s internal representations. Data & Knowledge
Engineering,
142 :102097, 2022.
[8] L. Veyrin-Forrer, A. Kamal, S. Duffner, M. Plantevit, and C. Robardet.
On gnn ex-
plainability with activation rules. Data Mining and Knowledge Discovery,
pages 1–35,
2022.
[9] H. Yuan, H. Yu, J. Wang, K. Li, and S. Ji. On explainability of graph
neural networks
via subgraph explorations. In M. Meila and T. Zhang, editors, Proceedings
of the 38th
International Conference on Machine Learning, volume 139 of Proceedings of
Machine
Learning Research, pages 12241–12252. PMLR, 18–24 Jul 2021.

April 24, 2025

INRIA Nancy

April 2, 2025

University of Liverpool

We are seeking a highly motivated individual for a fully funded PhD
studentship at the University of Liverpool, working on a cutting-edge
project titled “*Adaptive Robotic Chemists for Resilient Pharmaceuticals*”
as part of our Centre for Doctoral Training in Digital and Automated
Materials Chemistry.

This exciting interdisciplinary project sits at the intersection of *Robotics,
Artificial Intelligence, and Chemistry/Materials Science*. The goal is to
develop the next generation of autonomous robotic chemists for resilient
and efficient pharmaceutical manufacturing. Specifically, we will explore
how intelligent robotic systems can be designed and deployed for
dissolution testing using a human-robot collaborative approach, in
collaboration with our industrial partner Bristol Myers Squibb (a
multi-national pharma company).

*The Candidate:* We are looking for candidates with a strong background in
a relevant discipline such as:

– Chemistry or Materials Science (with strong computational
interest/skills)
– Computer Science or AI (with interest in scientific applications)
– Engineering (Chemical, Electrical, Mechatronic, or related fields)

Essential skills/interests include a passion for research, strong
problem-solving abilities, and enthusiasm for working across disciplines.
Programming skills (e.g., Python/C++) and an interest in robotics are
essential.

*The Environment:* You will join a vibrant, collaborative research
community at the University of Liverpool and the Materials Innovation
Factory <https://www.liverpool.ac.uk/materials-innovation-factory/>,
working alongside leading researchers in digital chemistry, robotics, and
AI.

*Funding:* This is a fully funded studentship for UK Home
students/EU/international candidates.

*How to Apply:* For full project details, eligibility criteria, and the
online application process, please visit the formal advert:
https://www.liverpool.ac.uk/study/postgraduate-research/studentships/adaptive-robotic-chemists-for-resilient-pharmaceuticals/

*Application Deadline: 25th May 2025* – Early application is strongly
encouraged as we will close the post once a suitable candidate is found.

*Informal Enquiries:* For further information about the project, please
contact myself on gabriella.pizzuto@liverpool.ac.uk.

We believe this project offers a unique opportunity to contribute to a
rapidly evolving field with significant real-world impact. We look forward
to receiving your applications.

Best regards,

Gabriella
*Dr Gabriella Pizzuto*
*Royal Academy of Engineering Research Fellow*
*Lecturer in Robotics and Chemistry Automation*
Department of Computer Science
Department of Chemistry
University of Liverpool
gabriella.pizzuto@liverpool.ac.uk <z.oriou@liverpool.ac.uk>

April 2, 2025

Italian Institute of Technology

PHD PROGRAM in TRANSLATIONAL NEUROSCIENCES AND NEUROTECHNOLOGIES

The Center for Translational Neurophysiology of Speech and Communication (CTNSC) @ Italian Institute of Technology (IIT), jointly with the University of Ferrara, is opening a number of PhD positions starting in November 1st, 2025.

Research areas:
– Improving performance and biocompatibility of electrode arrays for brain-computer interfaces
– Organic neuroelectronics for multimodal recordings and stimulation of the brain in vivo
– Hardware and software development for innovative exploration of brain signals
– Machine learning applications to multimodal brain and speech signals
– Investigation of sensorimotor functions in animal models
– Cortical recordings in human patients during awake Neurosurgery
– Human non-invasive neurophysiology of speech and sensorimotor communication by means of TMS, EEG, EMG and MoCap

Who: physicists, computer scientists, biomedical/electrical engineers, biologists, biotechnologist, medical doctors and experimental psychologists eager to work in an international and multidisciplinary team.

Where: The CTNSC (https://www.iit.it/it/ctnsc-unife) is hosted by the University of Ferrara (UNIFE) in a prestigious historical building in the city center. Ferrara is a well connected renaissance city (30-min to Bologna, 40-min to Padua, 60-min to Venice; 2 nearby international airports), bustling with students (https://whc.unesco.org/en/list/733).

***The Application will open in June***

For early inquiry, please contact: alessandro.dausilio@iit.it