About Me

I am an AI/ML Researcher at Microsoft Gaming (King AI Labs), where my current focus lies in the theoretical foundations of Transformer-based architectures and Large Language Models (LLMs), with an emphasis on understanding their inner mechanisms. In parallel, in line with the research conducted during my PhD, I also investigate the robustness and safety of deep learning systems, aiming to establish theoretical insights that support Responsible AI practices. I am also interested in Temporal Graph Neural Networks and their application to dynamic, real-world systems such as recommendation engines.
I am currently finalizing my PhD in the School of Electrical Engineering and Computer Science at the KTH Royal Institute of Technology in Sweden. My research is conducted under the supervision of Professor Michalis Vazirgiannis and Professor Henrik Boström, with funding from the Wallenberg AI, Autonomous Systems and Software Program (WASP).
My doctoral work explores the robustness of Graph Neural Networks (GNNs), with a particular emphasis on adversarial attacks, both analyzing and developing attack strategies, while designing efficient and theoretically grounded defenses.
I also had the pleasure of spending a summer at the Flatiron Institute (Simons Foundation) as part of the Foundation Models for Science initiative, collaborating with Leopoldo Sarra and Siavash Golkar on extending Joint Embedding approaches for time series.
Previously, I earned an MSc in “Applied Mathematics – Data Sciences” from École Polytechnique (Paris, France) and an Engineering Master's degree from EMINES, School of Industrial Management at Mohammed VI Polytechnic University (UM6P) in Morocco.
News
- Sep 2025 — Happy to share that we have two papers accepted at Neurips 2025!
- May 2025 — Gave a talk on my previous and current work on GNN Robustness in the Metis Spring School organized in Rabat, Morocco.
- Feb 2025 — Our survey paper "Expressivity of Representation Learning on Continuous-Time Dynamic Graphs: An Information-Flow Centric Review" is accepted to TMLR with a Survey Certification !
- Sep 2024 — Our paper "Joint Embedding go Temporal" is accepted to the "Time Series in the Age of Large Models" Workshop at Neurips 2024!
- Sep 2024 — Our paper "If You Want to Be Robust, Be Wary of Initialization" is accepted to Neurips 2024!
- Apr 2024 — Presented my work on GNN robustness on the Deep Learning: Classics and Trends reading group (Collective ML) [Slides].
- Mar 2024 — Presented my work on GNN robustness paper at the Morocco AI Webinar [Slides | Recording].
Selected Publications
-
Transformer models have become the dominant backbone for sequence modeling, leveraging self-attention to produce contextualized token representations. These are typically aggregated into fixed-size vectors via pooling operations for downstream tasks. While much of the literature has focused on attention mechanisms, the role of pooling remains underexplored despite its critical impact on model behavior. In this paper, we introduce a theoretical framework that rigorously characterizes the expressivity of Transformer-based models equipped with widely used pooling methods by deriving closed-form bounds on their representational capacity and the ability to distinguish similar inputs. Our analysis extends to different variations of attention formulations, demonstrating that these bounds hold across diverse architectural variants. We empirically evaluate pooling strategies across tasks requiring both global and local contextual understanding, spanning three major modalities: computer vision, natural language processing, and time-series analysis. Results reveal consistent trends in how pooling choices affect accuracy, sensitivity, and optimization behavior. Our findings unify theoretical and empirical perspectives, providing practical guidance for selecting or designing pooling mechanisms suited to specific tasks. This work positions pooling as a key architectural component in Transformer models and lays the foundation for more principled model design beyond attention alone.
-
Graph Neural Networks (GNNs) have achieved strong performance across a range of graph representation learning tasks, yet their adversarial robustness in graph classification remains underexplored compared to node classification. While most existing defenses focus on the message-passing component, this work investigates the overlooked role of pooling operations in shaping robustness. We present a theoretical analysis of standard flat pooling methods (sum, average and max), deriving upper bounds on their adversarial risk and identifying their vulnerabilities under different attack scenarios and graph structures. Motivated by these insights, we propose \textit{Robust Singular Pooling (RS-Pool)}, a novel pooling strategy that leverages the dominant singular vector of the node embedding matrix to construct a robust graph-level representation. We theoretically investigate the robustness of RS-Pool and interpret the resulting bound leading to improved understanding of our proposed pooling operator. While our analysis centers on Graph Convolutional Networks (GCNs), RS-Pool is model-agnostic and can be implemented efficiently via power iteration. Empirical results on real-world benchmarks show that RS-Pool provides better robustness than the considered pooling methods when subject to state-of-the-art adversarial attacks while maintaining competitive clean accuracy.
-
Graphs are ubiquitous in real-world applications, ranging from social networks to biological systems, and have inspired the development of Graph Neural Networks (GNNs) for learning expressive representations. While most research has centered on static graphs, many real-world scenarios involve dynamic, temporally evolving graphs, motivating the need for Continuous-Time Dynamic Graph (CTDG) models. This paper provides a comprehensive review of Graph Representation Learning (GRL) on CTDGs with a focus on Self-Supervised Representation Learning (SSRL). We introduce a novel theoretical framework that analyzes the expressivity of CTDG models through an Information-Flow (IF) lens, quantifying their ability to propagate and encode temporal and structural information. Leveraging this framework, we categorize existing CTDG methods based on their suitability for different graph types and application scenarios. Within the same scope, we examine the design of SSRL methods tailored to CTDGs, such as predictive and contrastive approaches, highlighting their potential to mitigate the reliance on labeled data. Empirical evaluations on synthetic and real-world datasets validate our theoretical insights, demonstrating the strengths and limitations of various methods across long-range, bi-partite and community-based graphs. This work offers both a theoretical foundation and practical guidance for selecting and developing CTDG models, advancing the understanding of GRL in dynamic settings.
-
Graph Neural Networks (GNNs) have demonstrated remarkable performance across a spectrum of graph-related tasks, however concerns persist regarding their vulnerability to adversarial perturbations. While prevailing defense strategies focus primarily on pre-processing techniques and adaptive message-passing schemes, this study delves into an under-explored dimension: the impact of weight initialization and associated hyper-parameters, such as training epochs, on a model’s robustness. We introduce a theoretical framework bridging the connection between initialization strategies and a network's resilience to adversarial perturbations. Our analysis reveals a direct relationship between initial weights, number of training epochs and the model’s vulnerability, offering new insights into adversarial robustness beyond conventional defense mechanisms. While our primary focus is on GNNs, we extend our theoretical framework, providing a general upper-bound applicable to Deep Neural Networks. Extensive experiments, spanning diverse models and real-world datasets subjected to various adversarial attacks, validate our findings. We illustrate that selecting appropriate initialization not only ensures performance on clean datasets but also enhances model robustness against adversarial perturbations, with observed gaps of up to 50% compared to alternative initialization approaches.
-
Self-supervised learning has seen great success recently in unsupervised representation learning, enabling breakthroughs in natural language and image processing. However, these methods often rely on autoregressive and masked modeling, which aim to reproduce masked information in the input, which can be vulnerable to the presence of noise or confounding variables. To address this problem, Joint-Embedding Predictive Architectures (JEPA) has been introduced with the aim to perform self-supervised learning in the latent space. To leverage these advancements in the domain of time series, we introduce Time Series JEPA (TS-JEPA), an architecture specifically adapted for time series representation learning. We validate TS-JEPA on both classification and forecasting, showing that it can match or surpass current state-of-the-art baselines on different standard datasets. Notably, our approach demonstrates a strong performance balance across diverse tasks, indicating its potential as a robust foundation for learning general representations. Thus, this work lays the groundwork for developing future time series foundation models based on Joint Embedding.
-
Graph Neural Networks (GNNs) have demonstrated state-of-the-art performance in various graph representation learning tasks. Recently, studies revealed their vulnerability to adversarial attacks. In this work, we theoretically define the concept of expected robustness in the context of attributed graphs and relate it to the classical definition of adversarial robustness in the graph representation learning literature. Our definition allows us to derive an upper bound of the expected robustness of Graph Convolutional Networks (GCNs) and Graph Isomorphism Networks subject to node feature attacks. Building on these findings, we connect the expected robustness of GNNs to the orthonormality of their weight matrices and consequently propose an attack-independent, more robust variant of the GCN, called the Graph Convolutional Orthonormal Robust Networks (GCORNs). We further introduce a probabilistic method to estimate the expected robustness, which allows us to evaluate the effectiveness of GCORN on several real-world datasets. Experimental experiments showed that GCORN outperforms available defense methods.
-
Graph Neural Networks (GNNs) have emerged as the dominant approach for machine learning on graph-structured data. However, concerns have arisen regarding the vulnerability of GNNs to small adversarial perturbations. Existing defense methods against such perturbations suffer from high time complexity and can negatively impact the model's performance on clean graphs. To address these challenges, this paper introduces NoisyGNNs, a novel defense method that incorporates noise into the underlying model's architecture. We establish a theoretical connection between noise injection and the enhancement of GNN robustness, highlighting the effectiveness of our approach. We further conduct extensive empirical evaluations on the node classification task to validate our theoretical findings, focusing on two popular GNNs: the GCN and GIN. The results demonstrate that NoisyGNN achieves superior or comparable defense performance to existing methods while minimizing added time complexity. The NoisyGNN approach is model-agnostic, allowing it to be integrated with different GNN architectures. Successful combinations of our NoisyGNN approach with existing defense techniques demonstrate even further improved adversarial defense results.
* denotes equal contribution.
Experience
AI/ML Researcher @ Microsoft (ABK – King AI Labs)
- Theoretical investigations and application of Transformer‑based models.
- Self‑Supervised representation learning on Continuous‑Time Dynamic Graphs (CTDG).
Research Intern @ Flatiron Institute (Simons Foundation)
- Polymathic AI initiative.
- Extended Joint‑Embedding Predictive Architectures (JEPA) for time‑series pre‑training.
Research Intern @ BNP Paribas (RISK AIR)
- Interpretability of ML/DL models at Risk AIR.
- Counterfactual explanations in black‑box settings.
Research Scholar @ University of Louisville
- ML‑based CT lung‑cancer detection.
- Built CAD pipeline achieving ≈94% (±0.6) accuracy on LUNA Challenge.
Academic Service
Talks
- From Bounds to Defenses: A Comprehensive Look at GNN Robustness — Metis Spring School.
- On the Effect of Initialization on Adversarial Robustness — LOG Conference Meetup - Sweden.
- Theoretically Upper-Bounding the Expected Adversarial Robustness of GNNs — Collective ML reading group. [Slides]
- Adversarial Robustness of GNNs — MoroccoAI webinar. [Slides | Recording]
Academic Reviewing
- NeurIPS (2025, 2024), ICLR (2025), KDD (2025), Learning on Graphs (2024), TMLR.