Université Grenoble Alpes

Titre : Analysis of the health content of a corpus of tweets unsing the signature method

Sujet proposé dans : M2 MOSIG, Projet — M2 MSIAM, Projet

Responsable(s) :

Mots-clés : Automatic theme extraction, Latent Dirichlet Allocation, Health Tweets
Durée du projet : 5 mois
Nombre maximal d’étudiants : 1
Places disponibles : 1
Interrogation effectuée le : 20 janvier 2020, à 11 heures 01


In partnership with Laboratoire d’Informatique de Grenoble, we have collected tweets for three years. Our goal is to understand the different factors involved in some ailments as well as the links between these ailments. In a preliminary work [3], we developed two probabilistic models TM-ATAM and T-ATAM extending Latent Dirichelet Allocation allowing us to summarize the health content of a corpus of tweets and taking into account time.

The output of the method is a vector valued time series that we analyzed using statistical tools. Notably, we detected change points in the health content of our corpus providing a relevant way to detect transitions in the environemental context (for e.g. seasons). We aim at combining this model and recent tools coming from rougths paths theory [1,2] to give new insights on the two models TM-ATAM and T-ATAM.

In particular, we aim at identifying causality relations between ailments as well as use the skew symmetric nature of order 2 signature to cluster the data. The internship will be divided into two parts : understanding of TM-ATAM/T-ATAM and signature method, and thereafter application on our real data.

Contacts: Massih-Reza Amini (Massih-Reza.Amini@imag.fr), Antoine Lejay (antoine.lejay@inria.f),  Marianne Clausel (marianne.clausel@univ-lorraine.fr).


[1] S.E. Avev Using Asymmetry in the Spectral Clustering of Trajectories, PhD Thesis (2011

[2] I. Chevyrev and A.Kormilitzin, A., 2016. A primer on the signature method in machine learning. arXiv preprint arXiv:1603.03788.

3] S. Sidana, S. Amer-Yahia, M. Clausel, M. Rebai, M.R. Amini, S. T. Mai, Health Monitoring on Social Media over Time. IEEE Transactions on Knowledge and Data Engineering, vol 30(8),pp 1467-1480 (2018).