Machine Learning Approaches for the Prediction of Credit Risk
G. Ausset

Predicting the possible occurrence of a future event, which may eventually never happen, is a fundamental problem that naturally occurs in most scientific as well as industrial fields. This problem, commonly referred to as survival analysis after its canonical application in epidemiology, has long been one of the classical problems in statistics whose exceptional contributions have enabled immeasurable advancements in the natural sciences. More recently, through advancements in the field of machine learning, those same natural scientific fields and industrial applications have also been able to achieve significant leap forwards by exploiting large amounts of high-dimensional data using highly flexible estimators. In this thesis we try to reconcile both approaches and show how to best make use of the highly flexible machine learning approaches in the survival analysis setting in a principled and motivated way. We show in this work how the classical ERM framework can be adapted to the survival analysis setting by introducing a reweighted objective called the Kaplan-Meier ERM and derive non-asymptotic error bounds without parametric assumptions on the true generating process, effectively bringing the results one has come to expect in the machine learning field to survival analysis. We also show how to construct highly flexible estimators of the survival function, one of the key building blocks of our Kaplan-Meier ERM framework. We formulate the survival as a normalizing flow problem and introduce a novel conditional normalizing flow estimator of the survival density, giving a tractable, easy to sample from, but highly expressive estimator of the survival density. In order to reduce the complexity of the two previous approaches, we introduce an estimator of the gradient of a black box function and show how to use it for variable selection, a simple yet highly effective method for dimensionality reduction. Finally, we apply the methods developed here to a particular instance of the survival problem: predicting the defaults of companies. We show how to use estimators of the probability of default to build optimal portfolios as well as how to efficiently make use of small data through hierarchical methods.

@online{aussetMethodesApprentissageStatistique2021,
  title = {Méthodes d'apprentissage Statistique Pour l'analyse Prédictive Du Risque de Crédit},
  author = {Ausset, Guillaume},
  date = {2021},
  url = {http://www.theses.fr/s197391},
  abstract = {L'objectif de la thèse est de développer des techniques d'apprentissage de modèles prédictifs du risque de crédit permettant d'exploiter de nouvelles sources de données afin de définir une approche Point In Time pour une estimation plus performante des risques afférents. Ces sources originales de données incluent par exemple les messages Bloomberg ou Twitter, dont le caractère éventuellement prédictif sera exploré à travers le projet. Des statistiques sur le volume de news Bloomberg citant une société existent déjà, mais ne proposent pas de traitement qualitatif du message (positif ou négatif). Aussi les méthodes développées s'adapteront à la structure particulière des données, i.e., caractère séquentiel et multitâche, et grande dimension de l'espace des features.  Le traitement de nouvelles sources de données par des méthodes appropriées permettra de créer de nouveaux signaux Point-In-Time pour élaborer des stratégies de hedging ou des stratégies d'investissement dans le marché de crédit et ceci en complément des ratings.},
  organization = {{http://www.theses.fr}}
}
		
Individual Survival Curves with Conditional Normalizing Flows
G. Ausset, S. Clémençon, F. Portier, Timothée Papin, Tom Ciffreo

Survival analysis, or time-to-event modelling, is a classical statistical problem that has garnered a lot of interest for its practical use in epidemiology, demographics or actuarial sciences. Recent advances on the subject from the point of view of machine learning have been concerned with precise per-individual predictions instead of population studies, driven by the rise of individualized medicine. We introduce here a conditional normalizing flow based estimate of the time-to-event density as a way to model highly flexible and individualized conditional survival distributions. We use a novel hierarchical formulation of normalizing flows to enable efficient fitting of flexible conditional distributions without overfitting and show how the normalizing flow formulation can be efficiently adapted to the censored setting. We experimentally validate the proposed approach on a synthetic dataset as well as four open medical datasets and an example of a common financial problem.

@inproceedings{aussetIndividualSurvivalCurves2021,
  title = {Individual {{Survival Curves}} with {{Conditional Normalizing Flows}}},
  booktitle = {{{DSAA}}'21},
  author = {Ausset, Guillaume and Ciffreo, Tom and Clémençon, Stéphan and Portier, François and Papin, Timothée},
  date = {2021},
  url = {https://arxiv.org/abs/2107.12825},
  archiveprefix = {arXiv},
  eprint = {2107.12825},
  eprinttype = {arxiv},
  eventtitle = {{{IEEE International Conference}} on {{Data Science}} and {{Advanced Analytics}}}
}
		
Nearest neighbour based estimates of gradients: Sharp nonasymptotic bounds and applications
G. Ausset, S. Clémençon, F. Portier

Motivated by a wide variety of applications, ranging from stochastic optimization to dimension reduction through variable selection, the problem of estimating gradients accurately is of crucial importance in statistics and learning theory. We consider here the classical regression setup, where a real valued square integrable r.v. Y is to be predicted upon observing a (possibly high dimensional) random vector X by means of a predictive function f(X) as accurately as possible in the mean squared sense and study a nearest-neighbour based pointwise estimate of the gradient of the optimal predictive function, the regression function m(x) = E[Y | X = x]. Under classical smoothness conditions combined with the assumption that the tails of Y − m(X) are sub-Gaussian, we prove nonasymptotic bounds improving upon those obtained for alternative estimation methods. Beyond the novel theoretical results established, several illustrative numerical experiments have been carried out. The latter provide strong empirical evidence that the estimation method proposed here performs very well for various statistical problems involving gradient estimation, namely dimensionality reduction, stochastic gradient descent optimization and disentanglement quantification.

@online{aussetEmpiricalRiskMinimization2019,
  title = {Empirical {{Risk Minimization}} under {{Random Censorship}}: {{Theory}} and {{Practice}}},
  shorttitle = {Empirical {{Risk Minimization}} under {{Random Censorship}}},
  author = {Ausset, Guillaume and Cl\'emen\c{c}on, St\'ephan and Portier, Fran\c{c}ois},
  date = {2019-06-05},
  url = {http://arxiv.org/abs/1906.01908},
  archiveprefix = {arXiv},
  eprint = {1906.01908},
  eprinttype = {arxiv},
  primaryclass = {cs, math, stat}
}
Empirical Risk Minimization under Random Censorship: Theory and Practice
G. Ausset, S. Clémençon, F. Portier

We consider the classic supervised learning problem, where a continuous non-negative random label Y (i.e. a random duration) is to be predicted based upon observing a random vector X valued in Rd with d ≥ 1 by means of a regression rule with minimum least square error. In various applications, ranging from industrial quality control to public health through credit risk analysis for instance, training observations can be right censored, meaning that, rather than on independent copies of (X, Y), statistical learning relies on a collection of n ≥ 1 independent realizations of the triplet (X, min{Y, C}, δ), where C is a nonnegative r.v. with unknown distribution, modeling censorship and δ = I{Y ≤ C} indicates whether the duration is right censored or not. As ignoring censorship in the risk computation may clearly lead to a severe underestimation of the target duration and jeopardize prediction, we propose to consider a plug-in estimate of the true risk based on a Kaplan-Meier estimator of the conditional survival function of the censorship C given X, referred to as Kaplan-Meier risk, in order to perform empirical risk minimization. It is established, under mild conditions, that the learning rate of minimizers of this biased/weighted empirical risk functional is of order O(√(log(n)/n) when ignoring model bias issues inherent to plug-in estimation, as can be attained in absence of censorship. Beyond theoretical results, numerical experiments are presented in order to illustrate the relevance of the approach developed.

@inproceedings{aussetNearestNeighbourBased2021,
  title = {Nearest Neighbour Based Estimates of Gradients: {{Sharp}} Nonasymptotic Bounds and Applications},
  booktitle = {Proceedings of the 24th International Conference on Artificial Intelligence and Statistics},
  author = {Ausset, Guillaume and Cl\'emencon, Stephan and Portier, Fran\c{c}ois},
  editor = {Banerjee, Arindam and Fukumizu, Kenji},
  date = {2021-04-13/2021-04-15},
  volume = {130},
  pages = {532--540},
  publisher = {{PMLR}},
  url = {http://proceedings.mlr.press/v130/ausset21a.html},
  series = {Proceedings of Machine Learning Research}
}