Motivated by a wide variety of applications, ranging from stochastic optimization to dimension reduction through variable selection, the problem of estimating gradients accurately is of crucial importance in statistics and learning theory. We consider here the classical regression setup, where a real valued square integrable r.v. Y is to be predicted upon observing a (possibly high dimensional) random vector X by means of a predictive function f(X) as accurately as possible in the mean squared sense and study a nearest-neighbour based pointwise estimate of the gradient of the optimal predictive function, the regression function m(x) = E[Y | X = x]. Under classical smoothness conditions combined with the assumption that the tails of Y − m(X) are sub-Gaussian, we prove nonasymptotic bounds improving upon those obtained for alternative estimation methods. Beyond the novel theoretical results established, several illustrative numerical experiments have been carried out. The latter provide strong empirical evidence that the estimation method proposed here performs very well for various statistical problems involving gradient estimation, namely dimensionality reduction, stochastic gradient descent optimization and disentanglement quantification.

@online{aussetEmpiricalRiskMinimization2019, title = {Empirical {{Risk Minimization}} under {{Random Censorship}}: {{Theory}} and {{Practice}}}, shorttitle = {Empirical {{Risk Minimization}} under {{Random Censorship}}}, author = {Ausset, Guillaume and Cl\'emen\c{c}on, St\'ephan and Portier, Fran\c{c}ois}, date = {2019-06-05}, url = {http://arxiv.org/abs/1906.01908}, archiveprefix = {arXiv}, eprint = {1906.01908}, eprinttype = {arxiv}, primaryclass = {cs, math, stat} }

We consider the classic supervised learning problem, where a continuous non-negative random label Y (i.e. a random duration) is to be predicted based upon observing a random vector X valued in Rd with d ≥ 1 by means of a regression rule with minimum least square error. In various applications, ranging from industrial quality control to public health through credit risk analysis for instance, training observations can be right censored, meaning that, rather than on independent copies of (X, Y), statistical learning relies on a collection of n ≥ 1 independent realizations of the triplet (X, min{Y, C}, δ), where C is a nonnegative r.v. with unknown distribution, modeling censorship and δ = I{Y ≤ C} indicates whether the duration is right censored or not. As ignoring censorship in the risk computation may clearly lead to a severe underestimation of the target duration and jeopardize prediction, we propose to consider a plug-in estimate of the true risk based on a Kaplan-Meier estimator of the conditional survival function of the censorship C given X, referred to as Kaplan-Meier risk, in order to perform empirical risk minimization. It is established, under mild conditions, that the learning rate of minimizers of this biased/weighted empirical risk functional is of order O(√(log(n)/n) when ignoring model bias issues inherent to plug-in estimation, as can be attained in absence of censorship. Beyond theoretical results, numerical experiments are presented in order to illustrate the relevance of the approach developed.

@inproceedings{aussetNearestNeighbourBased2021, title = {Nearest Neighbour Based Estimates of Gradients: {{Sharp}} Nonasymptotic Bounds and Applications}, booktitle = {Proceedings of the 24th International Conference on Artificial Intelligence and Statistics}, author = {Ausset, Guillaume and Cl\'emencon, Stephan and Portier, Fran\c{c}ois}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, date = {2021-04-13/2021-04-15}, volume = {130}, pages = {532--540}, publisher = {{PMLR}}, url = {http://proceedings.mlr.press/v130/ausset21a.html}, series = {Proceedings of Machine Learning Research} }