The Sample Complexity of Multi-Reference Alignment - Philippe Rigollet, (MIT Mathematics, USA)
llet\, (MIT Mathematics\, USA)
DESCRIPTION:How should one estimate a signal\, given only access to noisy
versions of the signal corrupted by unknown cyclic shifts? This simple pro
blem has surprisingly broad applications\, in fields from aircraft radar i
maging to structural biology with the ultimate goal of understanding the s
ample complexity of Cryo-EM. We describe how this model can be viewed as a
multivariate Gaussian mixture model whose centers belong to an orbit of a
group of orthogonal transformations. This enables us to derive matching l
ower and upper bounds for the optimal rate of statistical estimation for t
he underlying signal. These bounds show a striking dependence on the signa
l-to-noise ratio of the problem. We also show how a tensor based method of
moments can solve the problem efficiently. Based on joint work with Afon
so Bandeira (NYU)\, Amelia Perry (MIT)\, Amit Singer (Princeton) and Jonat
han Weed (MIT).\nSpeakers:\nPhilippe Rigollet\, (MIT Mathematics\, USA)
Patterns and surprises in rich but noisy network data - Mark Newman (University of Michigan, USA)
n (University of Michigan\, USA)
tbc
Speakers:
Mark Newman (University of Michigan, USA)
SUMMARY:Algorithms and Algorithmic Obstacles in High-Dimensional Regressio
n - David Gamarnik (MIT Sloan School of Management\, USA)
DESCRIPTION:Many optimization problems arising in studying of random struc
tures exhibit an apparent gap between the optimal values which can be esti
mated by non-constructive means\, and the best values achievable by fast (
polynomial time) algorithms. Through a combined effort of mathematicians\,
computer scientists and statistical physicists\, it became apparent that
a potential and in some cases a provable obstruction for designing algorit
hms bridging this gap is a phase transition in the geometry of nearly opti
mal solutions\, in particular the presence of a certain Overlap Gap Proper
ty (OGP).\n\nIn this talk we discuss this property in the context of spars
e high dimensional linear regression problem with Gaussian design. We show
that\, on the one hand\, in the sampling regime where the known fast meth
ods for this problem are effective\, the space of solutions exhibits a mon
otonicity with respect to the proximity to the ground truth regression vec
tor and no local optimums exist apart from the ground truth. On the other
hand\, once the sampling number is asymptotically in the regime where the
known methods fail\, we show that the monotonicity is lost\, and the model
exhibits an OGP. In the context of the regression problem this means ever
y solution exhibiting a small mean squared error is either fairly close to
the ground truth or is very far from it\, with no middle ground.\nSpeaker
s:\nDavid Gamarnik (MIT Sloan School of Management\, USA)
SUMMARY: (Why) do functional sites induce long-range evolutionary constrai
nts in enzymes? - Julian Echave (Universidad Nacional de San Martín\, Bue
nos Aires\, Argentina)
DESCRIPTION:Protein evolution can be viewed as a repeated mutation-fixatio
n process. At each step\, one amino acid is randomly mutated. The mutant w
ill eventually be either discarded or fixed\, replacing the parent. The fi
xation probability varies from site to site\, thus different sites evolve
at different rates. This variation of rates among sites is due to thermody
namic and/or functional constraints.\n\nAlmost all biophysical models of p
rotein evolution developed so far consider only selection for stability\,
disregarding functional constraints\, which were thought to affect only a
handful of residues: the active site and its immediate neighbours. However
\, a recent study showed that site-specific evolutionary rates increase ra
ther smoothly with increasing distance to the protein’s active residues.
Such long-range rate-distance dependence cannot be explained with current
stability-based models and is at odds with the localized fast exponential
decrease of coupling strength one would expect on physics grounds.\n\nTo
understand whether and why functional constraints have long-range effects\
, we need new models of protein evolution that go beyond selection for sta
bility and consider protein activity explicitly. Here\, I will describe a
model of protein evolution that considers explicitly both stability and ac
tivity constraints and I will discuss its predictions. The stability-activ
ity model predicts that fitness cost is localized (short-range) yet rate v
ariation is delocalized (long-range)\; short-range fitness effects are con
sistent with long-range rate effects. Yet\, such long-range coupling is no
t universal but range varies among proteins. Such range variation among pr
oteins does not depend on intrinsic protein properties but on external fun
ctional selection pressure (e.g. the role of an enzyme in the metabolic ne
twork).\nSpeakers:\nJulian Echave (Universidad Nacional de San Martín\, B
uenos Aires\, Argentina)
Adaptive FAB confidence intervals with constant coverage
DESCRIPTION:Confidence intervals for the means of multiple normal populati
ons are often based on a hierarchical normal model. While commonly used i
nterval procedures based on such a model have the nominal coverage rate on
average across a population of groups\, their actual coverage rate for a
given group will be above or below the nominal rate\, depending on the val
ue of the group mean.\n\nIn this talk I present confidence interval proced
ures that have constant frequentist coverage rates and that make use of in
formation about across-group heterogeneity\, resulting in constant-coverag
e intervals that are narrower than standard t-intervals on average across
groups.\nThese intervals are obtained by inverting Bayes-optimal frequenti
st tests\, and so are "frequentist\, assisted by Bayes" (FAB). I present s
ome asymptotic optimality results and some extensions to other scenarios\,
such as linear regression and tensor analysis.\n
A BPHZ theorem for stochastic PDEs - Professor Martin Hairer (University of Warwick)
ersity of Warwick)
DESCRIPTION:A classical result obtained in the 50's and 60's by Bogoliubov
\, Parasiuk\, Hepp and Zimmerman provides a prescription on how to renorma
lise amplitudes of Feynman diagrams arising in perturbative quantum field
theory in a consistent way. We will discuss an analogue of this theorem wh
ich has both an analytic and a probabilistic interpretation. In particular
\, we will see that it implies that the solutions to a large class of nonl
inear stochastic PDEs depend on their driving noise in a surprisingly rigi
d way. This rigidity is a mathematical manifestation of the "universality"
taken for granted when building our intuition on the large-scale behaviou
r of probabilistic models.\n\nSpeakers:\nProfessor Martin Hairer (Universi
ty of Warwick)
Machine learning for patient stratification from genomic data - Professor Jean-Philippe Vert (Mines ParisTech)
ofessor Jean-Philippe Vert (Mines ParisTech)
DESCRIPTION:As the cost and throughput of genomic technologies reach a poi
nt where DNA sequencing is close to becoming a routine exam at the clinics
\, there is a lot of hope that treatments of diseases like cancer can dram
atically improve by a digital revolution in medicine\, where smart algorit
hms analyze « big medical data » to help doctors take the best decision
s for each patient. The application of machine learning-based techniques t
o genomic data raises however numerous computational and mathematical chal
lenges that I will illustrate on a few examples of cancer patient stratifi
cation from gene expression or somatic mutation profiles.\n\nSpeakers:\nPr
ofessor Jean-Philippe Vert (Mines ParisTech)
Random cracks in space. - Professor Wendelin Werner (ETH Zurich)
DESCRIPTION:We will describe in non-technical terms some old and new ideas
about what basic natural random objects and fields one can define in a gi
ven space with some geometric structure\, and what one can do with them. T
his will probably include various joint recent and ongoing work with Jason
Miller\, Scott Sheffield\, Qian Wei and Titus Lupu.\nSpeakers:\nProfessor
Wendelin Werner (ETH Zurich)
SUMMARY:Network-based Statistical Models and Methods for Identification of
Cellular Mechanisms of Action - Professor Eric Kolaczyk (Boston Universit
y)
DESCRIPTION:Identifying biological mechanisms of action (e.g. genes\, func
tional elements\, or biological pathways) that control disease states\, dr
ug response\, and altered cellular function is a multifaceted problem invo
lving a dynamic system of biological variables that culminate in an altere
d cellular state. The challenge is in deciphering the factors that play ke
y roles in determining the cell's fate. In this talk I will present an ove
rview of various efforts by our group to develop statistical models and me
thods for identification of cellular mechanisms of action. Common to all
of our approaches is the use of certain perturbed Gaussian graphical model
s\, which allows us to formulate the identification problem as a network-b
ased statistical inverse problem. Illustrations will be given in the cont
ext of yeast experiments and human cancer.\nSpeakers:\nProfessor Eric Kola
czyk (Boston University)
"Come join the multiple testing party!" - Professor Matthew Stephens (University of Chicago)
ns (University of Chicago)
DESCRIPTION:Multiple testing is often described as a "burden". My goal is
to convince you that multiple testing is better viewed as an opportunity\,
and that instead of laboring under this burden you should be looking for
ways to exploit this opportunity. I invite you to a multiple testing party
.\n\nSpeakers:\nProfessor Matthew Stephens (University of Chicago)
SUMMARY:How to estimate the mean of a random variable? - Professor Gabor L
ugosi (Department of Economics and Business\, Universitat Pompeu Fabra\, B
arcelona)
DESCRIPTION:Given n independent\, identically distributed copies of a rand
om variable\, one is interested in estimating the expected value. Perhaps
surprisingly\, there are still open questions concerning this very basic
problem in statistics.\n\nIn this talk we are primarily interested in non-
asymptotic sub-Gaussian estimates for potentially heavy-tailed random vari
ables. We discuss various estimates and extensions to high dimensions\, em
pirical risk minimization\, and multivariate problems. This talk is based
on joint work with Emilien Joly\, Luc Devroye\, Matthieu Lerasle\, and Rob
erto Imbuzeiro Oliveira.\nSpeakers:\nProfessor Gabor Lugosi (Department of
Economics and Business\, Universitat Pompeu Fabra\, Barcelona)
SUMMARY:The Statistical Crisis in Science - Professor Andrew Gelman (Depar
tment of Statistics and Department of Political Science\, Columbia Univers
ity\, New York)
DESCRIPTION:Top journals in psychology routinely publish ridiculous\, scie
ntifically implausible claims\, justified based on “p < 0.05.” And th
is in turn calls into question all sorts of more plausible\, but not neces
sarily true\, claims\, that are supported by this same sort of evidence.
To put it another way: we can all laugh at studies of ESP\, or ovulation
and voting\, but what about MRI studies of political attitudes\, or embodi
ed cognition\, or stereotype threat\, or\, for that matter\, the latest po
tential cancer cure? If we can’t trust p-values\, does experimental sci
ence involving human variation just have to start over? And what do we do
in fields such as political science and economics\, where preregistered r
eplication can be difficult or impossible? Can Bayesian inference supply
a solution? Maybe. These are not easy problems\, but they’re important
problems.\nSpeakers:\nProfessor Andrew Gelman (Department of Statistics a
nd Department of Political Science\, Columbia University\, New York)
