Monday, May 3, 2021

[DMANET] IGAFIT Algorithmic Colloquium - 2pm CEST, May 6: When is Memorization of Irrelevant Training Data Necessary for High-Accuracy Learning? by Adam Smith

Dear all,

we would like to invite you to the IGAFIT Algorithmic Colloquium this
Thursday:

May 6, 2021, 14:00 CEST
Adam Smith, Boston University
Title: When is Memorization of Irrelevant Training Data Necessary for
High-Accuracy Learning?
Abstract: Modern machine learning models are complex, and frequently
encode surprising amounts of information about individual inputs. In
extreme cases, complex models appear to memorize entire input examples,
including seemingly irrelevant information (social security numbers from
text, for example). In this paper, we aim to understand whether this sort
of memorization is necessary for accurate learning.
We describe natural prediction problems in which every sufficiently
accurate training algorithm must encode, in the prediction model,
essentially all the information about a large subset of its training
examples. This remains true even when the examples are high-dimensional and
have entropy much higher than the sample size, and even when most of that
information is ultimately irrelevant to the task at hand. Further, our
results do not depend on the training algorithm or the class of models used
for learning.

Our problems are simple and fairly natural variants of the next-symbol
prediction and the cluster labeling tasks. These tasks can be seen as
abstractions of image- and text-related prediction problems. To establish
our results, we reduce from a family of one-way communication problems for
which we prove new information complexity lower bounds.

Time permitting, I'll lay out some open problems and directions for future
work.

Based on joint work with Mark Bun, Gavin Brown, Vitaly Feldman, and Kunal
Talwar, to appear at STOC 2021 ( https://arxiv.org/abs/2012.06421 ).



We would like to invite you to take part in the event including the
networking session. This new event aims to integrate the European
algorithmic community and keep it connected during the times of the
pandemic. The meeting will be held using the Airmeet platform and to attend
please go to https://www.airmeet.com/e/3ec6ef20-a8af-11eb-bd30-339c4f25cb79.
You can find more information about the talk on IGAFIT web page:
http://igafit.mimuw.edu.pl/?page_id=483786.

In order to access the Airmeet platform, you will need to register, so that
others know who you are. Then before and after the talk, you will be
located inside a Social Lounge, where discussion tables are visible. Most
of the tables can have up to 8 chairs, and you can join a table by clicking
the "Take a Seat" button on a table. This way you will be able to talk to
people seated currently at the table. When the session starts you will be
automatically switched to see the stage. You can find more instructions on
how to use Airmeet here:
https://www.airmeet.com/hub/step-by-step-guide-use-airmeet-for-attendees/.

We invite you to take part in the event. Please advertise broadly and bring
your students/postdocs as well.

Organization Committee:
Nikhil Bansal
Artur Czumaj
Andreas Feldmann
Danupon Nanongkai
Adi Rosen
Eva Rotenberg
Piotr Sankowski
Christian Sohler

**********************************************************
*
* Contributions to be spread via DMANET are submitted to
*
* DMANET@zpr.uni-koeln.de
*
* Replies to a message carried on DMANET should NOT be
* addressed to DMANET but to the original sender. The
* original sender, however, is invited to prepare an
* update of the replies received and to communicate it
* via DMANET.
*
* DISCRETE MATHEMATICS AND ALGORITHMS NETWORK (DMANET)
* http://www.zaik.uni-koeln.de/AFS/publications/dmanet/
*
**********************************************************