Friday, June 29, 2012

[DMANET] Post-doc / Engineering position in algorithmic optimization

When determining the structure of a macromolecular complex, data from
different sources can be used to aid reconstruction [1] . Continuous
data, such as those collected in small-angle X-ray scattering or
electron microscopy experiments, is a challenge to treat rigorously
because of the correlations in the observed values. Recent work has
shown that these data can be treated in a statistically rigorous way
using gaussian process interpolation [2].

We have implemented such an interpolation procedure in the open-source
Integrative Modelling Platform. It is written in C++, uses the
high-level algebra library Eigen, and interfaces with python. Benchmarks
have shown that the current implementation suffers from multiple
algorithmic limitations that translate into a big computational burden
for datasets larger than a thousand points.

The applicant must have excellent skills in object-based C++ programming
and design patterns. He or she must be aware of the general caveats of
matrix computations on floating-point units and their numeric stability.
Basic linear algebra knowledge is also recommended, with a bonus for
linear and nonlinear regression. Possible modifications of the code include

•Templating of all classes to avoid numerous virtual function calls

•Efficient use of sparse matrices

•Implementing various approximation algorithms for large datasets, such
as Subset of Regressors, Subset of Datapoints or Projected Process [3]

•Parallelization of the code to multiple CPUs or GPUs

•Refactoring to make the code reusable for mathematically similar
projects of the lab, such as the Self-Organizing Map, or its bayesian
counterpart, Gaussian Process Latent Variable Model

Requests for information and applications should be addressed to Yannick
Spill (yannick@pasteur.fr) and Michael Nilges (nilges@pasteur.fr). The
project duration is at least one year, and will be funded by an ERC grant.



[1] Wolfgang Rieping, Michael Habeck, and Michael Nilges. Inferential
structure determination. /Science/, 8:303–306, 2005.

[2] Yannick Spill, Seung Joong Kim, Dina Schneidman-Duhovny, Andrej
Sali, and Michael Nilges. Bayesian treatment of continuous data for
structure determination. In preparation, 2012.

[3] Carl Edward Rasmussen and Christopher K. I. Williams. /Gaussian
Processes for Machine Learning/. The MIT Press, 2006.

**********************************************************
*
* Contributions to be spread via DMANET are submitted to
*
* DMANET@zpr.uni-koeln.de
*
* Replies to a message carried on DMANET should NOT be
* addressed to DMANET but to the original sender. The
* original sender, however, is invited to prepare an
* update of the replies received and to communicate it
* via DMANET.
*
* DISCRETE MATHEMATICS AND ALGORITHMS NETWORK (DMANET)
* http://www.zaik.uni-koeln.de/AFS/publications/dmanet/
*
**********************************************************