Monday, April 1, 2024

[DMANET] Call for Participation SISAP Indexing Challenge 2024

SISAP Indexing Challenge 2024
=============================
https://sisap-challenges.github.io/
https://www.sisap.org/2024/
=============================

The SISAP Indexing Challenge 2024 invites researchers and practitioners
to participate in three exciting tasks to
advance the state of the art in similarity search and indexing. The
challenge provides a platform to showcase
innovative solutions and push the boundaries of efficiency and
effectiveness in large-scale similarity search indexes.
This year, we are opening three challenging tasks.


=== Task 1: Unrestricted Indexing ===
In this task, system solutions will have access to all resources on our
testing computer to build their indexing solutions.
The goal is to achieve the highest search performance within the given
constraints.

* Wall clock time for index construction: 12 hours.

* Ranking: Highest throughput/fastest search time with an average recall
of at least 0.8.

* Search performance will be evaluated using a built index (in a single
configuration) and various query executions
using up to 30 different search hyperparameters.

* Saving the index is not required for running the search.


=== Task 2: Memory-Constrained Indexing with Reranking ===
This task challenges participants to develop memory-efficient indexing
solutions with reranking capabilities. Each solution will be run in a
Linux container with limited memory and storage resources.

* Container specifications: 8 virtual CPUs, RAM = 32 GB, the dataset
will be mounted read-only into the container.

* Wall clock time for index construction: 24 hours.

* Minimum recall to be considered in the final ranking: 0.8.

* Search performance will be evaluated using 30 different search
hyperparameters.


=== Task 3: Memory-Constrained Indexing without Reranking ===
In this task, participants are asked to develop memory-efficient
indexing solutions that will be used without reranking the search
results. The container provided will have higher memory capacity
compared to Task 2. Participants have to build an index in the first
phase. In the search phase, the original vectors cannot be used.

* Container specifications: RAM = 64 GB.

* Wall clock time for index construction: 12 hours.

* Minimum recall to be considered in the final ranking: 0.4

* Search performance will be evaluated using 30 different search
hyperparameters


== Test Data and Queries ==
* Approximately 100 million CLIP descriptors extracted from the LAION
database.

* Similarity between two objects is measured by their dot product.

* The goal is to evaluate 30 nearest neighbors for a large set of query
objects, as follows:

* Public queries for the development stage: The private queries from the
previous year's challenge will be used as the public query set (10k
queries).

* Private queries for the final evaluation stage: A set of new queries
will be used for the final evaluation.


The evaluation will be carried out on a machine with the following
specifications:

* 2x Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz, 28 cores in total

* 512 GB RAM (DDR4, 2400 MT/s)

* 1 TB SSD

== Registration and Participation ==

1. Register for the challenge by opening a "Pre-registration request"
issue in the GitHub repository
https://github.com/sisap-challenges/challenge2024/. Fill out the
required data, taking into account that the given data will be used to
keep in contact while the challenge remains open.


2. During the development phase, participants will have access to 300K,
10M, and 100M datasets, the public query set, and their public gold
standards.


3. Teams are required to provide public GitHub repositories with working
GitHub Actions and clear instructions on how to run their solutions with
the correct hyperparameters for each task. Submissions are required to
run in docker containers. Please visit the challenge site for examples.


4. The full 100M dataset and a private query set will be used for the
testing phase. Participants' repositories will be cloned and tested on
our best available machine at the time of the challenge. Results will be
shared with the authors for verification and potential fixes before the
final rankings are published.

== Paper Submissions ==
All participants will be considered for paper submissions. We aim to
accommodate all accepted papers within the conference program. Papers
should be short, focusing on the presentation and a poster.

We look forward to your participation and innovative solutions in the
SISAP Indexing Challenge 2024! Let's push the frontiers of similarity
search and indexing together.

== Final comments ==
Any transformation to the dataset to load, index, and solve k=30 nearest
neighbor queries is allowed. Transformations include but are not limited
to, packing into different data types, dimensional reduction,
locality-sensitive hashing, product quantization, or transforming into
binary sketches.

Reproducibility and open science are primary goals of the challenge, so
we accept only public Github repositories with working Github Actions as
submissions. Indexing algorithms could be already published or original
contributions.

You can find more detailed information, data access, and registration at
the SISAP Indexing Challenge website
https://sisap-challenges.github.io/2024/


== Important Dates ==

* Mar. 18th, 2024: Call for participation published, expression of
interest opened.

* Apr. 20th, 2024: Expression of interest closes.

* Aug. 2nd, 2024: Submission of proposed implementations deadline (AoE).

* Aug. 9th, 2024: Short paper deadline (AoE).

* Aug. 16th, 2024: Paper notification.

* Aug. 28th, 2024: Publication of final rankings.

* Sept. 1st, 2024: Camera-ready.

* Nov. 6-8 2024, 2024: SISAP 2024 Providence (Rhode Island), USA, with a
special
   session for the challenge.

== SISAP Indexing Challenge Chairs ==

- Edgar L. Chavez, CICESE, México

- Eric S. Téllez, INFOTEC-CONAHCyT, México

- Martin Aumüller, ITU Copenhagen, Denmark

- Vladimír Míč, Aarhus University, Denmark

**********************************************************
*
* Contributions to be spread via DMANET are submitted to
*
* DMANET@zpr.uni-koeln.de
*
* Replies to a message carried on DMANET should NOT be
* addressed to DMANET but to the original sender. The
* original sender, however, is invited to prepare an
* update of the replies received and to communicate it
* via DMANET.
*
* DISCRETE MATHEMATICS AND ALGORITHMS NETWORK (DMANET)
* http://www.zaik.uni-koeln.de/AFS/publications/dmanet/
*
**********************************************************