Tuesday, February 7, 2023

[DMANET] [DEADLINE EXTENSION] ScaDL2023

Dear colleague,


apologize for multiple posting


ScaDL 2023: Scalable Deep Learning over Parallel And Distributed

Infrastructure - An IPDPS 2023 Workshop


https://2023.scadl.org<https://2023.scadl.org/>


Scope of the Workshop:

Recently, Deep Learning (DL) has received tremendous attention in the research

community because of the impressive results obtained for a large number of

machine learning problems. The success of state-of-the-art deep learning

systems relies on training deep neural networks over a massive amount of

training data, which typically requires a large-scale distributed computing

infrastructure to run. In order to run these jobs in a scalable and efficient

manner, on cloud infrastructure or dedicated HPC systems, several interesting

research topics have emerged which are specific to DL. The sheer size and

complexity of deep learning models when trained over a large amount of data

makes them harder to converge in a reasonable amount of time. It demands

advancement along multiple research directions such as, model/data

parallelism, model/data compression, distributed optimization algorithms for

DL convergence, synchronization strategies, efficient communication and

specific hardware acceleration.


SCADL seeks to advance the following research directions:

- Asynchronous and Communication-Efficient SGD: Stochastic gradient descent is

at the core of large-scale machine learning. Parallelizing SGD gradient

computation across multiple nodes increases the data processed per iteration,

but exposes the SGD to communication and synchronization delays and

unpredictable node failures in the system. Thus, there is a critical need to

design robust and scalable distributed SGD methods to achieve fast error-

convergence in spite of such system variabilities.

High performance computing aspects: Deep learning is highly compute intensive.

Algorithms for kernel computations on commonly used accelerators (e.g. GPUs),

efficient techniques for communicating gradients and loading data from storage

are critical for training performance.


- Model and Gradient Compression Techniques: Techniques such as reducing

weights and the size of weight tensors help in reducing the compute

complexity. Using lower-bit representations such as quantization and

sparsification allow for more optimal use of memory and communication

bandwidth.


- Distributed Trustworthy AI: New techniques are needed to meet the goal of

global trustworthiness (e.g., fairness and adversarial robustness) efficiently

in a distributed DL setting.


- Emerging AI hardware Accelerators: with the proliferation of new hardware

accelerators for AI such in memory computing (Analog AI) and neuromorphic

computing, novel methods and algorithms need to be introduced to adapt to the

underlying properties of the new hardware (example: the non-idealities of the

phase-change memory (PCM) and the cycle-to-cycle statistical variations).


- The intersection of Distributed DL and Neural Architecture Search (NAS): NAS

is increasingly being used to automate the synthesis of neural networks.

However, given the huge computational demands of NAS, distributed DL is

critical to make NAS computationally tractable (e.g., differentiable

distributed NAS).


This intersection of distributed/parallel computing and deep learning is

becoming critical and demands specific attention to address the above topics

which some of the broader forums may not be able to provide. The aim of this

workshop is to foster collaboration among researchers from distributed/

parallel computing and deep learning communities to share the relevant topics

as well as results of the current approaches lying at the intersection of

these areas.


Areas of Interest

In this workshop, we solicit research papers focused on distributed deep

learning aiming to achieve efficiency and scalability for deep learning jobs

over distributed and parallel systems. Papers focusing both on algorithms as

well as systems are welcome. We invite authors to submit papers on topics

including but not limited to:


- Deep learning on cloud platforms, HPC systems, and edge devices

- Model-parallel and data-parallel techniques

- Asynchronous SGD for Training DNNs

- Communication-Efficient Training of DNNs

- Scalable and distributed graph neural networks, Sampling techniques for

graph neural networks

- Federated deep learning, both horizontal and vertical, and its challenges

- Model/data/gradient compression

- Learning in Resource constrained environments

- Coding Techniques for Straggler Mitigation

- Elasticity for deep learning jobs/spot market enablement

- Hyper-parameter tuning for deep learning jobs

- Hardware Acceleration for Deep Learning including digital and analog

accelerators

- Scalability of deep learning jobs on large clusters

- Deep learning on heterogeneous infrastructure

- Efficient and Scalable Inference

- Data storage/access in shared networks for deep learning

- Communication-efficient distributed fair and adversarially robust learning

- Distributed learning techniques applied to speed up neural architecture

search



Workshop Format:

Due to the continuing impact of COVID-19, ScaDL 2023 will also adopt relevant

IPDPS 2023 policies on virtual participation and presentation. Consequently,

the organizers are currently planning a hybrid (in-person and virtual) event.


Submission Link:

Submissions will be managed through linklings. Submission link available at:

https://2023.scadl.org/call-for-papers


Key Dates

Paper Submission: February 14th, 2023 (EXTENDED)

Acceptance Notification: February 26th, 2023

Camera ready papers due: March 3th, 2023

Workshop Date: May 19th, 2023


Author Instructions

ScaDL 2023 accepts submissions in two categories:

- Regular papers: 8-10 pages

- Short papers/Work in progress: 4 pages

The aforementioned lengths include all technical content, references and

appendices.

We encourage submissions that are original research work, work in progress,

case studies, vision papers, and industrial experience papers.

Papers should be formatted using IEEE conference style, including figures,

tables, and references. The IEEE conference style templates for MS Word and

LaTeX provided by IEEE eXpress Conference Publishing are available for

download. See the latest versions at

https://www.ieee.org/conferences/publishing/templates.html


General Chairs

Kaoutar El Maghraoui, IBM Research AI, USA

Daniele Lezzi, Barcelona Supercomputing Center, Spain


Program Committee Chairs

Misbah Mubarak, NVIDIA, USA

Alex Gittens, Rensselaer Polytechnic Institute (RPI), USA


Publicity Chairs

Federica Filippini, Politecnico di Milano, Italy

Hadjer Benmeziane, Université Polytechnique des Hauts-de-France


Web Chair

Praveen Venkateswaran, IBM Research AI, USA


Steering Committee

Parijat Dube, IBM Research AI, USA

Vinod Muthusamy, IBM Research AI, USA

Ashish Verma, IBM Research AI, USA

Jayaram K. R., IBM Research AI, USA

Yogish Sabharwal, IBM Research AI, India

Danilo Ardagna, Politecnico di Milano, Italy

**********************************************************
*
* Contributions to be spread via DMANET are submitted to
*
* DMANET@zpr.uni-koeln.de
*
* Replies to a message carried on DMANET should NOT be
* addressed to DMANET but to the original sender. The
* original sender, however, is invited to prepare an
* update of the replies received and to communicate it
* via DMANET.
*
* DISCRETE MATHEMATICS AND ALGORITHMS NETWORK (DMANET)
* http://www.zaik.uni-koeln.de/AFS/publications/dmanet/
*
**********************************************************