AutoDL 2019


Following the success of previous AutoML challenges, we designed a new challenge: AutoDL, accepted to NeurIPS 2019. We target applications such as speech, image, video, and text, for which deep learning (DL) methods have had great success recently, to drive the community to work on automating the design of DL models. Raw data are provided, formatted in a uniform tensor manner, to encourage participants to submit generic algorithms. All problems are multi-label classification problems. We impose restrictions on training time and resources to push the state-of-the-art further. We provide a large number of pre-formatted public datasets and offer the possibility of formatting your own datasets in the same way.


To ramp up difficulty, we are running a series of milestone challenges of increasing difficulty, culminating with the NeurIPS challenge.

  1. AutoCV: Image Classification computer vision (CV) - CURRENTLY RUNNING [ENTER NOW]
  2. AutoCV2: Image and video Classification
  3. AutoNLP: Text classification
  4. AutoSeries: Text and time series classification
  5. AutoDL: Everything!

By everything, we mean image+video+test+time series+tabular data.

All challenges are with code submission and have prizes and opportunities to publish and/or present at conferences.

The first challenge AutoCV has already started on Codalab! Its results will be presented at the IJCNN conference.

Entering is very easy. We provide a starting kit containing everything you need to create your own code submission (just by modifying the file and to test it on your local computer, in conditions identical to those of the Codalab platform. This includes a jupyter notebook tutorial.ipynb with step-by-step instructions. The interface is simple and generic: you must supply a Python class with:

  • a constructor
  • a train method
  • a test method

To make submissions, zip, then use the "Upload a Submission" button. That's it!

The starting kit contains sample data, but you may want to develop your code with larger practice datasets, which we provide:

cd autodl_starting_kit_stable

Raw data are preserved, but formatted in a generic data format based on TFRecords, used by TensorFlow. However, this will not impose to participants to use deep learning algorithms nor even Tensorflow. If you want to practice designing algorithms with your own datasets, follow these steps.

Competition protocol

In each challenge your code will be blind tested on five datasets. The score you obtain (with your last submission) will be used for the final ranking. There is no other hidden leaderboard.

Code submitted is trained and tested automatically, without any human intervention. Code submitted is run on all five datasets in parallel on separate compute workers, each one with its own time budget.

The identities of the datasets used for blind testing on the platform are concealed. The data are provided in a raw form (no feature extraction) to encourage researchers to use Deep Learning methods performing automatic feature learning, although this is NOT a requirement. All problems are multi-label classification problems. The tasks are constrained by a time budget.


The participants can train in batches of pre-defined duration to incrementally improve their performance, until the time limit is attained. In this way we can plot learning curves: "performance" as a function of time. Each time the "train" method terminates, the "test" method is called and the results are saved, so the scoring program can use them, together with their time stamp.

We treat both multi-class and multi-label problems alike. Each label/class is considered a separate binary classification problem. We measure performance with the average over all labels of

balanced_accuracy = (1/2) (TPR + TNR).

For each dataset, we compute the area under the learning curve (by the trapeze method), i.e. the area of mean_balanced_accuracy as a function of log(1+time), where "time" is the cumulative time in seconds of training and testing. The overall ranking is made by averaging the ranks obtained on the 5 datasets.

Examples of learning curves:


We anticipate the following timeline (subject to change):

Top ranking participants will be invited to submit papers to a special issue of the IEEE transaction journal PAMI on Automated Machine Learning and will be entered in a contest for the best paper. Deadline November 30, 2019. There will be 2 best paper awards ("best paper" and "best student paper").

About AutoDL

Machine Learning and in particular Deep Learning has achieved considerable successes in recent years and an ever-growing number of disciplines rely on it. However, this success crucially relies on human intervention in many steps (data pre-processing, feature engineering, model selection, hyper-parameter optimization, etc.). As the complexity of these tasks is often beyond non-experts, the rapid growth of machine learning applications has created a demand for off-the-shelf or reusable methods, which can be used easily and without expert knowledge. The objective of AutoML (Automatic Machine Learning) challenges is to provide "universal learning machines" (deep learning or others), which can learn and make predictions without human intervention (blind testing).

The overall problem covers a wide range of difficulties, which cannot be addressed all at once in a single challenge. To name only a few: data ``ingestion" and formatting, pre-processing and feature/representation learning, detection and handling of skewed/biased data, inhomogeneous, drifting, multimodal, or multi-view data (hinging on transfer learning), matching algorithms to problems (which may include supervised, unsupervised, or reinforcement learning, or other settings), acquisition of new data (active learning, query learning, reinforcement learning, causal experimentation), management of large volumes of data including the creation of appropriately sized and stratified training, validation, and test sets, selection of algorithms that satisfy arbitrary resource constraints at training and run time, the ability to generate and reuse workflows, and generating explicative reports. Therefore, scoping the focus of a particular challenge is of great importance to ensure that the field progresses swiftly and intermediate milestone of immediate practical interest are reached.

To that end, we are organizing a series of AutoML challenges since 2015 and numerous satellite events. A detailed analysis of the first challenges will be published in a book on AutoML to be published in the “Springer series on challenges in machine learning”. One of the most successful challenges in that series is the KDD cups 2019 on temporal relational data.

The objective of the new ChaLearn AutoDL challenge series, organized with Google and 4Paradigm, is to address some of the limitations of the previous challenges and provide an ambitious benchmark multi-class classification problems without any human intervention, in limited time, on any large-scale dataset composed of samples either in tabular format, 2d matrices, time series, or spatio-temporal series. Data are formatted in a uniform way as 4d tensors (t, x, y, channel). This lends itself in particular to the use of convolutional neural networks (CNNs). Although the use of Tensorflow is facilitated by providing participants with a starting kit including sample code demonstrating how to solve the problems at hand with Tensorflow, the participants are free to provide solutions not using Tensorflow.

For more information, we wrote a white paper about the AutoCV challenge design.


We are offering 28000 USD in prizes.

Each of the 5 competitions will have a 4000 USD prize pool.

1st place: 2000 USD

2nd place: 1500 USD

3rd place: 500 USD

We will also offer to the top ranking participants of the final AutoDL challenge travel awards according to merit and needs (up to a total of 6000 USD) and two best paper awards (each of 1000 USD).

About us

This challenge would not have been possible without the help of many people.

Main organizers:

  • Olivier Bousquet (Google, Switzerland)
  • André Elisseef (Google, Switzerland)
  • Isabelle Guyon (U. Paris-Saclay; UPSud/INRIA, France and ChaLearn, USA)
  • Zhengying Liu (U. Paris-Saclay; UPSud, France)

Other contributors to the organization, starting kit, and datasets, include:

  • Stephane Ayache (AMU, France)
  • Hubert Jacob Banville (INRIA, France)
  • Mahsa Behzadi (Google, Switzerland)
  • Kristin Bennett (RPI, New York, USA)
  • Hugo Jair Escalante (IANOE, Mexico and ChaLearn, USA)
  • Sergio Escalera (U. Barcelona, Spain and ChaLearn, USA)
  • Gavin Cawley (U. East Anglia, UK)
  • Baiyu Chen (UC Berkeley, USA)
  • Albert Clapes i Sintes (U. Barcelona, Spain)
  • Alexandre Gramfort (U. Paris-Saclay; INRIA, France)
  • Yi-Qi Hu (4paradigm, China)
  • Julio Jacques Jr. (U. Barcelona, Spain)
  • Meysam Madani (U. Barcelona, Spain)
  • Tatiana Merkulova (Google, Switzerland)
  • Adrien Pavao (U. Paris-Saclay; INRIA, France and ChaLearn, USA)
  • Shangeth Rajaa (BITS Pilani, India)
  • Herilalaina Rakotoarison (U. Paris-Saclay, INRIA, France)
  • Mehreen Saeed (FAST Nat. U. Lahore, Pakistan)
  • Marc Schoenauer (U. Paris-Saclay, INRIA, France)
  • Michele Sebag (U. Paris-Saclay; CNRS, France)
  • Danny Silver (Acadia University, Canada)
  • Lisheng Sun (U. Paris-Saclay; UPSud, France)
  • Sebastien Treger (La Pallaisse, France)
  • Wei-Wei Tu (4paradigm, China)
  • Fengfu Li (4paradigm, China)
  • Lichuan Xiang (4paradigm, China)
  • Jun Wan (Chinese Academy of Sciences, China)
  • Mengshuo Wang (4paradigm, China)
  • Jingsong Wang (4paradigm, China)
  • Ju Xu (4paradigm, China)
  • Zhen Xu (Ecole Polytechnique and U. Paris-Saclay; INRIA, France)

The challenge is running on the Codalab platform, administered by Université Paris-Saclay and maintained by CKCollab LLC, with primary developers:

  • Eric Carmichael (CKCollab, USA)
  • Tyler Thomas (CKCollab, USA)

ChaLearn is the challenge organization coordinator. Google is the primary sponsor of the challenge and helped defining the tasks, protocol, and data formats. 4Paradigm donated prizes, datasets, and contributed to the protocol, baselines methods and beta-testing. Other institutions of the co-organizers provided in-kind contributions, including datasets, data formatting, baseline methods, and beta-testing.

Contact the organizers.