Brought to you by ChaLearn, Google and 4Paradigm

The AutoDL challenges are series of machine learning competitions focusing on Automated Machine Learning (AutoML) applied to a wide range of modalities/domains (e.g. image, video, text, speech, tabular) in which Deep Learning has had great success.

From 2019 to 2020, we organized several AutoDL challenges, each with a different focus:

All challenges are with code submission and opportunities to publish and/or present at conferences. Each challenge awards a total prize money of $4000 (USD).

Competition Protocol

In each challenge, your code will be tested directly on the platform on five datasets. Code submitted is trained and tested automatically, without any human intervention. Code submitted is run on all five datasets in parallel on separate compute workers.

We provide a starting kit containing everything you need to create your own code submission (just by modifying the file and to test it on your local computer, in conditions identical to those of the CodaLab platform. This includes a jupyter notebook (tutorial.ipynb) with step-by-step instructions. The interface is simple and generic: you must supply a Python class with:

  • a constructor

  • a train method

  • a test method

To make submissions, zip, then use the "Upload a Submission" button. That's it!

The identities of the datasets used for testing on the platform are concealed. The data are provided in a raw form (no feature extraction) to encourage researchers to use Deep Learning methods performing automatic feature learning, although this is NOT a requirement. All problems are multi-label classification problems. The tasks are constrained by a time budget (e.g. 20 min).

Any-time Learning Metric

All AutoDL challenges (except AutoWSL) evaluate submissions using an any-time learning metric. The participants are required to make multiple predictions within the time budget. In this way we can plot learning curves: "performance" as a function of time. Each time the "train" method terminates, the "test" method is called and the predictions are saved, so the scoring program can use them, together with their timestamp.

We treat both multi-class and multi-label problems alike. Each label/class is considered a separate binary classification problem, and we compute the normalized AUC (NAUC or Gini coefficient)

NAUC = 2 * AUC - 1

as score for each prediction, here AUC is the usual area under ROC curve (ROC AUC).

For each dataset, we compute the Area under Learning Curve (ALC). The learning curve is drawn as follows:

  • at each timestamp t, we compute s(t), the normalized AUC (see above) of the most recent prediction. In this way, s(t) is a step function w.r.t time t;

  • in order to normalize time to the [0, 1] interval, we perform a time transformation by

where T is the time budget (of default value 1200 seconds = 20 minutes) and t0 is a reference time amount (of default value 60 seconds).

  • then compute the area under learning curve using the formula

we see that s(t) is weighted by 1/(t + t0)), giving a stronger importance to predictions made at the beginning of th learning curve.

After we compute the ALC for all 5 datasets, the overall ranking of the average rank is used as the final score for evaluation and will be used in the learderboard. It is computed by averaging the ranks (among all participants) of ALC obtained on the 5 datasets.

Examples of learning curves:

Performances of Winning Approaches

DeepWisdom ranked 1st place in the final AutoDL challenge. In addition, this solution succeeded (trained and predicted) on 65 out 66 AutoDL datasets in our post-challenge analysis, which indicates its robustness and a promising sign towards fully automated AutoML solutions. Details on the performances of DeepWisdom on these 66 datasets can be found on the Benchmark page.

In general, top participants have made significant improvement compared to the strongest baseline, baseline3, which is a combination of winner solutions from previous challenges (AutoCV, AutoSpeech, AutoNLP).

Winner Code

According to our challenge rules, the top ranking participants are required to publicly release their code to be eligible for prize. We congratulate again the team DeepWisdom for winning the final AutoDL challenge in NeurIPS 2019. Their code is open-sourced on their GitHub repo


All the winners and their code from past AutoDL challenges are given below:

AutoDL (NeurIPS 2019) winners:

AutoWSL (ACML 2019) winners:

AutoSpeech (ACML 2019) winners:

AutoNLP (WAIC 2019) winners:

AutoCV2 (ECML PKDD 2019) winners:

AutoCV (IJCNN 2019) winners [slides]:


Providing an ever-lasting AutoDL benchmark consists one important objective of ours. We release an enriching repository of datasets both for the usage in the AutoDL challenges and to enable research meta-learning.

In the following, we provide a list of public datasets used in the final AutoDL challenge. You will have access to the data (training set and test set) AND the true labels for these datasets. Notice that the video datasets do not include a sound track.

These data were re-formatted from original public datasets. If you use them, please make sure to acknowledge the original data donnors (see "Source" in the data table) and check the terms of use.

Note that you can use the AutoDL starting kit to download all public datasets at once:


Format Your Own Datasets

We provide a toolkit to format your own datasets to the same format of AutoDL challenges. A tutorial is set up to easily generate an AutoDL dataset with your own data. Then you can easily apply the winning solutions (or any approach you wish to deploy) on the newly generated dataset and solve your own industrial or research problems.

AutoDL Self-service

We provide an "AutoDL Self-service" that allows you to run the winning solution (DeepWisdom) on your own data by simply clicking on the "Submit" button and uploading your dataset.

Note that the uploaded dataset should be in AutoDL format. And you can use this toolkit to easily convert your own data.

Cite Us

The paper on the design and analysis of the challenges can be found on HAL.

If you wish to use our challenges as benchmark or to use the datasets, please cite our paper at:


title = {Winning solutions and post-challenge analyses of the {ChaLearn} {AutoDL} challenge 2019},

journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},

author = {Liu, Zhengying and Pavao, Adrien and Xu, Zhen and Escalera, Sergio and Ferreira, Fabio and Guyon, Isabelle and Hong, Sirui and Hutter, Frank and Ji, Rongrong and Nierhoff, Thomas and Niu, Kangning and Pan, Chunguang and Stoll, Danny and Treguer, Sebastien and Wang, Jin and Wang, Peng and Wu, Chenglin and Xiong, Youcheng},

year = {2020},

pages = {17},

annote = {Under review}


Sign Up

We are constantly organizing new challenges and events, in AutoML, AutoDL, meta-learning, neural architecture search and even more! If you wish to stay tuned, sign up using the following form!! We send one message every one or two months.

About AutoDL

Machine Learning and in particular Deep Learning has achieved considerable successes in recent years and an ever-growing number of disciplines rely on it. However, this success crucially relies on human intervention in many steps (data pre-processing, feature engineering, model selection, hyper-parameter optimization, etc.). As the complexity of these tasks is often beyond non-experts, the rapid growth of machine learning applications has created a demand for off-the-shelf or reusable methods, which can be used easily and without expert knowledge. The objective of AutoML (Automated Machine Learning) challenges is to provide "universal learning machines" (deep learning or others), which can learn and make predictions without human intervention (blind testing).

The overall problem covers a wide range of difficulties, which cannot be addressed all at once in a single challenge. To name only a few: data ``ingestion" and formatting, pre-processing and feature/representation learning, detection and handling of skewed/biased data, inhomogeneous, drifting, multimodal, or multi-view data (hinging on transfer learning), matching algorithms to problems (which may include supervised, unsupervised, or reinforcement learning, or other settings), acquisition of new data (active learning, query learning, reinforcement learning, causal experimentation), management of large volumes of data including the creation of appropriately sized and stratified training, validation, and test sets, selection of algorithms that satisfy arbitrary resource constraints at training and run time, the ability to generate and reuse workflows, and generating explicative reports. Therefore, scoping the focus of a particular challenge is of great importance to ensure that the field progresses swiftly and intermediate milestone of immediate practical interest are reached.

To that end, we are organizing a series of AutoML challenges since 2015 and numerous satellite events. A detailed analysis of the first challenges will be published in a book on AutoML to be published in the “Springer series on challenges in machine learning”. One of the most successful challenges in that series is the KDD cups 2019 on temporal relational data.

The objective of the new ChaLearn AutoDL challenge series, organized with Google and 4Paradigm, is to address some of the limitations of the previous challenges and provide an ambitious benchmark multi-class classification problems without any human intervention, in limited time, on any large-scale dataset composed of samples either in tabular format, 2d matrices, time series, or spatio-temporal series. Data are formatted in a uniform way as 4d tensors (t, x, y, channel). This lends itself in particular to the use of convolutional neural networks (CNNs). Although the use of TensorFlow is facilitated by providing participants with a starting kit including sample code demonstrating how to solve the problems at hand with TensorFlow, the participants are free to provide solutions not using TensorFlow.

About us

This challenge would not have been possible without the help of many people.

Main organizers:

  • Olivier Bousquet (Google, Switzerland)

  • André Elisseef (Google, Switzerland)

  • Isabelle Guyon (U. Paris-Saclay; UPSud/INRIA, France and ChaLearn, USA)

  • Zhengying Liu (U. Paris-Saclay; UPSud, France)

  • Wei-Wei Tu (4paradigm, China)

Other contributors to the organization, starting kit, and datasets, include:

  • Stephane Ayache (AMU, France)

  • Hubert Jacob Banville (INRIA, France)

  • Mahsa Behzadi (Google, Switzerland)

  • Kristin Bennett (RPI, New York, USA)

  • Hugo Jair Escalante (IANOE, Mexico and ChaLearn, USA)

  • Sergio Escalera (U. Barcelona, Spain and ChaLearn, USA)

  • Gavin Cawley (U. East Anglia, UK)

  • Baiyu Chen (UC Berkeley, USA)

  • Albert Clapes i Sintes (U. Barcelona, Spain)

  • Bram van Ginneken (Radboud U. Nijmegen, The Netherlands)

  • Alexandre Gramfort (U. Paris-Saclay; INRIA, France)

  • Yi-Qi Hu (4paradigm, China)

  • Julio Jacques Jr. (U. Barcelona, Spain)

  • Meysam Madani (U. Barcelona, Spain)

  • Tatiana Merkulova (Google, Switzerland)

  • Adrien Pavao (U. Paris-Saclay; INRIA, France and ChaLearn, USA)

  • Shangeth Rajaa (BITS Pilani, India)

  • Herilalaina Rakotoarison (U. Paris-Saclay, INRIA, France)

  • Mehreen Saeed (FAST Nat. U. Lahore, Pakistan)

  • Marc Schoenauer (U. Paris-Saclay, INRIA, France)

  • Michele Sebag (U. Paris-Saclay; CNRS, France)

  • Danny Silver (Acadia University, Canada)

  • Lisheng Sun (U. Paris-Saclay; UPSud, France)

  • Sebastien Treger (La Pallaisse, France)

  • Fengfu Li (4paradigm, China)

  • Lichuan Xiang (4paradigm, China)

  • Jun Wan (Chinese Academy of Sciences, China)

  • Mengshuo Wang (4paradigm, China)

  • Jingsong Wang (4paradigm, China)

  • Ju Xu (4paradigm, China)

  • Zhen Xu (Ecole Polytechnique and U. Paris-Saclay; INRIA, France)

The challenge is running on the Codalab platform, administered by Université Paris-Saclay and maintained by CKCollab LLC, with primary developers:

  • Eric Carmichael (CKCollab, USA)

  • Tyler Thomas (CKCollab, USA)

ChaLearn is the challenge organization coordinator. Google is the primary sponsor of the challenge and helped defining the tasks, protocol, and data formats. 4Paradigm donated prizes, datasets, and contributed to the protocol, baselines methods and beta-testing. Other institutions of the co-organizers provided in-kind contributions, including datasets, data formatting, baseline methods, and beta-testing.

Contact the organizers.