AutoDL challenges

Following the success of AutoDL 2019-2020 (which was part of the competition selection of NeurIPS 2019, see our workshop page), we are continuing to organize a series of challenges.

Coming soon KDD 2020 will be held in San Diego, CA, USA from August 23 to 27, 2020. The Automatic Graph Representation Learning challenge (AutoGraph), the first ever AutoML challenge applied to Graph-structured data, is the AutoML track challenge in KDD Cup 2020 provided by 4Paradigm, ChaLearn, Stanford and Google.

You can use the solutions of the winners of past challenges by formatting your own data with our code.

Congratulations to the NeurIPS 2019 AutoDL winners:

Congratulations to the ACML 2019 AutoWSL winners:

Congratulations to the AutoSpeech winners:

Congratulations to the WAIC 2019 AutoNLP winners:

Congratulations to the ECML PKDD conf. AutoCV2 winners:

They will be invited to present at the discovery challenge workshops of ECML PKDD.

The IJCNN conf. AutoCV winners [slides] were:

Following the success of previous AutoML challenges, we designed a new challenge series: AutoCV (IJCNN 2019), AutoCV2 (EMCL PKDD 2019), AutoNLP (WAIC 2019), AutoSpeech and AutoWeakly (ACML, 2019) and AutoDL, (NeurIPS 2019)! Read about the AutoCV challenge design. We target applications such as speech, image, video, and text, for which deep learning (DL) methods have had great success recently, to drive the community to work on automating the design of DL models. Raw data are provided, formatted in a uniform tensor manner, to encourage participants to submit generic algorithms. All problems are multi-label classification problems. We impose restrictions on training time and resources to push the state-of-the-art further. We provide a large number of pre-formatted public datasets and offer the possibility of formatting your own datasets in the same way.

The paper on the design and analysis of the challenges can be found on HAL.

If you wish to use our challenges as benchmark or to use the datasets, please cite our paper at:


title = {Winning solutions and post-challenge analyses of the {ChaLearn} {AutoDL} challenge 2019},

journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},

author = {Liu, Zhengying and Pavao, Adrien and Xu, Zhen and Escalera, Sergio and Ferreira, Fabio and Guyon, Isabelle and Hong, Sirui and Hutter, Frank and Ji, Rongrong and Nierhoff, Thomas and Niu, Kangning and Pan, Chunguang and Stoll, Danny and Treguer, Sebastien and Wang, Jin and Wang, Peng and Wu, Chenglin and Xiong, Youcheng},

year = {2020},

pages = {17},

annote = {Under review}



To ramp up difficulty, we are running a series of milestone challenges of increasing difficulty, culminating with the NeurIPS challenge.

  1. AutoCV: Image Classification computer vision (CV) - ENDED - EASY

  2. AutoCV2: Image and video Classification -- CURRENTLY RUNNING [ENTER NOW] -- HARDER (full blind testing in final phase)

  3. AutoNLP: Text classification -- EASY

  4. AutoSpeech: Speech time series classification -- HARDER

  5. AutoWeakly aka AutoWSL: Weakly supervised learning -- HARDER

  6. AutoDL: Everything! -- HARDEST (all types of data, full blind testing in final phase)

By everything, we mean image+video+text+speech+tabular data. Each task will still come from one single domain (e.g. image).

All challenges are with code submission and have prizes and opportunities to publish and/or present at conferences.

The first challenge AutoCV has already ended! Its results will be presented at the IJCNN 2019 conference. The second challenge AutoCV2 is associated with the ECML PKDD 2019 conference.

Entering is very easy. We provide a starting kit containing everything you need to create your own code submission (just by modifying the file and to test it on your local computer, in conditions identical to those of the Codalab platform. This includes a jupyter notebook tutorial.ipynb with step-by-step instructions. The interface is simple and generic: you must supply a Python class with:

  • a constructor

  • a train method

  • a test method

To make submissions, zip, then use the "Upload a Submission" button. That's it!

The starting kit contains sample data, but you may want to develop your code with larger practice datasets, which we provide:

cd autodl_starting_kit_stable


Raw data are preserved, but formatted in a generic data format based on TFRecords, used by TensorFlow. However, this will not impose to participants to use deep learning algorithms nor even TensorFlow. For PyTorch users, we also provide a data converter in the starting kit (see above).

If you want to practice designing algorithms with your own datasets, follow these steps.

Competition protocol

In each challenge your code will be tested directly on the platform on five datasets. The score you obtain (with your last submission) will be used for the final ranking in AutoCV and AutoNLP (easiest challenges, no other hidden leaderboard). In the phases AutoCV2, AutoNLP, AutoSpeech, AutoWSL, and AutoDL, we plan to have a final ranking on a fresh set of datasets with a single run of the last code you submitted (fully blind testing).

Code submitted is trained and tested automatically, without any human intervention. Code submitted is run on all five datasets in parallel on separate compute workers, each one with its own time budget.

The identities of the datasets used for testing on the platform are concealed. The data are provided in a raw form (no feature extraction) to encourage researchers to use Deep Learning methods performing automatic feature learning, although this is NOT a requirement. All problems are multi-label classification problems. The tasks are constrained by a time budget.


The participants can train in batches of pre-defined duration to incrementally improve their performance, until the time limit is attained. In this way we can plot learning curves: "performance" as a function of time. Each time the "train" method terminates, the "test" method is called and the results are saved, so the scoring program can use them, together with their timestamp.

We treat both multi-class and multi-label problems alike. Each label/class is considered a separate binary classification problem, and we compute the normalized AUC (or Gini coefficient)

2 * AUC - 1

as score for each prediction, here AUC is the usual area under ROC curve (ROC AUC).

For each dataset, we compute the Area under Learning Curve (ALC). The learning curve is drawn as follows:

  • at each timestamp t, we compute s(t), the normalized AUC (see above) of the most recent prediction. In this way, s(t) is a step function w.r.t time t;

  • in order to normalize time to the [0, 1] interval, we perform a time transformation by

where T is the time budget (of default value 1200 seconds = 20 minutes) and t0 is a reference time amount (of default value 60 seconds).

  • then compute the area under learning curve using the formula

we see that s(t) is weighted by 1/(t + t0)), giving a stronger importance to predictions made at the beginning of th learning curve.

After we compute the ALC for all 5 datasets, the overall ranking is used as the final score for evaluation and will be used in the learderboard. It is computed by averaging the ranks (among all participants) of ALC obtained on the 5 datasets.

Examples of learning curves:


We anticipate the following timeline (subject to change):

About AutoDL

Machine Learning and in particular Deep Learning has achieved considerable successes in recent years and an ever-growing number of disciplines rely on it. However, this success crucially relies on human intervention in many steps (data pre-processing, feature engineering, model selection, hyper-parameter optimization, etc.). As the complexity of these tasks is often beyond non-experts, the rapid growth of machine learning applications has created a demand for off-the-shelf or reusable methods, which can be used easily and without expert knowledge. The objective of AutoML (Automated Machine Learning) challenges is to provide "universal learning machines" (deep learning or others), which can learn and make predictions without human intervention (blind testing).

The overall problem covers a wide range of difficulties, which cannot be addressed all at once in a single challenge. To name only a few: data ``ingestion" and formatting, pre-processing and feature/representation learning, detection and handling of skewed/biased data, inhomogeneous, drifting, multimodal, or multi-view data (hinging on transfer learning), matching algorithms to problems (which may include supervised, unsupervised, or reinforcement learning, or other settings), acquisition of new data (active learning, query learning, reinforcement learning, causal experimentation), management of large volumes of data including the creation of appropriately sized and stratified training, validation, and test sets, selection of algorithms that satisfy arbitrary resource constraints at training and run time, the ability to generate and reuse workflows, and generating explicative reports. Therefore, scoping the focus of a particular challenge is of great importance to ensure that the field progresses swiftly and intermediate milestone of immediate practical interest are reached.

To that end, we are organizing a series of AutoML challenges since 2015 and numerous satellite events. A detailed analysis of the first challenges will be published in a book on AutoML to be published in the “Springer series on challenges in machine learning”. One of the most successful challenges in that series is the KDD cups 2019 on temporal relational data.

The objective of the new ChaLearn AutoDL challenge series, organized with Google and 4Paradigm, is to address some of the limitations of the previous challenges and provide an ambitious benchmark multi-class classification problems without any human intervention, in limited time, on any large-scale dataset composed of samples either in tabular format, 2d matrices, time series, or spatio-temporal series. Data are formatted in a uniform way as 4d tensors (t, x, y, channel). This lends itself in particular to the use of convolutional neural networks (CNNs). Although the use of TensorFlow is facilitated by providing participants with a starting kit including sample code demonstrating how to solve the problems at hand with TensorFlow, the participants are free to provide solutions not using TensorFlow.

For more information, we wrote a white paper about the AutoCV challenge design.


We are offering 28000 USD in prizes.

Each of the 5 competitions will have a 4000 USD prize pool.

1st place: 2000 USD

2nd place: 1500 USD

3rd place: 500 USD

We will also offer to the top ranking participants of the final AutoDL challenge travel awards according to merit and needs (up to a total of 6000 USD) and two best paper awards (each of 1000 USD).

About us

This challenge would not have been possible without the help of many people.

Main organizers:

  • Olivier Bousquet (Google, Switzerland)

  • André Elisseef (Google, Switzerland)

  • Isabelle Guyon (U. Paris-Saclay; UPSud/INRIA, France and ChaLearn, USA)

  • Zhengying Liu (U. Paris-Saclay; UPSud, France)

  • Wei-Wei Tu (4paradigm, China)

Other contributors to the organization, starting kit, and datasets, include:

  • Stephane Ayache (AMU, France)

  • Hubert Jacob Banville (INRIA, France)

  • Mahsa Behzadi (Google, Switzerland)

  • Kristin Bennett (RPI, New York, USA)

  • Hugo Jair Escalante (IANOE, Mexico and ChaLearn, USA)

  • Sergio Escalera (U. Barcelona, Spain and ChaLearn, USA)

  • Gavin Cawley (U. East Anglia, UK)

  • Baiyu Chen (UC Berkeley, USA)

  • Albert Clapes i Sintes (U. Barcelona, Spain)

  • Bram van Ginneken (Radboud U. Nijmegen, The Netherlands)

  • Alexandre Gramfort (U. Paris-Saclay; INRIA, France)

  • Yi-Qi Hu (4paradigm, China)

  • Julio Jacques Jr. (U. Barcelona, Spain)

  • Meysam Madani (U. Barcelona, Spain)

  • Tatiana Merkulova (Google, Switzerland)

  • Adrien Pavao (U. Paris-Saclay; INRIA, France and ChaLearn, USA)

  • Shangeth Rajaa (BITS Pilani, India)

  • Herilalaina Rakotoarison (U. Paris-Saclay, INRIA, France)

  • Mehreen Saeed (FAST Nat. U. Lahore, Pakistan)

  • Marc Schoenauer (U. Paris-Saclay, INRIA, France)

  • Michele Sebag (U. Paris-Saclay; CNRS, France)

  • Danny Silver (Acadia University, Canada)

  • Lisheng Sun (U. Paris-Saclay; UPSud, France)

  • Sebastien Treger (La Pallaisse, France)

  • Fengfu Li (4paradigm, China)

  • Lichuan Xiang (4paradigm, China)

  • Jun Wan (Chinese Academy of Sciences, China)

  • Mengshuo Wang (4paradigm, China)

  • Jingsong Wang (4paradigm, China)

  • Ju Xu (4paradigm, China)

  • Zhen Xu (Ecole Polytechnique and U. Paris-Saclay; INRIA, France)

The challenge is running on the Codalab platform, administered by Université Paris-Saclay and maintained by CKCollab LLC, with primary developers:

  • Eric Carmichael (CKCollab, USA)

  • Tyler Thomas (CKCollab, USA)

ChaLearn is the challenge organization coordinator. Google is the primary sponsor of the challenge and helped defining the tasks, protocol, and data formats. 4Paradigm donated prizes, datasets, and contributed to the protocol, baselines methods and beta-testing. Other institutions of the co-organizers provided in-kind contributions, including datasets, data formatting, baseline methods, and beta-testing.

Contact the organizers.