NeurIPS AutoDL challenges
Following the success of previous AutoML challenges, we designed a new challenge: AutoDL, accepted to NeurIPS 2019. We target applications such as speech, image, video, and text, for which deep learning (DL) methods have had great success recently, to drive the community to work on automating the design of DL models. Raw data are provided, formatted in a uniform tensor manner, to encourage participants to submit generic algorithms. All problems are multi-label classification problems. We impose restrictions on training time and resources to push the state-of-the-art further. We provide a large number of pre-formatted public datasets and offer the possibility of formatting your own datasets in the same way.
To ramp up difficulty, we are running a series of milestone challenges of increasing difficulty, culminating with the NeurIPS challenge.
- AutoCV: Image Classification computer vision (CV) - CURRENTLY RUNNING [ENTER NOW] -- EASY
- AutoCV2: Image and video Classification -- HARDER (full blind testing in final phase)
- AutoNLP: Text classification -- EASY
- AutoSeries: Text and time series classification -- HARDER (full blind testing in final phase)
- AutoDL: Everything! -- HARDEST (all types of data, full blind testing in final phase)
By everything, we mean image+video+test+time series+tabular data.
All challenges are with code submission and have prizes and opportunities to publish and/or present at conferences.
Entering is very easy. We provide a starting kit containing everything you need to create your own code submission (just by modifying the file model.py) and to test it on your local computer, in conditions identical to those of the Codalab platform. This includes a jupyter notebook tutorial.ipynb with step-by-step instructions. The interface is simple and generic: you must supply a Python class model.py with:
- a constructor
- a train method
- a test method
To make submissions, zip model.py, then use the "Upload a Submission" button. That's it!
The starting kit contains sample data, but you may want to develop your code with larger practice datasets, which we provide:
Raw data are preserved, but formatted in a generic data format based on TFRecords, used by TensorFlow. However, this will not impose to participants to use deep learning algorithms nor even TensorFlow. If you want to practice designing algorithms with your own datasets, follow these steps.
In each challenge your code will be tested directly on the platform on five datasets. The score you obtain (with your last submission) will be used for the final ranking in AutoCV and AutoNLP (easiest challenges, no other hidden leaderboard). In the phases AutoCV2, AutoSeries, and AutoDL, we plan to have a final ranking on a fresh set of datasets with a single run of the last code you submitted (fully blind testing).
Code submitted is trained and tested automatically, without any human intervention. Code submitted is run on all five datasets in parallel on separate compute workers, each one with its own time budget.
The identities of the datasets used for testing on the platform are concealed. The data are provided in a raw form (no feature extraction) to encourage researchers to use Deep Learning methods performing automatic feature learning, although this is NOT a requirement. All problems are multi-label classification problems. The tasks are constrained by a time budget.
The participants can train in batches of pre-defined duration to incrementally improve their performance, until the time limit is attained. In this way we can plot learning curves: "performance" as a function of time. Each time the "train" method terminates, the "test" method is called and the results are saved, so the scoring program can use them, together with their timestamp.
We treat both multi-class and multi-label problems alike. Each label/class is considered a separate binary classification problem, and we compute the normalized AUC (or Gini coefficient)
2 * AUC - 1
as score for each prediction, here AUC is the usual area under ROC curve (ROC AUC).
For each dataset, we compute the Area under Learning Curve (ALC). The learning curve is drawn as follows:
- at each timestamp t, we compute s(t), the normalized AUC (see above) of the most recent prediction. In this way, s(t) is a step function w.r.t time t;
- in order to normalize time to the [0, 1] interval, we perform a time transformation by
where T is the time budget (of default value 1200 seconds = 20 minutes) and t0 is a reference time amount (of default value 60 seconds).
- then compute the area under learning curve using the formula
we see that s(t) is weighted by 1/(t + t0)), giving a stronger importance to predictions made at the beginning of th learning curve.
After we compute the ALC for all 5 datasets, the overall ranking is used as the final score for evaluation and will be used in the learderboard. It is computed by averaging the ranks (among all participants) of ALC obtained on the 5 datasets.
Examples of learning curves:
We anticipate the following timeline (subject to change):
- Top ranking participants will be invited to submit papers to a special issue of the IEEE transaction journal PAMI on Automated Machine Learning and will be entered in a contest for the best paper. Deadline November 30, 2019. There will be 2 best paper awards ("best paper" and "best student paper").
Machine Learning and in particular Deep Learning has achieved considerable successes in recent years and an ever-growing number of disciplines rely on it. However, this success crucially relies on human intervention in many steps (data pre-processing, feature engineering, model selection, hyper-parameter optimization, etc.). As the complexity of these tasks is often beyond non-experts, the rapid growth of machine learning applications has created a demand for off-the-shelf or reusable methods, which can be used easily and without expert knowledge. The objective of AutoML (Automated Machine Learning) challenges is to provide "universal learning machines" (deep learning or others), which can learn and make predictions without human intervention (blind testing).
The overall problem covers a wide range of difficulties, which cannot be addressed all at once in a single challenge. To name only a few: data ``ingestion" and formatting, pre-processing and feature/representation learning, detection and handling of skewed/biased data, inhomogeneous, drifting, multimodal, or multi-view data (hinging on transfer learning), matching algorithms to problems (which may include supervised, unsupervised, or reinforcement learning, or other settings), acquisition of new data (active learning, query learning, reinforcement learning, causal experimentation), management of large volumes of data including the creation of appropriately sized and stratified training, validation, and test sets, selection of algorithms that satisfy arbitrary resource constraints at training and run time, the ability to generate and reuse workflows, and generating explicative reports. Therefore, scoping the focus of a particular challenge is of great importance to ensure that the field progresses swiftly and intermediate milestone of immediate practical interest are reached.
To that end, we are organizing a series of AutoML challenges since 2015 and numerous satellite events. A detailed analysis of the first challenges will be published in a book on AutoML to be published in the “Springer series on challenges in machine learning”. One of the most successful challenges in that series is the KDD cups 2019 on temporal relational data.
The objective of the new ChaLearn AutoDL challenge series, organized with Google and 4Paradigm, is to address some of the limitations of the previous challenges and provide an ambitious benchmark multi-class classification problems without any human intervention, in limited time, on any large-scale dataset composed of samples either in tabular format, 2d matrices, time series, or spatio-temporal series. Data are formatted in a uniform way as 4d tensors (t, x, y, channel). This lends itself in particular to the use of convolutional neural networks (CNNs). Although the use of TensorFlow is facilitated by providing participants with a starting kit including sample code demonstrating how to solve the problems at hand with TensorFlow, the participants are free to provide solutions not using TensorFlow.
For more information, we wrote a white paper about the AutoCV challenge design.
We are offering 28000 USD in prizes.
Each of the 5 competitions will have a 4000 USD prize pool.
1st place: 2000 USD
2nd place: 1500 USD
3rd place: 500 USD
We will also offer to the top ranking participants of the final AutoDL challenge travel awards according to merit and needs (up to a total of 6000 USD) and two best paper awards (each of 1000 USD).
This challenge would not have been possible without the help of many people.
- Olivier Bousquet (Google, Switzerland)
- André Elisseef (Google, Switzerland)
- Isabelle Guyon (U. Paris-Saclay; UPSud/INRIA, France and ChaLearn, USA)
- Zhengying Liu (U. Paris-Saclay; UPSud, France)
Other contributors to the organization, starting kit, and datasets, include:
- Stephane Ayache (AMU, France)
- Hubert Jacob Banville (INRIA, France)
- Mahsa Behzadi (Google, Switzerland)
- Kristin Bennett (RPI, New York, USA)
- Hugo Jair Escalante (IANOE, Mexico and ChaLearn, USA)
- Sergio Escalera (U. Barcelona, Spain and ChaLearn, USA)
- Gavin Cawley (U. East Anglia, UK)
- Baiyu Chen (UC Berkeley, USA)
- Albert Clapes i Sintes (U. Barcelona, Spain)
- Alexandre Gramfort (U. Paris-Saclay; INRIA, France)
- Yi-Qi Hu (4paradigm, China)
- Julio Jacques Jr. (U. Barcelona, Spain)
- Meysam Madani (U. Barcelona, Spain)
- Tatiana Merkulova (Google, Switzerland)
- Adrien Pavao (U. Paris-Saclay; INRIA, France and ChaLearn, USA)
- Shangeth Rajaa (BITS Pilani, India)
- Herilalaina Rakotoarison (U. Paris-Saclay, INRIA, France)
- Mehreen Saeed (FAST Nat. U. Lahore, Pakistan)
- Marc Schoenauer (U. Paris-Saclay, INRIA, France)
- Michele Sebag (U. Paris-Saclay; CNRS, France)
- Danny Silver (Acadia University, Canada)
- Lisheng Sun (U. Paris-Saclay; UPSud, France)
- Sebastien Treger (La Pallaisse, France)
- Wei-Wei Tu (4paradigm, China)
- Fengfu Li (4paradigm, China)
- Lichuan Xiang (4paradigm, China)
- Jun Wan (Chinese Academy of Sciences, China)
- Mengshuo Wang (4paradigm, China)
- Jingsong Wang (4paradigm, China)
- Ju Xu (4paradigm, China)
- Zhen Xu (Ecole Polytechnique and U. Paris-Saclay; INRIA, France)
- Eric Carmichael (CKCollab, USA)
- Tyler Thomas (CKCollab, USA)
ChaLearn is the challenge organization coordinator. Google is the primary sponsor of the challenge and helped defining the tasks, protocol, and data formats. 4Paradigm donated prizes, datasets, and contributed to the protocol, baselines methods and beta-testing. Other institutions of the co-organizers provided in-kind contributions, including datasets, data formatting, baseline methods, and beta-testing.