Task
The goal for ISIC 2019 is classify dermoscopic images among nine different diagnostic categories:
- Melanoma
- Melanocytic nevus
- Basal cell carcinoma
- Actinic keratosis
- Benign keratosis (solar lentigo / seborrheic keratosis / lichen planus-like keratosis)
- Dermatofibroma
- Vascular lesion
- Squamous cell carcinoma
- None of the others
25,331 images are available for training across 8 different categories. Additionally, the
test
dataset (planned release August 2nd) will contain an additional outlier class not
represented in
the training data, which developed systems must be able to identify.
Two tasks will be available for participation: 1) classify dermoscopic images without
meta-data,
and 2) classify images with additional available meta-data. Task 1’s deadline will be August
16th.
Task 2 will be August 23th, after release of test meta-data on August 16th. Participants of
Task 2
must submit to Task 1 as well, though participants can submit to Task 1 alone.
In addition to submitting predictions, each competitor is required to submit a link to a
manuscript
describing the methods used to generate predictions.
Submission
Submissions are made to the
ISIC Challenge submission system, which
provides automated
format validation, pre-scoring, metadata editing capabilties.
Evaluation
Goal Metric
Submissions are scored using a normalized multi-class accuracy metric (balanced across categories). Tied positions will be broken using the area under the receiver operating characteristic curve (AUC) metric.
Definition
Normalized (or balanced) multi-class accuracy is defined as the accuracies of each category,
weighted by the category prevalence. Specifically, it is the arithmetic mean of the
(<category>_true_positives / <category>_positives)
across each of the diagnostic categories. This
metric is semantically equivalent to the average recall score.
Rationale
Clinical application on skin lesion classification has two goals eventually: Giving specific information and treatment options for a lesion, and detecting skin cancer with a reasonable sensitivity and specificity. The first task needs a correct specific diagnosis out of multiple classes, whereas the second demands a binary decision "biopsy" versus "don’t biopsy". In the former ISIC Challenges, focus was on the second task, therefore this year we want to rank for the more ambitious metric of normalized multiclass accuracy, as it is also closer to real evaluation of a dermatologist. This is also important for the extending reader study, where the winning algorithm(s) will be compared to physicians performance in classification of digital images.
Other Metrics
Participants will be ranked and awards granted based only on the balanced multi-class accuracy metric. However, for scientific completeness, predicted responses will also have the following metrics computed (comparing prediction vs. ground truth) for each image:
Category Metrics
- sensitivity
- specificity
- accuracy
- area under the receiver operating characteristic curve (AUC)
- mean average precision
- F1 score
- [AUC integrated between 80% to 100% sensitivity (AUC80)]
- positive predictive value (PPV)
- negative predictive value (NPV)
Aggregate Metrics
- average AUC across all diagnoses
- malignant vs. benign diagnoses category AUC
Validation Scoring
All submissions to the ISIC Challenge are immediately issued a validation score. This validation score is not intended to be used for algorithm ranking or evaluation, but is provided for a sanity check of submission data (e.g. to guard against instances where prediction labels are mismatched).
The validation score is computed with the goal metric (balanced multi-class accuracy), taken against a small (~100), non-representative, pre-determined subset of images.
For reference, a random submission generates a validation score of about 0.3.
Final Score Release
Final scores and a public leaderboard are released shortly after the conclusion of the ISIC Challenge submission period.
Transparency Statement
The code of
the isic-challenge-scoring
package
is used for actual score computation. This code is open source, permissively licensed, and published,
to facilitate external auditing.
Awards
Cash prizes of $4,000, $2,000, and $1,000 will be awarded to the first, second, and third
place
participants of image-only and meta-data tasks. USD will be awarded to winners of each of
the
tasks. The monetary prizes for the winners of the challenge will be awarded after the MICCAI
Workshop in Shenzhen, China, in October 2019. The prizes are being provided by Canfield
Scientific,
Inc., a US company, and are subject to any restrictions incumbent on the sponsor. Winners
will be
asked to identify a recipient individual or entity who will be required to provide tax
documentation
(U.S. citizens- IRS form W-9, non-U.S. citizens Form W-8 BEN).
Sponsors
- Canfield Scientific
- IBM
- MetaOptima
Clinical Chairs
- Josep Malvehy, M.D. ;
University Hospital Clinic of Barcelona, Barcelona, Spain - Allan Halpern, M.D. ;
Memorial Sloan Kettering Cancer Center, New York City, NY, USA
Computer Vision Chairs
- Noel C. F. Codella, Ph.D. ;
IBM Research, Yorktown Heights, NY, USA
Challenge Co-Chairs
- M. Emre Celebi, Ph.D. ;
University of Central Arkansas, Conway, AR, USA - Marc Combalia, M.S. ;
Fundació Clínic per a la Recerca Biomèdica, Barcelona, Spain - David Gutman, M.D., Ph.D. ;
Emory University, Atlanta, GA, USA - Brian Helba ;
Kitware, Inc., Clifton Park, NY, USA - Harald Kittler, M.D. ;
Medical University of Vienna, Vienna, Austria - Veronica Rotemberg, M.D., Ph.D. ;
Memorial Sloan Kettering Cancer Center, New York City, NY, USA - Philipp Tschandl, M.D., Ph.D. ;
Medical University of Vienna, Vienna, Austria