Summary
The overarching goal of the benchmark is to develop image analysis tools that classify the diagnosis of skin lesions using the following set of information for each case:
- Clinical close-up image
- Dermatoscopic image
- Metadata (more detail below)
This live challenge features the MILK10k training dataset (5,240 lesions) and a blind held-out test dataset (479 lesions) called the MILK10k Benchmark.
Task
The task is multi-category lesion diagnosis classification.
Submissions are required to provide eleven probability estimates for each lesion
identifier (lesion_id
), one for each of the following diagnostic categories:
- Actinic keratosis / intraepidermal carcinoma (
AKIEC
) - Basal cell carcinoma (
BCC
) - Other benign proliferations, including collision tumors (
BEN_OTH
) - Benign keratinocytic lesion (
BKL
) - Dermatofibroma (
DF
) - Inflammatory and infectious conditions (
INF
) - Other malignant proliferations, including collision tumors (
MAL_OTH
) - Melanoma (
MEL
) - Melanocytic nevus (
NV
) - Squamous cell carcinoma / keratoacanthoma (
SCCKA
) - Vascular lesions and hemorrhage (
VASC
)
Input Data
The input data are image pairs with additional metadata. Each lesion is composed of one clinical close-up image and one dermoscopic image.
Clinical close-up image | Dermatoscopy image |
---|---|
![]() |
![]() |
All images are annotated using the MONET framework (Kim et al., 2024), with concept term probability scores included for the following groups:
- Ulceration, crust
- Hair
- Vasculature, vessels
- Erythema
- Pigmentation
- Gel, water drop, dermoscopy liquid
- Skin markings, pen ink, purple pen
Kim C, Gadgil SU, DeGrave AJ, et al. Transparent medical image AI via an image-text foundation model grounded in medical literature. Nat Med. 2024;30(4):1154-1165. doi:10.1038/s41591-024-02887-x
Additional metadata includes:
- Age (grouped in 5-year intervals)
- Sex
-
Skin tone, categorized from
0
(very dark) to5
(very light) — designed to avoid confusion with Fitzpatrick skin types - Anatomical site
Response Data
Response data are binary classification probabilities for each of the 11 diagnostic categories over all 479 lesions in the MILK10k Benchmark dataset. Responses must be encoded in a CSV (comma-separated value) file and submitted through the ISIC Challenge submission system, which provides automated format validation and scoring. File columns must be:
lesion_id
AKIEC
BCC
BEN_OTH
BKL
DF
INF
MAL_OTH
MEL
NV
SCCKA
VASC
Responses are expressed as floating-point values in the closed interval
[0.0, 1.0]
, where is used as the binary classification
threshold (see Evaluation).
Evaluation
The primary evaluation metric is Macro F1 Score (Dice coefficient) for diagnostic categories. The macro F1 score is a multi-class classification metric that calculates the average of individual F1 scores for each class, treating all classes equally. It's calculated by first computing the F1 score for each class (using precision and recall), and then averaging these per-class F1 scores. This averaging method doesn't consider class support or prevalence, meaning each class contributes equally to the overall score.
Individual responses will be binarized using ≥0.5
as the decision
threshold for positive prediction. Therefore, it is possible that a lesion may be counted
as being predicted "positive" (response≥0.5
) for any number
(0, 1, 2, …, 11) of the eleven diagnosis categories.
Datasets
This live challenge features the MILK10k Benchmark dataset, comprising 958 images — specifically, close-up and dermatoscopic image pairs for 479 skin lesions. The benchmark dataset originates from the same sources as the original MILK10k and covers the same eleven diagnostic categories.
Although the diagnostic category structure aligns with that of MILK10k, the granular ISIC-DX diagnoses differ. The MILK10k Benchmark includes new ISIC-DX diagnoses not present in the original dataset, while some original diagnoses are omitted—particularly within the "other benign" and "other malignant" categories.
For each image in MILK10k, we also provide the most specific ISIC-DX diagnosis in the supplemental data file. While specific diagnoses of the MILK10k Benchmark are not disclosed directly, all possible granular ISIC-DX diagnoses are listed in the table below.
Diagnostic Category | Abbreviation | ISIC-DX |
---|---|---|
Actinic keratosis/intraepidermal carcinoma | AKIEC |
|
Basal cell carcinoma | BCC |
|
Other benign proliferations including collisions | BEN_OTH |
|
Benign keratinocytic lesion | BKL |
|
Dermatofibroma | DF |
|
Inflammatory and infectious | INF |
|
Other malignant proliferations including collisions | MAL_OTH |
|
Melanoma | MEL |
|
Melanocytic Nevus, any type | NV |
|
Squamous cell carcinoma/keratoacanthoma | SCCKA |
|
Vascular lesions and hemorrhage | VASC |
|
Organizers
-
Harald Kittler, MD ;
Department of Dermatology, Medical University of Vienna, Vienna, Austria -
Philipp Tschandl, MD, PhD ;
Department of Dermatology, Medical University of Vienna, Vienna, Austria