MILK10k 2025 Challenge

Summary

The overarching goal of the benchmark is to develop image analysis tools that classify the diagnosis of skin lesions using the following set of information for each case:

Clinical close-up image
Dermatoscopic image
Metadata (more detail below)

This live challenge features the MILK10k training dataset (5,240 lesions) and a blind held-out test dataset (479 lesions) called the MILK10k Benchmark.

Task

The task is multi-category lesion diagnosis classification. Submissions are required to provide eleven probability estimates for each lesion identifier (lesion), one for each of the following diagnostic categories:

Actinic keratosis / intraepidermal carcinoma (AKIEC)
Basal cell carcinoma (BCC)
Other benign proliferations, including collision tumors (BEN_OTH)
Benign keratinocytic lesion (BKL)
Dermatofibroma (DF)
Inflammatory and infectious conditions (INF)
Other malignant proliferations, including collision tumors (MAL_OTH)
Melanoma (MEL)
Melanocytic nevus (NV)
Squamous cell carcinoma / keratoacanthoma (SCCKA)
Vascular lesions and hemorrhage (VASC)

Input Data

The input data are image pairs with additional metadata. Each lesion is composed of one clinical close-up image and one dermoscopic image.

Clinical close-up image	Dermatoscopy image
Image provided by the MILK study team	Image provided by the MILK study team

All images are annotated using the MONET framework (Kim et al., 2024), with concept term probability scores included for the following groups:

Ulceration, crust
Hair
Vasculature, vessels
Erythema
Pigmentation
Gel, water drop, dermoscopy liquid
Skin markings, pen ink, purple pen

Kim C, Gadgil SU, DeGrave AJ, et al. Transparent medical image AI via an image-text foundation model grounded in medical literature. Nat Med. 2024;30(4):1154-1165. doi:10.1038/s41591-024-02887-x

Additional metadata includes:

Age (grouped in 5-year intervals)
Sex
Skin tone, categorized from 0 (very dark) to 5 (very light) — designed to avoid confusion with Fitzpatrick skin types
Anatomical site

Response Data

Response data are binary classification probabilities for each of the 11 diagnostic categories over all 479 lesions in the MILK10k Benchmark dataset. Responses must be encoded in a CSV (comma-separated value) file and submitted through the ISIC Challenge submission system, which provides automated format validation and scoring. File columns must be:

lesion
AKIEC
BCC
BEN_OTH
BKL
DF
INF
MAL_OTH
MEL
NV
SCCKA
VASC

Responses are expressed as floating-point values in the closed interval [0.0, 1.0], where is used as the binary classification threshold (see Evaluation).

Evaluation

The primary evaluation metric is Macro F1 Score (Dice coefficient) for diagnostic categories. The macro F1 score is a multi-class classification metric that calculates the average of individual F1 scores for each class, treating all classes equally. It's calculated by first computing the F1 score for each class (using precision and recall), and then averaging these per-class F1 scores. This averaging method doesn't consider class support or prevalence, meaning each class contributes equally to the overall score.

Individual responses will be binarized using ≥0.5 as the decision threshold for positive prediction. Therefore, it is possible that a lesion may be counted as being predicted "positive" (response≥0.5) for any number (0, 1, 2, …, 11) of the eleven diagnosis categories.

Datasets

This live challenge features the MILK10k Benchmark dataset, comprising 958 images — specifically, close-up and dermatoscopic image pairs for 479 skin lesions. The benchmark dataset originates from the same sources as the original MILK10k and covers the same eleven diagnostic categories.

Although the diagnostic category structure aligns with that of MILK10k, the granular ISIC-DX diagnoses differ. The MILK10k Benchmark includes new ISIC-DX diagnoses not present in the original dataset, while some original diagnoses are omitted—particularly within the "other benign" and "other malignant" categories.

For each image in MILK10k, we also provide the most specific ISIC-DX diagnosis in the supplemental data file. While specific diagnoses of the MILK10k Benchmark are not disclosed directly, all possible granular ISIC-DX diagnoses are listed in the table below.

Diagnostic Category	Abbreviation	ISIC-DX
Actinic keratosis/intraepidermal carcinoma	`AKIEC`	Solar or actinic keratosis Squamous cell carcinomsitu Bowen's disease
Basal cell carcinoma	`BCC`	Basal cell carcinoma
Other benign proliferations including collisions	`BEN_OTH`	Benign - Other Benign soft tissue proliferations - Fibro-histiocytic Benign soft tissue proliferations - Vascular Collision - Only benign proliferations Cylindroma Exogenous Fibroepithelial polyp Fibroma, Infundibular or epidermal cyst Juvenile xanthogranuloma Mastocytosis Mucosal melanotic macule Scar Sebaceous hyperplasia Spiradenoma Supernumerary nipple Trichilemmal or isthmic-catagen or pilar cyst Trichoblastoma
Benign keratinocytic lesion	`BKL`	Clear cell acanthoma Ink-spot lentigo Lichen planus like keratosis Seborrheic keratosis Solar lentigo
Dermatofibroma	`DF`	Dermatofibroma
Inflammatory and infectious	`INF`	Inflammatory or infectious diseases Molluscum Porokeratosis Verruca
Other malignant proliferations including collisions	`MAL_OTH`	Atypical fibroxanthoma Collision - At least one malignant proliferation Kaposi sarcoma Lymphocytic proliferations - T-Cell/NK Malignant peripheral nerve sheath tumor Merkel cell carcinoma
Melanoma	`MEL`	Melanoma Invasive Melanoma in situ Melanoma metastasis
Melanocytic Nevus, any type	`NV`	Blue nevus Nevus Nevus, Acral Nevus, BAP-1 deficient Nevus, Balloon cell Nevus, Combined Nevus, Congenital Nevus, Deep penetrating Nevus, NOS, Compound Nevus, NOS, Dermal Nevus, NOS, Junctional Nevus, Recurrent or persistent Nevus, Reed Nevus, Spilus Nevus, Spitz
Squamous cell carcinoma/keratoacanthoma	`SCCKA`	Keratoacanthoma Squamous cell carcinoma, Invasive
Vascular lesions and hemorrhage	`VASC`	Angiokeratoma Arterio-venous malformation Hemangioma Hemangioma, Hobnail Lymphangioma Pyogenic granuloma

Organizers

Harald Kittler, MD ;
Department of Dermatology, Medical University of Vienna, Vienna, Austria
Philipp Tschandl, MD, PhD ;
Department of Dermatology, Medical University of Vienna, Vienna, Austria

Report an Issue