Paper Analytical Device (PAD) Dataset Registry

Welcome to the PAD Dataset Registry, a collection of datasets used for training, validating, and testing machine learning models for the detection of falsified pharmaceuticals using Paper Analytical Devices.

About Paper Analytical Devices

Paper Analytical Devices (PADs) are test cards that can quickly determine whether a drug tablet contains the correct medicines. They are cheap and easy to use, requiring no power, chemicals, solvents, or expensive instruments.

PADs work by performing twelve chemical tests on a drug sample and producing a distinctive color barcode that is analyzed to identify the chemical composition of the drug. If a falsified version of the medicine lacks the active ingredient or includes substitute fillers, the difference in color is perceivable by a trained human evaluator or machine learning model.

Available Datasets

Our datasets are formatted according to the MLCommons Croissant specification, making them easily accessible for machine learning applications.

FHI2020_Stratified_Sampling

Enhanced approach to selecting training/test sets for the FHI2020 dataset

Records: 8001

View Details

FHI2021

Dataset: FHI2021 from PaperAnalyticalDeviceND/pad_dataset_registry

Records: 0

View Details

FHI2022

Dataset: FHI2022 from PaperAnalyticalDeviceND/pad_dataset_registry

Records: 0

View Details

FHI360_FHI2020-FHI2022_MidTrainingSet_Good_v1.0

Dataset: FHI360_FHI2020-FHI2022_MidTrainingSet_Good_v1.0 from PaperAnalyticalDeviceND/pad_dataset_registry

Records: 5924

View Details

FHI360_FHI2020_MidTrainingSet-Zero_Good_v1.1

Dataset: FHI360_FHI2020_MidTrainingSet-Zero_Good_v1.1 from PaperAnalyticalDeviceND/pad_dataset_registry

Records: 9027

View Details

FHI360_FHI2020_MidTrainingSet_Good_v1.0

Dataset: FHI360_FHI2020_MidTrainingSet_Good_v1.0 from PaperAnalyticalDeviceND/pad_dataset_registry

Records: 8792

View Details

FHI360_FHI360-FHI2020_BalancedData_v1.0

Dataset: FHI360_FHI360-FHI2020_BalancedData_v1.0 from PaperAnalyticalDeviceND/pad_dataset_registry

Records: 10483

View Details

Leiberman-Lab_ChemoPADNNtraining2024_Partial-Drug-Set_v1.0

The ChemoPADNNtraining2024 Dataset is a curated collection of Paper Analytical Device (PAD) images used for chemotherapy drug identification and analysis.

Records: 3609

View Details

TFDA_MSH-Tanzania_v2.0

New version of the dataset based on the MSH Tanzania project.

Records: 2949

View Details

Veripad_ChemoPAD-idPAD2.4_v1.0

Dataset: Veripad_ChemoPAD-idPAD2.4_v1.0 from PaperAnalyticalDeviceND/pad_dataset_registry

Records: 1424

View Details

How to Use

These datasets can be used to train machine learning models to detect falsified pharmaceuticals. Each dataset contains:

  1. Processed PAD card images
  2. Metadata about the samples (drug name, concentration, etc.)
  3. Labels for training and evaluation
  4. Data splits for training, validation, and testing

API Access

All datasets are available through a Croissant-compliant API at:

https://paperanalyticaldevicend.github.io/pad_dataset_registry/api/datasets/{dataset-name}.json

A catalog of all available datasets can be accessed at:

https://paperanalyticaldevicend.github.io/pad_dataset_registry/api/catalog.json