Paper Analytical Device (PAD) Dataset Registry
Welcome to the PAD Dataset Registry, a collection of datasets used for training, validating, and testing machine learning models for the detection of falsified pharmaceuticals using Paper Analytical Devices.
About Paper Analytical Devices
Paper Analytical Devices (PADs) are test cards that can quickly determine whether a drug tablet contains the correct medicines. They are cheap and easy to use, requiring no power, chemicals, solvents, or expensive instruments.
PADs work by performing twelve chemical tests on a drug sample and producing a distinctive color barcode that is analyzed to identify the chemical composition of the drug. If a falsified version of the medicine lacks the active ingredient or includes substitute fillers, the difference in color is perceivable by a trained human evaluator or machine learning model.
Available Datasets
Our datasets are formatted according to the MLCommons Croissant specification, making them easily accessible for machine learning applications.
FHI2020_Stratified_Sampling
Enhanced approach to selecting training/test sets for the FHI2020 dataset
Records: 8001
View DetailsFHI360_FHI2020-FHI2022_MidTrainingSet_Good_v1.0
Dataset: FHI360_FHI2020-FHI2022_MidTrainingSet_Good_v1.0 from PaperAnalyticalDeviceND/pad_dataset_registry
Records: 5924
View DetailsFHI360_FHI2020_MidTrainingSet-Zero_Good_v1.1
Dataset: FHI360_FHI2020_MidTrainingSet-Zero_Good_v1.1 from PaperAnalyticalDeviceND/pad_dataset_registry
Records: 9027
View DetailsFHI360_FHI2020_MidTrainingSet_Good_v1.0
Dataset: FHI360_FHI2020_MidTrainingSet_Good_v1.0 from PaperAnalyticalDeviceND/pad_dataset_registry
Records: 8792
View DetailsFHI360_FHI360-FHI2020_BalancedData_v1.0
Dataset: FHI360_FHI360-FHI2020_BalancedData_v1.0 from PaperAnalyticalDeviceND/pad_dataset_registry
Records: 10483
View DetailsLeiberman-Lab_ChemoPADNNtraining2024_Partial-Drug-Set_v1.0
The ChemoPADNNtraining2024 Dataset is a curated collection of Paper Analytical Device (PAD) images used for chemotherapy drug identification and analysis.
Records: 3609
View DetailsTFDA_MSH-Tanzania_v2.0
New version of the dataset based on the MSH Tanzania project.
Records: 2949
View DetailsVeripad_ChemoPAD-idPAD2.4_v1.0
Dataset: Veripad_ChemoPAD-idPAD2.4_v1.0 from PaperAnalyticalDeviceND/pad_dataset_registry
Records: 1424
View DetailsHow to Use
These datasets can be used to train machine learning models to detect falsified pharmaceuticals. Each dataset contains:
- Processed PAD card images
- Metadata about the samples (drug name, concentration, etc.)
- Labels for training and evaluation
- Data splits for training, validation, and testing
API Access
All datasets are available through a Croissant-compliant API at:
https://paperanalyticaldevicend.github.io/pad_dataset_registry/api/datasets/{dataset-name}.json
A catalog of all available datasets can be accessed at:
https://paperanalyticaldevicend.github.io/pad_dataset_registry/api/catalog.json