Dataset Catalog

Browse all available PAD datasets. Each dataset is available in Croissant-compliant format with metadata and split definitions.

Dataset Name Description Records Files Version Published
FHI2020_Stratified_Sampling Enhanced approach to selecting training/test sets for the FHI2020 dataset 8001 2 1.0 2025-03-24
FHI2021 Dataset: FHI2021 from PaperAnalyticalDeviceND/pad_dataset_registry 0 0 b26340e 2024-11-15
FHI2022 Dataset: FHI2022 from PaperAnalyticalDeviceND/pad_dataset_registry 0 0 70d4ad1 2024-11-15
FHI360_FHI2020-FHI2022_MidTrainingSet_Good_v1.0 Dataset: FHI360_FHI2020-FHI2022_MidTrainingSet_Good_v1.0 from PaperAnalyticalDeviceND/pad_dataset_registry 5924 3 fc7ff27 2025-03-24
FHI360_FHI2020_MidTrainingSet-Zero_Good_v1.1 Dataset: FHI360_FHI2020_MidTrainingSet-Zero_Good_v1.1 from PaperAnalyticalDeviceND/pad_dataset_registry 9027 3 fc7ff27 2025-03-24
FHI360_FHI2020_MidTrainingSet_Good_v1.0 Dataset: FHI360_FHI2020_MidTrainingSet_Good_v1.0 from PaperAnalyticalDeviceND/pad_dataset_registry 8792 2 fc7ff27 2025-03-24
FHI360_FHI360-FHI2020_BalancedData_v1.0 Dataset: FHI360_FHI360-FHI2020_BalancedData_v1.0 from PaperAnalyticalDeviceND/pad_dataset_registry 10483 4 ec795cc 2025-03-24
Leiberman-Lab_ChemoPADNNtraining2024_Partial-Drug-Set_v1.0 The ChemoPADNNtraining2024 Dataset is a curated collection of Paper Analytical Device (PAD) images used for chemotherapy drug identification and analysis. 3609 2 v1.0 2025-03-31
TFDA_MSH-Tanzania_v2.0 New version of the dataset based on the MSH Tanzania project. 2949 2 2.0 2025-03-24
Veripad_ChemoPAD-idPAD2.4_v1.0 Dataset: Veripad_ChemoPAD-idPAD2.4_v1.0 from PaperAnalyticalDeviceND/pad_dataset_registry 1424 3 ec795cc 2025-03-24

How to Access Datasets

All datasets are available through our GitHub repository and through our Croissant-compliant API.

GitHub Repository

You can directly access the dataset files in our GitHub repository.

API Access

All datasets are available through a Croissant-compliant API at:

https://paperanalyticaldevicend.github.io/pad_dataset_registry/api/datasets/{dataset-name}.json

A catalog of all available datasets can be accessed at:

https://paperanalyticaldevicend.github.io/pad_dataset_registry/api/catalog.json

Using Datasets in Machine Learning

The datasets in this registry are designed to be used with machine learning frameworks such as TensorFlow, PyTorch, and scikit-learn.

Each dataset includes:

  1. Processed PAD card images
  2. Metadata about the samples (drug name, concentration, etc.)
  3. Labels for training and evaluation
  4. Data splits for training, validation, and testing

To use these datasets, you can either:

  1. Download the raw CSV files and process them yourself
  2. Use our Croissant-compliant API to access the data programmatically