Dataset Catalog
Browse all available PAD datasets. Each dataset is available in Croissant-compliant format with metadata and split definitions.
Dataset Name | Description | Records | Files | Version | Published |
---|---|---|---|---|---|
FHI2020_Stratified_Sampling | Enhanced approach to selecting training/test sets for the FHI2020 dataset | 8001 | 2 | 1.0 | 2025-03-24 |
FHI2021 | Dataset: FHI2021 from PaperAnalyticalDeviceND/pad_dataset_registry | 0 | 0 | b26340e | 2024-11-15 |
FHI2022 | Dataset: FHI2022 from PaperAnalyticalDeviceND/pad_dataset_registry | 0 | 0 | 70d4ad1 | 2024-11-15 |
FHI360_FHI2020-FHI2022_MidTrainingSet_Good_v1.0 | Dataset: FHI360_FHI2020-FHI2022_MidTrainingSet_Good_v1.0 from PaperAnalyticalDeviceND/pad_dataset_registry | 5924 | 3 | fc7ff27 | 2025-03-24 |
FHI360_FHI2020_MidTrainingSet-Zero_Good_v1.1 | Dataset: FHI360_FHI2020_MidTrainingSet-Zero_Good_v1.1 from PaperAnalyticalDeviceND/pad_dataset_registry | 9027 | 3 | fc7ff27 | 2025-03-24 |
FHI360_FHI2020_MidTrainingSet_Good_v1.0 | Dataset: FHI360_FHI2020_MidTrainingSet_Good_v1.0 from PaperAnalyticalDeviceND/pad_dataset_registry | 8792 | 2 | fc7ff27 | 2025-03-24 |
FHI360_FHI360-FHI2020_BalancedData_v1.0 | Dataset: FHI360_FHI360-FHI2020_BalancedData_v1.0 from PaperAnalyticalDeviceND/pad_dataset_registry | 10483 | 4 | ec795cc | 2025-03-24 |
Leiberman-Lab_ChemoPADNNtraining2024_Partial-Drug-Set_v1.0 | The ChemoPADNNtraining2024 Dataset is a curated collection of Paper Analytical Device (PAD) images used for chemotherapy drug identification and analysis. | 3609 | 2 | v1.0 | 2025-03-31 |
TFDA_MSH-Tanzania_v2.0 | New version of the dataset based on the MSH Tanzania project. | 2949 | 2 | 2.0 | 2025-03-24 |
Veripad_ChemoPAD-idPAD2.4_v1.0 | Dataset: Veripad_ChemoPAD-idPAD2.4_v1.0 from PaperAnalyticalDeviceND/pad_dataset_registry | 1424 | 3 | ec795cc | 2025-03-24 |
How to Access Datasets
All datasets are available through our GitHub repository and through our Croissant-compliant API.
GitHub Repository
You can directly access the dataset files in our GitHub repository.
API Access
All datasets are available through a Croissant-compliant API at:
https://paperanalyticaldevicend.github.io/pad_dataset_registry/api/datasets/{dataset-name}.json
A catalog of all available datasets can be accessed at:
https://paperanalyticaldevicend.github.io/pad_dataset_registry/api/catalog.json
Using Datasets in Machine Learning
The datasets in this registry are designed to be used with machine learning frameworks such as TensorFlow, PyTorch, and scikit-learn.
Each dataset includes:
- Processed PAD card images
- Metadata about the samples (drug name, concentration, etc.)
- Labels for training and evaluation
- Data splits for training, validation, and testing
To use these datasets, you can either:
- Download the raw CSV files and process them yourself
- Use our Croissant-compliant API to access the data programmatically