Datasets
Register and track datasets used by your AI systems, with PII detection and bulk upload.
Overview
The Datasets page is where you register and track every dataset your organization uses for AI systems. Each dataset record captures what the data contains, where it came from, whether it includes personal information, and what biases have been identified.
Datasets can be linked to models and projects, so you have a clear picture of which data feeds into which system.
Viewing datasets
The main page shows all datasets in a table with columns for name, description, status, type, classification, owner, and source. Above the table, summary cards show how many datasets are in each status (Draft, Active, Deprecated, Archived).
You can search by name or description, filter by status, type, or classification, and group datasets by any of those fields to organize them into collapsible sections.
Adding a dataset
- Click Add dataset in the top right.
- Fill in the basic fields: name, description, version, and owner.
- Set the type (Training, Validation, Testing, Production, or Reference) and classification (Public, Internal, Confidential, or Restricted).
- If the dataset contains personal data, check Contains PII and list the PII types present.
- Document any known biases and mitigation steps.
- Link the dataset to relevant models and projects.
- Click Save.
Dataset fields
| Field | What it captures |
|---|---|
| Name | A short, recognizable name for the dataset |
| Description | What the dataset contains and what it is used for |
| Version | Version identifier (e.g., 1.0.0) |
| Owner | Person or team responsible for the dataset |
| Type | Training, Validation, Testing, Production, or Reference |
| Classification | Public, Internal, Confidential, or Restricted |
| Source | Where the data came from |
| Format | File format (CSV, JSON, Parquet, etc.) |
| License | Data license or usage terms |
| Contains PII | Whether the dataset includes personally identifiable information |
| PII types | Specific PII categories present (email, SSN, phone, etc.) |
| Known biases | Documented bias issues in the data |
| Bias mitigation | Steps taken to address identified biases |
| Collection method | How the data was gathered |
| Preprocessing | Cleaning or transformation steps applied |
Bulk upload
If you have multiple dataset files to register at once, use the bulk upload feature instead of adding them one by one.
- Click Bulk upload in the top right.
- Drag and drop files (CSV, XLS, or XLSX, up to 30 MB each) or click to browse.
- The system scans column headers for potential PII (email, phone, SSN, address, etc.) and flags any matches.
- Review the auto-detected metadata for each file. Edit names, types, or classifications before uploading.
- Click Upload to register all files as dataset records.
Linking datasets to models and projects
Each dataset can be linked to one or more models from the model inventory and one or more projects. These links create a traceable chain from data to model to use case, which is what auditors look for when verifying data governance.
Set these links when creating or editing a dataset. You can also view all datasets linked to a specific model from the model inventory page.
Dataset statuses
| Status | When to use |
|---|---|
| Draft | Dataset is being documented but not yet in use |
| Active | Dataset is currently used by one or more models or systems |
| Deprecated | Dataset is being phased out and should not be used for new work |
| Archived | Dataset is no longer in use but kept for audit records |
Change history
Every edit to a dataset is tracked in a change history log. This gives auditors a record of what changed, when, and by whom.
Who can do what
| Action | Required role |
|---|---|
| View datasets | Any authenticated user |
| Add or edit datasets | Admin or Editor |
| Bulk upload | Admin or Editor |
| Delete datasets | Admin or Editor |