AI governance

Datasets

Overview

The Datasets page is where you register and track every dataset your organization uses for AI systems. Each dataset record captures what the data contains, where it came from, whether it includes personal information and what biases have been identified.

Datasets can be linked to models and projects, so you have a clear picture of which data feeds into which system.

Viewing datasets

The main page shows all datasets in a table with columns for name, description, status, type, classification, owner and source. Above the table, summary cards show how many datasets are in each status (Draft, Active, Deprecated, Archived).

You can search by name or description, filter by status, type or classification and group datasets by any of those fields to organize them into collapsible sections.

Adding a dataset

Click Add dataset in the top right.
Fill in the basic fields: name, description, version and owner.
Set the type (Training, Validation, Testing, Production, or Reference) and classification (Public, Internal, Confidential, or Restricted).
If the dataset contains personal data, check Contains PII and list the PII types present.
Document any known biases and mitigation steps.
Link the dataset to relevant models and projects.
Click Save.

Dataset fields

Field	What it captures
Name	A short, recognizable name for the dataset
Description	What the dataset contains and what it is used for
Version	Version identifier (e.g., 1.0.0)
Owner	Person or team responsible for the dataset
Type	Training, Validation, Testing, Production, or Reference
Classification	Public, Internal, Confidential, or Restricted
Source	Where the data came from
Format	File format (CSV, JSON, Parquet, etc.)
License	Data license or usage terms
Contains PII	Whether the dataset includes personally identifiable information
PII types	Specific PII categories present (email, SSN, phone, etc.)
Known biases	Documented bias issues in the data
Bias mitigation	Steps taken to address identified biases
Collection method	How the data was gathered
Preprocessing	Cleaning or transformation steps applied

Bulk upload

If you have multiple dataset files to register at once, use the bulk upload feature instead of adding them one by one.

Click Bulk upload in the top right.
Drag and drop files (CSV, XLS, or XLSX, up to 30 MB each) or click to browse.
The system scans column headers for potential PII (email, phone, SSN, address, etc.) and flags any matches.
Review the auto-detected metadata for each file. Edit names, types, or classifications before uploading.
Click Upload to register all files as dataset records.

PII auto-detection

During bulk upload, the system checks column headers against 40 known PII keywords (email, ssn, phone, salary, credit_card, etc.). If any match, the dataset is automatically flagged as containing PII. You can override this in the review step.

Linking datasets to models and projects

Each dataset can be linked to one or more models from the model inventory and one or more projects. These links create a traceable chain from data to model to use case, which is what auditors look for when verifying data governance.

Set these links when creating or editing a dataset. You can also view all datasets linked to a specific model from the model inventory page.

Dataset statuses

Status	When to use
Draft	Dataset is being documented but not yet in use
Active	Dataset is currently used by one or more models or systems
Deprecated	Dataset is being phased out and should not be used for new work
Archived	Dataset is no longer in use but kept for audit records

Change history

Every edit to a dataset is tracked in a change history log. This gives auditors a record of what changed, when and by whom.

Who can do what

Action	Required role
View datasets	Any authenticated user
Add or edit datasets	Admin or Editor
Bulk upload	Admin or Editor
Delete datasets	Admin or Editor

Managing model inventory

View and manage the models that use your datasets.

Use cases

Link datasets to the projects and use cases they support.

PreviousAI Trust Center

NextAgent discovery