Datensätze und Benchmarks

Governance-fokussierte Datensätze, keine Modelltrainingsdatensätze.

18 Ressourcen

Typ:

18 Ressourcen gefunden

DatensatzUCLA • 2021

FairFace dataset: balanced face images for race, gender, age bias testing

A balanced face image dataset designed for evaluating fairness in face analysis systems. Contains balanced representation across race, gender, and age groups to enable bias evaluation.

Bias and fairness datasets

DatensatzGoogle & Contributors • 2023

BIG-bench: Beyond the Imitation Game Benchmark

A collaborative benchmark for evaluating large language models across diverse tasks. Includes tasks designed to probe reasoning, knowledge, safety, and alignment properties.

Evaluation datasets

DatensatzStanford CRFM • 2023

HELM: Holistic Evaluation of Language Models

Stanford's comprehensive framework for evaluating language models across multiple dimensions including accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency.

Evaluation datasets

DatensatzResponsible AI Collaborative • 2024

AI Incident Database Dataset

Structured dataset of AI incidents and harms for research and analysis. Enables systematic study of AI failures, harm patterns, and risk factors across different AI applications.

Incident datasets

DatensatzSony AI • 2024

Groundbreaking Fairness Evaluation Dataset From Sony AI

A comprehensive fairness evaluation dataset containing 10,318 consensually-sourced images of 1,981 unique subjects with extensive annotations. This dataset serves as a global benchmark for ethical data collection and responsible AI development, specifically designed to evaluate bias and fairness in AI systems.

Bias and fairness datasets

DatensatzSony AI • 2024

FHIBE: Fairness Evaluation Dataset for Human-Centric Computer Vision

FHIBE is the first publicly available, consensually-collected, and globally diverse fairness evaluation dataset designed for human-centric computer vision tasks. The dataset serves as a global benchmark for ethical data collection and responsible AI development, enabling researchers and developers to evaluate fairness across diverse populations.

Bias and fairness datasets

DatensatzNature • 2025

Fair Human-Centric Image Dataset for Ethical AI Benchmarking

The Fair Human-Centric Image Benchmark (FHIBE) is an image dataset designed to evaluate AI systems for fairness and bias in computer vision applications. It implements best practices for responsible data curation and provides standardized benchmarks for testing algorithmic fairness across diverse human populations.

Bias and fairness datasets

ForschungViso.ai • 2024

Bias Detection in Computer Vision: Ensuring Fairness with AI Models

This resource explores methods for detecting bias in computer vision systems, including CNN feature descriptors and SVM classifiers for identifying bias in visual datasets. It examines how explainable AI techniques can improve transparency and trustworthiness of deep learning models used in computer vision applications.

Bias and fairness datasets

Werkzeugd4data • 2024

Bias Detection Model

An English sequence classification model specifically trained on the MBAD Dataset to automatically detect bias and assess fairness in textual content, particularly news articles. This tool enables automated analysis of potential biases in written content through machine learning-based classification.

Bias and fairness datasets

WerkzeugAlgorithm Audit • 2024

Unsupervised Bias Detection Tool

A technical implementation of the HBAC algorithm that detects bias in algorithmic decision-making systems without requiring labeled data. The tool maximizes differences in bias variables between clusters and includes statistical testing to prevent false conclusions about discriminatory patterns.

Bias and fairness datasetsEU

DatensatzFuture of Life Institute • 2025

2025 AI Safety Index

An AI safety index that evaluates model performance using Stanford's AIR-Bench 2024 (AI Risk Benchmark). The benchmark is designed to align with emerging government regulations and company policies for AI safety assessment.

Evaluation datasets

DatensatzMLCommons • 2024

Introducing v0.5 of the AI Safety Benchmark from MLCommons

This paper introduces version 0.5 of the AI Safety Benchmark developed by MLCommons AI Safety Working Group. The benchmark is designed to assess the safety risks of AI systems that use chat-tuned language models, providing a standardized evaluation framework for AI safety.

Evaluation datasets

FrameworkMLCommons • 2024

AI Risk & Reliability

MLCommons' AI Risk & Reliability working group develops tests and benchmarks for evaluating AI safety across specific use cases. The framework aims to summarize safety assessment results in ways that enable decision-making by non-experts through standardized benchmarking approaches.

Evaluation datasets

DatensatzNature Publishing Group • 2025

Responsible AI Measures Dataset for Ethics Evaluation of AI Systems

A comprehensive dataset consolidating 12,067 data points across 791 evaluation measures covering 11 ethical principles for AI systems. The dataset is extracted from 257 computing literature sources and provides standardized metrics for evaluating the ethical dimensions of AI systems.

Evaluation datasets

ForschungWikipedia • 2024

Algorithmic Bias

Wikipedia article covering algorithmic bias, including well-documented examples like the COMPAS criminal risk assessment software that has been criticized for exhibiting racial bias. The article discusses how biased datasets can perpetuate and amplify discrimination in algorithmic decision-making systems.

Bias and fairness datasets

WerkzeugArize AI • 2024

Algorithmic Bias: Examples and Tools for Tackling Model Fairness In Production

A resource from Arize AI providing examples of algorithmic bias and practical tools for addressing model fairness issues in production environments. The resource highlights various bias mitigation tools including Google's PAIR AI tools for addressing fairness and bias in image datasets using TensorFlow.

Bias and fairness datasets

DatensatzMLCommons • 2025

MLCommons AILuminate AI Safety Benchmark

MLCommons’ AILuminate benchmark assesses the safety of general-purpose chat models across a broad set of hazard categories, providing standardized, third-party safety grades to complement capability-focused benchmarks.

Evaluation datasetsGlobal

DatensatzNYU Machine Learning for Language • 2022

BBQ: A Hand-Built Bias Benchmark for Question Answering

BBQ is a hand-built benchmark that measures social bias in question-answering models across nine demographic dimensions, testing how model outputs shift with and without disambiguating context.

Bias and fairness datasetsGlobal