Datensätze und Benchmarks
Governance-fokussierte Datensätze, keine Modelltrainingsdatensätze.
16 Ressourcen
FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age
A balanced face image dataset designed for evaluating fairness in face analysis systems. Contains balanced representation across race, gender, and age groups to enable bias evaluation.
BIG-bench: Beyond the Imitation Game Benchmark
A collaborative benchmark for evaluating large language models across diverse tasks. Includes tasks designed to probe reasoning, knowledge, safety, and alignment properties.
HELM: Holistic Evaluation of Language Models
Stanford's comprehensive framework for evaluating language models across multiple dimensions including accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency.
AI Incident Database Dataset
Structured dataset of AI incidents and harms for research and analysis. Enables systematic study of AI failures, harm patterns, and risk factors across different AI applications.
Groundbreaking Fairness Evaluation Dataset From Sony AI
A comprehensive fairness evaluation dataset containing 10,318 consensually-sourced images of 1,981 unique subjects with extensive annotations. This dataset serves as a global benchmark for ethical data collection and responsible AI development, specifically designed to evaluate bias and fairness in AI systems.
FHIBE: Fairness Evaluation Dataset for Human-Centric Computer Vision
FHIBE is the first publicly available, consensually-collected, and globally diverse fairness evaluation dataset designed for human-centric computer vision tasks. The dataset serves as a global benchmark for ethical data collection and responsible AI development, enabling researchers and developers to evaluate fairness across diverse populations.
Fair Human-Centric Image Dataset for Ethical AI Benchmarking
The Fair Human-Centric Image Benchmark (FHIBE) is an image dataset designed to evaluate AI systems for fairness and bias in computer vision applications. It implements best practices for responsible data curation and provides standardized benchmarks for testing algorithmic fairness across diverse human populations.
Bias Detection in Computer Vision: Ensuring Fairness with AI Models
This resource explores methods for detecting bias in computer vision systems, including CNN feature descriptors and SVM classifiers for identifying bias in visual datasets. It examines how explainable AI techniques can improve transparency and trustworthiness of deep learning models used in computer vision applications.
Bias Detection Model
An English sequence classification model specifically trained on the MBAD Dataset to automatically detect bias and assess fairness in textual content, particularly news articles. This tool enables automated analysis of potential biases in written content through machine learning-based classification.
Unsupervised Bias Detection Tool
A technical implementation of the HBAC algorithm that detects bias in algorithmic decision-making systems without requiring labeled data. The tool maximizes differences in bias variables between clusters and includes statistical testing to prevent false conclusions about discriminatory patterns.
2025 AI Safety Index
An AI safety index that evaluates model performance using Stanford's AIR-Bench 2024 (AI Risk Benchmark). The benchmark is designed to align with emerging government regulations and company policies for AI safety assessment.
Introducing v0.5 of the AI Safety Benchmark from MLCommons
This paper introduces version 0.5 of the AI Safety Benchmark developed by MLCommons AI Safety Working Group. The benchmark is designed to assess the safety risks of AI systems that use chat-tuned language models, providing a standardized evaluation framework for AI safety.
AI Risk & Reliability
MLCommons' AI Risk & Reliability working group develops tests and benchmarks for evaluating AI safety across specific use cases. The framework aims to summarize safety assessment results in ways that enable decision-making by non-experts through standardized benchmarking approaches.
Responsible AI Measures Dataset for Ethics Evaluation of AI Systems
A comprehensive dataset consolidating 12,067 data points across 791 evaluation measures covering 11 ethical principles for AI systems. The dataset is extracted from 257 computing literature sources and provides standardized metrics for evaluating the ethical dimensions of AI systems.
Algorithmic Bias
Wikipedia article covering algorithmic bias, including well-documented examples like the COMPAS criminal risk assessment software that has been criticized for exhibiting racial bias. The article discusses how biased datasets can perpetuate and amplify discrimination in algorithmic decision-making systems.
Algorithmic Bias: Examples and Tools for Tackling Model Fairness In Production
A resource from Arize AI providing examples of algorithmic bias and practical tools for addressing model fairness issues in production environments. The resource highlights various bias mitigation tools including Google's PAIR AI tools for addressing fairness and bias in image datasets using TensorFlow.