Representative Data

Planning Building Deploying Monitoring
For an algorithm to be effective, its training data must be representative of the communities that it may impact. The way that you collect and organize data will benefit certain groups while excluding or harming others.




Learn how to organize an "AL" workshop

Have you considered...?

  • Exploring how your data might be incomplete or skewed, or encode historical biases
  • Including diverse voices in the data definition and collection process
  • Partnering with social service agencies for outreach to vulnerable groups

Case study

Researchers in Germany, the USA, and France developed an algorithm that detects skin cancer more accurately than dermatologists. The system finds 95% of melanomas, versus 89% by doctors. But it is effective only on light skin tones because a demographically diverse dataset has never been collected.

Have you engaged with...?

  • Affected communities
  • Subject matter experts
  • Civil rights organizations

Resources

What is missing?

Your suggestions

AI Blindspot Cards

PURPOSE

AI systems should make the world a better place. Defining a shared goal guides decisions across the lifecycle of an algorithmic decision-making system, promoting trust amongst individuals and the public.

REPRESENTATIVE DATA

For an algorithm to be effective, its training data must be representative of the communities that it may impact. The way that you collect and organize data will benefit certain groups while excluding or harming others.

ABUSABILITY

The designers of an AI system need to anticipate vulnerabilities and dual-use scenarios by modeling how bad actors might hijack and weaponize the system for malicious activity.

PRIVACY

AI systems often gather personal information that can invade our privacy. Systems storing confidential data can also be vulnerable to cyberattacks that result in devastating data breaches to access personal information.

DISCRIMINATION BY PROXY

An algorithm can have an adverse effect on vulnerable populations even without explicitly including protected characteristics. This often occurs when a model includes features that are correlated with these characteristics.

EXPLAINABILITY

The technical logic of algorithms is complex, which make recommendations unclear. People involved in designing and deploying algorithmic systems have a responsibility to explain high-stakes decisions that affect individuals' well-being.

OPTIMIZATION CRITERIA

There are trade-offs and potential externalities when determining an AI system's metrics for success. It is important to balance performance metrics against the risk of negatively impacting vulnerable populations.

GENERALIZATION ERROR

Between building and deploying an AI system, conditions in the world may change or not reflect the context in which the system was designed, such that training data are no longer representative.

RIGHT TO CONTEST

Like any human process, AI systems carry biases that make them subjective and imperfect. The right to contest an algorithmic decision can surface inaccuracies and grant agency to people affected.

OVERSIGHT

Ethical principles, standards, and policies are futile unless monitored and enforced. A diverse oversight body vested with formal authority can help to establish and maintain transparency, accountability, and sanctions.

CONSULTATION

The first, last, and every step in-between should include public participation. AI practitioners must enable meaningful input, explanations, and disclosures to ensure that AI systems promote human flourishing and mitigate harms.

BLANK TEMPLATE

Create a new card.

ABOUT

The AI Blindspot cards were developed by Ania Calderon, Dan Taber, Hong Qu, and Jeff Wen during the Berkman Klein Center Assembly program.

Learn more about the team.

COPYRIGHT

This work is licensed under a Creative Commons Attribution 4.0 International License.