Understanding Unconscious Bias in Machine Learning
Written on
Chapter 1: The Dual Systems of Thinking
In recent weeks, I have been engrossed in Daniel Kahneman's acclaimed book, "Thinking, Fast and Slow," which delves into the two systems that shape our thought processes. Kahneman, a Nobel laureate in economics, personifies these systems as characters with unique traits that interact with one another. He introduces us to System 1, which operates quickly, intuitively, and emotionally, and System 2, which is more deliberate and logical, albeit slower. Through this narrative, Kahneman illustrates how their interplay can help explain various phenomena, such as why we tend to believe statements presented in bold or why judges might deny parole more frequently before lunch. Additionally, he emphasizes how our cognitive processes influence our understanding of sampling and probability.
Given these insights, it’s crucial to consider their implications for Data Science, where accurate sampling and minimizing bias are essential for obtaining reliable results. To begin, let’s summarize how Kahneman describes the functions of System 1 and System 2.
A dynamic interplay of two cognitive systems.
Section 1.1: The Interplay of System 1 and System 2
Kahneman portrays System 1 and System 2 as both oppositional and interconnected. System 1 is responsible for our thought processes 95% of the time, primarily because we often encounter familiar situations. In such cases, our brain opts not to engage System 2, as the situation at hand resembles prior experiences.
Consider the following scenarios governed by System 1:
- Calculating 2+2
- Naming your mother
- Driving on a highway
- Comprehending simple phrases
- Reading this text
Thus, System 1 operates swiftly, automatically, and unconsciously, taking charge when tasks are familiar. It generally does not activate System 2 unless necessary.
In addition to these characteristics, System 1 is also guided by emotions and stereotypes, often reacting based on feelings or previous experiences rather than contextual information.
Now, imagine instead of adding 2+2, you were asked to solve 23 x 34. Take a moment to think about that…
The thought process would involve recalling the multiplication method learned in school, which requires more mental effort. As Kahneman notes, this task would activate System 2, which is characterized by deliberate and strenuous thinking.
System 2 comes into play in tasks like:
- Concentrating on one task while multitasking
- Searching for someone in a crowd with specific traits
- Parking in a tight spot
- Verifying a complex argument
- Counting occurrences of a letter on a page
These tasks demand our attention, necessitating the logical and rational capabilities of System 2.
While we like to view ourselves as rational decision-makers, it’s important to recognize that we operate under System 1's influence most of the time. However, this doesn't imply we lack rationality; rather, System 1's reasoning is based on known experiences. When faced with challenges, System 1 calls upon System 2 for more nuanced processing.
The relationship between these two systems is essential to understanding how our automatic and emotional responses can lead to various biases.
Chapter 2: Implications of Unconscious Bias in Data Science
The first video titled "Tutorial: Implicit Bias I" explores the concept of implicit bias, providing insights that are particularly relevant to our discussion on unconscious biases in data science.
As we explore the effects of these cognitive systems on our work in Data Science, we’ll focus on two specific biases: priming and anchoring.
Section 2.1: Understanding Priming
Priming refers to the influence that exposure to one stimulus can have on our response to a subsequent stimulus. For example, if you’ve recently heard the word "EAT," you’re more likely to complete "SO_P" as "SOUP" rather than "SOAP." This effect extends beyond words to actions and emotions.
The challenge with priming is that it often operates outside our awareness. For instance, when analyzing data, if we have previously encountered a mean value, we might unconsciously let that influence our decisions regarding other samples.
The most concerning form of priming bias comes from external influences. For example, if a marketing team runs an online campaign that shapes customer expectations, these pre-formed ideas can skew how we analyze and interpret data.
Section 2.2: The Anchoring Effect
Anchoring is a cognitive bias where initial information disproportionately influences our subsequent decisions. This concept closely relates to priming, as exposure to one stimulus can anchor our perceptions.
For instance, when evaluating property quality through image recognition, we might anchor our judgments based on previously labeled images, which could distort our understanding of market averages.
Moreover, if our target variable—like house prices—is influenced by individuals who are themselves subject to anchoring biases, we must tread carefully in our predictions.
In conclusion, both priming and anchoring pose significant risks throughout the data science process, from data collection to analysis and prediction. As data scientists, it’s imperative to approach our work with a critical mindset, free from preconceived beliefs, and to rigorously analyze our data.
To delve deeper into these topics, I recommend watching Kahneman's engaging talk at Google for further insights into his work.
The second video titled "Bias in AI: How to Measure It and How to Fix It" offers valuable strategies for identifying and addressing biases within AI systems.
Feel free to explore more of my writings on Medium for additional insights and discussions on related topics. Thank you for your time, and I look forward to our next conversation!