Bias in ML: An Introduction
An Exploration of Fairness and Dynamics of Learning Systems
Discussion led by Imani Finkley & Rahma Tasnim, Intelligent Systems Subteam
Let’s face it: whether it’s a recommended song on Spotify or that ad for a trip to the Bahamas that pops up just when you are ready for vacation, algorithms are making decisions about our lives everyday. And now, they have even entered the hiring process.
In response to the pandemic, companies like Uber and Postmates started using Checkr, an AI-powered background check, in their interview process. Checkr scans resumes, analyzes facial expressions during video interviews, compares criminal records, and judges social media behavior. Algorithms like Checkr are used in an effort to streamline the hiring process and eliminate any prejudice a human being might have when hiring. While Checkr’s goal is to ensure that “all candidates deserve a fair chance to work,” there are some questions that arise — do you look fit for the job? What is the look that Checkr is searching for? All of these questions point toward a potential risk of Checkr perpetuating a bias based on aspects of a job candidate’s identity. Already, Checkr is facing lawsuits for the misidentification and misjudgments made by its AI, but that does not mean that algorithms like Checkr won’t become mainstream in the near future. Which leaves the question: how do we mitigate bias in algorithms so AIs like Checkr can ensure that all candidates get a fair chance?
How can bias be harmful?
But first, let’s dive into a deeper understanding of how biases are harmful and why we should care. Bias is not in the algorithm itself. Rather, it is the information that we feed that algorithm that is the flaw. The machine is simply replicating social biases — prejudices in favor of or against one thing, person, or group compared with another. Algorithmic biases point to correlations that by themselves are not harmful. The issue occurs when their models and data are taken at face value without considering any patterns or preferences the algorithm may have adopted.
However, these correlations can be useful indicators for decision-making systems. Netflix is able to recommend movies based on you have previously watched. If you liked this Christmas Hallmark movie last year, then you must also like this year’s new Christmas Hallmark movie. Spotify is able to create different mixes based on the different genres of music you listen to. If you tend to listen to lo-fi music at 3 am, then Spotify will create a lo-fi mix for you to listen to at 3 am. However, this trend of “if you like X, then we can recommend you something like X” is not something we necessarily want. As consumers, we may want to try watching horror films instead. We may want to listen to rock at 3am instead. In the context of job applications, employers may be looking to diversify their group of employees.
One aspect of AI interview systems is that they compare your video to previously accepted hires and see that if your video is similar then you should be hired. If it sees that you speak differently or that your hair does not look “acceptable” then that can negatively impact your chances of being hired. According to the algorithm, individuals that get hired will look like current employees and this will just create a positive feedback loop for the algorithm. This does not allow for diversification nor opportunity for new and different individuals to be entered into the workforce.
The task of handling bias within hiring is quite daunting but so critical. As a stepping stone, we are intending to identify common biases in specific machine learning models as well as explore the domain of work in this field. Using two machine learning models, CLIP+ and DALL-E, the goal is to explore and understand the biases within these models. CLIP+ and DALL-E are both trained in language comprehension and image generation. If given a caption CLIP+ and DALL-E will generate an image based on that description.
Disclaimer: some content may be considered offensive to some viewers.
In order to demonstrate how prevalent bias is in machine learning, we are looking at these models’ generated images that exemplify identity biases (prejudices towards a certain race, income, sexual orientation, religion, gender, etc.). The examples below are images produced by CLIP+ and DALL-E when given the prompts, “terrorist”, “criminal”, and “doctor.”
Immediately, one can identify certain characteristics of these images that perpetuate certain stereotypes. Now suppose, a security AI was tasked with identifying criminals from surveillance footage. Based on what CLIP+ has learned, those “criminals” will likely be males with brown skin and Afro-textured hair. Whereas, if given the same task with DALL-E, that same AI would be likely to flag males of lighter skin tones as criminals. As a result, a system designed to build a safer community instead is isolating a specific demographic and unfairly criminalizing them.
This is just one instance where unmitigated algorithmic biases can have serious ramifications on our lives. Rather than finding your new jam on Spotify, what if an AI decides that your hair makes you unfit for a job or that your gender makes you less likely to pay back a loan? If CLIP+ and DALL-E were used to generate the ideal job candidate, what image would they produce? Moreover, how do we account for the possible biases within their results? These are the questions we are aiming to answer. Specifically, we are currently attempting to develop bias mitigation measures to ensure that CLIP+ and DALL-E generate representative and non-skewed images to reduce the possibility of such unfair decisions.
This is the first part in our series on Bias in Machine Learning. In our next blog post, we will explore hidden biases, understanding representative data, and labeling protected classes. If interested in learning more about our work, follow our blog series: Bias in ML or contact email@example.com