About this series

I'm Ethan Weinberger. After a brief stint in the hedge fund world I'm now a Ph.D student in machine learning at the University of Washington.

The world of AI has had some real breakthroughs mixed with massive amounts of cash and wildly speculative claims - a perfect recipe for BS. Kernels of Truth takes a deep dive into recent work in the field to determine whether reality matches up with the hype.

twitter: @efweinberger, email: ethan [at] honestyisbest [dot] com.


Tweet

July 14, 2020

RoboCop and You: Why Facial Recognition Discriminates

Machine learning research moves fast - fast enough that we’re still very far from having any kind of strict code of ethics or set of regulations as in medicine. As a result, pseudoscience-esque research has a tendency to crop up every now and then, as it did two weeks ago when Harrisburg University issued a now-deleted press release praising one of its research groups for developing an algorithm to predict a person’s “criminality” (i.e., likelihood of committing a crime in the future) using only a single picture of their face. Supposedly this study had even made it through peer review and was scheduled to be published by Springer Nature as part of an upcoming book series. However, since the intial press release the work has been near universally condemned by the academic world, with an open letter calling on Springer to rescind publication of the study receiving 1700+ signatures. Springer responded quickly to clear their name, as fortunately it turns out the study was in fact rejected during peer review and was never scheduled to be published.

This wasn’t the first time such a study has cropped up and caused controversy in the AI community; a similar 2016 work from Shanghai Jiaotong University also attempted to predict criminality from facial features and was similarly denounced by the wider community for its potential to be misused. It’s easy to understand why such work continues to surface. Law enforcement agencies are very interested in AI systems that could make their jobs easier, like so-called predictive policing technologies that flag potential criminals, as well as tools to track down suspects after a crime has been committed. While the academic community has been consistent about reminding the world that phrenology with robots is still phrenology, the large profits to be made in this space have led some tech companies to be more flexible with regards to potential ethical concerns than others.

On the one hand, back in 2011 Eric Schmidt - then executive chairman of Google - claimed that the company wouldn’t develop any kind of facial recognition database, stating that it “cross[ed] the creepy line”. However, Schmidt cautioned that other companies wouldn’t be afraid to cross that line, and it wasn’t long before he was proven right. Amazon released its Rekognition platform for facial recognition in 2016, and has been explicitly marketing its technology to law enforcement agencies. These efforts have been successful, with Rekognition customers including national agencies like ICE as well as local police departments. Microsoft too, began pitching its services to law enforcement as early as 2017.

The unchecked proliferation of such technologies within government agencies is extremely concerning. In addition to the Orwellian-style surveillance that they enable, these technologies still have many unsolved failure cases that can lead to disastrous consequences when used by law enforcement. As we briefly touched on last week, academic research labs have demonstrated that these technologies have major performance issues when tested on persons of color, with error rates approaching 35% for dark-skinned women compared to less than 1% for white men. Similarly, an ACLU study found that Amazon’s Rekognition erroneously matched 28 members of Congress with publicly available mugshots. As with previous studies, once again persons of color were far more likely to be misclassified. The results of these studies aren’t due to researchers using pared-down models or inadequate computing infrastructure; the results were achieved across multiple “production-ready” systems available to the general public for purchase. Indeed, we’re already seeing real-life consequences of these shortcomings. In January Robert Julian-Borchak Williams, a black man, was arrested by the Detroit Police Department for a shoplifting crime he didn’t commit. The reason? A facial recognition system mistakenly identified him as the perpetrator. As a result of this algorithmic failure, he spent 30 hours in police custody, missing his first day of work in four years, and was forced to waste another day two weeks later to appear in court before the prosecutor dismissed the charges against him.

Why do these technologies fail in discriminatory ways? The short answer is that they’re prone to encoding their creator’s biases, whether conscious or otherwise. To better understand this phenomenon, let’s consider a technology from before the deep learning revolution: the Microsoft Kinect. The Kinect is an accessory originally designed for the Xbox 360 gaming console that added facial recognition and motion control to the console’s capabilities. The Kinect relied on so-called classical (i.e., pre-deep learning) computer vision algorithms handcrafted by domain experts to perform its work. In particular, to detect a user’s face, the accessory’s algorithm relied on the contrast between the face and the background. As such, some darker-skinned users had trouble with their faces not being recognized in poorly-lit environments. Notably, this problem only manifested with the accessory’s facial recognition capabilities and not with its gesture detection algorithms, which utilized infrared sensors that aren’t affected by lighting conditions. Because the Kinect’s algorithms were entirely designed by humans, it’s easy to understand why the Kinect failed in the way that it did. However, with deep learning in the mix, the situation gets more complicated.

Deep learning vision systems don’t rely on premade, hardcoded features in the way that classical computer vision algorithms do. Rather, when given a labelled dataset (e.g., images of cats and dogs), these systems adjust their parameters to learn to detect features (e.g., sharper ears vs. floppy ears) that distinguish classes of images from each other, without additional input from a human. How does this work? We can represent an $n \times m$ image as a matrix, $M \in \mathbb{R}^{n \times m}$. Convolutional neural networks, the workhorses of deep learning vision systems, learn a set of filter matrices. A filter is a smaller matrix (e.g. the $i$-th filter $F_{i} \in \mathbb{R}^{k \times k}$, with $k$ less than $m$ and $n$). A filter is meant to detect the presence of a specific feature (e.g. a type of edge or squiggle) in a given portion of an image. To do this, the filter is matrix-multiplied with that portion of the image. Moreover, the filter “slides” across whole the image, looking for its specific feature in all potential spatial locations, thereby finding all instances of the particular feature that the filter detects. Visually, if we have an image $M \in \mathbb{R}^{7 \times 7}$, and a filter $F_{i} \in \mathbb{R}^{3 \times 3}$, this procedure looks like the following, where we represent the filter’s current position using red text

$$ \def\red#1#2{\color{red}{x_{#1#2}}} \def\ra{\color{red}\ba} \begin{bmatrix} \red{1}{1} & \red{1}{2} & \red{1}{3} & x_{14} & x_{15}& x_{16} & x_{17} \\ \red{2}{1} & \red{2}{2} & \red{2}{3} & x_{24} & x_{25}& x_{26} & x_{27} \\ \red{3}{1} & \red{3}{2} & \red{3}{3} & x_{34} & x_{35}& x_{36} & x_{37} \\ x_{41} & x_{42} & x_{43} & x_{44} & x_{45}& x_{46} & x_{47} \\ x_{51} & x_{52} & x_{53} & x_{54} & x_{55}& x_{56} & x_{57} \\ x_{61} & x_{62} & x_{63} & x_{64} & x_{65}& x_{66} & x_{67} \\ x_{71} & x_{72} & x_{73} & x_{74} & x_{75}& x_{76} & x_{77} \\ \end{bmatrix} \implies $$

$$ \begin{bmatrix} x_{11} & \red{1}{2} & \red{1}{3} & \red{1}{4} & x_{15}& x_{16} & x_{17} \\ x_{21} & \red{2}{2} & \red{2}{3} & \red{2}{4} & x_{25}& x_{26} & x_{27} \\ x_{31} & \red{3}{2} & \red{3}{3} & \red{3}{4} & x_{35}& x_{36} & x_{37} \\ x_{41} & x_{42} & x_{43} & x_{44} & x_{45}& x_{46} & x_{47} \\ x_{51} & x_{52} & x_{53} & x_{54} & x_{55}& x_{56} & x_{57} \\ x_{61} & x_{62} & x_{63} & x_{64} & x_{65}& x_{66} & x_{67} \\ x_{71} & x_{72} & x_{73} & x_{74} & x_{75}& x_{76} & x_{77} \\ \end{bmatrix} \implies $$

$$\vdots$$

$$ \begin{bmatrix} x_{11} & x_{12} & x_{13} & x_{14} & x_{15}& x_{16} & x_{17} \\ x_{21} & x_{22} & x_{23} & x_{24} & x_{25}& x_{26} & x_{27} \\ x_{31} & x_{32} & x_{33} & x_{34} & x_{35}& x_{36} & x_{37} \\ x_{41} & x_{42} & x_{43} & x_{44} & x_{45}& x_{46} & x_{47} \\ x_{51} & x_{52} & x_{53} & x_{54} & \red{5}{5} & \red{5}{6} & \red{5}{7} \\ x_{61} & x_{62} & x_{63} & x_{64} & \red{6}{5} & \red{6}{6} & \red{6}{7} \\ x_{71} & x_{72} & x_{73} & x_{74} & \red{7}{5} & \red{7}{6} & \red{7}{7} \\ \end{bmatrix} $$

Importantly, the features that the filters detect aren’t pre-specified by a human; during the training process the model learns to adjust its filters to detect discriminative features on its own. For example, if we want a model to detect the presence of a cat vs. a dog in an image, two potential filters might learn to detect straight lines vs. curvy ones. The curve filter being activated could indicate the presence of a dog (with their generally floppier ears) as opposed to the straight line filter, which would indicate the presence of a cat (with their sharper ears). However, despite the fact that these algorithms learn which features are discriminative without additional human input, such a procedure does not eliminate biases. Instead this leads to their manifesting in more subtle ways that can be harder to debug. For example, a common source of bias in deep learning systems is a result of biases in the data used to train them, as our filters only learn to fit to the data they’re given during the training process. As a toy example, let’s suppose that we wanted to train a model to predict the presence of a dog in a given image. If the data used to train the model only contained golden retrievers and no other dog breeds, then our model would learn filters to make a great golden retriever detector, but would likely fall apart when fed images of other dog breeds.

Similar phenomena are potentially behind the consistent problems that facial recognition systems have with darker-skinned faces; most large openly-available face datasets are dominated by Caucasian faces. Combine this with the fact that some historically marginalized racial groups are overrepresented in official crime records, and we have the ingredients for a system with the potential to massively amplify existing inequalites in policing. Dataset bias is just one of many ways that modern AI systems learn to encode human biases, and AI fairness is a whole field of research unto itself. For a deeper introduction to the field, I’d highly recommend this video series led by Timnit Gebru and Emily Denton, two research scientists on Google’s ethical AI team.

In an abrupt turn of events, against the backdrop of massive demonstrations against police brutality and racial injustice across the US, many large American tech companies that had been developing these systems, including Amazon, Microsoft, and IBM, have paused sales of these technologies to law enforcement agencies. However, the details of these actions vary from company to company. On the one hand, Microsoft has committed to stopping the sale of such technologies to law enforcement agencies until Congress regulates usage of them, and IBM has abandoned its efforts at developing the technology altogether. On the other hand, Amazon has only committed to a one-year moratorium on its sales of the technology. In other words, a year from now when the national spotlight on issues of racial justice may have faded, Amazon may be back to selling its platform to law enforcement as if nothing happened. Moreover, even though the larger companies attract the most press attention, their gestures are largely symbolic moves. Part of the reason why Amazon et al. have at least temporarily stopped selling their platforms is because they’re not the main players in the space, and their sales of facial recognition technology don’t represent a meaningful portion of their business. The business of facial recognition software for policing is instead dominated by a handful of lesser-known companies, such as Clearview AI, iOmniscient, and NEC. As such, despite the press around the tech megacorps’ recent decisions, it’s clear that more substantive action must be taken to stop the unchecked spread of these technologies.

Given the outstanding flaws in these systems and the severe consequences of their misuse, it’s clear that meaningful regulation is needed to protect the public from potential abuses by both government and the private sector. In an encouraging sign, local governments in tech hubs, including San Francisco, and Boston, have banned the use of facial recognition technologies entirely by local government agencies. Perhaps more significantly, Washington state recently became the first state to enact into law a comprehensive facial recognition bill. Rather than ban the technology outright, the bill requires that any real-time government facial recognition system come with an API that allows for independent third-party evaluation of the system, the system only be deployed after obtaining a warrant, and any results from the system must be verified by a human before taking action on them. The bill isn’t perfect - the ACLU has noted that it contains a broadly worded provision allowing the government to bypass the warrant requirement if “exigent circumstances exist” - but it’s far preferable to the lack of regulation in most areas of the country. At the federal level Congressional Democrats are attempting to reign in the use of facial recognition tech nationwide as part of the George Floyd Justice in Policing Act, though Senate Republicans and the Trump Administration have already declared their opposition to the bill.

Facial recognition technology is already shaping up to be a civil liberties disaster. The tech is known to have major flaws, with particularly bad impacts for historically disadvantaged communities. Moreover, despite these well-known flaws, the potential for profit is just too great for industry to self-regulate on usage of the technology. Without meaningful action from the federal government it’s near certain that, outside the few areas of the country with regulation already in place, the tech will spread unchecked and only exacerbate the issues that have led to the largest protests in recent US history.

Ethan Weinberger

Click here to read more from Kernels of Truth.


What is Honesty Is Best?

We find ourselves living in interesting times. This is a moment of great pain, incredible uncertainty, and collapsing realities — fertile soil for new ideas, new paths, and new institutions. Honesty Is Best brings people together to think about how we got here and to explore what we should do next in order to build a fundamentally better world on the uneven foundations upon which we are perched.

We will play host to a number of regular series about technology, policy, and culture spanning writing, podcasts, and video. Each of these series will be written or anchored by one or two people working actively in the specific area the series is about. The distinct style of each series will reflect that of its creators, with the common threads being a focus on concrete ideas and a commitment to telling the unvarnished truth as they see it.

We invite to explore and subscribe to our three current offerings:

Today in Indian History, a four-times weekly series about the context and consequences of events in India’s past written by Sahaj Sankaran, winner of Yale’s South Asian Studies Prize and Diane Kaplan Memorial Prize for his work in Indian history

Segfault, a twice-monthly podcast about Computer Science research hosted by Soham Sankaran, the founder of Pashi and a PhD student in Computer Science at Cornell

Kernels of Truth, a weekly series taking a deeper dive into recent hyped-up developments in artificial intelligence by Ethan Weinberger, a PhD student in machine learning at the University of Washington.

Take a look at some recent work from Honesty Is Best, or subscribe via email for updates from all our series below: