WEBVTT 00:00:01.010 --> 00:00:02.510 - All right, good afternoon. 00:00:02.510 --> 00:00:03.660 My name is Arun Vemury 00:00:03.660 --> 00:00:06.640 with the DHS Science and Technology Directorate's 00:00:06.640 --> 00:00:09.300 Biometric and Identity Technology Center. 00:00:09.300 --> 00:00:10.439 I'd like to thank you for joining us today 00:00:10.439 --> 00:00:13.759 for this webinar on Evaluating the Equitability 00:00:14.850 --> 00:00:17.350 of Commercial Facial Recognition Technologies 00:00:17.350 --> 00:00:19.120 in DHS Scenarios. 00:00:19.120 --> 00:00:20.650 This is part of our webinar series 00:00:20.650 --> 00:00:22.830 where we are providing better, 00:00:22.830 --> 00:00:23.960 you know, providing information 00:00:23.960 --> 00:00:26.790 about some of the work we're doing, specifically in the area 00:00:26.790 --> 00:00:30.500 of assessing the accuracy, fairness, and equitability 00:00:30.500 --> 00:00:34.360 of biometric technologies and matching algorithms. 00:00:34.360 --> 00:00:36.720 So with that, I'd like to also let you know 00:00:36.720 --> 00:00:38.740 that we're joined by two of my colleagues, 00:00:38.740 --> 00:00:42.750 Dr. Yevgeniy Sirotin and Dr. John Howard, 00:00:42.750 --> 00:00:44.610 who will be briefing you today 00:00:44.610 --> 00:00:46.340 on some of the work that we're doing. 00:00:46.340 --> 00:00:48.763 Let's go ahead to the next slide, please. 00:00:51.130 --> 00:00:53.046 Alright, we'll provide you a quick overview 00:00:53.046 --> 00:00:53.958 of some of the things 00:00:53.958 --> 00:00:55.310 that we're gonna be talking about today. 00:00:55.310 --> 00:00:56.530 We'll provide you an introduction 00:00:56.530 --> 00:00:59.940 to the Biometric and Identity Technology Center, 00:00:59.940 --> 00:01:02.810 talk about the Biometric Technology Rallies 00:01:02.810 --> 00:01:06.740 and how that data collection and that activity leads to data 00:01:06.740 --> 00:01:08.800 and helps inform some of our research, 00:01:08.800 --> 00:01:10.840 test and evaluation activities, 00:01:10.840 --> 00:01:13.320 and the work that we're doing on standards 00:01:13.320 --> 00:01:16.700 to help inform broad international engagement 00:01:16.700 --> 00:01:19.880 on measuring the effectiveness of these technologies, 00:01:19.880 --> 00:01:22.170 not only in terms of raw biometric performance 00:01:22.170 --> 00:01:24.493 but also, again, fairness and equitability. 00:01:25.630 --> 00:01:27.530 And we'll also talk about some of the work 00:01:27.530 --> 00:01:29.460 that we've been doing in 2021 00:01:29.460 --> 00:01:32.470 with our most recent Biometric Technology Rally 00:01:32.470 --> 00:01:35.060 to help take a look at how industry has progressed 00:01:35.060 --> 00:01:36.113 over the last year. 00:01:37.380 --> 00:01:38.213 Next slide. 00:01:40.560 --> 00:01:44.410 Alright, so within S&T's 00:01:45.290 --> 00:01:46.580 technology centers division, 00:01:46.580 --> 00:01:49.610 we have the Biometric and Identity Technology Center. 00:01:49.610 --> 00:01:52.850 And what we do is core foundational research 00:01:52.850 --> 00:01:56.930 into topics related to biometrics and digital identity. 00:01:56.930 --> 00:01:58.950 Our goal is to help drive innovation 00:01:58.950 --> 00:02:02.457 throughout S&T and the DHS components 00:02:02.457 --> 00:02:06.050 and headquarters agencies 00:02:06.050 --> 00:02:08.600 through research development, test, and evaluation. 00:02:10.656 --> 00:02:13.700 Our intention is to help facilitate better understanding 00:02:13.700 --> 00:02:16.980 of lessons learned and help people understand 00:02:16.980 --> 00:02:19.860 how, sorry about that, 00:02:19.860 --> 00:02:22.110 how technologies are continuing to evolve 00:02:23.440 --> 00:02:26.450 and provide greater transparency 00:02:26.450 --> 00:02:28.500 and understanding for DHS components 00:02:28.500 --> 00:02:30.760 who are trying to have a better understanding 00:02:30.760 --> 00:02:33.150 of how these technologies may be useful 00:02:33.150 --> 00:02:35.180 for their specific operations. 00:02:35.180 --> 00:02:37.980 Our goal is also to help drive efficiencies. 00:02:37.980 --> 00:02:39.610 If we find that technologies are working well 00:02:39.610 --> 00:02:40.660 for some components 00:02:40.660 --> 00:02:44.250 or that there are best practices to be shared, 00:02:44.250 --> 00:02:46.250 it's our intention to help make sure that that knowledge 00:02:46.250 --> 00:02:49.310 is shared across different components and missions. 00:02:49.310 --> 00:02:51.680 We provide objective subject matter expertise 00:02:51.680 --> 00:02:53.480 across the enterprise, 00:02:53.480 --> 00:02:54.970 not just one component or a mission, 00:02:54.970 --> 00:02:57.040 but make sure that that's broadly available. 00:02:57.040 --> 00:02:59.400 And we work actively with industry and academia 00:02:59.400 --> 00:03:02.200 to provide not only a better understanding 00:03:02.200 --> 00:03:05.630 of where we have technology needs and gaps, 00:03:05.630 --> 00:03:08.000 but also to spur innovation 00:03:08.000 --> 00:03:11.900 and to help, you know, provide mechanisms 00:03:11.900 --> 00:03:13.690 to evaluate and provide feedback 00:03:13.690 --> 00:03:16.503 so that they can make better technologies over time. 00:03:17.620 --> 00:03:19.963 And with that, let's go on to the next slide. 00:03:21.140 --> 00:03:23.830 I'll kick it over to Dr. Sirotin, 00:03:23.830 --> 00:03:25.190 who will provide a background 00:03:25.190 --> 00:03:28.570 on the Biometric and Identity Technology Center 00:03:28.570 --> 00:03:30.010 Technology Rallies, 00:03:30.010 --> 00:03:30.967 and talk a little bit more 00:03:30.967 --> 00:03:35.450 about how that feeds some of our supporting research. 00:03:35.450 --> 00:03:36.283 Thank you. 00:03:37.850 --> 00:03:39.220 - Thanks, Arun. 00:03:39.220 --> 00:03:42.420 DHS S&T created the Biometric Technology Rallies 00:03:42.420 --> 00:03:44.170 to motivate industry to provide 00:03:44.170 --> 00:03:46.730 innovative biometric technology solutions 00:03:46.730 --> 00:03:49.563 focused on DHS technology use cases. 00:03:50.830 --> 00:03:53.890 Specifically, the rallies were designed 00:03:53.890 --> 00:03:57.370 to address key technology risks 00:03:58.510 --> 00:04:00.623 outlined on the left. 00:04:01.570 --> 00:04:03.360 We believe these risks are relevant 00:04:03.360 --> 00:04:06.180 across a variety of biometric technology use cases, 00:04:06.180 --> 00:04:10.360 many of which will be discussed at this webinar. 00:04:10.360 --> 00:04:12.940 These risks include effectiveness risks, 00:04:12.940 --> 00:04:14.180 or high failure rates, 00:04:14.180 --> 00:04:17.490 efficiency risks, or technologies that are too slow 00:04:17.490 --> 00:04:19.510 or require excessive staff, 00:04:19.510 --> 00:04:24.320 risks due to the satisfaction of the users of the technology 00:04:24.320 --> 00:04:28.270 leading to potentially low adoption or just unhappy users, 00:04:28.270 --> 00:04:30.960 and, of course, risks to privacy, 00:04:30.960 --> 00:04:34.983 whether PII gathered by these systems is stored securely. 00:04:35.970 --> 00:04:40.100 And each Biometric Technology Rally is carefully designed 00:04:40.100 --> 00:04:43.913 to focus on a specific biometric technology use case. 00:04:44.860 --> 00:04:47.680 Oh, and of course, the subject of this webinar 00:04:47.680 --> 00:04:49.240 is the equitability risk, 00:04:49.240 --> 00:04:52.890 which focuses on insuring technology works for everyone. 00:04:52.890 --> 00:04:55.300 So each Biometric Technology Rally is carefully designed 00:04:55.300 --> 00:04:58.490 to focus on a specific biometric technology use case 00:04:58.490 --> 00:05:01.220 and provides independent quantitative assessment 00:05:01.220 --> 00:05:02.963 of current industry offerings. 00:05:03.850 --> 00:05:07.670 And the rallies really help DHS collaborate with industry, 00:05:07.670 --> 00:05:09.580 and they do that through cooperative research 00:05:09.580 --> 00:05:10.960 and development agreements, 00:05:10.960 --> 00:05:14.610 which are entered into between DHS and technology providers 00:05:14.610 --> 00:05:17.110 so that they can share information 00:05:17.110 --> 00:05:19.260 and make their technologies work better 00:05:19.260 --> 00:05:20.833 within the DHS scenario. 00:05:23.730 --> 00:05:25.200 Going on to the next slide, 00:05:25.200 --> 00:05:29.340 we see that here is a little bit of a primer 00:05:29.340 --> 00:05:32.910 on scenario testing versus technology testing. 00:05:32.910 --> 00:05:35.090 So the Biometric Technology Rallies 00:05:35.090 --> 00:05:37.220 are specifically scenario tests, 00:05:37.220 --> 00:05:39.900 and as such, they fill a specific need 00:05:39.900 --> 00:05:43.570 in testing biometric technologies that's kind of unique. 00:05:43.570 --> 00:05:45.880 Scenario tests are laboratory evaluations 00:05:45.880 --> 00:05:49.660 that fall in between sort of the operational testing 00:05:49.660 --> 00:05:52.030 like pilot deployments on one side 00:05:52.030 --> 00:05:54.360 and technology tests 00:05:54.360 --> 00:05:56.890 that are done in computer labs on the other. 00:05:56.890 --> 00:05:57.830 And so what I'd like to do 00:05:57.830 --> 00:06:00.400 is highlight the difference between technology tests, 00:06:00.400 --> 00:06:02.760 like, for example, NIST's FRVT tests 00:06:02.760 --> 00:06:04.050 that folks are familiar with, 00:06:04.050 --> 00:06:07.730 and scenario testing like the Biometric Technology Rallies. 00:06:07.730 --> 00:06:09.810 So technology testing focuses 00:06:09.810 --> 00:06:12.640 on a specific biometric technology component, 00:06:12.640 --> 00:06:16.020 for example, a matching algorithm in isolation, 00:06:16.020 --> 00:06:18.380 whereas scenario tests, on the other hand, 00:06:18.380 --> 00:06:21.840 are centered around a specific technology use case, 00:06:21.840 --> 00:06:26.430 for instance, a high-throughput airport checkpoint, 00:06:26.430 --> 00:06:29.560 and they include the full multi-component biometric system, 00:06:29.560 --> 00:06:34.404 so everything from user interaction, camera location, 00:06:34.404 --> 00:06:38.260 and, of course, biometric matching algorithms. 00:06:38.260 --> 00:06:42.620 So technology tests generally reuse biometric datasets 00:06:42.620 --> 00:06:44.850 and images that have been collected in the past 00:06:44.850 --> 00:06:47.690 and they benefit from these larger sample sizes, 00:06:47.690 --> 00:06:49.570 whereas scenario testing, by contrast, 00:06:49.570 --> 00:06:52.590 gathers all new biometric data each time 00:06:52.590 --> 00:06:54.850 in a way that simulates the operational environment 00:06:54.850 --> 00:06:57.870 but consequently, we work with smaller sample size 00:06:57.870 --> 00:06:59.050 in this case. 00:06:59.050 --> 00:07:01.720 So what's important here though is technology testing 00:07:01.720 --> 00:07:04.670 answers different questions than scenario testing. 00:07:04.670 --> 00:07:06.550 So technology testing answers questions 00:07:06.550 --> 00:07:08.150 about how technologies advance 00:07:08.150 --> 00:07:10.300 or perform relative to each other, 00:07:10.300 --> 00:07:12.340 especially at the limits of performance. 00:07:12.340 --> 00:07:14.670 So sort of think racing cars 00:07:14.670 --> 00:07:16.520 along the Bonneville Salt Flats, 00:07:16.520 --> 00:07:19.210 you know, the kind of cars that people race there 00:07:19.210 --> 00:07:23.280 to see who can beat the world land speed record 00:07:23.280 --> 00:07:24.800 are very different than the kind of cars 00:07:24.800 --> 00:07:26.917 you drive around town. 00:07:26.917 --> 00:07:28.830 So scenario testing answers questions 00:07:28.830 --> 00:07:30.900 about how well the technology performs 00:07:30.900 --> 00:07:32.420 within an intended use case. 00:07:32.420 --> 00:07:34.110 And in this case, you could think of, 00:07:34.110 --> 00:07:36.300 you know, if you're driving your car around town 00:07:36.300 --> 00:07:38.180 versus along a dirt trail 00:07:38.180 --> 00:07:40.200 versus some other specific scenario 00:07:40.200 --> 00:07:42.550 where you need to get from point A to point B. 00:07:42.550 --> 00:07:44.750 So the scenario test is really tailored 00:07:44.750 --> 00:07:47.040 to answering questions around that track 00:07:47.040 --> 00:07:49.100 and not in principle. 00:07:49.100 --> 00:07:53.020 So technology testing also answers questions 00:07:53.020 --> 00:07:57.190 like, for biometrics, what is the minimum false match rate 00:07:57.190 --> 00:07:59.980 achievable by face recognition technology? 00:07:59.980 --> 00:08:02.530 Whereas scenario testing answers questions 00:08:02.530 --> 00:08:05.480 like how will face recognition perform 00:08:05.480 --> 00:08:07.930 in, say, a high-throughput unattended scenario, 00:08:07.930 --> 00:08:08.933 like at an airport? 00:08:12.260 --> 00:08:15.040 So the work we perform at the Maryland Test Facility 00:08:15.040 --> 00:08:19.340 involves testing different types of biometric technologies 00:08:19.340 --> 00:08:21.940 which include, of course, face recognition. 00:08:21.940 --> 00:08:25.150 And the main test we perform, the rally, 00:08:25.150 --> 00:08:27.570 is focused on assessing a multitude 00:08:27.570 --> 00:08:29.220 of commercial face recognition 00:08:29.220 --> 00:08:32.720 and multimodal systems in DHS use cases. 00:08:32.720 --> 00:08:35.820 So we've been running the rally since 2018 00:08:35.820 --> 00:08:38.340 and the most recent assessment was carried out 00:08:38.340 --> 00:08:39.603 just a few months ago. 00:08:40.730 --> 00:08:44.210 To date, we've tested more than 200 combinations 00:08:44.210 --> 00:08:46.820 of commercial face acquisition systems 00:08:46.820 --> 00:08:48.240 and matching algorithms 00:08:48.240 --> 00:08:50.460 in this high-throughput unattended use case 00:08:50.460 --> 00:08:53.130 that we've been simulating through these years. 00:08:53.130 --> 00:08:54.480 And these rallies have provided 00:08:54.480 --> 00:08:56.650 some really comprehensive metrics 00:08:56.650 --> 00:08:58.310 about these tested technologies, 00:08:58.310 --> 00:09:00.790 which include, you know, how quickly they work, 00:09:00.790 --> 00:09:03.190 their efficiency and transaction times, 00:09:03.190 --> 00:09:06.140 the effectiveness of these technologies, 00:09:06.140 --> 00:09:09.160 you know, the ability of them to reliably acquire images 00:09:09.160 --> 00:09:10.003 and match them, 00:09:10.980 --> 00:09:13.570 satisfaction, you know, the user feedback that people leave 00:09:13.570 --> 00:09:15.250 about these technologies, 00:09:15.250 --> 00:09:19.450 as well as more recently, and the focus of the 2021 rally, 00:09:19.450 --> 00:09:23.550 is the equitability, making sure that technology works well 00:09:23.550 --> 00:09:25.230 for different demographic groups. 00:09:25.230 --> 00:09:28.907 And a lot of this work you could find at mdtf.org. 00:09:31.310 --> 00:09:34.550 So in addition to the summative metrics 00:09:34.550 --> 00:09:36.450 of technology performance, 00:09:36.450 --> 00:09:39.440 DHS S&T has used the data gathered as part of the rallies 00:09:39.440 --> 00:09:41.370 to help answer important questions 00:09:41.370 --> 00:09:44.370 about the way that commercial biometric technologies work, 00:09:44.370 --> 00:09:46.340 including questions regarding whether the technology 00:09:46.340 --> 00:09:48.500 is equitable, fair, or biased 00:09:48.500 --> 00:09:50.510 through advanced data analyses 00:09:50.510 --> 00:09:52.670 and publications in scientific journals. 00:09:52.670 --> 00:09:53.540 And I have a few of them 00:09:53.540 --> 00:09:55.493 on the right side of the slide here. 00:09:56.420 --> 00:09:57.960 And our publications to date 00:09:57.960 --> 00:10:00.320 have addressed a number of research topics, 00:10:00.320 --> 00:10:02.730 which include, for example, looking at the role 00:10:02.730 --> 00:10:06.070 of image acquisition in shaping demographic differences 00:10:06.070 --> 00:10:08.700 in face recognition system performance, 00:10:08.700 --> 00:10:11.550 establishing the influence of race, gender, and age 00:10:11.550 --> 00:10:13.540 on the false match rates, 00:10:13.540 --> 00:10:15.250 estimates of face recognition systems, 00:10:15.250 --> 00:10:17.050 something we'll touch on today, 00:10:17.050 --> 00:10:20.220 quantification and comparison of race and gender differences 00:10:20.220 --> 00:10:22.620 in commercial face and iris recognition systems, 00:10:23.550 --> 00:10:25.110 as well as cognitive biases 00:10:25.110 --> 00:10:28.190 introduced by face recognition algorithm outcomes 00:10:28.190 --> 00:10:29.920 in human workflows. 00:10:29.920 --> 00:10:32.720 So while some systems test well 00:10:32.720 --> 00:10:35.380 with diverse demographic groups, 00:10:35.380 --> 00:10:37.730 there are some demographic performance differentials 00:10:37.730 --> 00:10:40.260 that persist in both acquisition 00:10:40.260 --> 00:10:43.160 and in matching components of biometric systems, 00:10:43.160 --> 00:10:45.600 and these require careful evaluation. 00:10:45.600 --> 00:10:48.200 So I'll give you some examples today 00:10:48.200 --> 00:10:51.763 and so will John later on in this webinar. 00:10:53.920 --> 00:10:56.770 So specifically, what I'll start with 00:10:56.770 --> 00:10:58.980 is data from last year's, 00:10:58.980 --> 00:11:01.420 the 2020 Biometric Technology Rally, 00:11:01.420 --> 00:11:03.320 which was the first rally completed 00:11:03.320 --> 00:11:06.333 during the ongoing COVID-19 national emergency. 00:11:07.708 --> 00:11:09.720 So as the emergency unfolded from February 00:11:09.720 --> 00:11:11.380 and into the fall of 2020, 00:11:11.380 --> 00:11:14.723 masks became a part of life in the travel environment, 00:11:15.700 --> 00:11:17.940 and removing masks for face recognition 00:11:18.910 --> 00:11:21.490 now became a potential new source of risk 00:11:21.490 --> 00:11:25.710 to unvaccinated travelers and to staff at the airport. 00:11:25.710 --> 00:11:29.190 So for the 2020 rally, we challenged the industry 00:11:29.190 --> 00:11:31.950 to provide face recognition technologies 00:11:31.950 --> 00:11:34.860 that work in the presence of face masks. 00:11:34.860 --> 00:11:38.030 And this rally was the first large-scale scenario test 00:11:38.030 --> 00:11:39.570 of such technologies 00:11:39.570 --> 00:11:41.520 and we compared how well they worked 00:11:41.520 --> 00:11:43.560 for individuals without masks 00:11:43.560 --> 00:11:47.200 and the same people wearing their face masks of choice 00:11:47.200 --> 00:11:48.840 using the technology, 00:11:48.840 --> 00:11:51.270 and all of this while simulated a high-throughput 00:11:51.270 --> 00:11:52.870 unattended scenario environment. 00:11:56.300 --> 00:11:57.880 So what do I mean when I say 00:11:57.880 --> 00:12:00.180 an unattended high-throughput scenario? 00:12:00.180 --> 00:12:02.470 And I think I alluded to this a few times before, 00:12:02.470 --> 00:12:04.540 kind of like an airport checkpoint, 00:12:04.540 --> 00:12:07.050 but here's the main properties of this scenario. 00:12:07.050 --> 00:12:08.870 One, the face recognition system 00:12:08.870 --> 00:12:10.470 had a limited time to operate, 00:12:10.470 --> 00:12:13.353 actually just eight seconds on average per person, 00:12:14.370 --> 00:12:16.480 the face recognition system 00:12:16.480 --> 00:12:19.310 gets to acquire just about one image per individual, 00:12:19.310 --> 00:12:21.013 you can't acquire 10, 00:12:22.420 --> 00:12:23.750 and the identification gallery 00:12:23.750 --> 00:12:25.050 that you're working with is small, 00:12:25.050 --> 00:12:27.420 you know, typically you wanna identify people 00:12:27.420 --> 00:12:30.920 boarding a particular aircraft, 500 people. 00:12:30.920 --> 00:12:32.120 Most people being matched 00:12:32.120 --> 00:12:34.150 are in the identification gallery as well 00:12:34.150 --> 00:12:35.530 so there's very few people 00:12:35.530 --> 00:12:38.650 that would be out of gallery in this case, 00:12:38.650 --> 00:12:40.930 people who are not on the plane. 00:12:40.930 --> 00:12:43.560 And consequently, the impact of errors 00:12:43.560 --> 00:12:47.490 of those being matched is dominated by one kind of error. 00:12:47.490 --> 00:12:50.583 It's called a false negative error or false nonmatch, 00:12:52.350 --> 00:12:56.018 and the consequence of having a false nonmatch 00:12:56.018 --> 00:12:59.263 is a delay or denial of access to an aircraft. 00:13:00.680 --> 00:13:04.260 So in this case, that's what I'm gonna focus on. 00:13:04.260 --> 00:13:06.130 In the later part of the talk, 00:13:06.130 --> 00:13:09.353 Dr. Howard will talk about the other type of biometric data. 00:13:10.250 --> 00:13:12.440 So in this rally, the 2020 rally, 00:13:12.440 --> 00:13:15.750 a total of 582 diverse volunteers participated, 00:13:15.750 --> 00:13:18.240 and I show you sort of a demographic breakdown here 00:13:18.240 --> 00:13:20.720 by age, race, and gender. 00:13:20.720 --> 00:13:23.260 It's a complex graphic but what it conveys 00:13:23.260 --> 00:13:26.710 is that we had people that participated in this rally 00:13:26.710 --> 00:13:29.610 come from all sorts of demographic backgrounds, 00:13:29.610 --> 00:13:33.233 all ages, 18 to 65, 00:13:35.100 --> 00:13:39.393 males, females, and folks from different race groups. 00:13:40.600 --> 00:13:41.970 All of this demographic data 00:13:41.970 --> 00:13:44.730 is self-identified by the volunteers. 00:13:44.730 --> 00:13:46.540 So there were some volunteers 00:13:46.540 --> 00:13:49.190 that self-identified as Black or African American, 00:13:49.190 --> 00:13:52.000 volunteers that self-identified as white, Asian, 00:13:52.000 --> 00:13:53.940 and, you know, a number of other groups 00:13:53.940 --> 00:13:55.873 for whom we had a limited sample. 00:13:57.150 --> 00:13:58.020 Throughout the testing, 00:13:58.020 --> 00:14:00.993 volunteers used their own personal face masks. 00:14:01.880 --> 00:14:03.320 And in this rally, 00:14:03.320 --> 00:14:07.480 six commercial image acquisition systems participated 00:14:07.480 --> 00:14:12.162 and 10 commercial matching systems participated 00:14:12.162 --> 00:14:14.990 for a total of 60 system combinations tested. 00:14:14.990 --> 00:14:19.700 You see, in the rally, we test different acquisition systems 00:14:19.700 --> 00:14:21.100 with different matching systems 00:14:21.100 --> 00:14:24.790 and we're able to see a whole variety of performance. 00:14:24.790 --> 00:14:27.870 So all of these systems had to acquire a face image 00:14:27.870 --> 00:14:29.500 from each volunteer, 00:14:29.500 --> 00:14:34.160 and then that face image was used to identify each volunteer 00:14:34.160 --> 00:14:35.793 against a small gallery. 00:14:38.200 --> 00:14:40.150 And so what did we see? 00:14:40.150 --> 00:14:42.340 So the first part of the rally 00:14:42.340 --> 00:14:45.350 tested these systems without face masks, 00:14:45.350 --> 00:14:47.700 so everybody was wearing their face mask 00:14:47.700 --> 00:14:50.040 but right before biometric acquisition, 00:14:50.040 --> 00:14:53.222 we asked them to take their mask off 00:14:53.222 --> 00:14:55.070 so that they could go through the biometric system. 00:14:55.070 --> 00:14:57.650 And the infographic that I have here on the left 00:14:57.650 --> 00:14:59.730 shows the overall performance 00:14:59.730 --> 00:15:03.430 of the median system without face masks. 00:15:03.430 --> 00:15:06.700 So the median system actually did quite well 00:15:06.700 --> 00:15:09.070 because face recognition has come a long way 00:15:09.070 --> 00:15:10.340 in the last decade. 00:15:10.340 --> 00:15:12.790 And the graphic on the left shows that overall, 00:15:12.790 --> 00:15:16.430 the median system was able to identify 93% 00:15:16.430 --> 00:15:19.980 of these 582 volunteers. 00:15:19.980 --> 00:15:21.840 So overall, you can see there were few errors 00:15:21.840 --> 00:15:22.830 due to matching. 00:15:22.830 --> 00:15:26.720 Just 1% of the errors were due to the matching system 00:15:26.720 --> 00:15:29.743 failing to identify based on a collected photo. 00:15:30.760 --> 00:15:34.300 But more numerous were issues with image acquisition, 00:15:34.300 --> 00:15:37.180 so where the camera failed to take a photo. 00:15:37.180 --> 00:15:41.430 That was for 6% of the individuals in our sample. 00:15:41.430 --> 00:15:44.400 So overall, this is well in line with what we typically see 00:15:44.400 --> 00:15:45.730 in these scenario tests 00:15:45.730 --> 00:15:48.180 is that actually algorithms have gotten very accurate 00:15:48.180 --> 00:15:50.520 and they now don't dominate the errors 00:15:50.520 --> 00:15:51.610 of the biometric system. 00:15:51.610 --> 00:15:54.603 A lot of the errors are now made by the cameras. 00:15:55.550 --> 00:15:58.830 And on the right, I'm showing you something 00:15:58.830 --> 00:16:01.230 that we call the disaggregated performance 00:16:01.230 --> 00:16:03.970 of the system across demographic groups. 00:16:03.970 --> 00:16:07.800 So on the X-axis, I have the different demographic groups, 00:16:07.800 --> 00:16:10.880 Black, white, Asian, and other, 00:16:10.880 --> 00:16:12.940 and each point on this graph 00:16:12.940 --> 00:16:15.700 represents the true identification rate 00:16:15.700 --> 00:16:19.960 for a given system combination, there are 60 total, 00:16:19.960 --> 00:16:22.780 across our sample for each one of the demographic groups. 00:16:22.780 --> 00:16:25.660 And you could see that this true identification rate 00:16:25.660 --> 00:16:30.190 is very high for each one of the systems tested. 00:16:30.190 --> 00:16:34.390 So for any system tested I could find, 00:16:34.390 --> 00:16:35.830 for any demographic group tested, 00:16:35.830 --> 00:16:37.317 I could find a number of systems, 00:16:37.317 --> 00:16:40.880 and that's the number above these colored plots, 00:16:40.880 --> 00:16:42.490 a number of systems that have performed, 00:16:42.490 --> 00:16:46.530 you know, within the acceptable range, 00:16:46.530 --> 00:16:48.670 and that's that gray bar on the graph. 00:16:48.670 --> 00:16:50.740 So you could see 22 systems 00:16:50.740 --> 00:16:53.390 performed well within the acceptable range 00:16:53.390 --> 00:16:56.550 for volunteers that identified as Black or African American, 00:16:56.550 --> 00:16:59.590 32 systems for volunteers who self-identified as white, 00:16:59.590 --> 00:17:02.450 25 systems for those that self-identified as Asians, 00:17:02.450 --> 00:17:04.473 and 12 is for other. 00:17:05.560 --> 00:17:07.890 And this TIR is actually inclusive 00:17:07.890 --> 00:17:09.900 of all the sources of errors, 00:17:09.900 --> 00:17:13.540 so failures to acquire and failures to match. 00:17:13.540 --> 00:17:14.390 And on the very right side, 00:17:14.390 --> 00:17:17.520 I have another type of TIR which we call Matching TIR, 00:17:18.470 --> 00:17:20.770 and that one just focuses on algorithm errors. 00:17:20.770 --> 00:17:21.603 And you can see that 00:17:21.603 --> 00:17:22.505 when you take away errors of acquisition, 00:17:24.150 --> 00:17:28.790 essentially all the systems were able to, 00:17:28.790 --> 00:17:30.560 you know, most of the systems, 00:17:30.560 --> 00:17:33.780 were able to meet this high performance bar 00:17:33.780 --> 00:17:35.040 across demographic groups. 00:17:35.040 --> 00:17:36.660 And of course there are some systems 00:17:36.660 --> 00:17:40.050 that just didn't perform very well for technical reasons, 00:17:40.050 --> 00:17:42.710 and those are the dots that you see at the very bottom. 00:17:42.710 --> 00:17:44.620 But overall, the results look very similar 00:17:44.620 --> 00:17:46.123 across demographic groups. 00:17:47.880 --> 00:17:51.170 So what happens when we had folks 00:17:51.170 --> 00:17:54.020 keep wearing their face masks? 00:17:54.020 --> 00:17:56.100 So obviously a lot of these biometric systems 00:17:56.100 --> 00:17:58.290 were redesigned very recently 00:17:58.290 --> 00:18:01.110 in order to be able to even handle images with face masks. 00:18:01.110 --> 00:18:04.180 So we asked, okay, well, what impact now does that have 00:18:04.180 --> 00:18:06.280 on the performance of face recognition? 00:18:06.280 --> 00:18:09.510 And just to give you an idea, 00:18:09.510 --> 00:18:12.350 these images here are the diversity 00:18:12.350 --> 00:18:14.293 of the kind of images gathered by our system 00:18:14.293 --> 00:18:17.140 and the diversity of the face masks 00:18:17.140 --> 00:18:19.600 that we had in this evaluation. 00:18:19.600 --> 00:18:21.320 You could see all sorts of masks here. 00:18:21.320 --> 00:18:22.920 You could see blue surgical masks, 00:18:22.920 --> 00:18:23.990 you could see a lot of personal masks, 00:18:23.990 --> 00:18:28.240 masks with patterns, masks that are dark or light, 00:18:28.240 --> 00:18:31.530 different colors and different patterns as well. 00:18:31.530 --> 00:18:33.710 So this was really a good representative variety 00:18:33.710 --> 00:18:37.610 of the kind of face masks that people wear 00:18:37.610 --> 00:18:39.783 within the travel environment. 00:18:41.470 --> 00:18:43.380 So what did things look like? 00:18:43.380 --> 00:18:44.580 So again, on the left, 00:18:44.580 --> 00:18:47.870 I'm showing the performance of the median system, 00:18:47.870 --> 00:18:51.010 and indeed the performance with face masks was lower, 00:18:51.010 --> 00:18:54.560 with the median system now identifying only 77% 00:18:56.411 --> 00:18:58.120 of the 582 volunteers. 00:18:58.120 --> 00:19:01.450 Remember the previous slide, it was 93%. 00:19:01.450 --> 00:19:04.490 But impressively, the best system combination 00:19:04.490 --> 00:19:08.403 was still able to identify 96% of all volunteers. 00:19:10.180 --> 00:19:13.890 Again, you know, now the errors were a little bit higher 00:19:13.890 --> 00:19:17.610 so the algorithm failed to identify 8% 00:19:17.610 --> 00:19:18.740 in the median system 00:19:18.740 --> 00:19:22.490 and the camera failed to take a photo on 14% 00:19:22.490 --> 00:19:23.350 for the median system, 00:19:23.350 --> 00:19:25.900 so again, a lot of problems with even acquiring an image 00:19:25.900 --> 00:19:27.983 and they're exacerbated by face masks. 00:19:29.060 --> 00:19:31.580 But what I'm showing you on the right 00:19:31.580 --> 00:19:34.100 is, again, this disaggregated performance, 00:19:34.100 --> 00:19:36.400 I'm showing you this true identification rate. 00:19:37.750 --> 00:19:40.190 And I have an arrow marking an observation 00:19:40.190 --> 00:19:43.680 that we found worth highlighting, 00:19:43.680 --> 00:19:45.930 is when we disaggregated the system performance 00:19:45.930 --> 00:19:47.090 in the presence of masks, 00:19:47.090 --> 00:19:48.820 the results look very different. 00:19:48.820 --> 00:19:51.400 So of course overall performance went down 00:19:51.400 --> 00:19:53.610 but performance for some demographic groups 00:19:53.610 --> 00:19:56.290 went down more than for others. 00:19:56.290 --> 00:19:58.370 So, for instance, the performance for individuals 00:19:58.370 --> 00:20:01.840 that self-identified as Black or African American 00:20:01.840 --> 00:20:03.330 was particularly lower 00:20:03.330 --> 00:20:06.150 such that now no system combination 00:20:06.150 --> 00:20:09.550 achieved acceptable performance, this gray band, 00:20:09.550 --> 00:20:11.570 for that demographic group. 00:20:11.570 --> 00:20:12.610 Whereas for white, 00:20:12.610 --> 00:20:15.560 you see that five systems met that criteria, 00:20:15.560 --> 00:20:18.363 for Asian, 15, and then for other, eight. 00:20:19.630 --> 00:20:21.730 And you could see that this persists 00:20:21.730 --> 00:20:25.210 even if you take discount any failures to acquire, 00:20:25.210 --> 00:20:27.080 so on the very right, it's the same plot, 00:20:27.080 --> 00:20:29.070 but now on the Matching TIR, 00:20:29.070 --> 00:20:33.363 fewer systems met the matching only criteria, 00:20:34.210 --> 00:20:39.053 you know, 7 versus above 26 for all other groups, 00:20:40.016 --> 00:20:42.630 7 for Black or African American. 00:20:42.630 --> 00:20:45.920 So what this shows is that face masks 00:20:45.920 --> 00:20:48.400 not only decreased face recognition performance overall, 00:20:48.400 --> 00:20:51.390 but they also unmasked some demographic differentials, 00:20:51.390 --> 00:20:55.530 which we didn't see when the faces were not masked, 00:20:55.530 --> 00:20:57.530 when people were taking their masks off. 00:20:59.600 --> 00:21:02.600 So let's look at the performance of this best-performing 00:21:02.600 --> 00:21:04.560 acquisition and matching system combination, 00:21:04.560 --> 00:21:09.283 which performed relatively well, as I said earlier. 00:21:10.160 --> 00:21:11.910 And you could see that without masks, 00:21:11.910 --> 00:21:15.490 this system actually matched every single person. 00:21:15.490 --> 00:21:17.103 It didn't make matching errors. 00:21:18.400 --> 00:21:19.920 So the best system combination 00:21:19.920 --> 00:21:21.758 really worked well for everyone 00:21:21.758 --> 00:21:24.410 and you could see that without masks on the left. 00:21:24.410 --> 00:21:26.560 But with masks, on the right, 00:21:26.560 --> 00:21:29.820 you could see that even this best system combination 00:21:29.820 --> 00:21:33.200 failed to reach 95% true identification rate 00:21:33.200 --> 00:21:35.770 for volunteers identifying as Black or African American. 00:21:35.770 --> 00:21:39.680 So this sort of sets the upper bound of what was possible 00:21:40.770 --> 00:21:42.483 as demonstrated by this rally. 00:21:44.230 --> 00:21:46.593 So what have I told you so far? 00:21:47.440 --> 00:21:49.980 Well, face recognition technology 00:21:49.980 --> 00:21:52.810 can work well across demographic groups, 00:21:52.810 --> 00:21:54.720 especially without face masks. 00:21:54.720 --> 00:21:57.210 And these findings are similar to the findings 00:21:57.210 --> 00:21:59.490 from past rally scenario tests, 00:21:59.490 --> 00:22:01.940 but acquisition and matching errors 00:22:01.940 --> 00:22:06.330 don't always increase equally when the system is perturbed, 00:22:06.330 --> 00:22:08.410 in this case, by the addition of face masks. 00:22:08.410 --> 00:22:10.900 So if a system were to be evaluated 00:22:10.900 --> 00:22:14.420 for demographic differentials without face masks, 00:22:14.420 --> 00:22:17.560 you'd say that most of these systems did pretty well. 00:22:17.560 --> 00:22:19.980 However, as conditions change, 00:22:19.980 --> 00:22:22.420 like when people started wearing face masks, 00:22:22.420 --> 00:22:27.000 then these perturbations can actually unmask 00:22:27.000 --> 00:22:29.707 some demographic differentials that exist in the systems 00:22:29.707 --> 00:22:31.463 but just aren't visible without. 00:22:32.610 --> 00:22:34.770 So we found that this performance can decline 00:22:34.770 --> 00:22:37.820 for some demographic groups more than for others. 00:22:37.820 --> 00:22:41.280 Both acquisition and matching performance was affected. 00:22:41.280 --> 00:22:44.080 It's not just the matching algorithm that's important. 00:22:44.080 --> 00:22:45.800 We believe that there's a lot of effects 00:22:45.800 --> 00:22:49.860 of acquisition cameras that contribute to this, 00:22:49.860 --> 00:22:53.000 and that future research is gonna be really needed 00:22:53.000 --> 00:22:55.910 to investigate differential performance of the technology 00:22:55.910 --> 00:22:57.830 that underlies these differential outcomes. 00:22:57.830 --> 00:23:00.380 So the takeaway here is a fair system 00:23:00.380 --> 00:23:02.860 under one set of operational conditions 00:23:02.860 --> 00:23:04.900 may become a little bit demographically unfair 00:23:04.900 --> 00:23:06.600 when conditions change. 00:23:06.600 --> 00:23:10.710 And so from this, we recommend ongoing testing 00:23:10.710 --> 00:23:12.317 to track system performance 00:23:12.317 --> 00:23:14.850 and to include fairness as part of that 00:23:14.850 --> 00:23:16.173 as conditions change. 00:23:23.070 --> 00:23:25.053 So at this point, 00:23:29.787 --> 00:23:33.490 I'm gonna hand things off to my colleague, Dr. John Howard, 00:23:33.490 --> 00:23:35.880 who's gonna talk about the other kind of biometric error. 00:23:35.880 --> 00:23:37.720 So everything I've told you so far 00:23:37.720 --> 00:23:39.370 has been about false negatives 00:23:39.370 --> 00:23:42.220 and what John is gonna talk about right now 00:23:42.220 --> 00:23:43.930 is a different kind of error, false positives, 00:23:43.930 --> 00:23:47.070 which has a different kind of demographic effect. 00:23:47.070 --> 00:23:49.220 And John, I'm gonna hand it off to you now. 00:23:52.780 --> 00:23:56.100 - Okay. Just a head nod from someone you can hear me. 00:23:56.100 --> 00:23:58.420 This is good. Excellent. 00:23:58.420 --> 00:24:01.137 Okay, so yeah, biometrics 00:24:01.137 --> 00:24:03.040 and what we call demographic equitability, 00:24:03.040 --> 00:24:06.280 this is a topic that we found ourselves 00:24:06.280 --> 00:24:07.390 very heavily involved in 00:24:07.390 --> 00:24:10.190 in sort of the last couple of years. 00:24:10.190 --> 00:24:11.023 And it sort of means, 00:24:11.023 --> 00:24:14.030 you know, how well do biometric systems work 00:24:14.030 --> 00:24:15.277 across different groups of people, right? 00:24:15.277 --> 00:24:16.780 And this can be a lot of different things. 00:24:16.780 --> 00:24:19.240 It could be white/black, it could be short/tall, 00:24:19.240 --> 00:24:22.840 it could be light skin/dark skin, it could be male/female, 00:24:22.840 --> 00:24:24.990 but one of our goals is to sort of evaluate 00:24:24.990 --> 00:24:28.213 and to encourage industry to make biometric systems 00:24:28.213 --> 00:24:30.770 that sort of work equally well 00:24:30.770 --> 00:24:32.160 for all these different groups of people, 00:24:32.160 --> 00:24:35.318 and so that's what this topic we're gonna look at today is. 00:24:35.318 --> 00:24:36.768 You can go to the next slide. 00:24:42.110 --> 00:24:43.930 Right, so this may be a topic 00:24:43.930 --> 00:24:45.410 that some of you are familiar with. 00:24:45.410 --> 00:24:48.330 It's actually been in the news a lot lately 00:24:48.330 --> 00:24:49.360 over the last couple of years 00:24:49.360 --> 00:24:52.720 and in some fairly prominent places. 00:24:52.720 --> 00:24:55.070 We had articles in places like "Nature" 00:24:55.070 --> 00:24:57.440 that you see there on the upper left, 00:24:57.440 --> 00:25:00.030 which is, you know, a leading scientific publication 00:25:00.030 --> 00:25:01.970 that sort of asked the question, 00:25:01.970 --> 00:25:05.310 is facial recognition to biased to be let loose? 00:25:05.310 --> 00:25:08.410 We had some material in "MIT Technology Review," 00:25:08.410 --> 00:25:12.930 another very prominent tech reporting outlet 00:25:12.930 --> 00:25:15.440 that said, you know, a US government study 00:25:15.440 --> 00:25:17.370 actually confirmed in their words 00:25:17.370 --> 00:25:20.360 that face recognition systems were racist. 00:25:20.360 --> 00:25:21.730 And then the other thing I'll point out on this slide 00:25:21.730 --> 00:25:23.560 is it's not just a US issue. 00:25:23.560 --> 00:25:26.390 There's been a lot of activity in Europe 00:25:26.390 --> 00:25:28.810 and especially with regulatory bodies asking, 00:25:28.810 --> 00:25:32.270 you know, do we need to ban these technologies? 00:25:32.270 --> 00:25:36.684 And then the quote I'll point out in the lower right there 00:25:36.684 --> 00:25:39.470 is that the reason that these technologies 00:25:39.470 --> 00:25:40.960 are sometimes seen as discriminatory 00:25:40.960 --> 00:25:42.700 is because of this clustering effect, 00:25:42.700 --> 00:25:46.210 that they cluster individuals by these demographic groups, 00:25:46.210 --> 00:25:49.960 whether that's race, ethnicity, gender, et cetera. 00:25:49.960 --> 00:25:51.940 So we wanted to sort of ask the question, 00:25:51.940 --> 00:25:53.640 you know, what does that mean 00:25:53.640 --> 00:25:56.050 for commercial face recognition to cluster people 00:25:56.050 --> 00:25:57.580 by those demographic categories? 00:25:57.580 --> 00:25:59.253 And if you go to the next slide, 00:26:00.220 --> 00:26:03.990 we can kind of see visually what that looks like. 00:26:03.990 --> 00:26:04.860 A little context here, 00:26:04.860 --> 00:26:06.930 when people come to the Maryland Test Facility, 00:26:06.930 --> 00:26:09.977 we take a picture of their iris patterns 00:26:09.977 --> 00:26:11.390 and a picture of their face. 00:26:11.390 --> 00:26:14.850 And we ask ourselves, have we ever seen this person before? 00:26:14.850 --> 00:26:16.800 And that's 'cause we wanna have 00:26:16.800 --> 00:26:18.700 really good ground truth information 00:26:18.700 --> 00:26:21.520 about who the people that are involved in our testing are 00:26:21.520 --> 00:26:23.580 because the biometric error rates 00:26:23.580 --> 00:26:24.920 that we evaluate these systems on 00:26:24.920 --> 00:26:28.490 are sort of based on that ground truth information. 00:26:28.490 --> 00:26:30.730 And what we found is that sometimes, 00:26:30.730 --> 00:26:33.033 the computer systems we use to do this, 00:26:34.040 --> 00:26:36.030 incorrectly think that person has been there before, 00:26:36.030 --> 00:26:37.050 and when that happens, 00:26:37.050 --> 00:26:39.370 they'll send us back a picture of who they think it is. 00:26:39.370 --> 00:26:41.170 And so what you see on the left here 00:26:41.170 --> 00:26:45.330 are people who experienced this false match behavior, 00:26:45.330 --> 00:26:47.790 that's what that's called, with their iris pattern. 00:26:47.790 --> 00:26:49.340 And what you see on the right 00:26:49.340 --> 00:26:52.570 is people that had that false match error occur 00:26:52.570 --> 00:26:53.437 with their face pattern. 00:26:53.437 --> 00:26:56.800 And you should notice something that's rather profound, 00:26:56.800 --> 00:27:01.410 it's that all of the iris recognition false matches 00:27:01.410 --> 00:27:04.170 are not sort of related demographically, right? 00:27:04.170 --> 00:27:07.560 They're not the same gender, the same race, or the same age, 00:27:07.560 --> 00:27:08.570 but the same can't be said 00:27:08.570 --> 00:27:10.300 for the face recognition false matches. 00:27:10.300 --> 00:27:12.480 Every single person you see 00:27:12.480 --> 00:27:15.230 sort of on the right-hand side there, 00:27:15.230 --> 00:27:16.800 more or less, they're all the same gender, 00:27:16.800 --> 00:27:17.633 they're mostly the same race, 00:27:17.633 --> 00:27:19.720 and they're more or less the same age as well. 00:27:19.720 --> 00:27:22.690 And that's a characteristic sort of unique 00:27:22.690 --> 00:27:24.390 to facial recognition. 00:27:24.390 --> 00:27:26.087 It doesn't happen with iris recognition 00:27:26.087 --> 00:27:28.960 and it doesn't happen with fingerprint recognition. 00:27:28.960 --> 00:27:30.450 And that's something we sort of observed 00:27:30.450 --> 00:27:34.090 while we were watching these systems operate in real time 00:27:34.090 --> 00:27:36.360 with face recognition specifically. 00:27:36.360 --> 00:27:37.193 Next slide. 00:27:43.100 --> 00:27:46.180 So most people watching this, that's probably not surprising 00:27:46.180 --> 00:27:47.690 that face recognition does that. 00:27:47.690 --> 00:27:49.800 If you were, you know, a computer scientist 00:27:49.800 --> 00:27:52.220 or someone that was evaluating a face recognition algorithm 00:27:52.220 --> 00:27:53.370 and I showed you the last slide, 00:27:53.370 --> 00:27:54.660 you'd probably think 00:27:54.660 --> 00:27:56.310 that's a working face recognition algorithm, 00:27:56.310 --> 00:27:58.353 that's what it's supposed to do. 00:27:58.353 --> 00:28:01.560 And I'll challenge you to sort of understand 00:28:01.560 --> 00:28:03.110 that I think the reason most people think that 00:28:03.110 --> 00:28:06.330 is because unlike iris recognition, 00:28:06.330 --> 00:28:08.040 humans do face recognition as well, right? 00:28:08.040 --> 00:28:11.870 We have brains that have evolved to do face recognition. 00:28:11.870 --> 00:28:13.690 It's important for a number of reasons 00:28:13.690 --> 00:28:15.700 where evolutionary speaking, 00:28:15.700 --> 00:28:17.923 we recognize mates, friends, foes, 00:28:19.040 --> 00:28:22.460 and we study this in neuroscience and human cognition 00:28:22.460 --> 00:28:23.293 so much to the point 00:28:23.293 --> 00:28:26.240 that we actually know the part of the brain 00:28:26.240 --> 00:28:28.050 that does human face recognition. 00:28:28.050 --> 00:28:31.383 It's the part you see sort of highlighted in the red here. 00:28:32.716 --> 00:28:35.760 And so to us, it's intuitive that a computer algorithm 00:28:35.760 --> 00:28:36.940 would also think people 00:28:36.940 --> 00:28:41.690 that share, you know, gender and race are more similar, 00:28:41.690 --> 00:28:44.250 but we think that sort of gives us 00:28:44.250 --> 00:28:45.420 sort of an unconscious bias 00:28:45.420 --> 00:28:48.860 when it comes to humans evaluating, 00:28:48.860 --> 00:28:50.360 you know, how well a face recognition algorithms work. 00:28:50.360 --> 00:28:51.550 We think they should work like that 00:28:51.550 --> 00:28:53.600 so when they are, it's not surprising to us. 00:28:53.600 --> 00:28:55.670 And it's sort of our claim here 00:28:55.670 --> 00:28:58.160 that we need to overcome that human intuition 00:28:58.160 --> 00:29:00.740 so that we really can objectively 00:29:00.740 --> 00:29:02.720 evaluate these technologies. 00:29:02.720 --> 00:29:03.603 Next slide. 00:29:09.300 --> 00:29:10.950 Okay, so this is kind of mathematically 00:29:10.950 --> 00:29:11.930 what that clustering looks like. 00:29:11.930 --> 00:29:13.990 I showed you just digitally what it looks like, 00:29:13.990 --> 00:29:16.310 but the chart you see here in the middle, 00:29:16.310 --> 00:29:17.940 you can sort of picture every row 00:29:17.940 --> 00:29:20.200 and every column is a different person, 00:29:20.200 --> 00:29:25.190 and the value in the cell is how similar 00:29:25.190 --> 00:29:28.140 a face recognition algorithm thought those two people were. 00:29:28.140 --> 00:29:29.980 So a couple of things you should notice 00:29:29.980 --> 00:29:32.160 looking at this sort of matrix here 00:29:32.160 --> 00:29:34.210 is that the diagonal is all very dark 00:29:34.210 --> 00:29:35.650 and that's because face recognition algorithms 00:29:35.650 --> 00:29:40.650 think two of the exact same images look very similar, right? 00:29:41.550 --> 00:29:42.383 Again, working. 00:29:42.383 --> 00:29:43.810 And the other thing you should notice 00:29:43.810 --> 00:29:45.760 is sort of this block diagonal pattern 00:29:45.760 --> 00:29:50.530 that moves along the sort of middle of the chart. 00:29:50.530 --> 00:29:52.760 And that's because, again, face recognition algorithms 00:29:52.760 --> 00:29:55.040 tend to think Black females 00:29:55.040 --> 00:29:57.280 look more similar to other Black females 00:29:57.280 --> 00:29:59.160 than white females do to Black females, 00:29:59.160 --> 00:30:01.470 which is sort of these two blocks you see 00:30:01.470 --> 00:30:03.900 along the bottom row there. 00:30:03.900 --> 00:30:05.130 And that same effect exists 00:30:05.130 --> 00:30:06.480 for other demographic groups as well. 00:30:06.480 --> 00:30:07.910 The face recognition algorithms tend to think 00:30:07.910 --> 00:30:10.830 white females look more similar to white females, 00:30:10.830 --> 00:30:12.570 Black males look more similar to Black males, 00:30:12.570 --> 00:30:13.990 white males, et cetera. 00:30:13.990 --> 00:30:16.240 It's not limited to just one of those groups. 00:30:17.140 --> 00:30:17.973 Next slide. 00:30:21.390 --> 00:30:24.680 So that sort of block diagonal pattern there that exists 00:30:24.680 --> 00:30:26.040 can be problematic, right? 00:30:26.040 --> 00:30:29.130 It has this effect that I just sort of outlined. 00:30:29.130 --> 00:30:31.000 And so we wanted to ask ourselves, 00:30:31.000 --> 00:30:32.940 do we think it's possible 00:30:32.940 --> 00:30:35.370 to make a face recognition algorithm that doesn't do this, 00:30:35.370 --> 00:30:37.740 that behaves more like a fingerprint 00:30:37.740 --> 00:30:39.360 or an iris recognition algorithm 00:30:39.360 --> 00:30:41.817 where if you took my fingerprints or my iris patterns 00:30:41.817 --> 00:30:45.350 and you searched me against a whole gallery of people, 00:30:45.350 --> 00:30:46.700 the person that comes back most similar to me 00:30:46.700 --> 00:30:48.970 is not going to be, in all likelihood, 00:30:48.970 --> 00:30:51.863 another 30-year-old Caucasian male. 00:30:52.834 --> 00:30:54.340 And so we asked ourselves, do we think it's possible 00:30:54.340 --> 00:30:56.840 to train a face recognition algorithm 00:30:56.840 --> 00:30:57.990 to do something similar to that, 00:30:57.990 --> 00:30:59.550 where it's just as likely to confuse me 00:30:59.550 --> 00:31:02.820 for, you know, perhaps a (indistinct) Asian woman 00:31:02.820 --> 00:31:04.520 as it is a 30-year-old Caucasian male, 00:31:04.520 --> 00:31:07.313 and it turns out we think the answer to that is yes. 00:31:08.170 --> 00:31:11.610 So that would be moving from this sort of matrix pattern 00:31:11.610 --> 00:31:13.380 you see in the middle 00:31:13.380 --> 00:31:15.040 to the pattern you see on the upper right there, 00:31:15.040 --> 00:31:16.780 where there really is no discernible pattern 00:31:16.780 --> 00:31:18.090 along the middle. 00:31:18.090 --> 00:31:21.710 I'm not gonna go into sort of all of the math 00:31:21.710 --> 00:31:22.990 behind how we did this. 00:31:22.990 --> 00:31:26.160 It's kind of a lot for this short kind of presentation. 00:31:26.160 --> 00:31:28.890 I will say we published a paper. 00:31:28.890 --> 00:31:32.130 It's actually a DHS Technical Papers Series. 00:31:32.130 --> 00:31:34.780 It's on the Biometric and Identity Technology Center 00:31:34.780 --> 00:31:37.280 website right now, where all that's laid out. 00:31:37.280 --> 00:31:39.110 I encourage you, if anyone's interested, 00:31:39.110 --> 00:31:40.600 to go download that and go through it. 00:31:40.600 --> 00:31:43.233 I'm also happy to take questions on it in the Q&A. 00:31:44.220 --> 00:31:46.080 But it essentially comes to these conclusions 00:31:46.080 --> 00:31:48.170 that are laid out on the slide here 00:31:48.170 --> 00:31:50.860 that we can prevent face recognition algorithms 00:31:50.860 --> 00:31:53.980 from taking into account race and gender information 00:31:53.980 --> 00:31:56.710 when they're making identity determinations 00:31:56.710 --> 00:32:01.660 and it'll still be a functioning face recognition algorithm. 00:32:01.660 --> 00:32:04.600 It'll still distinguish people from themselves 00:32:04.600 --> 00:32:05.713 and from other people. 00:32:06.860 --> 00:32:10.460 There's also a conclusion in the paper that this technique 00:32:10.460 --> 00:32:13.620 might, could lead to slightly less accurate 00:32:13.620 --> 00:32:15.830 face recognition algorithms overall, 00:32:15.830 --> 00:32:18.730 but algorithms that lead to more fair outcomes, 00:32:18.730 --> 00:32:19.740 which we sort of point out 00:32:19.740 --> 00:32:21.900 is a trade space that's worth exploring 00:32:21.900 --> 00:32:24.640 and a conversation that's worth having. 00:32:24.640 --> 00:32:25.473 So yeah, I encourage everyone 00:32:25.473 --> 00:32:28.250 to go to the Tech Center website 00:32:28.250 --> 00:32:30.430 and look at that if you're interested. 00:32:30.430 --> 00:32:31.263 Next slide. 00:32:33.890 --> 00:32:36.320 So right, as part of this demographic work 00:32:36.320 --> 00:32:37.660 that I just outlined that we've been doing, 00:32:37.660 --> 00:32:38.570 we're also heavily involved 00:32:38.570 --> 00:32:40.470 in the international standards community. 00:32:40.470 --> 00:32:42.600 I mentioned this isn't sort of just a problem 00:32:42.600 --> 00:32:45.700 that's unique to the US. 00:32:45.700 --> 00:32:48.350 A lot of other nations are having it as well. 00:32:48.350 --> 00:32:50.233 Go to the next slide. 00:32:51.947 --> 00:32:52.840 And that's because a lot of people 00:32:52.840 --> 00:32:54.880 are sort of starting to use face recognition systems. 00:32:54.880 --> 00:32:57.440 It's seen a sort of an explosion in use cases 00:32:57.440 --> 00:32:59.270 over the last couple of years. 00:32:59.270 --> 00:33:00.900 With that explosion in use cases 00:33:00.900 --> 00:33:04.680 has come, we think, a sort of an increased public awareness 00:33:04.680 --> 00:33:06.053 and also some concerns. 00:33:07.090 --> 00:33:09.950 This has trickled also into the policymaker space. 00:33:09.950 --> 00:33:12.280 I've got two US Senate bills 00:33:12.280 --> 00:33:16.230 listed here on the slide, 3284 and 4084, 00:33:16.230 --> 00:33:20.810 that are both essentially regulation or restrictions 00:33:20.810 --> 00:33:23.580 on the use of face recognition specifically. 00:33:23.580 --> 00:33:26.100 There's similar actions pending 00:33:26.100 --> 00:33:28.020 both in Australia and in the EU, 00:33:28.020 --> 00:33:29.910 probably elsewhere that I don't even have listed 00:33:29.910 --> 00:33:30.743 on this chart. 00:33:32.381 --> 00:33:34.650 And I think some of that stems from, 00:33:34.650 --> 00:33:36.930 you know, researchers don't always talk about this 00:33:36.930 --> 00:33:37.970 the same way as well, 00:33:37.970 --> 00:33:41.120 and so part of this international standards effort 00:33:41.120 --> 00:33:43.020 that DHS has really taken a leadership role in 00:33:43.020 --> 00:33:45.073 is sort of coming to that standard. 00:33:46.670 --> 00:33:47.503 Next slide. 00:33:50.480 --> 00:33:51.770 It started actually a few years ago. 00:33:51.770 --> 00:33:55.720 This is ISO Technical Report 22116 00:33:55.720 --> 00:33:58.993 that DHS had actually consumed the editorship on. 00:34:00.100 --> 00:34:01.860 It was the first to sort of think about 00:34:01.860 --> 00:34:06.810 how would we do studies of demographic equitability 00:34:06.810 --> 00:34:08.640 on sort of the international stage. 00:34:08.640 --> 00:34:10.260 It defined some terms, 00:34:10.260 --> 00:34:12.220 sort of looked at different areas in the system 00:34:12.220 --> 00:34:14.410 where performance variation can exist. 00:34:14.410 --> 00:34:17.460 That was actually published almost a full year ago now. 00:34:17.460 --> 00:34:19.040 And that led into some activities 00:34:19.040 --> 00:34:21.640 that we're currently undertaking, 00:34:21.640 --> 00:34:23.040 if you go to the next slide, 00:34:24.390 --> 00:34:26.210 which is the actual international standard. 00:34:26.210 --> 00:34:30.280 This is ISO Document 19795-10 00:34:30.280 --> 00:34:35.080 and it is how to quantify biometric performance 00:34:35.080 --> 00:34:37.070 across demographic groups. 00:34:37.070 --> 00:34:39.580 It was proved shortly after that TR was released, 00:34:39.580 --> 00:34:41.730 they sort of looked at it 00:34:41.730 --> 00:34:43.120 and said this is a worthwhile topic 00:34:43.120 --> 00:34:44.350 and we've got the starts 00:34:44.350 --> 00:34:46.481 of what could be international standard here. 00:34:46.481 --> 00:34:48.200 We actually just put out the first draft of that. 00:34:48.200 --> 00:34:51.780 Yevgeniy and I are actually the co-editors of it. 00:34:51.780 --> 00:34:53.210 The first draft came out this summer, 00:34:53.210 --> 00:34:58.100 The final (indistinct) should be approved for publication 00:34:58.100 --> 00:35:00.900 sometime in the 2023 to 2024 timeline. 00:35:00.900 --> 00:35:02.610 But that means it's sort of open right now 00:35:02.610 --> 00:35:07.610 for comments and for new material and things like that. 00:35:07.617 --> 00:35:11.890 And DHS, as well as some other US government agencies, 00:35:11.890 --> 00:35:13.750 are really taking that opportunity 00:35:13.750 --> 00:35:15.610 to add material into the standard, 00:35:15.610 --> 00:35:17.510 which we think is a really good thing. 00:35:18.725 --> 00:35:20.660 (indistinct) through some of this material. 00:35:20.660 --> 00:35:21.900 I think we only have a couple slides left 00:35:21.900 --> 00:35:23.893 and then we'll get into the QA session. 00:35:25.310 --> 00:35:28.040 But this is what's within scope of that ISO standard, 00:35:28.040 --> 00:35:30.530 essentially what the title says, 00:35:30.530 --> 00:35:32.940 how to estimate and report on performance variation 00:35:32.940 --> 00:35:34.253 across demographic groups. 00:35:35.600 --> 00:35:37.340 The document attempts to give some guidance 00:35:37.340 --> 00:35:40.530 on establishing demographic group membership. 00:35:40.530 --> 00:35:41.490 That's sometimes challenging, 00:35:41.490 --> 00:35:44.180 particularly we talk about things like race categories 00:35:44.180 --> 00:35:45.993 across international countries. 00:35:47.300 --> 00:35:50.120 Guidance on using what's called phenotypic measures, 00:35:50.120 --> 00:35:51.520 which are more observable 00:35:51.520 --> 00:35:54.973 as opposed to reported characteristics of individuals. 00:35:54.973 --> 00:35:56.890 I think that's a good thing. 00:35:56.890 --> 00:35:59.590 It does (indistinct) did as well, 00:35:59.590 --> 00:36:01.330 which is continues to define these terms and definitions 00:36:01.330 --> 00:36:04.060 so when we say things like demographic differential, 00:36:04.060 --> 00:36:05.750 we're all talking about the same thing. 00:36:05.750 --> 00:36:07.400 And then it gives some requirements 00:36:07.400 --> 00:36:09.040 on sort of, again, from a math level, 00:36:09.040 --> 00:36:10.080 how do you do these tests? 00:36:10.080 --> 00:36:11.730 What kind of statistical techniques do you use? 00:36:11.730 --> 00:36:14.000 What kind of formulas and things like that? 00:36:14.000 --> 00:36:17.170 All of that's sort of currently being iterated on. 00:36:17.170 --> 00:36:18.003 Next slide. 00:36:19.930 --> 00:36:21.680 And then this is sort of the last part of the scope, 00:36:21.680 --> 00:36:22.680 and I'm bringing this up 00:36:22.680 --> 00:36:25.040 just because I think some of the people on the call, 00:36:25.040 --> 00:36:25.877 this may be interesting to you. 00:36:25.877 --> 00:36:28.727 And if it is, I highly encourage you to reach out to Arun 00:36:29.660 --> 00:36:31.150 and sort of get involved with this standard. 00:36:31.150 --> 00:36:32.970 We're always looking for additional partners 00:36:32.970 --> 00:36:35.730 to help craft this and to take input. 00:36:35.730 --> 00:36:37.890 But so outside of the scope and the definitions 00:36:37.890 --> 00:36:39.250 that I went through on the last slide, 00:36:39.250 --> 00:36:41.450 I mentioned this phenotypical measures, 00:36:41.450 --> 00:36:43.290 and then the last two we think are really important, right? 00:36:43.290 --> 00:36:47.690 So it's how and when do you do demographic testing? 00:36:47.690 --> 00:36:50.630 I think Yevgeniy laid out a really compelling case study 00:36:50.630 --> 00:36:53.980 and why it's important to do these fairly often, 00:36:53.980 --> 00:36:55.610 because as things change on the ground, 00:36:55.610 --> 00:36:58.310 the results of your demographic equitability study 00:36:58.310 --> 00:37:00.030 could change as well. 00:37:00.030 --> 00:37:02.870 And then, okay, so you've decided to do 00:37:02.870 --> 00:37:04.470 an equitability study, 00:37:04.470 --> 00:37:06.060 what do you actually need to report out? 00:37:06.060 --> 00:37:07.710 What do you need to sort of tell people 00:37:07.710 --> 00:37:09.350 to give them the confidence 00:37:09.350 --> 00:37:12.310 that these systems are working equally well 00:37:12.310 --> 00:37:14.356 for all groups of people? 00:37:14.356 --> 00:37:16.206 That'll also be part of the standard. 00:37:17.320 --> 00:37:18.153 So next slide. 00:37:21.140 --> 00:37:22.960 Okay, and I think we're gonna end here 00:37:22.960 --> 00:37:25.580 and just note that Yevgeniy talked about 00:37:26.420 --> 00:37:28.810 the 2020 Biometric Technology Rally 00:37:28.810 --> 00:37:30.810 that we did sort of mid-pandemic. 00:37:30.810 --> 00:37:34.363 We actually just executed another one a couple months ago. 00:37:35.760 --> 00:37:36.830 This was in October. 00:37:36.830 --> 00:37:38.990 And this one is very similar 00:37:38.990 --> 00:37:41.970 to the rally Yevgeniy talked about 00:37:41.970 --> 00:37:43.980 where we're looking at masked and non-masked performance 00:37:43.980 --> 00:37:46.123 with these acquisition and matching systems. 00:37:47.380 --> 00:37:49.680 For the first time, this 2021 rally 00:37:49.680 --> 00:37:52.750 is also explicitly looking at biometric performance 00:37:52.750 --> 00:37:54.360 across demographic groups. 00:37:54.360 --> 00:37:57.460 That hasn't been an explicit goal of us in the past 00:37:57.460 --> 00:37:58.520 and part of the reason 00:37:58.520 --> 00:38:00.760 is because we didn't have sort of some of these ideas 00:38:00.760 --> 00:38:03.690 standardized about how to do that. 00:38:03.690 --> 00:38:05.020 So as the standard has progressed, 00:38:05.020 --> 00:38:07.550 it's allowed us to migrate some of those topics 00:38:07.550 --> 00:38:09.963 into our scenario testing model as well. 00:38:11.410 --> 00:38:16.050 Reports on how this went will be coming out shortly 00:38:16.050 --> 00:38:17.527 and so I encourage everybody to check back with Arun 00:38:17.527 --> 00:38:18.560 and with the Tech Center 00:38:18.560 --> 00:38:20.783 and get the results of those studies. 00:38:22.880 --> 00:38:24.640 And with that, I think that's my last slide. 00:38:24.640 --> 00:38:28.453 We can turn it back to Arun or open it up for questions, 00:38:28.453 --> 00:38:30.532 whatever you wanna do. 00:38:30.532 --> 00:38:32.949 - Okay, I have a question in. 00:38:34.360 --> 00:38:35.657 It's a question, I'll read it off. 00:38:35.657 --> 00:38:37.990 "When you looked at iris matching, 00:38:37.990 --> 00:38:40.100 did you have any mismatches of people 00:38:40.100 --> 00:38:42.320 across eye color types? 00:38:42.320 --> 00:38:44.510 Or were they of the same demographic groups 00:38:44.510 --> 00:38:46.517 of the same eye color?" 00:38:48.470 --> 00:38:50.963 Think maybe, John, you wanna take this one? 00:38:52.930 --> 00:38:54.730 - Yeah, I can take this one. 00:38:54.730 --> 00:38:56.320 It's a really good question actually. 00:38:56.320 --> 00:38:57.500 So eye color, 00:38:57.500 --> 00:39:01.910 we mentioned age, race, and gender as demographic groups 00:39:01.910 --> 00:39:04.630 that presumably would affect face recognition. 00:39:04.630 --> 00:39:07.210 Eye color is one we didn't bring up on this slide, 00:39:07.210 --> 00:39:08.670 but that's one that might affect, 00:39:08.670 --> 00:39:10.960 you could reasonably presume 00:39:10.960 --> 00:39:13.190 it would affect iris recognition. 00:39:13.190 --> 00:39:15.050 I don't actually have an answer to the question 00:39:15.050 --> 00:39:17.550 because we didn't pull that particular piece of information, 00:39:17.550 --> 00:39:21.760 but I will just kind of mention that (indistinct) 00:39:21.760 --> 00:39:23.300 these iris recognition algorithms work 00:39:23.300 --> 00:39:25.630 is one, they're all looking at irises 00:39:25.630 --> 00:39:26.990 in what's called the near-IR range, 00:39:26.990 --> 00:39:30.110 so outside of actual visible light, 00:39:30.110 --> 00:39:32.280 and so they sort of look like black and white images 00:39:32.280 --> 00:39:33.163 to begin with. 00:39:34.560 --> 00:39:35.580 And then the second point 00:39:35.580 --> 00:39:39.450 is sort of the way that iris matching works 00:39:39.450 --> 00:39:42.090 uses these patterns called Gabor wavelets. 00:39:42.090 --> 00:39:44.000 They're not really consistent between eye color 00:39:44.000 --> 00:39:45.650 so we wouldn't expect to find that there, 00:39:45.650 --> 00:39:47.440 but it's, again, a good question 00:39:47.440 --> 00:39:49.290 and something that you could reasonably assume 00:39:49.290 --> 00:39:51.143 might happen with iris recognition. 00:39:52.620 --> 00:39:56.290 - Yeah, I'll add one thing, John, to this reply, 00:39:56.290 --> 00:39:59.890 is that, you know, the choice of demographic groups 00:39:59.890 --> 00:40:02.190 is really important 00:40:02.190 --> 00:40:04.500 and it's something that I think the standard also, 00:40:04.500 --> 00:40:07.970 this -10 standard that John mentioned, it will address. 00:40:07.970 --> 00:40:09.780 But, you know, which demographic groups 00:40:09.780 --> 00:40:11.300 should we worry about assessing? 00:40:11.300 --> 00:40:13.450 Because, you know, ultimately you could imagine 00:40:13.450 --> 00:40:15.870 creating a demographic group of people 00:40:15.870 --> 00:40:18.520 for whom the technology works better than for others, 00:40:18.520 --> 00:40:21.330 and that could be a demographic group on its own right. 00:40:21.330 --> 00:40:23.720 But we have these protected demographic groups 00:40:23.720 --> 00:40:25.210 in different jurisdiction 00:40:25.210 --> 00:40:26.600 and those are really the ones 00:40:26.600 --> 00:40:28.793 that we've been focusing on far. 00:40:29.680 --> 00:40:32.020 But it's a great question because different technologies 00:40:32.020 --> 00:40:34.220 may have different demographic (indistinct). 00:40:36.360 --> 00:40:38.483 And I've got the next question here. 00:40:39.670 --> 00:40:41.557 And this question is, 00:40:41.557 --> 00:40:46.557 "What are the confidence intervals on identification 00:40:46.660 --> 00:40:49.970 relative to the individual features 00:40:49.970 --> 00:40:52.660 used in the facial recognition algorithm 00:40:52.660 --> 00:40:55.567 for different demographics?" 00:40:56.840 --> 00:40:59.660 And I think what the question is trying to ask is 00:41:01.677 --> 00:41:05.350 is there a difference in the confidence scores maybe 00:41:07.160 --> 00:41:09.340 of the identifications 00:41:09.340 --> 00:41:11.530 based on different demographic groups, 00:41:11.530 --> 00:41:14.540 or the variability of the measured identification 00:41:14.540 --> 00:41:16.800 performance across demographic groups? 00:41:16.800 --> 00:41:19.710 The charts that we had on the screen today, 00:41:19.710 --> 00:41:23.800 we did not put error bars on those charts. 00:41:23.800 --> 00:41:27.830 So our sample is typically numbering in the, 00:41:27.830 --> 00:41:31.830 I say for folks identifying as Black or African American, 00:41:31.830 --> 00:41:34.480 we had a sample, you know, in the hundreds, 00:41:34.480 --> 00:41:38.200 and for folks that self-identified as white as well. 00:41:38.200 --> 00:41:39.810 So those confidence intervals 00:41:39.810 --> 00:41:43.773 will typically be in the order of a few percent. 00:41:44.850 --> 00:41:48.750 And so if that addresses the question, 00:41:48.750 --> 00:41:51.543 or if not, I apologize and maybe I misunderstood it. 00:41:55.030 --> 00:41:58.080 - Yeah, maybe just to add on here, 00:41:58.080 --> 00:42:01.530 and again, please rephrase the question if that wasn't it, 00:42:01.530 --> 00:42:02.363 per Yevgeniy there, 00:42:02.363 --> 00:42:03.610 but the general question 00:42:03.610 --> 00:42:07.320 about how do you put confidence intervals on these numbers 00:42:07.320 --> 00:42:10.160 I think is actually a very important one, right? 00:42:10.160 --> 00:42:12.710 So we usually report out things like false match rate, 00:42:12.710 --> 00:42:15.640 false non-match rate, and asking the question 00:42:15.640 --> 00:42:17.440 of how similar these things need to be 00:42:17.440 --> 00:42:19.810 to be sort of be determined as equal 00:42:19.810 --> 00:42:22.310 is a really challenging one in some situations. 00:42:22.310 --> 00:42:23.160 And it's one of the things 00:42:23.160 --> 00:42:25.830 that also the standard attempts to go into 00:42:26.940 --> 00:42:28.410 to sort of give some guidance on, 00:42:28.410 --> 00:42:30.110 you know, okay, so you have two different numbers 00:42:30.110 --> 00:42:31.373 from two different rates. 00:42:33.460 --> 00:42:34.760 You know, can you say it's operating 00:42:34.760 --> 00:42:39.760 at statistically the same rate across those groups? 00:42:39.810 --> 00:42:40.710 Really good question, 00:42:40.710 --> 00:42:43.300 really hard question, actually, to answer too. 00:42:43.300 --> 00:42:47.320 - So a clarification that came in with the question is, 00:42:47.320 --> 00:42:50.500 the clarification is the features used, 00:42:50.500 --> 00:42:53.210 iris, ears, nose, mouth, 00:42:53.210 --> 00:42:57.250 do the confidences associated with different facial features 00:42:57.250 --> 00:42:59.490 vary differently for different demographics? 00:42:59.490 --> 00:43:01.610 And I think to answer that question, 00:43:01.610 --> 00:43:06.020 I would have to say that we simply don't know. 00:43:06.020 --> 00:43:07.780 These face recognition algorithms 00:43:07.780 --> 00:43:11.880 are essentially black boxes, at least the modern ones. 00:43:11.880 --> 00:43:15.010 They take the entire face image 00:43:15.010 --> 00:43:18.980 and they perform a complex, 00:43:18.980 --> 00:43:21.550 convolutional operations on this image 00:43:21.550 --> 00:43:23.710 so that we can't tell whether or not 00:43:23.710 --> 00:43:27.730 a particular specific feature is driving that score. 00:43:27.730 --> 00:43:30.400 But if you look at our technical paper, 00:43:30.400 --> 00:43:33.920 we're actually trying to unravel at least 00:43:35.343 --> 00:43:38.570 what kind of features these algorithms might be using 00:43:38.570 --> 00:43:40.360 that are related to demographics. 00:43:40.360 --> 00:43:42.890 So we can't pin it down to like, oh, it's the nose 00:43:42.890 --> 00:43:44.320 or it's the ears, 00:43:44.320 --> 00:43:47.130 but we can say that we have evidence 00:43:47.130 --> 00:43:49.780 that they're using some sort of a set of features 00:43:49.780 --> 00:43:51.260 that are linked with demographics 00:43:51.260 --> 00:43:53.380 because we see some patterns 00:43:53.380 --> 00:43:55.880 in the way that this algorithm performs 00:43:55.880 --> 00:43:59.270 across specific demographic groups. 00:43:59.270 --> 00:44:02.830 So, unfortunately I can't answer the question directly. 00:44:02.830 --> 00:44:04.803 John or Arun, do you have anything? 00:44:06.000 --> 00:44:07.290 - Yeah, I was just kind of, 00:44:07.290 --> 00:44:09.180 so I had to drop off and come back on, 00:44:09.180 --> 00:44:12.110 but I think, you know, one of the points to make here too 00:44:12.110 --> 00:44:14.760 is most of these modern technologies, 00:44:14.760 --> 00:44:16.200 we've seen this massive improvement 00:44:16.200 --> 00:44:17.900 with facial recognition algorithms 00:44:17.900 --> 00:44:19.040 in the last couple of years, 00:44:19.040 --> 00:44:23.890 largely driven by the adoption of AIML technologies 00:44:23.890 --> 00:44:25.360 into this space. 00:44:25.360 --> 00:44:27.260 But with the adoption of the AIML, 00:44:27.260 --> 00:44:29.980 we don't exactly know what features 00:44:29.980 --> 00:44:32.030 these models are using all the time. 00:44:32.030 --> 00:44:34.730 And we can try to go back and dissect that a little bit 00:44:34.730 --> 00:44:37.230 but we're limited because these are trained models 00:44:38.292 --> 00:44:40.640 that are commercial and proprietary, right? 00:44:40.640 --> 00:44:43.450 So there's only so much insight we can actually get 00:44:43.450 --> 00:44:47.430 into what's going on within the models themselves. 00:44:47.430 --> 00:44:49.910 So it's really hard to pinpoint what are the features 00:44:49.910 --> 00:44:53.450 let alone whether the features that are more salient 00:44:53.450 --> 00:44:55.713 vary between different demographic groups. 00:45:00.080 --> 00:45:02.313 - Okay, next question, 00:45:03.447 --> 00:45:06.280 "Regarding the system that performed the best 00:45:07.580 --> 00:45:10.060 on the unmasked faces, 00:45:10.060 --> 00:45:11.740 is it possible that this system 00:45:11.740 --> 00:45:14.300 was using some ocular inputs?" 00:45:14.300 --> 00:45:16.710 And I think the answer is yes. 00:45:16.710 --> 00:45:20.870 I think it certainly, again, as Arun pointed out, 00:45:20.870 --> 00:45:23.330 these are sort of black box systems 00:45:23.330 --> 00:45:28.070 and we don't know exactly what features they're using, 00:45:28.070 --> 00:45:30.230 but it's absolutely possible that the system 00:45:30.230 --> 00:45:33.420 may have been using what we call periocular information, 00:45:33.420 --> 00:45:35.833 sort of information around the eye region. 00:45:38.250 --> 00:45:40.360 But it's unlikely that they're doing something 00:45:40.360 --> 00:45:42.820 that is akin to iris recognition 00:45:42.820 --> 00:45:46.070 just because the irises are such small portions 00:45:46.070 --> 00:45:49.163 of a face recognition type image. 00:45:50.200 --> 00:45:51.450 But yeah, so the answer is yes, 00:45:51.450 --> 00:45:54.801 it's probably using some periocular information. 00:45:54.801 --> 00:45:56.250 - And I'll just kind of add onto it. 00:45:56.250 --> 00:45:58.340 It's probably not using iris information. 00:45:58.340 --> 00:46:01.810 As John mentioned, iris is in the near infrared, right? 00:46:01.810 --> 00:46:04.290 And the features that are kind of discernible on that domain 00:46:04.290 --> 00:46:05.750 are very different than what would appear 00:46:05.750 --> 00:46:07.240 in the visible domain. 00:46:07.240 --> 00:46:10.110 So, you know, it's almost certainly using information here 00:46:10.110 --> 00:46:12.880 where the algorithms are saying, when there's a mask, 00:46:12.880 --> 00:46:15.120 maybe I'm weighting these features differently 00:46:15.120 --> 00:46:18.190 than I would be if the person's not wearing a mask at all. 00:46:18.190 --> 00:46:20.360 - Yeah, I think it was interesting 00:46:20.360 --> 00:46:22.090 that, you know, if you were gonna do a study 00:46:22.090 --> 00:46:25.733 of a black box seeing an algorithm, as Arun pointed out, 00:46:26.820 --> 00:46:28.820 to figure out what features it was using, 00:46:28.820 --> 00:46:30.040 you know, what you would do 00:46:30.040 --> 00:46:32.130 is essentially start masking different features out 00:46:32.130 --> 00:46:33.660 and running recognition performance 00:46:33.660 --> 00:46:36.070 and seeing when, you know, masking out the nose 00:46:36.070 --> 00:46:38.550 caused the scores to go down. 00:46:38.550 --> 00:46:41.670 With the COVID pandemic and the application of masks, 00:46:41.670 --> 00:46:43.680 we actually had this really interesting opportunity 00:46:43.680 --> 00:46:46.090 to sort of do a natural experiment there 00:46:46.090 --> 00:46:50.410 and say can face recognition system still work 00:46:50.410 --> 00:46:52.610 when we've removed facial information 00:46:52.610 --> 00:46:54.780 from the lower part of the face? 00:46:54.780 --> 00:46:57.410 And, you know, I think the results that Yevgeniy presented 00:46:57.410 --> 00:46:59.620 sort of led us to the conclusion that it can. 00:46:59.620 --> 00:47:02.960 And so it must be using, again, not iris-like features 00:47:02.960 --> 00:47:04.030 but ocular-like features, 00:47:04.030 --> 00:47:06.690 I think that's a very good assumption. 00:47:06.690 --> 00:47:10.867 - So we have more questions. Here's the next one. 00:47:10.867 --> 00:47:13.530 "Has there been or is there planned 00:47:13.530 --> 00:47:18.290 any research on genealogy diversity versus performance? 00:47:18.290 --> 00:47:20.770 In other words, do some demographics 00:47:20.770 --> 00:47:22.880 have greater genealogy diversity 00:47:22.880 --> 00:47:25.190 and therefore a greater difference 00:47:25.190 --> 00:47:27.850 in facial features than others 00:47:27.850 --> 00:47:31.770 that underpin some of the performance differences?" 00:47:31.770 --> 00:47:33.110 So I'll take a part of that 00:47:33.110 --> 00:47:35.280 and I know that John will wanna weigh in. 00:47:35.280 --> 00:47:39.430 So I talked about false non-match rates 00:47:39.430 --> 00:47:41.360 in my portion of the talk. 00:47:41.360 --> 00:47:44.702 And we believe that a lot of the errors 00:47:44.702 --> 00:47:46.310 and failures to match 00:47:46.310 --> 00:47:49.808 can be tied to some of the acquisition components 00:47:49.808 --> 00:47:50.641 of the systems. 00:47:50.641 --> 00:47:53.350 So for instance, I showed you that the predominance 00:47:53.350 --> 00:47:55.790 of the errors made in this scenario test 00:47:57.400 --> 00:47:58.730 wasn't actually due to matching, 00:47:58.730 --> 00:48:01.190 was primarily due to image acquisition. 00:48:01.190 --> 00:48:03.800 And we think that at least one component 00:48:03.800 --> 00:48:05.100 that is responsible for that 00:48:05.100 --> 00:48:07.900 is the quality with which cameras 00:48:07.900 --> 00:48:11.140 can image different shades of a human's skin. 00:48:11.140 --> 00:48:16.010 And there's a number of pieces of research out there 00:48:16.010 --> 00:48:19.210 showing that, you know, that really can affect, 00:48:19.210 --> 00:48:21.630 you could be overexposed or underexposed, 00:48:21.630 --> 00:48:24.100 depending on exactly how that camera is set up 00:48:24.100 --> 00:48:25.890 and we think that's really important. 00:48:25.890 --> 00:48:28.680 So there, I think, it's this acquisition part 00:48:28.680 --> 00:48:29.620 and the way it interacts 00:48:29.620 --> 00:48:32.620 with the skin phenotype that's important, 00:48:32.620 --> 00:48:35.040 but when it comes to that other kind of error, 00:48:35.040 --> 00:48:37.780 false matches, that's where I'll pass it over to John 00:48:37.780 --> 00:48:40.000 'cause I'm sure he'll have some thoughts. 00:48:40.000 --> 00:48:43.590 - Yeah, this is a absolutely fantastic question. 00:48:43.590 --> 00:48:47.380 There is almost certainly something happening here 00:48:47.380 --> 00:48:51.533 that is more deep than simple race self-reporting. 00:48:53.288 --> 00:48:54.500 The simple answer to your question 00:48:54.500 --> 00:48:56.580 is there has been some research on this 00:48:56.580 --> 00:48:58.030 but not really at the genealogy level. 00:48:58.030 --> 00:49:03.030 So we know, for example, that twins who share underlying DNA 00:49:03.300 --> 00:49:05.160 give face recognition a problem, right? 00:49:05.160 --> 00:49:07.110 That face recognition algorithms and humans 00:49:07.110 --> 00:49:12.110 will think that twin A and twin B look very, very similar. 00:49:12.620 --> 00:49:16.040 We also know that taking it sort of one step further removed 00:49:16.040 --> 00:49:19.433 from genealogical identity, 00:49:21.080 --> 00:49:24.220 parents and siblings also share facial features, 00:49:24.220 --> 00:49:25.980 so there's a genealogy link 00:49:25.980 --> 00:49:28.580 that comes across in facial recognition. 00:49:28.580 --> 00:49:31.990 We don't know, and it's really important in my mind, 00:49:31.990 --> 00:49:34.980 an area of research, 00:49:34.980 --> 00:49:37.840 where that sort of genealogical breakdown stops, 00:49:37.840 --> 00:49:39.910 where people stop looking similar 00:49:39.910 --> 00:49:42.330 because they happen to share genealogy. 00:49:42.330 --> 00:49:44.810 And so the planned part of your question 00:49:44.810 --> 00:49:46.210 is it's sorely needed. 00:49:46.210 --> 00:49:49.440 We don't have anything right now to sort of work on this. 00:49:49.440 --> 00:49:51.080 Although there has been a little bit of work 00:49:51.080 --> 00:49:54.963 in doing, like, reconstructions from DNA to face, 00:49:56.070 --> 00:49:58.160 but we haven't done any of that to date. 00:49:58.160 --> 00:50:02.293 But we need to. It's a really good area to look into. 00:50:05.380 --> 00:50:07.000 - Yeah, one thing I'll add here 00:50:07.000 --> 00:50:08.940 is that I think the question asks 00:50:08.940 --> 00:50:11.360 could there be greater diversity of genealogy 00:50:11.360 --> 00:50:12.780 in some groups than others? 00:50:12.780 --> 00:50:16.160 And I think, you know, as a neuroscientist, 00:50:16.160 --> 00:50:21.160 you know, we all needed to recognize conspecifics, 00:50:21.420 --> 00:50:24.020 as John mentioned, we need to tell friend from foe, 00:50:24.020 --> 00:50:28.090 we need to tell apart the individuals in our groups 00:50:28.090 --> 00:50:30.000 and individuals outside our groups. 00:50:30.000 --> 00:50:32.000 I don't think that those evolutionary pressures 00:50:32.000 --> 00:50:33.830 have ever been different. 00:50:33.830 --> 00:50:35.340 So it's not clear, 00:50:35.340 --> 00:50:38.640 you know, and we can all achieve these tasks well 00:50:38.640 --> 00:50:42.580 regardless, you know, what our origins are. 00:50:42.580 --> 00:50:45.000 So I think that really these are, 00:50:45.000 --> 00:50:47.420 the questions that we raised on the false match side 00:50:47.420 --> 00:50:48.253 with face recognition 00:50:48.253 --> 00:50:50.810 is that, you know, people that are similar 00:50:50.810 --> 00:50:53.190 genetically to each other, like twins, 00:50:53.190 --> 00:50:56.350 are gonna be similar in their face characteristics. 00:50:56.350 --> 00:50:59.240 But I don't know of any evidence of differences 00:50:59.240 --> 00:51:02.493 in diversity for specific groups. 00:51:04.310 --> 00:51:05.833 So we do have other questions. 00:51:06.937 --> 00:51:09.990 "The science here is potentially evolving 00:51:09.990 --> 00:51:11.910 past the public debate, 00:51:11.910 --> 00:51:14.110 and will these findings be published? 00:51:14.110 --> 00:51:17.567 And if so, where? They could be helpful." 00:51:21.690 --> 00:51:23.570 So I think John mentioned 00:51:23.570 --> 00:51:26.560 that we have publications available 00:51:28.400 --> 00:51:32.723 on the Biometrics and Identity Technology Center website. 00:51:34.599 --> 00:51:37.840 And a lot of this research is also published 00:51:37.840 --> 00:51:40.433 in academic journals as well. 00:51:42.386 --> 00:51:47.260 So I believe you could go to BI-TC website today 00:51:47.260 --> 00:51:52.260 and download this technical paper series that John briefed, 00:51:52.640 --> 00:51:54.560 so that's available. 00:51:54.560 --> 00:51:56.170 And the components 00:51:56.170 --> 00:51:59.130 of the demographically disaggregated analysis 00:51:59.130 --> 00:52:00.510 that I briefed earlier 00:52:01.540 --> 00:52:03.130 are available in brief format 00:52:03.130 --> 00:52:05.050 but have not been explicitly published. 00:52:05.050 --> 00:52:06.720 For the 2021 rally, 00:52:06.720 --> 00:52:10.560 it's one of our goals to brief 00:52:10.560 --> 00:52:13.410 and to make this demographically disaggregated analysis 00:52:13.410 --> 00:52:17.210 of commercial technology available as well. 00:52:17.210 --> 00:52:20.290 Arun, do you wanna weigh in here? 00:52:20.290 --> 00:52:22.870 - Yeah, I'll just point out that we have, 00:52:22.870 --> 00:52:26.550 so from our Biometric and Identity Technology Center page, 00:52:26.550 --> 00:52:30.320 you can also find a link to our MDTF page, mdtf.org, 00:52:30.320 --> 00:52:31.553 and we have a number of papers there 00:52:31.553 --> 00:52:33.390 that are linked as well. 00:52:33.390 --> 00:52:35.830 You know, there's this constant balance 00:52:35.830 --> 00:52:39.680 we're trying to strike about putting out good content 00:52:39.680 --> 00:52:41.640 and material and analyses here, 00:52:41.640 --> 00:52:44.870 and also going through like, peer-reviewed processes, right? 00:52:44.870 --> 00:52:49.250 Peer review is not always a timely process. 00:52:49.250 --> 00:52:50.570 It can take a lot of time 00:52:50.570 --> 00:52:53.750 and sometimes it's just because the editors are busy, right? 00:52:53.750 --> 00:52:55.390 So we are trying to make sure 00:52:55.390 --> 00:52:57.640 we are putting out information in a timely basis, 00:52:57.640 --> 00:52:59.793 that it's informative and useful, 00:53:00.640 --> 00:53:03.300 but yeah, it is also beneficial 00:53:03.300 --> 00:53:05.690 to get a peer review but sometimes, 00:53:05.690 --> 00:53:08.580 and that's what we've almost exclusively done in the past. 00:53:08.580 --> 00:53:11.250 It's just that that process was just taking so long 00:53:11.250 --> 00:53:13.693 and it was preventing us from helping to get content out 00:53:13.693 --> 00:53:15.910 into some of the public forum, 00:53:15.910 --> 00:53:20.320 'cause otherwise people send out misinformation 00:53:20.320 --> 00:53:22.060 via tweets in seconds. 00:53:22.060 --> 00:53:25.610 It's very hard for us to go through a peer review process 00:53:25.610 --> 00:53:27.770 and then put something out to contest that 00:53:27.770 --> 00:53:30.170 when our processes are so much longer. 00:53:30.170 --> 00:53:31.620 So we do go through the internal, 00:53:31.620 --> 00:53:33.940 so we do go through S&T processes to review it 00:53:33.940 --> 00:53:35.730 before we publish it. 00:53:35.730 --> 00:53:37.670 But anyway, so we're trying to do a combination of both 00:53:37.670 --> 00:53:39.630 so we can make sure better information is available 00:53:39.630 --> 00:53:42.030 to people who need this information 00:53:42.030 --> 00:53:43.600 to help inform public policy 00:53:43.600 --> 00:53:45.300 and public debate on these topics. 00:53:46.350 --> 00:53:48.450 I think we have a couple more questions here. 00:53:48.450 --> 00:53:50.720 - [Yevgeniy] Yep, so the next one here is, 00:53:50.720 --> 00:53:52.640 oops- - I sent it away, sorry. 00:53:52.640 --> 00:53:55.120 It's on the published, I'll- - Okay. 00:53:55.120 --> 00:53:56.397 Go ahead, Arun, please. 00:53:56.397 --> 00:53:59.160 - "The other race effect has been in literature 00:53:59.160 --> 00:54:00.960 for a long time. 00:54:00.960 --> 00:54:03.280 Wouldn't a way of mitigating equitability 00:54:03.280 --> 00:54:05.517 be to have a more diverse training set?" 00:54:06.810 --> 00:54:08.943 - So I'll start with this one. 00:54:10.005 --> 00:54:11.520 What we showed, 00:54:11.520 --> 00:54:13.980 what John showed on his slides about face recognition 00:54:13.980 --> 00:54:16.400 is not the other race effect. 00:54:16.400 --> 00:54:17.853 The other race effect, 00:54:18.960 --> 00:54:23.060 it says that if I am raised in an environment 00:54:23.060 --> 00:54:26.050 where I'm exposed to people of a certain demographic group, 00:54:26.050 --> 00:54:29.030 that I am better at discriminating faces 00:54:29.030 --> 00:54:31.130 that belong to that same demographic group, 00:54:31.130 --> 00:54:33.820 and usually it's my own demographic group. 00:54:33.820 --> 00:54:35.460 What John was talking about 00:54:35.460 --> 00:54:38.750 is that face recognition in general 00:54:38.750 --> 00:54:41.410 has a greater propensity to confuse people 00:54:41.410 --> 00:54:45.053 that match in race, gender, and age to each other, 00:54:46.070 --> 00:54:48.060 which is a very different thing. 00:54:48.060 --> 00:54:50.340 And it's something that took a while 00:54:50.340 --> 00:54:52.100 to wrap our heads around 00:54:52.100 --> 00:54:54.760 because of this cognitive bias that we have 00:54:54.760 --> 00:54:58.520 that says, hey, faces are more similar to us perceptually. 00:54:58.520 --> 00:55:01.090 But that's because we have this neural circuitry 00:55:01.090 --> 00:55:02.740 that tells us so. 00:55:02.740 --> 00:55:05.110 We don't have this type of neural circuitry or intuition 00:55:05.110 --> 00:55:07.920 for iris recognition or fingerprint recognition. 00:55:07.920 --> 00:55:10.050 And in fact, these systems don't make 00:55:10.050 --> 00:55:12.123 those same kind of demographic, 00:55:13.150 --> 00:55:14.740 the demographic confusion matrix 00:55:14.740 --> 00:55:17.080 of those systems looks very different. 00:55:17.080 --> 00:55:19.000 John, do you wanna add to that? 00:55:19.000 --> 00:55:21.330 - Yeah, so this is actually 00:55:21.330 --> 00:55:24.610 a really clean case 00:55:24.610 --> 00:55:27.360 of where diverse training set certainly helps 00:55:27.360 --> 00:55:28.760 but it doesn't solve this problem. 00:55:28.760 --> 00:55:32.170 So we hear a lot from people 00:55:32.170 --> 00:55:33.440 that are in the face recognition space 00:55:33.440 --> 00:55:36.060 that, oh, if we only had a more diverse dataset, 00:55:36.060 --> 00:55:37.880 a lot of these problems would disappear. 00:55:37.880 --> 00:55:41.210 And here I think is a great sort of counter example. 00:55:41.210 --> 00:55:44.690 So imagine you're sort of training a face recognition system 00:55:44.690 --> 00:55:47.720 and you set the optimization parameters, 00:55:47.720 --> 00:55:49.940 or you are telling the computer program 00:55:49.940 --> 00:55:52.090 what doing a good job looks like. 00:55:52.090 --> 00:55:55.100 And in most ways that these things are trained right now, 00:55:55.100 --> 00:55:56.270 it has two objectives. 00:55:56.270 --> 00:55:58.510 It needs to think pictures of me 00:55:58.510 --> 00:56:01.640 look like other pictures of me, they get high scores, 00:56:01.640 --> 00:56:04.230 and pictures of me and other people get low scores. 00:56:04.230 --> 00:56:08.170 And you sort of say, okay, computer program, neural net, 00:56:08.170 --> 00:56:09.530 accomplish these two things. 00:56:09.530 --> 00:56:10.900 And if you accomplish these two things, 00:56:10.900 --> 00:56:12.950 I have a working face recognition system. 00:56:13.839 --> 00:56:16.260 What this actually says is there's a third criteria 00:56:16.260 --> 00:56:18.460 so even getting a more diverse dataset 00:56:18.460 --> 00:56:19.293 wouldn't solve this problem. 00:56:19.293 --> 00:56:22.260 You need to add this third optimization parameter 00:56:22.260 --> 00:56:26.650 and it's that the person who looks most like me 00:56:26.650 --> 00:56:28.860 shouldn't also share my demographic group, 00:56:28.860 --> 00:56:30.750 shouldn't be a 30-year-old Caucasian male. 00:56:30.750 --> 00:56:32.610 So it's a great example 00:56:32.610 --> 00:56:34.260 of where, yeah, a diverse training set would help 00:56:34.260 --> 00:56:35.623 but you'd then also need to make this recognition 00:56:35.623 --> 00:56:40.623 that you have to update your optimum steps 00:56:41.230 --> 00:56:42.543 and add this new thing. 00:56:43.870 --> 00:56:44.980 Short answer, yes, it would help, 00:56:44.980 --> 00:56:47.183 no, it doesn't solve this particular problem. 00:56:48.420 --> 00:56:50.193 - [Yevgeniy] Yep. And we have one more. 00:56:50.193 --> 00:56:52.850 - Oh, I'm sorry. - Go ahead, Arun. 00:56:52.850 --> 00:56:55.850 - Did you guys go over, like, some of the reasons 00:56:55.850 --> 00:56:57.380 why we're looking into this in particular, 00:56:57.380 --> 00:56:59.290 the whole thing about protected classes 00:56:59.290 --> 00:57:01.420 and trying to have equitable performance across... 00:57:01.420 --> 00:57:02.340 Yep. Okay, nevermind. 00:57:02.340 --> 00:57:04.420 - Yeah, so we didn't go through it in detail 00:57:04.420 --> 00:57:06.520 and I think it makes sense to do that now. 00:57:08.450 --> 00:57:12.720 Yeah, Arun, so protected groups came up a little bit earlier 00:57:12.720 --> 00:57:14.670 and I wanted to point out a ramification 00:57:14.670 --> 00:57:18.320 and I don't think we put that slide into this brief. 00:57:18.320 --> 00:57:23.320 But there's a ramification that people need to consider 00:57:23.410 --> 00:57:27.930 about face recognition making these more confusions 00:57:27.930 --> 00:57:30.870 between people of the same race, gender, and age. 00:57:30.870 --> 00:57:34.590 And these happen because let's say I have a gallery 00:57:34.590 --> 00:57:38.490 of people who share my demographic characteristics. 00:57:38.490 --> 00:57:42.713 Everybody is, let's say, a white male of a certain age. 00:57:43.830 --> 00:57:46.550 And if I am not in that gallery 00:57:46.550 --> 00:57:48.840 but other people that look like me are, 00:57:48.840 --> 00:57:52.040 and I go to match against that gallery, 00:57:52.040 --> 00:57:54.370 then, in general, if you have an accurate system, 00:57:54.370 --> 00:57:57.020 my likelihood of a match will be low 00:57:57.020 --> 00:57:59.110 but that low likelihood of a match 00:57:59.110 --> 00:58:02.750 may still be higher against that gallery of white males 00:58:03.973 --> 00:58:08.640 than, say, an Asian female against that same gallery, right? 00:58:08.640 --> 00:58:10.550 So I would have a greater hazard 00:58:10.550 --> 00:58:13.770 of matching falsely to this gallery of white males 00:58:13.770 --> 00:58:16.700 than somebody else that doesn't share my demographics. 00:58:16.700 --> 00:58:18.090 And in many circumstances, 00:58:18.090 --> 00:58:21.870 that could be considered, you know, not desirable or unfair, 00:58:21.870 --> 00:58:24.080 because if that gallery is something 00:58:24.080 --> 00:58:26.430 that, you know, perhaps I don't wanna match to, 00:58:28.140 --> 00:58:29.220 maybe it's a list of people 00:58:29.220 --> 00:58:32.220 that would not be allowed to fly that day, 00:58:32.220 --> 00:58:34.070 then I would have a greater sort of hazard 00:58:34.070 --> 00:58:36.510 of having that error occur. 00:58:36.510 --> 00:58:38.460 And with face recognition, 00:58:38.460 --> 00:58:41.383 that could be something that we need to worry about. 00:58:45.630 --> 00:58:48.070 So I'll go through the last question here, 00:58:48.070 --> 00:58:50.150 we're running up on our time, 00:58:50.150 --> 00:58:51.207 and that last question is, 00:58:51.207 --> 00:58:54.620 "Have infrared systems been used in face recognition? 00:58:54.620 --> 00:58:58.770 And are those system less prone to demographic effects?" 00:58:58.770 --> 00:59:00.803 And I think the answer is yes, 00:59:01.640 --> 00:59:03.070 and we've actually, in the rally, 00:59:03.070 --> 00:59:07.470 have had some systems that have used infrared light 00:59:07.470 --> 00:59:09.690 for their acquisition. 00:59:09.690 --> 00:59:12.160 We did see some different demographic characteristics 00:59:12.160 --> 00:59:13.860 but it's a very small sample. 00:59:13.860 --> 00:59:16.700 Most face recognition systems today 00:59:16.700 --> 00:59:19.010 that have participated in our testing 00:59:19.010 --> 00:59:24.010 have been visible spectrum face acquisition systems 00:59:24.310 --> 00:59:26.703 using typical RGB sensors. 00:59:28.100 --> 00:59:30.190 But it is an open question of what would happen 00:59:30.190 --> 00:59:33.250 to these acquisition demographic effects 00:59:33.250 --> 00:59:35.633 if infrared systems were used. 00:59:40.010 --> 00:59:44.440 So that takes us through the end of the questions. 00:59:44.440 --> 00:59:46.640 Arun, I'll hand it off to you. 00:59:46.640 --> 00:59:47.820 - Yeah, thank you so much. 00:59:47.820 --> 00:59:52.033 So Yevgeniy and John, thank you so much for doing the call, 00:59:53.350 --> 00:59:54.880 for sharing the webinar, 00:59:54.880 --> 00:59:56.790 and thanks to all the participants 00:59:56.790 --> 00:59:58.180 who joined us this afternoon 00:59:58.180 --> 00:59:59.810 to learn more about some of our research 00:59:59.810 --> 01:00:02.440 and learn some of the work that we're doing. 01:00:02.440 --> 01:00:03.410 If you have any questions, 01:00:03.410 --> 01:00:05.890 please feel free to reach out to me 01:00:05.890 --> 01:00:09.310 within the Biometric and Identity Technology Center. 01:00:09.310 --> 01:00:12.100 You can always get me on Teams or email. 01:00:12.100 --> 01:00:14.500 In fact, I think I just saw a couple of emails come through, 01:00:14.500 --> 01:00:17.700 so I'm happy to help follow up and help answer questions 01:00:17.700 --> 01:00:20.150 and share any information as it's relevant. 01:00:20.150 --> 01:00:24.253 So, yeah, thanks again and please, please keep in touch.