1 00:00:00,010 --> 00:00:04,110 DHS Science and Technology - The 2019 Biometrics Technology Rally 2 00:00:04,110 --> 00:00:09,250 [Arun] This webinar will focus on the camera systems or the acquisition systems. 3 00:00:09,250 --> 00:00:14,510 We will be hosting a second webinar in a few weeks and that will focus on the matching algorithms. 4 00:00:14,510 --> 00:00:19,690 So one of the things we wanted to do was to provide some information about how these technologies work. 5 00:00:19,690 --> 00:00:24,770 At least the collection systems to give you a better insight into how these may play out into normal operations. 6 00:00:24,770 --> 00:00:29,870 So actually let's go ahead, let's jump into the presentation. We will go into the first slide. So today we're going 7 00:00:29,870 --> 00:00:34,890 give you a brief overview of what the biometrics rally was. 8 00:00:34,890 --> 00:00:40,100 How we executed it. We're going to talk about the goals for the rally that we just completed. How we 9 00:00:40,100 --> 00:00:45,190 measured performance and what the results were and as well as some observations about how technology 10 00:00:45,190 --> 00:00:50,360 companies performed this year, compared to last year. So the Biometric and Identity Technology Center 11 00:00:50,360 --> 00:00:55,460 within DHS S&T is relatively new construct. It's focusing on 12 00:00:55,460 --> 00:01:00,700 centralizing and providing core biometric, research development, test and evaluation capabilities 13 00:01:00,700 --> 00:01:05,840 so that we can look at how biometric technologies can be used across multiple 14 00:01:05,840 --> 00:01:10,890 different mission sets. The goal is to support, to accelerate 15 00:01:10,890 --> 00:01:16,130 learning, to help different DHS components in understanding 16 00:01:16,130 --> 00:01:21,320 what technologies may be relevant for their specific use case, to inform their planning and identify 17 00:01:21,320 --> 00:01:26,430 issues that they may have to mitigate. We really want to focus on improving communications 18 00:01:26,430 --> 00:01:31,740 between stakeholders as well as ensuring knowledge share so that best practices 19 00:01:31,740 --> 00:01:37,010 and information can be shared across different missions and different groups. 20 00:01:37,010 --> 00:01:42,030 Today we are going to focus a lot on some test and evaluation capabilities that are more broadly 21 00:01:42,030 --> 00:01:47,130 applicable. So biometric technology rallies, so this is a new 22 00:01:47,130 --> 00:01:52,140 mechanism that we've established just last year. It focuses on defining specific 23 00:01:52,140 --> 00:01:57,380 use case that is relevant to DHS and DHS stakeholders. In this particular 24 00:01:57,380 --> 00:02:02,540 case we identified a use case, which we called "high throughput biometric screening". 25 00:02:02,540 --> 00:02:07,610 Basically this is different from most types of other biometric applications because 26 00:02:07,610 --> 00:02:12,620 we focus on the situation where you may need to verify 27 00:02:12,620 --> 00:02:17,800 using biometrics or recognizing the identity of individuals 28 00:02:17,800 --> 00:02:23,010 in a very high throughput scenario. And when we say high throughput we're talking about under 10 second. 29 00:02:23,010 --> 00:02:28,020 Maybe under five seconds with minimal staffing and being able to 30 00:02:28,020 --> 00:02:33,240 process not hundreds but maybe thousands, tens of thousands or more people in a matter of 31 00:02:33,240 --> 00:02:38,380 perhaps minutes or hours. The goal here really is to try to make these technologies 32 00:02:38,380 --> 00:02:43,390 much more faster, much easier to use, much more accurate and reduce 33 00:02:43,390 --> 00:02:48,640 all situations where there are any types of errors. So really what we're doing here too is helping 34 00:02:48,640 --> 00:02:53,840 to identify these types of risks and issues before technologies are acquired 35 00:02:53,840 --> 00:02:58,950 and deployed. And we provide this independent test capability and we share the results 36 00:02:58,950 --> 00:03:04,270 so that whether its DHS mission stakeholders. Our interagency partners, our international partners 37 00:03:04,270 --> 00:03:09,510 that this is more broadly available to all of our stakeholders so that we can work together and 38 00:03:09,510 --> 00:03:14,680 increase the consumer base and we're buying the right technologies that meet all of our different missions. 39 00:03:14,680 --> 00:03:19,770 And with that let's go to the next slide and I'll turn things over to my colleague Jake Hasselgren. 40 00:03:19,770 --> 00:03:24,800 [Jake] Thanks Arun. First as Arun had mentioned, we did target the high throughput 41 00:03:24,800 --> 00:03:30,050 unmanned use case. So we were looking at systems that we're able to process an individual 42 00:03:30,050 --> 00:03:35,080 in ten seconds or less and also there was no human in the loop a pie interest use case. 43 00:03:35,080 --> 00:03:40,350 We did focus on face, fingerprint and iris systems that collected that imagery. 44 00:03:40,350 --> 00:03:45,560 In addition we looked at acquisition systems and algorithms. 45 00:03:45,560 --> 00:03:50,650 Arun had mentioned this is the first of two webinars. This one is going to be focusing mostly on the 46 00:03:50,650 --> 00:03:55,990 acquisition systems. We did implement a very similar 47 00:03:55,990 --> 00:04:01,220 test design to the rally that we had performed last year, to enable us to measure 48 00:04:01,220 --> 00:04:06,410 progress from one to the other. And I believe Yevgeniy will be discussing that. 49 00:04:06,410 --> 00:04:11,500 The difference between this rally and last year’s rally is that we did allow for high throughput systems. 50 00:04:11,500 --> 00:04:16,890 This was the first time that we had done that. We did assess the ability of acquisitions 51 00:04:16,890 --> 00:04:22,110 systems across multiple algorithms. Additionally we assess the ability 52 00:04:22,110 --> 00:04:27,260 of algorithms across multiple acquisition systems. Again these two bullet points are gonna be out of scope 53 00:04:27,260 --> 00:04:32,330 for this webinar, but they will be included in a future webinar. And finally we are 54 00:04:32,330 --> 00:04:37,610 what we did for acquisition systems is we had them sign a cooperative research and 55 00:04:37,610 --> 00:04:42,620 development agreement and that allowed us to collaborate them and also guide any 56 00:04:42,620 --> 00:04:47,730 any promising technologies to better fit DHS's use cases. 57 00:04:47,730 --> 00:04:52,740 Okay, so the 2019 biometric rally was a pretty aggressive 58 00:04:52,740 --> 00:04:57,870 timeline. We started our test design in September of 2018 59 00:04:57,870 --> 00:05:02,900 and we made our first call out for submission and applications in November of 2018. 60 00:05:02,900 --> 00:05:07,940 Once we made that call, we gave any acquisition system a month to submit 61 00:05:07,940 --> 00:05:13,040 their applications. These applications included things 62 00:05:13,040 --> 00:05:18,340 described their system maturity, how they planned on having that interaction 63 00:05:18,340 --> 00:05:23,370 with the volunteers so we took that information. Once we reviewed and 64 00:05:23,370 --> 00:05:28,520 decided which systems we wanted to look at we delivered conditional acceptances. 65 00:05:28,520 --> 00:05:33,560 And once they received the conditional acceptances they had three months to integrate their 66 00:05:33,560 --> 00:05:38,810 system into our API, which we hosted in a cloud form so they can 67 00:05:38,810 --> 00:05:43,820 do that testing and integration remotely. And then finally once they had completed that integration 68 00:05:43,820 --> 00:05:48,900 we delivered a final acceptance notification and starting in May of 2019 69 00:05:48,900 --> 00:05:54,200 we had the systems installed at the MBTF and the actual rally collection began on 70 00:05:54,200 --> 00:05:59,300 May 9th of 2019. So each acquisition system had a number 71 00:05:59,300 --> 00:06:04,310 of requirements that they had to adhere to. First as we had mentioned, they had to operate in unmanned mode so 72 00:06:04,310 --> 00:06:09,330 there could not be any human in the loop. The system itself had to give all the feedback and instruction. 73 00:06:09,330 --> 00:06:14,340 The system had to operate within the MdTF infrastructure or the API and also 74 00:06:14,340 --> 00:06:19,490 physically had to operate in a six foot by eight foot area defined by us. 75 00:06:19,490 --> 00:06:24,520 Again the system had to click either face, iris or fingerprint imagery. 76 00:06:24,520 --> 00:06:29,770 Each system had to submit at least one biometric probe per test volunteer and had to do so within the time 77 00:06:29,770 --> 00:06:34,950 constraints defined by DHS S&T. Optionally the systems 78 00:06:34,950 --> 00:06:40,030 could provide multiple algorithms should they choose to do so. We did have a few that did. 79 00:06:40,030 --> 00:06:45,310 And optionally their systems were able to submit three sets of images per modality, per volunteer. 80 00:06:45,310 --> 00:06:50,440 Okay so this slide gives you an idea of the applications that we received. 81 00:06:50,440 --> 00:06:55,480 This graphic shows that we had gotten the applications from multiple countries. I think seven to be specific. 82 00:06:55,480 --> 00:07:00,490 We did receive 26 applications for application systems, which is an increase from last year. 83 00:07:00,490 --> 00:07:05,710 We only got 19 last year. So good news story. We took 84 00:07:05,710 --> 00:07:10,860 those 26 applications and distributed them to a panel of 85 00:07:10,860 --> 00:07:15,910 reviewers. And that panel of reviewers were consistent of representatives from DHS 86 00:07:15,910 --> 00:07:21,160 Department of State, IORPA, DOJ, NIST 87 00:07:21,160 --> 00:07:26,350 a number of different government agencies and those reviewers looked at the application packages 88 00:07:26,350 --> 00:07:31,450 reviewed them and we took those reviews and we down selected to 14 89 00:07:31,450 --> 00:07:36,650 acquisition systems that we wanted to evaluate in the 2019 rally. 90 00:07:36,650 --> 00:07:41,670 So this slide gives you an idea of how it looked in the facility. So the image on the right 91 00:07:41,670 --> 00:07:46,850 six foot by eight foot area that the acquisition systems had to install into and 92 00:07:46,850 --> 00:07:52,030 also at the MdTF, we instrumented those stations with a 93 00:07:52,030 --> 00:07:57,120 number of measurement devices. So we had ground troop scans, beam breaks, satisfaction kiosk that we 94 00:07:57,120 --> 00:08:02,550 could use to get these measures and pour a specific 95 00:08:02,550 --> 00:08:07,810 test day how would we run this if we had a group of volunteers line up at each station? 96 00:08:07,810 --> 00:08:12,990 They would then get individually scanned into the station and each person had a unique 97 00:08:12,990 --> 00:08:18,080 ID on the wristband that we would scan for ground truth. As they enter the station they would cross a beam break 98 00:08:18,080 --> 00:08:23,400 starting an transaction. They would interact with the system how the system prompted them 99 00:08:23,400 --> 00:08:28,630 to and then they would leave the station crossing another beam break 100 00:08:28,630 --> 00:08:33,800 designating the end of a transaction and within those two beam breaks the system was sending images 101 00:08:33,800 --> 00:08:38,850 to our back-end. And once they exited the station they did submit a satisfaction 102 00:08:38,850 --> 00:08:44,120 score using a satisfaction kiosk that we had developed in-house. A couple differences from 103 00:08:44,120 --> 00:08:49,330 2018, 2019 rally. We didn't require anyone to submit face 104 00:08:49,330 --> 00:08:54,370 images. In 2018 we did. They submitted whatever mode they wanted to. 105 00:08:54,370 --> 00:08:59,590 Okay, so we took extra care to make sure this was a fair test. So we focused on a couple of aspects 106 00:08:59,590 --> 00:09:05,080 to get us the data that we needed. First we wanted 107 00:09:05,080 --> 00:09:10,270 an appropriate sample size. So we wanted to collect on at least 400 test volunteers 108 00:09:10,270 --> 00:09:15,380 for each acquisition system. We ended up getting 430 so that was good. We also wanted to provide 109 00:09:15,380 --> 00:09:20,710 a diverse blocking of demographics across the test volunteers and we did that through 110 00:09:20,710 --> 00:09:25,990 age, gender, race. Prior experience with different MdTF 111 00:09:25,990 --> 00:09:31,190 evaluations. Being that the test volunteers were going to... I see a question "What is MdTF?" It's the Maryland 112 00:09:31,190 --> 00:09:36,290 test facility where this evaluation occurred. 113 00:09:36,290 --> 00:09:41,600 Being that we had multiple systems and each volunteer was using multiple systems per test session. 114 00:09:41,600 --> 00:09:46,840 Implement a counter balancing scheme so we can mitigate learning effects and carry over effects across 115 00:09:46,840 --> 00:09:52,030 these different acquisition systems. We didn't want one system to always occur before 116 00:09:52,030 --> 00:09:57,110 another system and we didn't want the same order because we didn't want learning effects to take over. 117 00:09:57,110 --> 00:10:02,420 And then finally we did consider test schedule, because we wanted to be able to 118 00:10:02,420 --> 00:10:07,660 recover in case of any sort of anomaly such as inclement weather or any reason why we would have to 119 00:10:07,660 --> 00:10:12,700 reschedule a test date. So like I said we did want to block 120 00:10:12,700 --> 00:10:17,710 a kind of diverse distribution of demographics. This graphic 121 00:10:17,710 --> 00:10:22,850 here gives you an idea of what was the resulting distributions so we had a pretty even 122 00:10:22,850 --> 00:10:27,940 spread across gender so male and female are about 50/50. We did have a pretty wide distribution 123 00:10:27,940 --> 00:10:33,260 of ages from 18 to 81. As far as race we a majority 124 00:10:33,260 --> 00:10:38,500 of African Americans, a pretty close number of caucasians and then other being 125 00:10:38,500 --> 00:10:43,660 Hispanic and Asian. And then we also had a blocking of 126 00:10:43,660 --> 00:10:48,750 different anthropolistic of heights and across the gender. And like I said we did end up recruiting four 127 00:10:48,750 --> 00:10:53,810 hundred test volunteers in each acquisition system. 128 00:10:53,810 --> 00:10:59,040 Okay so I'm going to pass the mic over to John and he'll, talk about some of the metrics. [John] Thanks, Jake. 129 00:10:59,040 --> 00:11:04,220 So Jake talks a lot about the test design. These are the actual metrics that we 130 00:11:04,220 --> 00:11:09,590 evaluated each system on, we'll run it through each one of them at a high level here. The first ones efficiency. 131 00:11:09,590 --> 00:11:14,750 Basically how fast could people use every one of the 14 systems that was in our 132 00:11:14,750 --> 00:11:19,830 test. We defined how fast as basically the time between the 133 00:11:19,830 --> 00:11:25,130 two beam breaks that Jake discussed back on slide nine. 134 00:11:25,130 --> 00:11:30,380 The next thing we looked at was satisfaction. Jake mentioned that there is this little 135 00:11:30,380 --> 00:11:35,410 four button happy/not kiosk at the very end. Our top line satisfaction metric was how many 136 00:11:35,410 --> 00:11:40,460 people either hit the green button which was very happy or the yellow button that was just happy, so we wanted to 137 00:11:40,460 --> 00:11:45,730 know what percentage of those 430 people were happy or very happy after interacting with every single 138 00:11:45,730 --> 00:11:50,740 biometric device. Effectiveness is how well each one of these systems worked. It's a 139 00:11:50,740 --> 00:11:55,810 little bit more complicated, because there's a couple of different measures here. The first one we looked at is 140 00:11:55,810 --> 00:12:01,180 failure to acquire rate. So this is how many times a test volunteer walked up to 141 00:12:01,180 --> 00:12:06,480 a given system and that system was unable to capture a usable biometric sample. So they 142 00:12:06,480 --> 00:12:11,680 tried to take a picture of the face and they accidentally took a picture of a light in the background. 143 00:12:11,680 --> 00:12:16,780 Or it was a fingerprint sensor and the person didn't know to press hard enough on it and they didn't get a good 144 00:12:16,780 --> 00:12:22,190 fingerprint sample. That's our failure to acquire rate. The next metric we look at is the true 145 00:12:22,190 --> 00:12:27,480 identification rate. So for every single person that crossed 146 00:12:27,480 --> 00:12:32,650 each station, we matched the samples to 1-4 year old historic samples 147 00:12:32,650 --> 00:12:37,720 by a modality basis. So we have faces that we have 148 00:12:37,720 --> 00:12:43,000 acquired from our test volunteers during their previous interactions with us. We've got the fingerprint 149 00:12:43,000 --> 00:12:48,240 we've got iris samples. This true identification is how many people transitioned to 150 00:12:48,240 --> 00:12:53,370 each station and we got a biometric that matched something we collected at a previous 151 00:12:53,370 --> 00:12:53,510 time. For both of these effectiveness measures failure to 152 00:12:53,510 --> 00:12:58,560 time. For both of these effectiveness measures failure to acquire and tread 153 00:12:58,560 --> 00:13:03,860 identification rate. We measure them mas a function of time. So we wanted to look at what is the failure 154 00:13:03,860 --> 00:13:09,060 to acquire rate five seconds after the volunteer enters the bay and the failure to 155 00:13:09,060 --> 00:13:14,170 acquire rate 20 seconds after the volunteer enters the bay. Same thing with true identification rate 156 00:13:14,170 --> 00:13:19,190 we do this because in a lot of travel environments that Arun mentioned we're looking at this high throughput 157 00:13:19,190 --> 00:13:24,440 use case but that can sort of mean different things. So a five second use case might be something like a 158 00:13:24,440 --> 00:13:29,620 ticketing process on a jet bridge or something that takes 10 to 15 to 20 seconds that's maybe 159 00:13:29,620 --> 00:13:34,690 okay at something a global entry station, so we really wanted to break those apart. 160 00:13:34,690 --> 00:13:39,990 There were a couple of stations in the rally, not many but a few that were multi model that did 161 00:13:39,990 --> 00:13:45,200 a fingerprint and a face for example. Those were evaluated sort of independently. We treated it 162 00:13:45,200 --> 00:13:50,240 as what if it was just doing finger print, what if it was just doing face? So the next 163 00:13:50,240 --> 00:13:55,250 6 to 7 slides go through all of the metrics for the rally. I will mention that if you 164 00:13:55,250 --> 00:14:00,510 questions about these, I'm going to go through them kind of quickly just type them into the box. They will 165 00:14:00,510 --> 00:14:05,670 queue up and we will get back to all of these a little bit later. I will also mention that all of these 166 00:14:05,670 --> 00:14:10,760 graphics from these slides are going to be up on MdFT.org that's the Maryland Test Facility website 167 00:14:10,760 --> 00:14:16,060 by the end of the week. So you'll have ample time to sort of 168 00:14:16,060 --> 00:14:21,280 digest them in your own free time as well. I am going to describe how to read the graph 169 00:14:21,280 --> 00:14:26,460 on the right. You are going to see a series of very similar graphs. What you're seeing here is 170 00:14:26,460 --> 00:14:31,560 one metric that was evaluated during the 2019 rally this particular one is efficiency 171 00:14:31,560 --> 00:14:36,820 across the bottom. The x-axis there are the 14 stations that participated in the 2019 rally. 172 00:14:36,820 --> 00:14:42,050 So that's Wood, Sanford, Hunter. You probably don't recognize any of these 173 00:14:42,050 --> 00:14:47,060 biometric system providers and that's on purpose. What we 174 00:14:47,060 --> 00:14:52,390 did is we took companies real name and we aliased them so these are just 175 00:14:52,390 --> 00:14:57,650 mountain peaks, I think actually in the Rockies. Each company knows who they are 176 00:14:57,650 --> 00:15:02,840 they know what their key is, but nobody else knows other companies keys. 177 00:15:02,840 --> 00:15:08,030 The circle [unintelligible] is a filled circle if that 178 00:15:08,030 --> 00:15:13,130 particular company for this particular company, for this particular metric met the goal of the rally. 179 00:15:13,130 --> 00:15:18,150 The half-filled circle is if that particular company for this metric met the threshold for the rally. 180 00:15:18,150 --> 00:15:23,350 And then an empty circle means they didn't. So the Y-axis here is actually average transaction time. That's our 181 00:15:23,350 --> 00:15:28,480 efficiency metric, how long on average did a person spend on those two beam breaks. 182 00:15:28,480 --> 00:15:33,530 So those are pretty good news story for the rally. We can see seven of the fourteen systems actually met the 183 00:15:33,530 --> 00:15:38,780 goal, which was to take less than five seconds on average per subject. 184 00:15:38,780 --> 00:15:44,060 Those sevens were, Rainier, Ritter, Teton, Adams, Dana, Jarvis and Baker. 185 00:15:44,060 --> 00:15:49,270 Another five systems met the threshold for the rally. That was to take less than 10 seconds 186 00:15:49,270 --> 00:15:54,380 per subject Wood, Sanford, Atna, Bear and Gabb. So between the seven 187 00:15:54,380 --> 00:15:59,720 that met the goal and the five that met threshold, that's twelve out of the fourteen systems that either met 188 00:15:59,720 --> 00:16:04,980 the goal or threshold. So that was pretty good. That was better than the 2018 rally 189 00:16:04,980 --> 00:16:10,170 there were actually only two that didn't meet that 10 second threshold. Our fastest was a station 190 00:16:10,170 --> 00:16:15,270 called Teton, which had an average 2.7 seconds per volunteer 191 00:16:15,270 --> 00:16:20,570 average. The other thing I'll point out on this chart is if you look across the very top you'll see 192 00:16:20,570 --> 00:16:25,830 the modalities. So the iris systems are on the left, the face systems 193 00:16:25,830 --> 00:16:31,280 in the middle and you can see there are couple that overlap, there are two iris face system and one finger 194 00:16:31,280 --> 00:16:36,420 face system and then the fingerprinting systems are all the way over on the right. So satisfaction 195 00:16:36,420 --> 00:16:41,490 this is what we think is what we think is another pretty good news story that sort of this narrative out there 196 00:16:41,490 --> 00:16:46,500 in the public that people don't like biometric systems. Our tests sort of quantify this 197 00:16:46,500 --> 00:16:51,660 and find otherwise. Our goals for the rally was we wanted 95% of the people 198 00:16:51,660 --> 00:16:56,670 that interacted with the systems to be either happy or very happy with their interaction. 199 00:16:56,670 --> 00:17:01,920 There were five systems that were able to do that Rainier, Atna, Teton, Dana and Jarvis. 200 00:17:01,920 --> 00:17:07,090 There were another four that got above 90% which was our threshold for the rally. 201 00:17:07,090 --> 00:17:12,100 Standford, Bear, Adams and Baker. So that's nine of the fourteen systems that 202 00:17:12,100 --> 00:17:17,370 90% of these 430 people that interacted with these systems had a happy or very happy 203 00:17:17,370 --> 00:17:22,570 sort of taste in their mouths after they did that. The most satisfying system 204 00:17:22,570 --> 00:17:27,700 the happiest people were was after using station Teton, 98.1% 205 00:17:27,700 --> 00:17:32,720 of our population was either happy or very happy after they 206 00:17:32,720 --> 00:17:37,790 were done having their biometrics captured by that station so we thought that was pretty good. 207 00:17:37,790 --> 00:17:43,050 So now we are going to start looking at the effectiveness measures. The way we are going to do this is we're going 208 00:17:43,050 --> 00:17:48,250 to look for modalities. We're gonna go face, iris, fingers. 209 00:17:48,250 --> 00:17:53,390 And we are going to look at first failure to acquire and then true identification rate. These slides 210 00:17:53,390 --> 00:17:58,480 are very similar to the ones we just looked at. We've got all the stations across the bottom. 211 00:17:58,480 --> 00:18:03,800 And then the metric in this case failure to acquire on the y-axis. 212 00:18:03,800 --> 00:18:09,060 This one failure to acquire you want to be lower, right so having a low rate is good. That means 213 00:18:09,060 --> 00:18:14,230 captured having biometric samples from the majority of the population along the 214 00:18:14,230 --> 00:18:19,310 right hand side you can see we have broken this up by that five and twenty second metric I mentioned. 215 00:18:19,310 --> 00:18:24,600 So on the top you see the five second and the bottom you see the twenty second. 216 00:18:24,600 --> 00:18:29,610 Our goal for the rally was that stations would acquire on 95% of 217 00:18:29,610 --> 00:18:34,730 people or their failure to acquire would be 5% by five seconds and five of our systems 218 00:18:34,730 --> 00:18:39,750 met that. By twenty seconds we wanted stations to acquire a 99% 219 00:18:39,750 --> 00:18:45,000 of people, so given a little bit extra time could you acquire on more of the population 220 00:18:45,000 --> 00:18:50,010 There were four systems that met that goal of a less than 1% failure to acquire 221 00:18:50,010 --> 00:18:55,020 rate by twenty seconds. And then the threshold the thing we said you should be able to 222 00:18:55,020 --> 00:19:00,550 to do or to consider yourself a participant in the rally was to acquire 95% 223 00:19:00,550 --> 00:19:06,160 of our population by twenty seconds. Two additional stations Atna 224 00:19:06,160 --> 00:19:11,340 met that mark. Our lowest failure to acquire rate we actually did have a system 225 00:19:11,340 --> 00:19:16,420 this year that acquired a usable biometric sample on every one of the 430 people 226 00:19:16,420 --> 00:19:21,430 that crossed through the station. That was station Teton. You can see they had a zero percent failure 227 00:19:21,430 --> 00:19:26,620 to acquire rate by five seconds and obviously then a zero percent failure to acquire rate by twenty seconds 228 00:19:26,620 --> 00:19:31,750 as well. So true identification rate, this is probably the top lie number that most 229 00:19:31,750 --> 00:19:36,810 people are interested in. This is the percentage of people that crossed that interacted with the station and 230 00:19:36,810 --> 00:19:42,070 that were successfully identified. So on this chart on the right here, you want to be higher, higher is better. 231 00:19:42,070 --> 00:19:47,140 And you again see we have it broken down by five and twenty seconds. We did have a lot of 232 00:19:47,140 --> 00:19:52,440 stations that were able to meet our goal, which was 95%. Those were Atna, Teton, Bear and Jarvis. 233 00:19:52,440 --> 00:19:57,680 They all identified greater than 95% of people that interacted with them in under five seconds. 234 00:19:57,680 --> 00:20:02,830 Only one though met our more twenty second goal, which 235 00:20:02,830 --> 00:20:08,060 was we want you to identify 99% of our participants and that was station Teton. 236 00:20:08,060 --> 00:20:13,360 And see at the top there it actually identified 99.4% of our 237 00:20:13,360 --> 00:20:18,580 430 subjects. So that's 428 of them within that five and twenty seconds 238 00:20:18,580 --> 00:20:23,720 mark. There were a few others Atna and Bear, Adams and Jarvis that 239 00:20:23,720 --> 00:20:28,760 met the threshold of identifying 95% of people by 20 seconds. So this is how well the 240 00:20:28,760 --> 00:20:34,040 face acquisitions did at identifying people in the various time slices. 241 00:20:34,040 --> 00:20:39,230 You can also not that there are a couple of NA's on these charts on the right here, that's stations that didn't 242 00:20:39,230 --> 00:20:44,390 capture face samples. So this is face only. So now we are going to go 243 00:20:44,390 --> 00:20:49,630 from the face systems to the iris systems but it's going to be the same metrics I talked about first is failure 244 00:20:49,630 --> 00:20:55,420 again within five and within twenty seconds, so here you want to be lower on the chart. 245 00:20:55,420 --> 00:21:01,160 The story for this one is that nobody met those goals that we laid out which were 246 00:21:01,160 --> 00:21:06,780 lower than 5% acquire rate by five seconds and a lower than 1% failure to acquire by 247 00:21:06,780 --> 00:21:12,310 ten seconds. Our best iris station was only able to capture 75% 248 00:21:12,310 --> 00:21:17,640 of the participants with an iris sample. So it had a 25% failure to acquire rate by five seconds. 249 00:21:17,640 --> 00:21:22,780 And then it had a 14% failure to acquire rate by twenty seconds. None of those were quite up to the 250 00:21:22,780 --> 00:21:28,460 goals and the thresholds that we set for the rally. Similar story here on true identification rates 251 00:21:28,460 --> 00:21:34,110 since most of the systems weren't able to capture on enough to meet our thresholds and goals for the rally. 252 00:21:34,110 --> 00:21:39,230 With the iris modality they also weren't able to match that percentage of people. Our best 253 00:21:39,230 --> 00:21:45,000 conforming iris system was station Wood, which identified almost 70% 254 00:21:45,000 --> 00:21:50,690 of people within five seconds and almost 78% of people within eight seconds. Again that's 255 00:21:50,690 --> 00:21:56,010 a little short of the 95 and 99 percent numbers we were looking for in the rallies. 256 00:21:56,010 --> 00:22:01,170 The next slides are just the same slides but with fingerprints. So again 257 00:22:01,170 --> 00:22:08,080 looking at fingerprint acquisition systems what percentage of our population were you able to acquire a usable 258 00:22:08,080 --> 00:22:13,550 biometric sample on. Station Baker did the best, it got about 80% of our population or 20% failure to acquire 259 00:22:13,550 --> 00:22:18,800 rate within five seconds. And then given a little more time we had the station Gabb which took a little longer but 260 00:22:18,800 --> 00:22:24,670 actually overall was able to get more of the population and that was our best station within 261 00:22:24,670 --> 00:22:30,540 twenty seconds that got 82% of the population or an 18% 262 00:22:30,540 --> 00:22:36,360 failure to acquire rate. Again none of these met that 95 and 99% goal we were looking for 263 00:22:36,360 --> 00:22:41,890 acquisition systems. And then this is just the true identification rates for fingerprints. Station Baker 264 00:22:41,890 --> 00:22:46,910 was able to identify about 70% of people within five seconds and station Gabb was able to identify about 72% 265 00:22:46,910 --> 00:22:52,240 people within twenty seconds. Those were our best performing fingerprint stations. And I'll turn it over to 266 00:22:52,240 --> 00:22:57,360 Jean to rap us up. [Jean] John just presented the results of each rally metric separately. So what 267 00:22:57,360 --> 00:23:03,010 I want to do on this slide is orient everybody to how we will be presenting the summary of 268 00:23:03,010 --> 00:23:08,620 these acquisition results for all acquisition systems across all metrics. So for those 269 00:23:08,620 --> 00:23:13,670 who are joining us now that remember last year’s rally event 270 00:23:13,670 --> 00:23:19,350 this visualization should be very familiar. The chart shows 271 00:23:19,350 --> 00:23:24,900 the rally acquisition systems arranged by column and their labeled with their alias 272 00:23:24,900 --> 00:23:30,360 along the top. So you see Wood, Sanford, Hunter. Now in the rally 273 00:23:30,360 --> 00:23:35,680 we had primarily face systems and you see the line 274 00:23:35,680 --> 00:23:40,750 horizontally that sort of arranges the face systems. On the left of the 275 00:23:40,750 --> 00:23:46,440 chart you see three systems that provided iris. The ones where the iris 276 00:23:46,440 --> 00:23:52,840 line overlaps with face for multi-modal providing both face and iris, these are Sanford and 277 00:23:52,840 --> 00:23:58,490 Hunter. And on the very right you see there were some systems that provided only finger Fork or Baker and one 278 00:23:58,490 --> 00:24:04,100 system Gabb that provided both finger and face that was a multi-modal system. This is how the acquisition 279 00:24:04,100 --> 00:24:09,570 systems are arranged along the top. On the left hand side 280 00:24:09,570 --> 00:24:14,880 the rows are actually labeled with different metrics. You have efficiency, satisfaction 281 00:24:14,880 --> 00:24:20,020 and then series of effectiveness metrics. 282 00:24:20,020 --> 00:24:25,740 The first cluster within five seconds are those that 283 00:24:25,740 --> 00:24:32,260 quantify the performance five seconds after the person walked into the station. 284 00:24:32,260 --> 00:24:38,100 And within twenty seconds below that quantify the performance after twenty seconds of interaction 285 00:24:38,100 --> 00:24:43,810 with the system. Within each circle you see the value of each metric 286 00:24:43,810 --> 00:24:49,480 for a particular system. So the top left circle is actually 287 00:24:49,480 --> 00:24:56,220 the efficiency of station Wood and when the circle is 288 00:24:56,220 --> 00:25:02,070 contains a metric it's fully filled and when it's 289 00:25:02,070 --> 00:25:07,440 meets the threshold it's half full and when it sort of falls short then it's open. 290 00:25:07,440 --> 00:25:12,670 So what this summary does is kind of do is see at a glance you know well did systems do 291 00:25:12,670 --> 00:25:17,820 overall and one thing we see from this draft is that face systems overall perform pretty well. 292 00:25:17,820 --> 00:25:22,870 As compared to iris and finger somewhat as to John described. And then we had one system 293 00:25:22,870 --> 00:25:28,120 system Teton that actually swept the rally. 294 00:25:28,120 --> 00:25:33,300 The green color maps the best performing 295 00:25:33,300 --> 00:25:38,400 system within each metric and you can see that system Teton met the goal 296 00:25:38,400 --> 00:25:43,720 within every metric that applied to it. I should also say that 297 00:25:43,720 --> 00:25:48,970 the black dots that you see mark metrics that don't apply to a given system. 298 00:25:48,970 --> 00:25:54,130 So we are not going to compute iris failure to acquire on a system that doesn't capture iris and so forth. 299 00:25:54,130 --> 00:25:59,210 But you can see that system Teton met 300 00:25:59,210 --> 00:26:04,210 the goal of all the metrics and was actually the best performer across 301 00:26:04,210 --> 00:26:09,240 all the metrics. So overall face acquisitions 302 00:26:09,240 --> 00:26:14,340 systems did very well and this chart will be available 303 00:26:14,340 --> 00:26:19,340 this week at MdTF.org so that you can go over it in detail at 304 00:26:19,340 --> 00:26:24,360 your leisure. So overall this rally showed excellent performance by face acquisition systems 305 00:26:24,360 --> 00:26:29,420 like I said in the previous slide overall 12 technologies 306 00:26:29,420 --> 00:26:34,660 met the efficiency threshold of doing better, capturing biometrics in under 10 seconds 307 00:26:34,660 --> 00:26:39,840 on average and seven of the systems actually met the goal which is capturing 308 00:26:39,840 --> 00:26:44,840 biometrics within five seconds on average. Satisfaction was high 309 00:26:44,840 --> 00:26:50,170 nine technologies met the threshold of 90% positive satisfaction and five met the goal of 310 00:26:50,170 --> 00:26:55,650 better than 95% positive satisfaction. Effectiveness 311 00:26:55,650 --> 00:27:01,080 five technologies met the five second failure to acquire a goal of under 5%. 312 00:27:01,080 --> 00:27:06,220 Four technologies met the true identification rate goal of better than 95%. 313 00:27:06,220 --> 00:27:11,400 Within in 20 seconds six technologies met the threshold of better than 5%. 314 00:27:11,400 --> 00:27:16,710 And four met the goal of better than rather 315 00:27:16,710 --> 00:27:21,730 less than 1% failure to acquire rate. And for true 316 00:27:21,730 --> 00:27:26,820 identification rate one system met the goal of better than 99% true identification 317 00:27:26,820 --> 00:27:32,140 rate so just two failures there. And as we've been pointing 318 00:27:32,140 --> 00:27:37,170 out no fingerprint or iris system met the threshold or goal 319 00:27:37,170 --> 00:27:42,320 for either effectiveness at either time point within the rally. So with face acquisition systems setting such a high 320 00:27:42,320 --> 00:27:47,360 performance bar, fingerprint and iris systems have some ways to go to catch up for the high 321 00:27:47,360 --> 00:27:52,400 throughput unintended use case. So this rally further showed that industry has improved from 322 00:27:52,400 --> 00:27:57,620 last year. A larger share of systems met the five second efficiency and satisfaction goals 323 00:27:57,620 --> 00:28:02,770 then the 2019 rally as compare with 2018. 324 00:28:02,770 --> 00:28:07,920 Further a share of systems met rally thresholds for efficiency, satisfaction and for effectiveness. 325 00:28:07,920 --> 00:28:12,970 Whereas no system met all rally goals in 2018, one system did so in 2019. 326 00:28:12,970 --> 00:28:18,260 I should note that we kept the goals and thresholds as well 327 00:28:18,260 --> 00:28:23,460 as the general test framework fixed between 2018 328 00:28:23,460 --> 00:28:28,490 and 2019 making this direct comparison possible. So these data indicate major improvements 329 00:28:28,490 --> 00:28:33,840 in leading the rally use case of unintended high throughput acquisition and the metrics 330 00:28:33,840 --> 00:28:39,090 indicate that most industry systems now meet efficiency and satisfaction threshold although some fall short, 331 00:28:39,090 --> 00:28:44,270 however there are still performance challenges primarily in achieving and maintaining 332 00:28:44,270 --> 00:28:49,360 high effectiveness. Only five systems met true identification rate thresholds or goals. 333 00:28:49,360 --> 00:28:54,690 And the primary cause for this failure is due to challenges 334 00:28:54,690 --> 00:28:59,970 in assistive usability. You know how easy it was for participants to interact with the technology 335 00:28:59,970 --> 00:29:05,160 or reliability, that is you know breakdowns of the technology 336 00:29:05,160 --> 00:29:10,240 even during the short test period. Phase only systems continued improving 337 00:29:10,240 --> 00:29:15,570 and maturing in 2019, but we saw no gain in performance for iris systems 338 00:29:15,570 --> 00:29:20,790 as compared with 2018. The detailed acquisition system 339 00:29:20,790 --> 00:29:25,940 results from the 2019 rally presented by John and the summary figure that I showed you earlier 340 00:29:25,940 --> 00:29:30,990 will be made available at MdTF.org this week. The website 341 00:29:30,990 --> 00:29:36,290 will also continue to host results from the 2018 rally to facilitate comparisons of industry 342 00:29:36,290 --> 00:29:41,370 industry performance over time and I'll turn this over to Arun to close us out. So at this point 343 00:29:41,370 --> 00:29:46,660 first of all thanks for joining us today. As Yevgeniy and John 344 00:29:46,660 --> 00:29:51,870 and Jake have mentioned the charts, the graphs from this presentation will be 345 00:29:51,870 --> 00:29:57,150 be made available at MdFT.org and then shortly thereafter within about two to 346 00:29:57,150 --> 00:30:02,150 two and half weeks we plan to make this video available for people to go back and watch as well. 347 00:30:02,150 --> 00:30:07,160 So that will be done one we have a chance to go back and insert closed captions. 348 00:30:07,160 --> 00:30:12,450 In a few weeks on August 13th, we'll go back 349 00:30:12,450 --> 00:30:17,720 and we'll provide the second webinar as I mentioned before this is the first of two. 350 00:30:17,720 --> 00:30:22,890 The second webinar will be on August 13th and that will focus on matching algorithm performance. So 351 00:30:22,890 --> 00:30:27,970 in addition to the 14 systems that we tested during this process we also received about 15 matching 352 00:30:27,970 --> 00:30:33,300 algorithms so we will talk about how well those matching algorithms work and how well they work across the 353 00:30:33,300 --> 00:30:38,510 different collection systems. If you are interested in 354 00:30:38,510 --> 00:30:43,650 contacting any of these companies, we have currently aliased those companies 355 00:30:43,650 --> 00:30:48,650 because we did ask companies to something that was relatively hard 356 00:30:48,650 --> 00:30:53,670 and we didn't want people to be penalized for taking risks and participating in these test. 357 00:30:53,670 --> 00:30:58,810 That being said if there are some companies that you are interested in reaching out to please feel free to email 358 00:30:58,810 --> 00:31:03,870 people screening at HQ.DHS.gov with the specific company that you would like to reach out to. 359 00:31:03,870 --> 00:31:09,130 and we will forward your inquiry to them. And in conclusion 360 00:31:09,130 --> 00:31:14,330 basically stay tuned for more information. We are planning to do additional tests in the future that are focused around 361 00:31:14,330 --> 00:31:19,710 these types of issues and these types of use cases and more and we welcome your interests. 362 00:31:19,710 --> 00:31:24,760 And with that let's turn over to the questions, so the first question is 363 00:31:24,760 --> 00:31:30,050 "How is false match rates accounted for?", so John do you want to talk about false 364 00:31:30,050 --> 00:31:35,260 negative matches, false negatives in this particular test? [John] Yeah, really good question. 365 00:31:35,260 --> 00:31:40,370 So with all the results you saw today were with one single algorithm. It's a MdTF algorithm that 366 00:31:40,370 --> 00:31:45,740 we purchased. Top tier NIST performer and it's representative of something 367 00:31:45,740 --> 00:31:51,040 DHS would use in operational environment. The true match rates you saw reported 368 00:31:51,040 --> 00:31:56,230 were at a given false match rate, right. So we set a threshold, we use that same threshold across all of 369 00:31:56,230 --> 00:32:01,320 the different acquisition systems. Jake mentioned we had 370 00:32:01,320 --> 00:32:06,660 430 test volunteers come in, what I sort of breezed over was that we have historic 371 00:32:06,660 --> 00:32:11,660 images for all but 75 of those. So when we ran these 372 00:32:11,660 --> 00:32:16,810 identifications 75% of the people did not have images in the gallery and so 373 00:32:16,810 --> 00:32:21,890 the result of that should have been the highest match score that we got back was below the 374 00:32:21,890 --> 00:32:27,190 that threshold. That wasn't true that would have been a false identification that 375 00:32:27,190 --> 00:32:32,430 would have come off true identification rate, right. So that 376 00:32:32,430 --> 00:32:37,550 wasn't a true identification it was a false one. So we did that consistently across all the different acquisition 377 00:32:37,550 --> 00:32:42,570 systems and that would have hurt your true identification rate if you were matching the wrong 378 00:32:42,570 --> 00:32:47,830 person. I think when we do the matching system report here in about a month 379 00:32:47,830 --> 00:32:53,020 there's going to be a lot more interesting things in terms of false match rates and stuff like that 380 00:32:53,020 --> 00:32:58,090 and we will report that out separately. [Arun] Okay next question. "Do you know what contributed 381 00:32:58,090 --> 00:33:03,390 to the relative poor performance of iris and fingerprint modalities?" [Yevgeniy] So I'll take 382 00:33:03,390 --> 00:33:08,410 this one. I think it was a combination of factors part of 383 00:33:08,410 --> 00:33:13,530 it was these systems had more challenges instructing 384 00:33:13,530 --> 00:33:18,550 folks what to do within the rally, we did not 385 00:33:18,550 --> 00:33:23,810 train our users to use each particular biometric system. 386 00:33:23,810 --> 00:33:29,040 We informed the participants, the users 387 00:33:29,040 --> 00:33:34,150 of exactly, you know that they were going to have a biometric 388 00:33:34,150 --> 00:33:39,560 acquired, but it was up to the companies to provide the user interface and instruction to each individual. 389 00:33:39,560 --> 00:33:44,570 So in some instances the iris and fingerprint systems were more 390 00:33:44,570 --> 00:33:49,750 challenging for people to figure out how to use, but also a 391 00:33:49,750 --> 00:33:54,810 reliability played a role. Some of the systems really were 392 00:33:54,810 --> 00:33:59,830 functioning really well, but experienced break downs during our five days of testing. 393 00:33:59,830 --> 00:34:05,070 And that unfortunately penalized them on true identification 394 00:34:05,070 --> 00:34:10,180 rate metrics in the end, so that even if they were working very well when they were on 395 00:34:10,180 --> 00:34:15,600 if the system failed during the middle of testing, we kept that information in. 396 00:34:15,600 --> 00:34:20,930 [Arun] The other thing I would add to is in this particular 397 00:34:20,930 --> 00:34:26,190 use case and Yevgeniy mentioned, it was unstaffed. So there was nobody there to specifically help 398 00:34:26,190 --> 00:34:31,310 a user in case something went wrong. This was really to kind of help push the idea that the 399 00:34:31,310 --> 00:34:36,710 solution providers need to accommodate and provide really good user instruction. 400 00:34:36,710 --> 00:34:41,720 The other thing too is this was time constrained. So the question was "Could these biometrics be collected within 401 00:34:41,720 --> 00:34:46,870 that five second time frame or within that twenty second timeframe", and if instruction is ambiguous 402 00:34:46,870 --> 00:34:51,930 people might have difficulty complying with the instructions. 403 00:34:51,930 --> 00:34:57,220 Next question, "What is a Delta airports operator 404 00:34:57,220 --> 00:35:02,460 expecting, expects?" I think this question is, "What are the learnings we should take until 405 00:35:02,460 --> 00:35:07,600 the live production environment?", and I think there's a couple of things that I would highlight and I'll invite 406 00:35:07,600 --> 00:35:12,620 the folks here at this table to kind of add to it. One is the 407 00:35:12,620 --> 00:35:17,900 collection process matters. This is the instructions to the users. This is the 408 00:35:17,900 --> 00:35:23,070 usability and affordance of the technology, of the camera 409 00:35:23,070 --> 00:35:28,150 systems. Then not all cameras are, work equally well as 410 00:35:28,150 --> 00:35:33,490 well. In many cases you think just a face camera, just a webcam. Well 411 00:35:33,490 --> 00:35:38,730 there some things going on in these cameras that make some better at this task than others and they 412 00:35:38,730 --> 00:35:43,910 collect images that work more broadly across different demographic groups as well. 413 00:35:43,910 --> 00:35:48,970 So it's not just a matter of buying a camera or the lowest cost camera, it's about 414 00:35:48,970 --> 00:35:54,270 selecting a good camera and a good process for your particular operation and application. 415 00:35:54,270 --> 00:35:59,490 Anything else to add? [Yevgeniy] Yeah, I would add to that saying that, you know the 416 00:35:59,490 --> 00:36:04,510 way that the camera interacts with the traveler is really set the bar for 417 00:36:04,510 --> 00:36:09,530 your ultimate biometric performance, because failure to acquire in our 2019 418 00:36:09,530 --> 00:36:14,780 rally as well as our 2018 rally and on testing that we've done in the MdTF 419 00:36:14,780 --> 00:36:19,970 in years prior was always the most, sort of the largest 420 00:36:19,970 --> 00:36:25,040 single category of area in biometric systems. So for roughly 421 00:36:25,040 --> 00:36:30,340 60% of biometric systems, failure to acquire is going to 422 00:36:30,340 --> 00:36:35,570 outpace failure to match as a source of error. So camera 423 00:36:35,570 --> 00:36:40,740 selection is really important to make sure that the user interaction is done right so that 424 00:36:40,740 --> 00:36:45,790 the system actually acquires those images in a timely fashion. [Jake] To add 425 00:36:45,790 --> 00:36:51,060 to that reliability also has something that should be considered. A number of 426 00:36:51,060 --> 00:36:56,280 these systems were actually doing pretty good as the start of the evaluation occurred and 427 00:36:56,280 --> 00:37:01,380 then one visit their system down for any number of reasons 428 00:37:01,380 --> 00:37:06,710 and that really kinda, they suffered from that, their numbers did go down from there. 429 00:37:06,710 --> 00:37:11,990 So reliability is something that should be considered as well in the live prod environment. 430 00:37:11,990 --> 00:37:17,190 [John] Maybe just to round it all off as a teaser for the next webinar as well 431 00:37:17,190 --> 00:37:22,300 I think what Yevgeniy mentioned is true the largest source of error is this failure to acquire portion. 432 00:37:22,300 --> 00:37:27,620 But also the greatest variability comes from what camera you pick. So there is a lot of emphasis on how 433 00:37:27,620 --> 00:37:32,910 you select algorithms and that's important but the best and 434 00:37:32,910 --> 00:37:38,100 the worst algorithms actually perform not as differently. The best and the worst cameras perform very 435 00:37:38,100 --> 00:37:43,190 very differently. So that camera selection is crucial and then I think the original question 436 00:37:43,190 --> 00:37:48,520 had something about the airport operator in there. The takeaway for you is that that's very very 437 00:37:48,520 --> 00:37:53,800 important part of this process and is the largest source of error as Yevgeniy mentioned. 438 00:37:53,800 --> 00:37:59,000 [Arun] Okay, next question. "Does the fact that the majority of the submissions related to 439 00:37:59,000 --> 00:38:04,080 face modality reflect a choice on your side or rather is it indicative of vendors 440 00:38:04,080 --> 00:38:09,390 current preference in working on face?" So I would kind of 441 00:38:09,390 --> 00:38:14,630 qualify my response here. So we 442 00:38:14,630 --> 00:38:19,780 I couldn't definitively receive more applications for face. 443 00:38:19,780 --> 00:38:24,790 I would just go ahead and say that, what one of the things we communicated was that 444 00:38:24,790 --> 00:38:30,050 we have these different performance objectives with this specific use case. 445 00:38:30,050 --> 00:38:35,250 We opened it up to more modalities, hoping to get strong 446 00:38:35,250 --> 00:38:40,350 applications. I think what we've seen so far is that people are pretty good at making 447 00:38:40,350 --> 00:38:45,710 the face biometric modality work in this particular use case fairly well. 448 00:38:45,710 --> 00:38:50,960 It is possible, now could other biometric modalities work 449 00:38:50,960 --> 00:38:56,140 equally well like iris and fingerprint? I think so. I think it 450 00:38:56,140 --> 00:39:01,230 would take some funding and some resourcing from the companies to make the technologies work better. 451 00:39:01,230 --> 00:39:06,530 So it's not so much that it is our preference. It is this is what we're seeing from industry when we opened 452 00:39:06,530 --> 00:39:11,750 up the competition to include face, finger or iris for this particular use case. 453 00:39:11,750 --> 00:39:16,920 Next question. "Did the aliases carry over from last year too?" 454 00:39:16,920 --> 00:39:21,990 "Same vendor name? Same peak name?" No. So we issued new aliases this 455 00:39:21,990 --> 00:39:27,300 year, so if a company participated in a rally over the last two 456 00:39:27,300 --> 00:39:32,500 years and several did, they would have a different alias for each particular test or each particular 457 00:39:32,500 --> 00:39:37,630 system. Next question. "Is the size 458 00:39:37,630 --> 00:39:42,660 of the other race pool small enough that it could not statistically, that it is not statistically 459 00:39:42,660 --> 00:39:47,910 significant? Wondering if there is a plan to further break down races in addition to categories 460 00:39:47,910 --> 00:39:53,100 of or subcategories beyond African American or Caucasian?" [Yevgeniy] So 461 00:39:53,100 --> 00:39:58,200 I think that what we've done here so far is to 462 00:39:58,200 --> 00:40:03,540 sort of block on these categories so that the technologies are actually tested on 463 00:40:03,540 --> 00:40:08,790 a diverse population. In some of our published work, we've 464 00:40:08,790 --> 00:40:13,950 been trying to figure out how to best relate phenotypes 465 00:40:13,950 --> 00:40:19,020 as well as race categories to biometric performance and 466 00:40:19,020 --> 00:40:24,300 that's something that we are going to continue doing. In this rally we've been certainly 467 00:40:24,300 --> 00:40:29,530 well representing folks that identify as black or African American and folks that identify 468 00:40:29,530 --> 00:40:34,650 as Caucasian and we have been working to increase the participation of other groups however 469 00:40:34,650 --> 00:40:39,690 at this point we are not breaking down those results per 470 00:40:39,690 --> 00:40:44,700 group, however in further scientific work we may report 471 00:40:44,700 --> 00:40:49,850 better on the, for example match scores relates to 472 00:40:49,850 --> 00:40:54,920 demographics including race and phenotypes such as 473 00:40:54,920 --> 00:41:00,240 skin reflectants. [Arun] So I guess to summarize I think the short answer is, yes we 474 00:41:00,240 --> 00:41:05,460 do intend to try to increase the diversity of our test volunteer population. 475 00:41:05,460 --> 00:41:10,620 Part of it is we, you know the test will be dictated by who 476 00:41:10,620 --> 00:41:15,650 actually comes in and participates, but there is a lot more going on rather than just 477 00:41:15,650 --> 00:41:20,920 self-reported race categories. Next question. "Were there 478 00:41:20,920 --> 00:41:25,920 test volunteers that also participated in last year's rally or was everyone new to this test?" 479 00:41:25,920 --> 00:41:30,990 John you want to take that one? [John] Yeah, there was almost certainly overlap although I don't think we have a 480 00:41:30,990 --> 00:41:36,300 good number. Our test population does come back time after time. We also 481 00:41:36,300 --> 00:41:41,510 bring in new people, I mentioned this one, I have about 75 people that have never been to the MD test facility before. 482 00:41:41,510 --> 00:41:46,650 So some of the people were repeats over multiple years and then some people 483 00:41:46,650 --> 00:41:51,690 and then some people were brand new to the test. The 484 00:41:51,690 --> 00:41:57,030 participants had to recognize, not necessarily that they were identified, they had to say 485 00:41:57,030 --> 00:42:02,220 hey we've never seen this person before. [John] Right if we didn't have any historic records for them 486 00:42:02,220 --> 00:42:07,320 it was the responsibilities of the systems to provide a sample that didn't match anyone. 487 00:42:07,320 --> 00:42:12,690 And right, there was likely some overlap from last year’s rally and there was certainly 488 00:42:12,690 --> 00:42:17,970 some overlap but from testing we have done in the past and I think maybe and the last point on this one is 489 00:42:17,970 --> 00:42:23,140 you know we do this all day in and day out. We think about these systems a lot. 490 00:42:23,140 --> 00:42:28,230 It's probably unlikely that people that participated in last year's rally, even if they were back 491 00:42:28,230 --> 00:42:33,560 took away with them some remembering exactly what happened and 492 00:42:33,560 --> 00:42:38,810 then behaved much differently. It was kind of a long time ago. 493 00:42:38,810 --> 00:42:43,970 Okay next question. "On average how many users were not served as 494 00:42:43,970 --> 00:42:49,050 a result as system failures presented?" So I think that's failure 495 00:42:49,050 --> 00:42:54,380 to acquire rate. [Yevgeniy] I think in the end it's going to be a true identification rate cause it's the 496 00:42:54,380 --> 00:42:59,610 system made any kind of error it would be reflected there. And what I would note here is 497 00:42:59,610 --> 00:43:04,750 it would be varied quite a bit across the different acquisition 498 00:43:04,750 --> 00:43:09,790 systems. Go back to that statement that the acquisition system and camera really do matter. I'll point out that 499 00:43:09,790 --> 00:43:15,060 system Teton, which was the best performing system within the rally, only 500 00:43:15,060 --> 00:43:20,250 failed, only didn't serve two out of the 430 people that 501 00:43:20,250 --> 00:43:25,280 attempted to use it and that was due to some issues with 502 00:43:25,280 --> 00:43:30,590 matching, not an issue on acquisition so that could improve when we look at 503 00:43:30,590 --> 00:43:35,840 other algorithms. So on average you can take a look at the true identification rate across 504 00:43:35,840 --> 00:43:40,850 systems. It varied from being somewhat low 505 00:43:40,850 --> 00:43:45,910 about two thirds for the worst performing systems to really many of them were 506 00:43:45,910 --> 00:43:50,910 well within 95% and above. So I think that it is 507 00:43:50,910 --> 00:43:56,120 really dependent of the camera that you are talking about. [Arun] So just to like put it in another way like 508 00:43:56,120 --> 00:44:01,230 with the slides we talked about the report. I'm sorry we talked about the measure 509 00:44:01,230 --> 00:44:06,540 M-tier, which is our MdTF true identification rate so 510 00:44:06,540 --> 00:44:11,790 the rate that you’re talking about. The people who were not correctly processed, if you're doing the math 511 00:44:11,790 --> 00:44:16,960 one minus that number. So it would vary anywhere from like .5% 512 00:44:16,960 --> 00:44:22,030 up to, what was the other one? [Yevgeniy] I would say about 33% or so. [Arun] Yeah so that 513 00:44:22,030 --> 00:44:27,330 dominates all your issues with matching algorithms. This is really about the importance of the camera and the 514 00:44:27,330 --> 00:44:32,340 collection process. Next question. "We're continuous motion cameras tested 515 00:44:32,340 --> 00:44:37,460 or were they all pause and go, where the volunteer needed to stop?" 516 00:44:37,460 --> 00:44:42,460 SO we didn't actually dictate pause and go, versus continuous motion. One thing 517 00:44:42,460 --> 00:44:47,730 so many of the systems were actually were continuous motion essentially. 518 00:44:47,730 --> 00:44:52,890 You'll notice that the fastest system was about 2.7 seconds 519 00:44:52,890 --> 00:44:57,980 on average and that's basically to traverse to not only have the biometric process but to traverse 520 00:44:57,980 --> 00:45:03,310 eight feet. So in the span of eight feet, which they 521 00:45:03,310 --> 00:45:08,550 completed the biometrics, so the amount of time it took them was about 2.3 seconds to go that distance. 522 00:45:08,550 --> 00:45:13,710 They basically didn't stop. So that's what you'll see with many of the very fast 523 00:45:13,710 --> 00:45:18,750 efficiency systems. Next question. 524 00:45:18,750 --> 00:45:24,010 "Were there notable differences between biometric performance between the demographic groups? 525 00:45:24,010 --> 00:45:29,030 I think right now it's fair to say that for this particular test we haven't completed that portion of the 526 00:45:29,030 --> 00:45:34,100 analysis. So right now we're still analyzing the data. We are 527 00:45:34,100 --> 00:45:39,430 going to have high level results on the matching performance of the different systems and different 528 00:45:39,430 --> 00:45:44,440 algorithms in a couple of weeks and we'll plan to have that initial discussion there August 13th. 529 00:45:44,440 --> 00:45:49,460 However the analysis across different demographic groups or systems that may have performed 530 00:45:49,460 --> 00:45:54,470 differently for different groups. That will probably take a little bit more time to dig through. We have published 531 00:45:54,470 --> 00:45:59,730 some papers on that recently. Do you guys want to mention the papers that were just recently published or 532 00:45:59,730 --> 00:46:04,900 will be published soon. [John] I think the exact answer to that question depends on your definition of performance. 533 00:46:04,900 --> 00:46:10,190 We have looked at performances in terms of made it match scores across different race categories 534 00:46:10,190 --> 00:46:15,500 and different skin reflective values, how light and dark your skin is. That paper is published in a journal called 535 00:46:15,500 --> 00:46:20,780 [unintelligible]. Last February if you're interested I'm sure Arun can provide 536 00:46:20,780 --> 00:46:25,790 a copy if you send an email. [Arun] If you send an email to people screening at HQ@DHS.gov 537 00:46:25,790 --> 00:46:30,850 we can send you the references. [Yevgeniy] One thing I would add to that is while in the 538 00:46:30,850 --> 00:46:36,130 scientific work we look at things like match scores, when it 539 00:46:36,130 --> 00:46:41,330 comes to the previous question regarding, you know how many people did these technologies 540 00:46:41,330 --> 00:46:46,440 not serve. I will again point out that some of the systems performed extremely well. 541 00:46:46,440 --> 00:46:51,450 Where they only failed on two individuals out of the 430. 542 00:46:51,450 --> 00:46:56,700 So overall you can find systems that perform extremely well across that full group of 543 00:46:56,700 --> 00:47:01,850 diverse people and remember our demographic breakdown, 60% 544 00:47:01,850 --> 00:47:06,920 roughly African American, 30% Caucasian and then the rest 545 00:47:06,920 --> 00:47:12,190 so that you can see that the systems worked very well 546 00:47:12,190 --> 00:47:17,410 overall. [Arun] Okay, next question. "Where might I find 547 00:47:17,410 --> 00:47:22,550 published recommended camera standards?" This is a great question and honestly 548 00:47:22,550 --> 00:47:27,580 I don't have a really clear answer for you. [John] There is and IKO standard. [Arun] Right, that's the one 549 00:47:27,580 --> 00:47:32,850 I would point you too, but that doesn't apply to all use cases, but go ahead talk about the IKO standard. 550 00:47:32,850 --> 00:47:38,030 [John] It just defines the international standard for hat goes into a passport photo which is generally considered to be 551 00:47:38,030 --> 00:47:43,330 a good well framed, well lit, high quality standard. It sort of tells you the distances and the lighting 552 00:47:43,330 --> 00:47:48,700 and conditions and gradients you need to hit that particular standard, I will say 553 00:47:48,700 --> 00:47:54,040 you don't have quantitative numbers for this but a lot of the images coming of 554 00:47:54,040 --> 00:47:59,250 so the acquisition systems for the rally probably didn't meet that IKO standard, but they 555 00:47:59,250 --> 00:48:04,370 biometrically matched just fine. So like Arun mentioned, it depends exactly on your use case. 556 00:48:04,370 --> 00:48:09,710 As to what your gonna need in terms of camera standards. [Jake] Then the IKO standards 557 00:48:09,710 --> 00:48:14,990 and also groups like ISO, NIST and [unintelligible]. These standards relate to 558 00:48:14,990 --> 00:48:20,170 the data, the image that comes off these camera not necessarily to the camera itself. So that's one thing to 559 00:48:20,170 --> 00:48:25,270 keep in mind. So they're not recommending certain pieces of hardware, they are more recommending what 560 00:48:25,270 --> 00:48:30,580 an image should look like and what the result of that capture should be. [Arun] And with the IKO 561 00:48:30,580 --> 00:48:35,850 standard it does talk about the process, but it is again, it's like a passport quality photo collection process. 562 00:48:35,850 --> 00:48:41,050 Which doesn't necessarily fit well in most operational settings where you can get a really nice 563 00:48:41,050 --> 00:48:46,140 background or you get perfect lighting that's kind of adjusted for the person and perfectly positioned. 564 00:48:46,140 --> 00:48:51,140 So it is an area where there is some work that's needed about how to better 565 00:48:51,140 --> 00:48:56,350 specify this or how to simplify procurement but this is one of the things that if you guys see 566 00:48:56,350 --> 00:49:01,470 if this use case for the rally is useful to you or relevant to 567 00:49:01,470 --> 00:49:06,480 you, you can reach out to us at peoplescreening@hq.dhs.gov 568 00:49:06,480 --> 00:49:11,720 and inquire about certain performers or certain 569 00:49:11,720 --> 00:49:16,900 certain companies if you think they might be a good fit for your organization. If you have a five second time 570 00:49:16,900 --> 00:49:22,200 requirement a twenty second time requirement, whatever that might be, but as far as 571 00:49:22,200 --> 00:49:27,500 objective, performance measures or standards around this it hasn't really been defined. 572 00:49:27,500 --> 00:49:32,770 And one of the reasons there to is the technologies have been changing so quickly 573 00:49:32,770 --> 00:49:37,950 there really hasn't been a lot of harmonization on how well 574 00:49:37,950 --> 00:49:43,030 these things work. As John mentioned some of the images that we collect from the rally right now are good 575 00:49:43,030 --> 00:49:48,330 enough, really good enough to match. If we go back five years ago 576 00:49:48,330 --> 00:49:53,570 they probably wouldn't have matched with the technologies available then. 577 00:49:53,570 --> 00:49:58,710 Next question. "What I meant to ask is how many people 578 00:49:58,710 --> 00:50:03,760 were not able to be served as a result of down time due to 579 00:50:03,760 --> 00:50:09,040 system fail? Speaking toward the practical applicability of expectations that 580 00:50:09,040 --> 00:50:14,820 for manually performing an ID verification? So I think Yevgeniy's answer 581 00:50:14,820 --> 00:50:19,980 still applies, right. Anytime your system didn't match someone or didn't produce that true positive 582 00:50:19,980 --> 00:50:25,040 you're going to have to have somebody there. So in most cases because your error rates are 583 00:50:25,040 --> 00:50:30,400 non zero for your false negative or your false negative 584 00:50:30,400 --> 00:50:35,640 identification rate. You're gonna need somebody there that can perform that function. 585 00:50:35,640 --> 00:50:40,760 The higher that error rate is, the more people you might need to serve that function. 586 00:50:40,760 --> 00:50:45,810 But it is worth mentioning that we did see pretty high issues 587 00:50:45,810 --> 00:50:51,060 around reliability. Is there anything to quantify here that we can kind of mention or 588 00:50:51,060 --> 00:50:56,260 It's difficult to say what percentage of people 589 00:50:56,260 --> 00:51:01,350 in an operational environment would not be served due to a system failure because we are still 590 00:51:01,350 --> 00:51:06,680 trying to provide a fair test environment. So we completely 591 00:51:06,680 --> 00:51:11,900 separated vendors from any test volunteers. So if a system did go down 592 00:51:11,900 --> 00:51:17,090 in the middle of a test session, we would have to wait until the end of the test session for that vendor to go out 593 00:51:17,090 --> 00:51:22,180 and repair the system. So it may not be completely 594 00:51:22,180 --> 00:51:27,460 directly repairable, but there were a number of occasions where a system would randomly go 595 00:51:27,460 --> 00:51:32,760 out during the test session and a number of people would not be served. 596 00:51:32,760 --> 00:51:37,940 [Yevgeniy] Yeah we had three groups of about 15 people and we had 597 00:51:37,940 --> 00:51:43,020 working at the same time. Three? Three, yep and 598 00:51:43,020 --> 00:51:48,590 so if a system were to be down for one set of three groups 599 00:51:48,590 --> 00:51:53,840 then it would be 45 people that it wouldn't serve. [John] Right, we do have a 600 00:51:53,840 --> 00:51:59,000 chart although we didn't present it in this slide deck that looks at sort of groups that were outliers. 601 00:51:59,000 --> 00:52:04,100 Right, so the system is broken so it didn't get any of the 45 people as Yevgeniy mentioned that 602 00:52:04,100 --> 00:52:09,410 probably stand out as a outlier session. We could probably 603 00:52:09,410 --> 00:52:14,630 provide that if someone is really interested and I will say there were systems that didn't have any technical issues 604 00:52:14,630 --> 00:52:19,660 to go down. [Arun] So if you see very low failure to acquire rates and very low 605 00:52:19,660 --> 00:52:24,700 a very high true identification rate, that's a good indication 606 00:52:24,700 --> 00:52:29,950 that it was able to collect and work fairly reliably. It didn't lose large groups of people because the system went 607 00:52:29,950 --> 00:52:35,120 down. Okay. 608 00:52:35,120 --> 00:52:40,220 And I think those are all of our questions so far. I'll just go ahead and say thanks again 609 00:52:40,220 --> 00:52:45,520 for participating. Please email us if you guys have any 610 00:52:45,520 --> 00:52:50,770 follow-up questions. If you're interested in learning more about the different algorithms and how well they 611 00:52:50,770 --> 00:52:55,940 performed, please come back and we'll send out another email about that and that webinar will be on August 13th. 612 00:52:55,940 --> 00:53:01,000 And then again the results as well as this video will be 613 00:53:01,000 --> 00:53:06,260 shared in the near future. The charts from this slide deck 614 00:53:06,260 --> 00:53:11,450 will be available on the website MdFT.org and this webinar this video we actually plan to share 615 00:53:11,450 --> 00:53:16,570 as well, it will probably be let's say about three weeks because we have to insert closed captioning 616 00:53:16,570 --> 00:53:21,590 and that will happen in the near future here as well. Thank you very much and we appreciate your 617 00:53:21,590 --> 00:53:23,520 time today. [Jake waves goodbye]