1 00:00:06,290 --> 00:00:11,790 [Arun] Hello my name is Arun Vemury, I'm with the DHS Science and Technology Directorate. 2 00:00:11,790 --> 00:00:16,930 Thanks for joining our webinar today to discussing the results of the biometric technology rally for 2019. 3 00:00:16,930 --> 00:00:22,150 This webinar in particular is about the system results for matching systems. 4 00:00:22,150 --> 00:00:27,430 So as you may recall the biometrics rally tested both collection systems or acquisition systems 5 00:00:27,430 --> 00:00:32,640 as well as matching systems and this webinar is specifically focused on the matching system results. 6 00:00:32,640 --> 00:00:37,780 Today I am joined by my colleagues. You have Yevgeniy Sirotin, Jerry Tipton and John Howard. 7 00:00:37,780 --> 00:00:42,810 And let's go ahead and jump into the slides. I think we have a lot of information to share with you. 8 00:00:42,810 --> 00:00:47,980 And I'd like to make sure we provide as much information as possible. So just very briefly 9 00:00:47,980 --> 00:00:53,080 the webinar today will be broken up, mostly by providing some background and history. 10 00:00:53,080 --> 00:00:58,340 What is the S&T Biometric and Identity Technology Center? What are the biometric technology rallies? 11 00:00:58,340 --> 00:01:03,530 What were we doing with the 2019 biometric technology rallies? How were we measuring 12 00:01:03,530 --> 00:01:08,640 performance? And then specific results for different biometric modalities collections 13 00:01:08,640 --> 00:01:13,650 systems and matching systems. They will include finger print, iris, face recognition 14 00:01:13,650 --> 00:01:18,870 and then we will wrap up with some summary conclusions and some observations. 15 00:01:18,870 --> 00:01:23,880 So again the S&T Biometrics & Identity Technology 16 00:01:23,880 --> 00:01:29,110 Center is a group of subject matter expertise within DHS 17 00:01:29,110 --> 00:01:34,270 Science and Technology. The goal is to help facilitate the learning curve for DHS and it's 18 00:01:34,270 --> 00:01:39,370 mission partners to become more familiar with and aware of biometric new and emerging 19 00:01:39,370 --> 00:01:44,510 biometric and identity capabilities to understand how they may apply to various missions 20 00:01:44,510 --> 00:01:49,560 so that organizations that rely on these technologies and have these 21 00:01:49,560 --> 00:01:54,610 different mission areas can figure out whether or not these technologies may be useful or applicable for their specific 22 00:01:54,610 --> 00:01:59,770 needs. The goal truly is to facilitate information sharing and accelerate 23 00:01:59,770 --> 00:02:04,790 use of technologies if it's appropriate and we follow this approach where we focus on 24 00:02:04,790 --> 00:02:10,060 building once but then use or share information widely. The goal really is to drive 25 00:02:10,060 --> 00:02:15,260 efficiency to make this as easy for groups to understand how to effectively 26 00:02:15,260 --> 00:02:20,380 adopt these technologies quickly and do it well. Right now a lot of our 27 00:02:20,380 --> 00:02:25,420 focus is on test and evaluation and talking about how to recognize some technologies 28 00:02:25,420 --> 00:02:30,670 that may be very promising for different needs and today we will focus more on the biometric technology rally. 29 00:02:30,670 --> 00:02:35,700 So the biometric technology rally is an engagement between S&T 30 00:02:35,700 --> 00:02:40,760 and industry primarily focused on identifying how well 31 00:02:40,760 --> 00:02:45,990 specific technology offerings provided or solutions may support important 32 00:02:45,990 --> 00:02:51,070 DHS use cases. In this particular case a lot of our focus has really been on this need for 33 00:02:51,070 --> 00:02:56,350 high throughput biometrics. And what we mean by that is the ability to 34 00:02:56,350 --> 00:03:01,590 recognize people very quickly with minimal staffing in a very small space 35 00:03:01,590 --> 00:03:06,740 in a small amount of time. The goal is to be able to support things anything like 36 00:03:06,740 --> 00:03:11,800 either border crossings, transportation security, different 37 00:03:11,800 --> 00:03:17,070 security arenas or venues to be able to quickly recognize known individuals 38 00:03:17,070 --> 00:03:22,260 without having to have a significant staffing requirement. 39 00:03:22,260 --> 00:03:27,380 So we're really trying to focus on trying to recognize hundreds or maybe thousands 40 00:03:27,380 --> 00:03:32,410 or tens of thousands of people very quickly. With the rally what we are doing is kind 41 00:03:32,410 --> 00:03:37,640 of defining these use cases. Defining specific measures or metrics that we're 42 00:03:37,640 --> 00:03:42,800 that we are trying to orient industry to and setting goals for them to 43 00:03:42,800 --> 00:03:47,860 meet these things in order to help hopefully be applicable to different DHS user’s needs. 44 00:03:47,860 --> 00:03:53,060 The 2019 rally in particular again is focused on this high throughput use case. It's a little bit 45 00:03:53,060 --> 00:03:58,090 different than the rally we ran last year. Last year all of the systems had to 46 00:03:58,090 --> 00:04:03,340 provide at a minimum at least one face image or multiple face images. 47 00:04:03,340 --> 00:04:08,510 This year was a little bit different, because we also widened 48 00:04:08,510 --> 00:04:13,630 the scope so that we could also receive applications from companies that were focused on either high throughput 49 00:04:13,630 --> 00:04:18,950 fingerprint systems or high throughput iris systems also. Of course 50 00:04:18,950 --> 00:04:23,960 multi-mobile systems whether it was some combination of finger, face or iris were also 51 00:04:23,960 --> 00:04:28,980 open to participate. We also opened up a new category for participation it wasn't just 52 00:04:28,980 --> 00:04:34,040 for collection systems or acquisition systems. We also were encouraging companies to submit 53 00:04:34,040 --> 00:04:39,280 matching algorithms so that we could help evaluate both thee matching or the 54 00:04:39,280 --> 00:04:44,320 collection side of the capability as well as how well the images that were collected could then be matched with 55 00:04:44,320 --> 00:04:49,390 different algorithms. One of the things we are looking at doing here is helping to 56 00:04:49,390 --> 00:04:54,670 stakeholders to understand that there is a breath of opportunities or a breath of capabilities that are out there. 57 00:04:54,670 --> 00:04:59,670 Both on the collection systems but also on matching algorithms and by looking at 58 00:04:59,670 --> 00:05:04,780 how well maybe a collection system works across matching algorithms or how well 59 00:05:04,780 --> 00:05:09,810 a matching algorithm can take in images from many different types of collection systems. 60 00:05:09,810 --> 00:05:15,030 We get into this concept of having a robust system. Depending on your specific 61 00:05:15,030 --> 00:05:20,190 use case or your specific need having this robustness may or may not be important. 62 00:05:20,190 --> 00:05:25,460 But if you can imagine a system where maybe there is one major matching algorithm that's used 63 00:05:25,460 --> 00:05:30,810 by a lot of different users, but every user has a different camera. You want to make sure that your results 64 00:05:30,810 --> 00:05:36,060 will work really well even though the images of your system is going to be matched 65 00:05:36,060 --> 00:05:41,220 to may come from different cameras so you want to make sure that your systems interoperable well and that the 66 00:05:41,220 --> 00:05:46,320 match will be very robust to those different collections. 67 00:05:46,320 --> 00:05:51,620 With this webinar we'll actually spend quite a bit of time talking about 68 00:05:51,620 --> 00:05:56,820 this idea of robustness of algorithms to collection systems. 69 00:05:56,820 --> 00:06:01,880 As well as potentially collection systems to algorithms. 70 00:06:01,880 --> 00:06:07,400 [Jerry] Before we get into the details of the results we do want to go back and create a brief description 71 00:06:07,400 --> 00:06:12,440 of what the test was that rally and for acquisitions systems 72 00:06:12,440 --> 00:06:17,570 one is that it operate in unmanned mode and it must operate in a 73 00:06:17,570 --> 00:06:22,610 physical foot print of 6x8 foot. It could collect either 74 00:06:22,610 --> 00:06:27,760 face, iris, fingerprints or a combination of the three and it needed to provide 75 00:06:27,760 --> 00:06:33,050 at least one biometric image back per volunteer. It 76 00:06:33,050 --> 00:06:38,350 needed to do so in time constraints which were defined by DHS. Optionally it could send 77 00:06:38,350 --> 00:06:43,560 up to three face images back or three pairs of iris images or three sets of fingerprints with 78 00:06:43,560 --> 00:06:48,600 up to five fingerprints each. For matching systems they had to provide us there 79 00:06:48,600 --> 00:06:53,830 algorithm, their matching service in a Docker file 80 00:06:53,830 --> 00:06:59,010 that we could load onto our Maryland test facilities systems. It had to contain a commercially 81 00:06:59,010 --> 00:07:04,090 available matching algorithm. It had a size limit of 1.5 gigs in size. 82 00:07:04,090 --> 00:07:09,090 And it had to process templates in less than 1000 milliseconds. 83 00:07:09,090 --> 00:07:14,320 Along with that it needed to be able to operate without having to contact externally 84 00:07:14,320 --> 00:07:19,450 re-phone home for a period of one year and it needed to operate on a limited set of 85 00:07:19,450 --> 00:07:24,460 resources. So we offered four CPU's, two gigabytes of RAM and it could 86 00:07:24,460 --> 00:07:29,720 not, algorithm could not require a GPU. In addition 87 00:07:29,720 --> 00:07:34,900 that we had an open API document that we made available to them that these 88 00:07:34,900 --> 00:07:40,030 vendors could test with in advance of sending us their algorithm, which they did and 89 00:07:40,030 --> 00:07:45,410 all specs of this are working in an example that was hosted 90 00:07:45,410 --> 00:07:50,480 and is still is currently hosted at our get hub.mtdf.org website. 91 00:07:50,480 --> 00:07:55,730 For the rally we announced the call for participation in November of 2018 and we 92 00:07:55,730 --> 00:08:00,930 accepted the applications through the end of that month 93 00:08:00,930 --> 00:08:06,160 and we provided an conditional acceptance in early February. We also provided the cloud based 94 00:08:06,160 --> 00:08:11,320 API for these vendors to start working within that February time frame and once they 95 00:08:11,320 --> 00:08:16,400 completed a set of requirements, we provided file acceptance notification in March. 96 00:08:16,400 --> 00:08:21,720 In May we actually held a stakeholders VIP day, which many of you attended. 97 00:08:21,720 --> 00:08:26,950 And we started the data collection and completed it in early May and since that time 98 00:08:26,950 --> 00:08:32,100 the team has been performing the analysis of that set-data. 99 00:08:32,100 --> 00:08:37,180 For matching systems, we received 22 applications and we accepted 15. The evaluators 100 00:08:37,180 --> 00:08:42,210 came from the Department of State, IARPA, DOJ, NIST and DHS S&T. 101 00:08:42,210 --> 00:08:47,470 These 15 matching systems included eight phases. 102 00:08:47,470 --> 00:08:52,630 Algorithms, four iris algorithms and three fingerprint systems. Today 103 00:08:52,630 --> 00:08:57,700 the results that you hear are based upon matching systems, but there's also an interaction with the 104 00:08:57,700 --> 00:09:02,850 acquisition system devices. That's very important and you'll be hearing about as well. With that I'll turn it over 105 00:09:02,850 --> 00:09:07,960 to John Howard to talk about the acquisition process. [John] Thanks Jerry. As Jerry mentioned, we know this is 106 00:09:07,960 --> 00:09:13,300 the matching system webinar, but we sort of think it is important that everyone understand how we collected 107 00:09:13,300 --> 00:09:18,460 images we used to evaluate the matching systems so we're just going to go through the 108 00:09:18,460 --> 00:09:23,550 sort of the acquisition test process really quick. What you see on slide 10 is sort of 109 00:09:23,550 --> 00:09:28,850 broadly what the acquisition systems were designed to do. The dark gray box on the left 110 00:09:28,850 --> 00:09:34,090 hand image is that 6x8 foot space that Jerry mentioned. 111 00:09:34,090 --> 00:09:39,240 Acquisition system providers came to the Maryland Test Facility in early May, they had two days 112 00:09:39,240 --> 00:09:44,300 they were able to put whatever they wanted in terms of sensors and cameras and instructions 113 00:09:44,300 --> 00:09:49,300 inside that space as long as it was safe for our participants. And then we brought about 114 00:09:49,300 --> 00:09:54,470 400 people through and sort of queued them through the process you see on the left hand side and the right 115 00:09:54,470 --> 00:09:59,540 hand side is actually a picture of what that looked like. Basically groups of about 15 116 00:09:59,540 --> 00:10:04,900 would queue up in front of the system. They would be scanned one at a time 117 00:10:04,900 --> 00:10:10,140 into kind of the broader test station. At some point they would approach thee acquisition 118 00:10:10,140 --> 00:10:15,290 system crossing a beam break. They would do whatever they were instructed to do with the system. The system 119 00:10:15,290 --> 00:10:20,370 would send us images, either faces, fingers or irises. This common acquisition system API 120 00:10:20,370 --> 00:10:25,640 the test volunteer would leave the station tripping another beam break and then 121 00:10:25,640 --> 00:10:30,830 would rate their experience with this satisfaction kiosk. The ground truth 122 00:10:30,830 --> 00:10:35,970 identification information which comes off the wristband scan, there at step two. 123 00:10:35,970 --> 00:10:41,000 And the images that were sent via the API during this interaction. What we used 124 00:10:41,000 --> 00:10:46,230 as our probe images for evaluating the rally matching systems. 125 00:10:46,230 --> 00:10:51,390 The last thing that I'll sort of mention from a test process standpoint 126 00:10:51,390 --> 00:10:56,460 is that before all of the test groups entered the bay, they were given general instructions 127 00:10:56,460 --> 00:11:01,750 that each one of these acquisition stations were going to you know collect biometric images 128 00:11:01,750 --> 00:11:06,990 for the purposes of performing an identification. They weren't trained specifically on 129 00:11:06,990 --> 00:11:12,130 how to use any of these systems. So they were sort of naive users approaching each different 130 00:11:12,130 --> 00:11:17,180 acquisition systems. So they didn't know where to look necessarily. How to interact with 131 00:11:17,180 --> 00:11:22,440 the system, etc. We also collected prior to all of this 132 00:11:22,440 --> 00:11:27,640 test process taking place. What you would call same day ground truth images 133 00:11:27,640 --> 00:11:32,740 we had a manned enrollment station with a trained biometric collector 134 00:11:32,740 --> 00:11:38,060 take a really good picture of each subjects face. A really good picture of each subjects irises 135 00:11:38,060 --> 00:11:43,320 and a really good picture of each subjects fingerprints so we knew we sort of what they 136 00:11:43,320 --> 00:11:48,490 looked like. What each of those biometric samples looked like prior to the interaction with all these different 137 00:11:48,490 --> 00:11:53,570 acquisitions. A couple of other considerations we took into 138 00:11:53,570 --> 00:11:58,840 account just to make sure that the test was sort of fair and broadly applicable 139 00:11:58,840 --> 00:12:03,860 our sample size, which we will talk a little bit more on the next slide was over 400 people that's 140 00:12:03,860 --> 00:12:08,980 not a gigantic sample size, it's not like NIST level you know millions of images 141 00:12:08,980 --> 00:12:14,010 testing but it is pretty good. It allows us to sort of get down and report results with a plus 142 00:12:14,010 --> 00:12:19,290 or minus half of percent precision. Our demographics, our population is very diverse. 143 00:12:19,290 --> 00:12:24,420 We have all sort of ages, genders, races and people with sort of prior experience coming in 144 00:12:24,420 --> 00:12:29,430 MDTF working with biometric systems and people that have never been to the MDTF before. 145 00:12:29,430 --> 00:12:34,710 And then the counter balancing. The order in which every group interacted with 146 00:12:34,710 --> 00:12:39,870 acquisition system was sort of controlled and randomized so that you didn't 147 00:12:39,870 --> 00:12:44,900 see, every group didn't see a particular acquisition system 148 00:12:44,900 --> 00:12:50,500 first and every group didn't see a particular acquisition system last. This is sort of to wash out if they were learning 149 00:12:50,500 --> 00:12:55,740 how to interact how to with a biometric system on system A and then they got better on system B, well the next time 150 00:12:55,740 --> 00:13:00,880 they would see system B then system A. So we hope that there was not any habituation effect. So here's what the 151 00:13:00,880 --> 00:13:05,920 actual test population that participated in the 2019 biometric technology rally looked like. 152 00:13:05,920 --> 00:13:11,140 There were 430 test volunteers that used every single acquisition system. 153 00:13:11,140 --> 00:13:16,150 In the upper left there you can see they were almost evenly split 50/50. 154 00:13:16,150 --> 00:13:21,230 Males and females. Our age distributions are in the upper middle plot. You can see 155 00:13:21,230 --> 00:13:26,470 a pretty good spread from 18 to 81, maybe a little bit of a skew towards 156 00:13:26,470 --> 00:13:31,600 the 20 to 30 year old demographic. Our self-reported 157 00:13:31,600 --> 00:13:36,630 race breakdown is in the upper right chart. You can see we are about 45% 158 00:13:36,630 --> 00:13:41,900 Black or African-American, 35% White or Caucasian, and then about 20% 159 00:13:41,900 --> 00:13:47,070 of people that identified as something other than those two. And then we also collected a few other things 160 00:13:47,070 --> 00:13:52,290 like height and weight as well. We presented this chart back in the 161 00:13:52,290 --> 00:13:57,470 original webinars that Jerry mentioned in November. This was always sort of always out there about how 162 00:13:57,470 --> 00:14:02,550 we were going to evaluate these matching systems and it was really on two things. Basically their 163 00:14:02,550 --> 00:14:07,800 ability to template the biometric image that came from these diverse sort of acquisition systems 164 00:14:07,800 --> 00:14:12,980 to work with these images and then to also match those images. 165 00:14:12,980 --> 00:14:18,070 And when I say match those images, I mean correctly identify. So not a one to one 166 00:14:18,070 --> 00:14:23,380 match, but a one to end, end being our gallery size 167 00:14:23,380 --> 00:14:28,380 match and that makes since for this unintended high throughput use case, right? If you're trying to build a 168 00:14:28,380 --> 00:14:33,420 biometric systems that takes less than 10 seconds. You're not gonna have a component where you stop to present an 169 00:14:33,420 --> 00:14:38,430 idea, you'll present a ticket. So it has to be a one to end, it can't be a one to one evaluation. 170 00:14:38,430 --> 00:14:43,670 The galleries we used for that end, are images we've collected over the last five years at the MDTF. 171 00:14:43,670 --> 00:14:48,830 Which is a gallery for faces, there's a gallery for fingers and there's a gallery for irises. 172 00:14:48,830 --> 00:14:53,920 So different images in each gallery, but about the same size for each one. Each one had about 500 173 00:14:53,920 --> 00:14:58,930 subjects. So this sort of figure down in here in the lower right is what that looked 174 00:14:58,930 --> 00:15:04,140 like. If you are face matcher B for example, we gave you this 175 00:15:04,140 --> 00:15:09,240 gallery, the purple box and then we also sent every probe images that 176 00:15:09,240 --> 00:15:14,260 was collected from each acquisition system that collected face. So those all 177 00:15:14,260 --> 00:15:19,490 went into your matcher, we got true identification results out and then we looked at those 178 00:15:19,490 --> 00:15:24,640 true identification results sort of across acquisition systems and that's this robustness measure 179 00:15:24,640 --> 00:15:29,690 that we talked about. We wanted to find matching systems that could do well no matter 180 00:15:29,690 --> 00:15:35,030 where there images were coming from. [Arun] Thanks John. In operational 181 00:15:35,030 --> 00:15:40,200 biometric deployments, matching systems don't work in isolation and 182 00:15:40,200 --> 00:15:45,310 in 2019 rally for that reason focused on evaluating operational like 183 00:15:45,310 --> 00:15:49,330 combinations of matching systems and acquisition systems. And the key metric 184 00:15:49,330 --> 00:15:54,810 that we use to evaluate this performance is the true 185 00:15:54,810 --> 00:15:59,990 rate. We define this true identification rate at the 186 00:15:59,990 --> 00:16:05,100 percentage of transaction that result in a correct identity at a set threshold 187 00:16:05,100 --> 00:16:10,460 for each matching system. So as John mentioned we're doing identification 188 00:16:10,460 --> 00:16:15,700 operations against our gallery. It's not a 1 to 1 verification, it's a 189 00:16:15,700 --> 00:16:20,870 1 to end identification. So this true identification rate value was calculated 190 00:16:20,870 --> 00:16:25,960 separately for each combination of matching system and acquisition system. 191 00:16:25,960 --> 00:16:31,240 This is a key point. If you're used to looking at NIST evaluations you can think 192 00:16:31,240 --> 00:16:36,260 of this approach as you know NIST separately evaluates algorithms on 193 00:16:36,260 --> 00:16:41,300 classes of images. For example these images versus mug shot images versus 194 00:16:41,300 --> 00:16:46,310 selfie images. So we separately evaluated each matching 195 00:16:46,310 --> 00:16:51,570 system using images acquired by each acquisition system 196 00:16:51,570 --> 00:16:56,720 separately so a key point is each acquisition system 197 00:16:56,720 --> 00:17:01,780 gathered images on the same 430 subjects, which is not the case for the NIST evaluations. 198 00:17:01,780 --> 00:17:06,800 In our case the same 430 people had a shot of 199 00:17:06,800 --> 00:17:11,990 getting an image acquired on each acquisition on each acquisition system in the rally and each matching 200 00:17:11,990 --> 00:17:17,070 system had a shot of matching system had a shot of those images acquired on those acquisition systems. 201 00:17:17,070 --> 00:17:22,370 So for each system combination the true identification rate 202 00:17:22,370 --> 00:17:27,390 was evaluated both excluding failure to acquire to focus on 203 00:17:27,390 --> 00:17:32,420 matching system performance. This is something we won't focus on in this brief. 204 00:17:32,420 --> 00:17:37,440 And inclusive of failure to acquire. To focus on sort of overall expected operational performance 205 00:17:37,440 --> 00:17:42,690 of the system combination and this is really the key value 206 00:17:42,690 --> 00:17:47,850 because the total system performance is inclusive of all sources of error both in terms of 207 00:17:47,850 --> 00:17:52,860 acquisition and in terms of matching. So how do we set this threshold? 208 00:17:52,860 --> 00:17:58,050 We actually fix the threshold for calculating true identification rate at a setting suitable to generate 209 00:17:58,050 --> 00:18:03,190 a false match rate of one in a million. This was reported 210 00:18:03,190 --> 00:18:08,490 by each match system provider separately to us. We did not do our own evaluation 211 00:18:08,490 --> 00:18:13,710 to verify that that's true. We just used what the matching system providers 212 00:18:13,710 --> 00:18:18,720 told us. The FMR threshold setting used for matching 213 00:18:18,720 --> 00:18:23,740 system performance was chosen such that the expected number of false positives 214 00:18:23,740 --> 00:18:28,970 observed during the rally is zero. In fact we had 430 215 00:18:28,970 --> 00:18:34,130 volunteers, 76 of them were not in the rally gallery and the 216 00:18:34,130 --> 00:18:39,210 correct behavior for any matching system for a volunteer that's not in the gallery would be 217 00:18:39,210 --> 00:18:44,510 to say that this volunteer is unidentified. That is that it doesn't have a mate in the gallery. 218 00:18:44,510 --> 00:18:49,730 So at a threshold at one and a million, the expected true negative identification 219 00:18:49,730 --> 00:18:54,870 rate should be 100%. So every one of those 76 should 220 00:18:54,870 --> 00:18:59,920 be correctly classified as being out of the gallery. 221 00:18:59,920 --> 00:19:05,190 So conversely the number of false matches is zero. So that's the threshold that we picked, the chart on the 222 00:19:05,190 --> 00:19:10,380 right just kind of shows you a theoretical curve showing 223 00:19:10,380 --> 00:19:15,500 the performance for true negative identification rate as a function of false 224 00:19:15,500 --> 00:19:20,810 match rate and where different thresholds that we ask the vendor to provide to us would be 225 00:19:20,810 --> 00:19:26,060 set and the one we chose again was one in a million. If you have trouble seeing the 226 00:19:26,060 --> 00:19:31,230 graphs, I recommend you to maximize the slide, so that you have a better shot 227 00:19:31,230 --> 00:19:36,420 of being able to follow along cause there's going to be a lot of numbers after this. Here I want to pause and take a 228 00:19:36,420 --> 00:19:41,550 moment to go over how we are going to be visualizing these true identification rate results. 229 00:19:41,550 --> 00:19:46,560 For the 2019 matching system analyses, we're going to be focusing on 230 00:19:46,560 --> 00:19:51,800 true identification rates for each combination as I stated before of acquisition systems. 231 00:19:51,800 --> 00:19:57,160 These are the column headers in this little matrix that you see on the left and matching systems. 232 00:19:57,160 --> 00:20:02,170 These are going to be the row headers on the matrix on the left. And each circle 233 00:20:02,170 --> 00:20:07,410 in the visualization is going to refer to one system combination and now we can switch to the blue box 234 00:20:07,410 --> 00:20:12,550 and sort of take a look at inside. The black number inside the circle 235 00:20:12,550 --> 00:20:17,610 presents true identification rate performance, which is inclusive of any failures to 236 00:20:17,610 --> 00:20:22,860 acquire by the acquisition system. That's the more operational measure of performance. 237 00:20:22,860 --> 00:20:28,060 There is also going to be a red number within each circle that's gonna be more of interest if you're 238 00:20:28,060 --> 00:20:33,190 ore interested in the performance of the matching systems in isolation. 239 00:20:33,190 --> 00:20:38,510 So I'm not going to go through those in a lot of detail today. 240 00:20:38,510 --> 00:20:43,810 But they are going to be there and available for you to look 241 00:20:43,810 --> 00:20:49,010 at and I should not that every chart that we're gonna present today at during this webinar is going to be 242 00:20:49,010 --> 00:20:54,140 available on our website MDTF.org for you to look at your leisure and 243 00:20:54,140 --> 00:20:59,290 all these tutorials for how to read the charts will also be available there. So the other metric 244 00:20:59,290 --> 00:21:04,380 that John mentioned that we computed for our matching system and for our acquisition system analysis 245 00:21:04,380 --> 00:21:09,630 is this notion of robustness. And as John mentioned robustness really 246 00:21:09,630 --> 00:21:14,650 quantifies the variability in matching system performance across the acquisition 247 00:21:14,650 --> 00:21:19,910 systems. In fact we quantify that very simply as the range of observed 248 00:21:19,910 --> 00:21:25,100 true identification rate values across systems. And we did that at a false match rate 249 00:21:25,100 --> 00:21:30,130 of one in a million. So if you have a number of different acquisition systems that 250 00:21:30,130 --> 00:21:35,150 supply images to your matching system, we're gonna get a different true identification rate value for 251 00:21:35,150 --> 00:21:40,370 each system combination. And then for that matching system we're gonna take a look at the highest 252 00:21:40,370 --> 00:21:45,500 performance and the lowest performance and take a look at the difference between the 253 00:21:45,500 --> 00:21:50,550 spread between those two and that's gonna be our measure of robustness. So in this case, since robustness 254 00:21:50,550 --> 00:21:55,820 is measuring the variability, we're looking for low variability systems that you know give you 255 00:21:55,820 --> 00:22:01,010 you the same high performance ideally across different acquisition systems. 256 00:22:01,010 --> 00:22:06,120 And so we want that robustness value to be low. Our goal 257 00:22:06,120 --> 00:22:11,430 for the 2019 rally was a 5% variation in performance 258 00:22:11,430 --> 00:22:16,710 at most across acquisition systems and I should note that you could turn around this 259 00:22:16,710 --> 00:22:21,910 robustness metric and take a look at how robust are 260 00:22:21,910 --> 00:22:27,000 matching systems across different algorithms that work with their images and that's another value of robustness 261 00:22:27,000 --> 00:22:32,200 that we can now report that's tied back to the acquisition system. So this kind of illustrates that a little bit better 262 00:22:32,200 --> 00:22:37,570 if we take a look on the left there are two plots there 263 00:22:37,570 --> 00:22:42,630 of a system that's not meeting the robustness threshold and on the right is a different 264 00:22:42,630 --> 00:22:47,680 plots that are both for a system meets the robustness threshold at least for discounting 265 00:22:47,680 --> 00:22:52,940 failures to acquire. So if you look on the left first, for this 266 00:22:52,940 --> 00:22:58,140 system on the plot that has a curve in black. That plot 267 00:22:58,140 --> 00:23:03,220 you see that the performance, this is true identification rate on the Y-axis and you see 268 00:23:03,220 --> 00:23:08,510 that it varies quite a bit from acquisition system to acquisition system on the X-axis. 269 00:23:08,510 --> 00:23:13,530 And robustness basically quantifies how wide a variation 270 00:23:13,530 --> 00:23:18,670 that is. You see that bracket, in this case it's quite high 271 00:23:18,670 --> 00:23:23,710 61%. You can take a look at that counting failures to 272 00:23:23,710 --> 00:23:28,960 acquire or just looking at the algorithm performance by itself. Discounting failures to acquire 273 00:23:28,960 --> 00:23:34,120 no matter which way you look that bracket is very wide and that performance varies quite a bit 274 00:23:34,120 --> 00:23:39,220 in cross acquisition system. Whereas for the system on the right you could see that first of all performance varies 275 00:23:39,220 --> 00:23:44,530 less even counting failure to acquire now that black is only 276 00:23:44,530 --> 00:23:49,780 32% high, however discounting failures to acquire you can see that the performances 277 00:23:49,780 --> 00:23:54,950 uniformly are very good. This system was able to reliably match images no matter which 278 00:23:54,950 --> 00:24:00,030 acquisition system those images were acquired on and you can see that variation is very 279 00:24:00,030 --> 00:24:05,310 small just 1.2%. So this system met the robustness 280 00:24:05,310 --> 00:24:10,310 threshold in fact it met the goal discounting failures to 281 00:24:10,310 --> 00:24:15,580 acquire. What we're gonna show you is for each row again, each row is associated 282 00:24:15,580 --> 00:24:20,780 with a particular matching system. We're going to show you some robustness values 283 00:24:20,780 --> 00:24:25,810 on the right margin for that matching system and again we're gonna be using 284 00:24:25,810 --> 00:24:31,070 this sort of bracket notation to give you a sense of variation there was 285 00:24:31,070 --> 00:24:36,250 across acquisition systems in this case. And if we look at the chart on the right, that's actually the same 286 00:24:36,250 --> 00:24:41,330 chart as was on the previous slide for the system with good robustness. We'll show you 287 00:24:41,330 --> 00:24:46,640 two numbers. One for robustness without counting failures to acquire 288 00:24:46,640 --> 00:24:51,900 and that's those red numbers and one for the robustness 289 00:24:51,900 --> 00:24:57,050 counting failures to acquire the black numbers. And there are three numbers needed to describe the robustness. 290 00:24:57,050 --> 00:25:02,060 There's the maximum level of performance, in this case in both cases it was 100%. Then there's 291 00:25:02,060 --> 00:25:07,340 the minimum level of performance at the lowest tier observed across acquisition systems 292 00:25:07,340 --> 00:25:12,720 in this case it was 98.8% for the red numbers and 293 00:25:12,720 --> 00:25:18,110 67.7% for the black numbers. And then that result in range which is just the difference between those two 294 00:25:18,110 --> 00:25:23,170 numbers, 1.2% for the red and 32.3% for the black. 295 00:25:23,170 --> 00:25:28,470 And those numbers will be present on that gray margin to the right. We're also gonna take a look at 296 00:25:28,470 --> 00:25:33,670 robustness for acquisition systems. Those numbers are gonna be similarly computed but only 297 00:25:33,670 --> 00:25:38,790 looking at that one acquisition system now across matching systems and they’re going to be presented 298 00:25:38,790 --> 00:25:43,820 below each column. And to remind everyone robust systems have 299 00:25:43,820 --> 00:25:49,100 smaller variations between pairs of systems and low numbers indicate more 300 00:25:49,100 --> 00:25:54,260 robust systems. So you want those numbers to be low, closer to low. Closer to 1.2 than 32.3. 301 00:25:54,260 --> 00:25:59,280 [Arun] Yevgeniy can you kinda reiterate the difference between the red numbers and the black numbers again? 302 00:25:59,280 --> 00:26:04,590 [Yevgeniy] Sure. Just to again reorient everyone again the red numbers specifically 303 00:26:04,590 --> 00:26:09,610 focus on matching systems, algorithm performance and they discount any failures 304 00:26:09,610 --> 00:26:14,730 of a acquisition system to acquire the images, while they don't describe the 305 00:26:14,730 --> 00:26:19,750 operational total performance of that system combination their relevant for specifically 306 00:26:19,750 --> 00:26:25,000 evaluating the matcher, whereas the black numbers focus 307 00:26:25,000 --> 00:26:30,160 in on total system performance inclusive of any failures to acquire or match. 308 00:26:30,160 --> 00:26:35,250 These are probably what you would observe if you were for say to deploy that system combination in the field. 309 00:26:35,250 --> 00:26:40,410 Here's our first actual results slide. Let's take a look 310 00:26:40,410 --> 00:26:45,500 at the performance of finger print system included in the 2019 rally. So if you 311 00:26:45,500 --> 00:26:50,790 have previously looked at acquisition system results at MDTF.org either for the 2018 rally 312 00:26:50,790 --> 00:26:56,000 or 2019 rally. You should recall that we officiate acquisition system names. In this case 313 00:26:56,000 --> 00:27:01,230 We used Rocky Mountain Peaks and we officiated matching system names 314 00:27:01,230 --> 00:27:06,360 also to protect privacy of these companies. Here we used names of U.S. rivers. 315 00:27:06,360 --> 00:27:12,970 In the 2019 rally, we had three fingerprint acquisition systems, these are systems 316 00:27:12,970 --> 00:27:18,070 Gabe, Foraker and Baker. You can see those as the column headers and then we also have 317 00:27:18,070 --> 00:27:23,350 three fingerprint matching systems. Iowa, Kansas and Ohio 318 00:27:23,350 --> 00:27:28,580 that are the row labels and consequently we've got nine circles altogether 319 00:27:28,580 --> 00:27:33,750 nine system combinations. Any shape that has a couple numbers. Again the black number is 320 00:27:33,750 --> 00:27:38,820 the main one we are going to focus on which is the total system performance for that combination. 321 00:27:38,820 --> 00:27:44,110 The circle in green is the best observed performance. 322 00:27:44,110 --> 00:27:49,520 The combination with the best observed performance. And you can see that overall no fingerprint 323 00:27:49,520 --> 00:27:54,670 system combination met the 99% true identification goal when you count the 324 00:27:54,670 --> 00:27:59,740 metrics to acquire. And that green number there is the best total system performance that we 325 00:27:59,740 --> 00:28:05,000 observed which was only 73.5% which is well short 326 00:28:05,000 --> 00:28:10,200 of that 99% goal, primarily due to some issues with acquisition. 327 00:28:10,200 --> 00:28:15,230 However for the counting failures to acquire the 328 00:28:15,230 --> 00:28:20,260 matching system with the lowest performance variation but is the best robustness was Kansas. 329 00:28:20,260 --> 00:28:25,470 It had a 5% variation in performance across acquisition 330 00:28:25,470 --> 00:28:30,620 systems, however even though that variation was reasonably small the 331 00:28:30,620 --> 00:28:35,680 performance was typically, you know between 60 and a half, down to 55 and a half percent correct. 332 00:28:35,680 --> 00:28:40,920 So not a very good overall net performance. 333 00:28:40,920 --> 00:28:46,100 Again looking at however acquisition systems and looking 334 00:28:46,100 --> 00:28:51,210 at the robustness of those, system Foraker was somewhat robust 335 00:28:51,210 --> 00:28:56,230 and then counted failures to acquire Foraker achieved 2.3%. 336 00:28:56,230 --> 00:29:01,430 A variation, which is very low variation 337 00:29:01,430 --> 00:29:06,570 have again many failures to acquire led to somewhat low performance estimates. 338 00:29:06,570 --> 00:29:11,580 The story for some of the matching systems by themselves in red is a little bit 339 00:29:11,580 --> 00:29:16,820 better, but I'm not going to go through those numbers but for some of these systems the performance was quite 340 00:29:16,820 --> 00:29:21,990 high meeting rally thresholds. However 341 00:29:21,990 --> 00:29:27,080 here we are focused on total system performance. So I'm going to let you kind of 342 00:29:27,080 --> 00:29:32,380 take that offline once the results are available on the website, you'll be able to go back 343 00:29:32,380 --> 00:29:37,630 and look at those. We can also, if you're interested ask a question at the end of this presentation. 344 00:29:37,630 --> 00:29:42,770 We can go back and take a look at some of those numbers if you're interested. So next 345 00:29:42,770 --> 00:29:47,830 we'll be iris systems. We have two iris acquisition systems 346 00:29:47,830 --> 00:29:53,080 participate in the 2019 rally. These were systems Wood and Sanford and we 347 00:29:53,080 --> 00:29:58,250 actually had four iris matching systems in the rally, Red, Black, White and Green. 348 00:29:58,250 --> 00:30:03,360 These are all U.S. rivers. The best system combination 349 00:30:03,360 --> 00:30:08,390 for total system performance was the combination of system Red and system Wood 350 00:30:08,390 --> 00:30:13,640 Which achieved 82.8% true identification rate. 351 00:30:13,640 --> 00:30:18,670 Again no iris system combination met the 99% 352 00:30:18,670 --> 00:30:23,690 true identification rate goal for total system performance. 353 00:30:23,690 --> 00:30:28,900 Overall accounting for those to acquire the lowest performance variation for matching system 354 00:30:28,900 --> 00:30:34,070 was system Green at 6% and the acquisition system with 355 00:30:34,070 --> 00:30:39,140 the best robustness was system Sanford, which actually achieved very good 356 00:30:39,140 --> 00:30:44,160 under 1% variation though relatively 357 00:30:44,160 --> 00:30:49,350 poor net level of performance due to a lot of failures to acquire in that system. 358 00:30:49,350 --> 00:30:54,460 So that takeaway here is the iris and fingerprint 359 00:30:54,460 --> 00:30:59,480 system combinations that we tested, didn't quite live up to the goals of the 360 00:30:59,480 --> 00:31:04,630 rally. Now that we are used to reading these charts, here is a slide with 361 00:31:04,630 --> 00:31:09,700 a fair number of more bubbles. Again this information will 362 00:31:09,700 --> 00:31:14,960 be presented on our websites. You'll be able to look at it at 363 00:31:14,960 --> 00:31:20,140 your leisure and my purpose here is to sort of explain how the data is laid out 364 00:31:20,140 --> 00:31:25,260 and sort out major conclusions and takeaways. In the 2019 365 00:31:25,260 --> 00:31:30,320 rally there were 10 face acquisition systems and you can 366 00:31:30,320 --> 00:31:35,570 see those as columns across the top and there were face matching systems and you can see those 367 00:31:35,570 --> 00:31:40,990 as rows across the left side. 368 00:31:40,990 --> 00:31:46,150 Overall the good news story for face, four out of eight 369 00:31:46,150 --> 00:31:51,210 matching systems were able to meet the 99% true identification goal in combination 370 00:31:51,210 --> 00:31:56,760 with at least one acquisition system in fact for four of these matching systems 371 00:31:56,760 --> 00:32:02,020 the performance was flawless with one particular acquisition systems. 372 00:32:02,020 --> 00:32:07,160 System Teton. Five of eight matching systems met the 95% 373 00:32:07,160 --> 00:32:12,220 true identification rate threshold, in combination with at least one acquisition system 374 00:32:12,220 --> 00:32:17,470 and these are, just to orient you, these are going to be the filled circles on those systems that meet 375 00:32:17,470 --> 00:32:22,680 system combinations that meet the goal for the operational 376 00:32:22,680 --> 00:32:27,710 true identification rate and the half-filled circles are those that meet the threshold, 377 00:32:27,710 --> 00:32:33,010 but don't quite meet the goal. So you can see that five of those matching systems 378 00:32:33,010 --> 00:32:38,240 met the threshold. That also includes system Salmon in addition to 379 00:32:38,240 --> 00:32:43,400 Pecos, Wabash, Trinity and Mobile. 380 00:32:43,400 --> 00:32:48,500 12 of 80 face system combinations met the goal of the rally, 381 00:32:48,500 --> 00:32:53,770 counting failure to acquire. 23 of 80 face system combinations met the 382 00:32:53,770 --> 00:32:58,970 threshold counting failure to acquire and overall looking at 383 00:32:58,970 --> 00:33:03,970 the robustness results, we see that matching systems were generally not very robust 384 00:33:03,970 --> 00:33:09,010 across acquisition systems, when you consider failure to acquire although if you look at just the 385 00:33:09,010 --> 00:33:14,270 images acquired some of the matching systems performed really well and were able to 386 00:33:14,270 --> 00:33:19,430 match most, almost all of those images. So that's 387 00:33:19,430 --> 00:33:24,520 something to take a look at as well. Acquisition systems in general were 388 00:33:24,520 --> 00:33:29,820 not robust across matching systems, either counting or discounting failure to acquire. 389 00:33:29,820 --> 00:33:35,050 There's a lot of variation of quality of the matching systems and how well they worked across different 390 00:33:35,050 --> 00:33:40,200 [Stutters] I'm sorry, across individual 391 00:33:40,200 --> 00:33:45,480 for individual acquisition systems. What are some of the overall conclusions that we can takeaway 392 00:33:45,480 --> 00:33:50,720 from our analysis? So first of all some matching 393 00:33:50,720 --> 00:33:55,870 system and acquisition system, worked perfectly in our test. 394 00:33:55,870 --> 00:34:00,930 That is to say that they matched all of the 430 diverse 395 00:34:00,930 --> 00:34:05,960 subjects, diverse test volunteers that used each system. 396 00:34:05,960 --> 00:34:11,140 That's actually a pretty impressive feat considering the number of places where errors could have creeped in. 397 00:34:11,140 --> 00:34:16,250 They could failed to acquire, failed to extract and they could have failed to match. So flawless performance 398 00:34:16,250 --> 00:34:21,540 very good. However if we look across all the systems 399 00:34:21,540 --> 00:34:26,760 all 80, actually 97 system combinations across modalities 400 00:34:26,760 --> 00:34:31,930 that we tested only 12% of those. Actually 12 system 401 00:34:31,930 --> 00:34:36,990 combinations were actually face matching and acquisition system combinations. 402 00:34:36,990 --> 00:34:42,250 And that should give some concern, because all systems included in 403 00:34:42,250 --> 00:34:47,450 this evaluated passed subject matter expert review for inclusion into the rally and that 404 00:34:47,450 --> 00:34:52,480 met all the rally participation criteria. So if you're putting together an operational 405 00:34:52,480 --> 00:34:57,490 system deployment and you're looking for acquisition systems and algorithms 406 00:34:57,490 --> 00:35:02,720 matching systems to kind of combine without some extra testing 407 00:35:02,720 --> 00:35:07,880 your chances of success might be closer to 12% which is 408 00:35:07,880 --> 00:35:12,950 not very encouraging. The rally for the first time tested a new notion system robustness. 409 00:35:12,950 --> 00:35:18,220 You know how well does a particular system work across 410 00:35:18,220 --> 00:35:23,430 systems that it would be operationally paired with. So a matching system might be paired with 411 00:35:23,430 --> 00:35:28,550 different images from different acquisition systems or an acquisition system 412 00:35:28,550 --> 00:35:33,570 those images might be matched by different matching 413 00:35:33,570 --> 00:35:38,600 algorithms. Some face and iris algorithms did maintain good robustness across acquisition systems 414 00:35:38,600 --> 00:35:43,620 but others did not. No face acquisition system was robust 415 00:35:43,620 --> 00:35:48,650 across all matching systems including in this evaluation. 416 00:35:48,650 --> 00:35:53,880 So carefully pick matching systems may perform well 417 00:35:53,880 --> 00:35:59,030 so long as the acquisition system acquires images. However some matching systems 418 00:35:59,030 --> 00:36:04,110 that did not perform well even with a good acquisition 419 00:36:04,110 --> 00:36:09,390 system. So an overall conclusion from the analysis is that system combinations must be carefully considered 420 00:36:09,390 --> 00:36:14,600 to achieve optimal performance and operations. With that 421 00:36:14,600 --> 00:36:19,760 I'm going to turn things back over to Arun to close things out. [Arun] So first of all 422 00:36:19,760 --> 00:36:24,820 thank you all for joining us for the webinar today. I know that we threw a lot of information 423 00:36:24,820 --> 00:36:29,930 at you. There is also a lot of numbers on some of these slides. We will have 424 00:36:29,930 --> 00:36:34,950 these charts available on MDTF.org. So you can go back 425 00:36:34,950 --> 00:36:40,170 and take a look at them at your own convenience. You can also reach out with additional questions 426 00:36:40,170 --> 00:36:45,330 obviously right now during this webinar, but if you have questions later on please feel free to 427 00:36:45,330 --> 00:36:50,390 email us peoplescreening@HQ.DHS,gov. So again 428 00:36:50,390 --> 00:36:55,650 a lot of information here. A lot of discussion about this robustness measure. 429 00:36:55,650 --> 00:37:00,830 We were talking about it earlier. We realized that one chart on face, facial 430 00:37:00,830 --> 00:37:05,940 recognition system, both collection and matching systems. There's about 600 numbers there. 431 00:37:05,940 --> 00:37:10,960 So we know that could be a lot to take in. So we do 432 00:37:10,960 --> 00:37:16,180 value your feedback and your questions so please let us know and we'll do our best to help answer those 433 00:37:16,180 --> 00:37:21,320 questions. So the first 434 00:37:21,320 --> 00:37:26,370 question is "What would prevent a vendor from providing a generous threshold at a lower 435 00:37:26,370 --> 00:37:31,620 threshold to improve their accuracy? i.e., providing thresholds 436 00:37:31,620 --> 00:37:36,810 for FMI at one and ten thousand? 437 00:37:36,810 --> 00:37:41,910 [John] I guess I can take this a little bit. 438 00:37:41,910 --> 00:37:46,920 So as Yevgeniy mentioned we have these 430 people that went through every acquisition system. 439 00:37:46,920 --> 00:37:52,150 Like I mentioned we had a gallery of 500 people that we were matching back too 440 00:37:52,150 --> 00:37:57,310 and 76 of those people actually didn't have any images in the gallery so the correct response from the 441 00:37:57,310 --> 00:38:02,380 system is unidentified. If they have lowered their threshold 442 00:38:02,380 --> 00:38:07,650 generously to match more people. Match more of the 443 00:38:07,650 --> 00:38:12,850 370 some that did have images it would have started falsely matching the ones that didn't. 444 00:38:12,850 --> 00:38:17,990 So that was actually one of the points of that particular 445 00:38:17,990 --> 00:38:23,020 chart that had the top two and tried to curve instead of what point would we expect 446 00:38:23,020 --> 00:38:28,250 zero of the 76 people that are out of gallery subjects to incorrectly match someone 447 00:38:28,250 --> 00:38:33,400 the answer to that question is a one in a million threshold. 448 00:38:33,400 --> 00:38:38,470 [Yevgeniy] Yeah, just to add to that a little bit, we did specify 449 00:38:38,470 --> 00:38:43,480 which threshold we used in these charts and that is the one in a million. We did 450 00:38:43,480 --> 00:38:48,490 run some of these numbers with the one in 10,000, one and 1,000 451 00:38:48,490 --> 00:38:53,570 and one in 100,000 thresholds and as expected at those lower thresholds we started seeing 452 00:38:53,570 --> 00:38:58,840 false positives for some of the systems. One of the things we're analyzing 453 00:38:58,840 --> 00:39:04,040 we don't have for you today is how 454 00:39:04,040 --> 00:39:09,170 vendors are capable of picking their thresholds and what are the thresholds they 455 00:39:09,170 --> 00:39:14,220 report are the right ones to achieve a certain level of performance. In this case 456 00:39:14,220 --> 00:39:19,450 we simply took it at face value that the threshold that should work at one in 100,000 457 00:39:19,450 --> 00:39:24,630 or one in a million is the correct one that they reported to us. 458 00:39:24,630 --> 00:39:29,720 [Arun] Next question, " The finger reliability had more black than red, but 459 00:39:29,720 --> 00:39:34,730 iris had the opposite trend. Can you comment?" [Jerry] This was in relation to the 460 00:39:34,730 --> 00:39:39,930 request systems? [Yevgeniy] Yes, sure I think that some of the issues were 461 00:39:39,930 --> 00:39:45,050 the variation in fingerprint performance across 462 00:39:45,050 --> 00:39:50,080 systems and I'm not sure if the comment is about robustness of algorithms 463 00:39:50,080 --> 00:39:55,320 or of acquisition systems but 464 00:39:55,320 --> 00:40:00,500 two of the, I can say that two if the fingerprint systems included in the rally 465 00:40:00,500 --> 00:40:05,590 were non-contact and one was contact based, so some of 466 00:40:05,590 --> 00:40:10,860 the variability in performance, especially with failure to acquire and failure to match was 467 00:40:10,860 --> 00:40:16,070 complex interaction between what higher acquisition rates 468 00:40:16,070 --> 00:40:21,190 for some systems and some much lower match rates for other systems. 469 00:40:21,190 --> 00:40:26,240 So it would take a little bit of time to explain how those interactions 470 00:40:26,240 --> 00:40:31,470 cinch together in total system performance. 471 00:40:31,470 --> 00:40:36,660 "And in only matching system performance. [Arun] Aside from the acquisition 472 00:40:36,660 --> 00:40:45,750 what are the other essential inputs to building a high performing face matching systems algorithm?" 473 00:40:45,750 --> 00:40:51,080 [Yevgeniy] So I think that. [Arun] It might just be face matching system, I think that's what the 474 00:40:51,080 --> 00:40:56,100 intent there is. [Yevgeniy] Yeah, so I think some of the 475 00:40:56,100 --> 00:41:01,320 issues that we see especially in regards to robustness 476 00:41:01,320 --> 00:41:06,490 there are some face matching systems in our 477 00:41:06,490 --> 00:41:11,580 evaluation that performed very well no matter what acquisition system the image 478 00:41:11,580 --> 00:41:16,870 was acquired on, however it really started to separate some of the system 479 00:41:16,870 --> 00:41:22,090 some of the matching systems from each other was the ability to process 480 00:41:22,090 --> 00:41:27,120 images from acquisition systems that could reliably acquire images, but none of those images were not 481 00:41:27,120 --> 00:41:32,160 of high enough quality to match using that matching system so I think 482 00:41:32,160 --> 00:41:37,380 that a highly performing face matching system should 483 00:41:37,380 --> 00:41:42,530 be robust across the different potential acquisition systems and the differences in image quality 484 00:41:42,530 --> 00:41:47,610 that those provide. [John] Yeah, I think maybe just 485 00:41:47,610 --> 00:41:52,910 [unintelligible] to Yevgeniy's point there is that they are acquisition systems 486 00:41:52,910 --> 00:41:58,150 that spent a lot of time during the rally, sort of trying to figure out how to position their camera. 487 00:41:58,150 --> 00:42:03,280 So one option is directly in front of the participants, where they literally would run into 488 00:42:03,280 --> 00:42:08,340 it if they didn't stop. And that would obviously acquire straight on nicely 489 00:42:08,340 --> 00:42:13,600 framed image but slow them down, because they need to go around it. And the other option is move that 490 00:42:13,600 --> 00:42:18,810 acquisition off to the side a little bit sort of getting an angled photo, which may or may not work 491 00:42:18,810 --> 00:42:23,110 with algorithms, sort of speed up that total process 492 00:42:23,110 --> 00:42:27,240 of moving through this. I think there's a broad answer to your question that 493 00:42:27,240 --> 00:42:31,490 factors, considerations that you have to think about. And then there is sort of these algorithms you have to think 494 00:42:31,490 --> 00:42:35,570 about. Acquisition system interaction effects to consider. 495 00:42:35,570 --> 00:42:39,750 [Arun] Yeah, I would say yeah 496 00:42:39,750 --> 00:42:44,090 those are really important. I think if you are talking about a specific operation 497 00:42:44,090 --> 00:42:48,110 you'd really want to focus on what is the concept of operations or the use. 498 00:42:48,110 --> 00:42:52,350 How are you supposed to interact with the system. I would start there first. You would want 499 00:42:52,350 --> 00:42:56,420 to pick the right types of cameras that work in that particular type of 500 00:42:56,420 --> 00:43:00,580 environment and then have matchers that work well across lots of 501 00:43:00,580 --> 00:43:04,600 different cameras, because you may have some variability 502 00:43:04,600 --> 00:43:08,700 with camera systems. You might go out and buy some cameras and then later on some of 503 00:43:08,700 --> 00:43:14,110 them break and you have to get new cameras. There will diversity in your cameras. 504 00:43:14,110 --> 00:43:18,250 so it's kind of good to kind of think of it in those 505 00:43:18,250 --> 00:43:22,510 kind of pieces perhaps. [Jerry] Just to add one more thing there's lots of variability 506 00:43:22,510 --> 00:43:26,610 in lighting and there's lots of variability in how these cameras behave with differences in lighting. 507 00:43:26,610 --> 00:43:30,820 And that's another consideration. 508 00:43:30,820 --> 00:43:35,150 [Arun] Next question. "Given that selecting a 509 00:43:35,150 --> 00:43:39,300 waiting system combination is low, when will business requirements be published 510 00:43:39,300 --> 00:43:43,560 for airport operators?" I think this is a little bit of so this is a little bit of a 511 00:43:43,560 --> 00:43:47,650 targeted question, but what I will say is, right now with 512 00:43:47,650 --> 00:43:51,850 the results as we have published them we have applied aliases to the different company names. 513 00:43:51,850 --> 00:43:56,180 So aliases are given 514 00:43:56,180 --> 00:44:00,320 so Rocky Mountain Peaks for collection systems. Rivers for 515 00:44:00,320 --> 00:44:04,560 matching algorithms. That being said if there are combinations of 516 00:44:04,560 --> 00:44:08,650 collection systems or matching algorithms that you are very interested in 517 00:44:08,650 --> 00:44:12,830 you can email peoplescreening@dhs.gov and we will 518 00:44:12,830 --> 00:44:17,120 forward your email to those companies so that they can follow-up with you directly. 519 00:44:17,120 --> 00:44:21,260 So that you have, you can understand which companies 520 00:44:21,260 --> 00:44:25,530 are performing. Actually I think that goes to the last question that was asked as well. 521 00:44:25,530 --> 00:44:29,610 "Will you be able to share the actual names of vendors you 522 00:44:29,610 --> 00:44:33,780 held to an NDA?" So we are not actually providing the vendor names, but what we 523 00:44:33,780 --> 00:44:38,120 will do is vendors who are very happy with their results 524 00:44:38,120 --> 00:44:42,260 may self-announce. This is what happened last year, where some companies did press releases. 525 00:44:42,260 --> 00:44:46,510 And they said we were company x, we were company y. 526 00:44:46,510 --> 00:44:50,880 But the other thing we offered was if there is specific interest in the company 527 00:44:50,880 --> 00:44:55,090 again tell us, email peoplescreening@dhs.gov 528 00:44:55,090 --> 00:44:59,130 tell us which companies you want to reach out to. 529 00:44:59,130 --> 00:45:03,280 Please don't say all of them. We're not gonna take those 530 00:45:03,280 --> 00:45:07,520 seriously. We are trying to abide the terms of our 531 00:45:07,520 --> 00:45:11,610 [unintelligible] with these companies. So if you give us a list of let's say two or three we will 532 00:45:11,610 --> 00:45:15,790 forward your email to those two or three companies so that they can follow back up with you. 533 00:45:15,790 --> 00:45:20,090 And hopefully answer your 534 00:45:20,090 --> 00:45:24,130 questions or share more information about their products. 535 00:45:24,130 --> 00:45:28,360 Next question were any of the matchers and systems from the same vendor such 536 00:45:28,360 --> 00:45:32,420 that they might be better optimized?" Good question. [John] Yeah so I 537 00:45:32,420 --> 00:45:36,580 really took a look at this recently. There were a number of matching systems 538 00:45:36,580 --> 00:45:41,080 and acquisition systems that were from the same company. 539 00:45:41,080 --> 00:45:45,220 If Arun tells me I can say it, I'll tell you exactly how many. And there were 540 00:45:45,220 --> 00:45:49,490 majority of those that actually had acquisition systems that did better with someone else's 541 00:45:49,490 --> 00:45:53,560 matcher. That was not uncommon. 542 00:45:53,560 --> 00:45:57,760 So I think there is definitely the room for this sort of 543 00:45:57,760 --> 00:46:02,060 match making process as Arun likes to call it, where 544 00:46:02,060 --> 00:46:06,200 you know there might be opportunity for partnerships and sort of collaborations out there. 545 00:46:06,200 --> 00:46:10,450 [Yevgeniy] Yeah and I would add to that so we that definitely officiated 546 00:46:10,450 --> 00:46:13,740 matching system names than differently than acquisition system names 547 00:46:13,740 --> 00:46:16,760 so you won't be able to just look up 548 00:46:16,760 --> 00:46:19,780 the same alias in the row in the column 549 00:46:19,780 --> 00:46:23,070 on any of the charts that we put out so if you 550 00:46:23,070 --> 00:46:26,100 I think if you wanna know those, some of those combinations 551 00:46:26,100 --> 00:46:29,110 you might have to reach out. [Arun] Okay, umm 552 00:46:29,110 --> 00:46:32,150 next question. So "In my knowledge face image 553 00:46:32,150 --> 00:46:35,200 acquisition rate is changed depending on the camera 554 00:46:35,200 --> 00:46:38,240 configuration. Does S&T have any camera specifications 555 00:46:38,240 --> 00:46:41,290 standards for the estimation of FTA?" 556 00:46:41,290 --> 00:46:44,360 umm, I would 557 00:46:44,360 --> 00:46:47,430 I think the answer to that question is no. I think 558 00:46:47,430 --> 00:46:50,510 the point here is the face 559 00:46:50,510 --> 00:46:53,580 acquisition system configuration really needs to be 560 00:46:53,580 --> 00:46:56,580 tailored to the CONOPS or the use case. 561 00:46:56,580 --> 00:46:59,670 Depending on the use case, you have a specific 562 00:46:59,670 --> 00:47:02,680 intended behavior of the 563 00:47:02,680 --> 00:47:06,730 user as well as what the camera is expected to do. 564 00:47:06,730 --> 00:47:09,950 this will vary from system to system depending on 565 00:47:09,950 --> 00:47:12,960 the specific use case. And depending on that use 566 00:47:12,960 --> 00:47:16,180 case you can kind of tailor these systems 567 00:47:16,180 --> 00:47:19,430 pick the right type of camera system. So to be honest with 568 00:47:19,430 --> 00:47:22,700 you the rally is really intended to if we were 569 00:47:22,700 --> 00:47:22,990 describing it in terms of specific DHS type use 570 00:47:22,990 --> 00:47:28,230 cases it would work primarily 571 00:47:28,230 --> 00:47:33,380 I would say it is really targeted for like an exit type of 572 00:47:33,380 --> 00:47:38,440 scenario or maybe like a TSA travel document as you are approaching the 573 00:47:38,440 --> 00:47:43,710 TSA representative that would be a relevant use case. Another one might be 574 00:47:43,710 --> 00:47:48,900 if your just approaching a general security check point and 575 00:47:48,900 --> 00:47:54,010 you have a direct line entering the system, but other types 576 00:47:54,010 --> 00:47:59,050 of use cases is where you may have a camera off to the side or above or something like in a CTV camera system. 577 00:47:59,050 --> 00:48:04,290 I wouldn't use the same camera. So you really need to kind of select 578 00:48:04,290 --> 00:48:09,440 the camera, the position and the interaction based on the specific scenario and 579 00:48:09,440 --> 00:48:14,530 I wish I had a easy answer. It's like yeah go to isle three and pick any camera there. 580 00:48:14,530 --> 00:48:19,540 Where not to that point where these technologies are fully communized. You have to, 581 00:48:19,540 --> 00:48:24,740 you need to do some careful selection here. I think the results of some of this 582 00:48:24,740 --> 00:48:29,840 what would Yevgeniy and John talked about, kind of highlighted the fact that there really needs to be some 583 00:48:29,840 --> 00:48:35,180 careful selection if you’re using these different systems especially with different matchers. 584 00:48:35,180 --> 00:48:40,440 You have anything else to add there. [John] Maybe just one little point. So we didn't really have 585 00:48:40,440 --> 00:48:45,610 a ton of input into how those cameras were configured, right? We were very open and transparent 586 00:48:45,610 --> 00:48:50,710 with our test design, this is exactly how we're going to test your image, all those slides about 587 00:48:50,710 --> 00:48:56,150 the process and the number of people and what exactly this would look like. They have been up 588 00:48:56,150 --> 00:49:01,390 on the website from the representative webinar since November and then 589 00:49:01,390 --> 00:49:06,550 those acquisition system providers actually got to set their configurations to whatever mode they thought would 590 00:49:06,550 --> 00:49:11,630 give them the best performance and I think that's a lot of times, well maybe not always the goal 591 00:49:11,630 --> 00:49:16,950 that you'll find in the field as well. [Yevgeniy] Yeah there are many 592 00:49:16,950 --> 00:49:22,170 reasons for why a camera, an acquisition system might have high failure to acquire rates. 593 00:49:22,170 --> 00:49:27,330 And these reasons might be different depending on the 594 00:49:27,330 --> 00:49:32,390 use case and the technology, so sometimes it's a 595 00:49:32,390 --> 00:49:37,670 challenge in terms of usability. Does the volunteer know 596 00:49:37,670 --> 00:49:42,850 where to look. Sometimes it's an ergonomics challenge as the camera configures to 597 00:49:42,850 --> 00:49:47,980 the volunteer can get into the frame. Sometimes it's a lighting challenge. 598 00:49:47,980 --> 00:49:53,000 You know can the system acquire an image in dim light. 599 00:49:53,000 --> 00:49:58,220 Although we've previously made this statement that failure to acquire is the highest single 600 00:49:58,220 --> 00:50:03,370 source of error underneath that there are many different causes for failures to acquire. 601 00:50:03,370 --> 00:50:08,430 And that makes it challenging to give one prescription for how to solve that. [Arun] Alright 602 00:50:08,430 --> 00:50:13,730 I think one thing to add. We did give the vendors to the opportunity during the first two days to make human 603 00:50:13,730 --> 00:50:18,960 factors changes to try to improve the performance of their acquisition systems. [[Yevgeniy] And frankly 604 00:50:18,960 --> 00:50:24,110 one thing that we did observe and we continue to observe in addition to robustness there's 605 00:50:24,110 --> 00:50:29,160 a general issue with system reliability. Some of the acquisition systems 606 00:50:29,160 --> 00:50:34,410 experience technical failure during testing and that ding there acquisition number. 607 00:50:34,410 --> 00:50:39,580 Because if the systems weren't on they weren't gonna capture any images. 608 00:50:39,580 --> 00:50:44,670 And this was an issues for some systems in this 2019 rally 609 00:50:44,670 --> 00:50:49,680 and also for several systems in the 2018 rally. So that continues to be an issue. 610 00:50:49,680 --> 00:50:54,890 [Arun] Alright, next question. "To your knowledge did any of the algorithms utilize multi image 611 00:50:54,890 --> 00:51:00,010 templates or was this outside the test design of this rally?" 612 00:51:00,010 --> 00:51:05,040 [John] Yeah, so I know no algorithms utilized multi template issues I think is the way to answer this. 613 00:51:05,040 --> 00:51:10,340 So if we go back and look at, I forget exactly which slide, but it talks about the matching system requirements. 614 00:51:10,340 --> 00:51:15,520 There isn't an opportunity to provide multiple templates per an image. 615 00:51:15,520 --> 00:51:15,530 Our API specifies image in template out. 616 00:51:15,530 --> 00:51:20,610 Our API specifies image in template out. And you can 617 00:51:20,610 --> 00:51:25,620 that's still up on github.mdtf.org, if anyone's really interested. It's a little bit different from 618 00:51:25,620 --> 00:51:30,820 some of the NIST API's that sort of allow you to create this gallery I think that's for 619 00:51:30,820 --> 00:51:35,930 optimizing speed purposes and some of the IARP API's, which sort of let you build this 620 00:51:35,930 --> 00:51:40,950 super template concept. We don't have any of that. 621 00:51:40,950 --> 00:51:46,170 [Arun] Next question. "We're the acquisition vendors allowed to recalibrate the system after an initial 622 00:51:46,170 --> 00:51:51,330 period of capture, i.e., after one day?" So the answer is 623 00:51:51,330 --> 00:51:56,400 yes, I think it was after two days, they were allowed to make two, we called them human factors adjustments 624 00:51:56,400 --> 00:52:01,690 over the first two days. [Yevgeniy]Over the first two days they could adjust any sort of 625 00:52:01,690 --> 00:52:06,720 signage or layout of their station after which they have to freeze that however 626 00:52:06,720 --> 00:52:11,820 they always had an opportunity for break fix in case their system experienced a technical issue. 627 00:52:11,820 --> 00:52:16,830 They could go back after the group and participants had gone and see if they could get their system up 628 00:52:16,830 --> 00:52:22,060 and running again. [Arun] Yeah, so actually I should probably provide a little context so during the first two 629 00:52:22,060 --> 00:52:27,220 days what that really meant that the vendors who are participating could monitor the performance of 630 00:52:27,220 --> 00:52:32,260 their systems. We actually gave them iPads, so that they could see videos of people interacting 631 00:52:32,260 --> 00:52:37,540 with their systems as well as see some of the imagery coming out of their systems. 632 00:52:37,540 --> 00:52:42,720 So that they could determine whether or not the system appeared to be working correctly based on their own 633 00:52:42,720 --> 00:52:47,820 expectations. So for example if people weren't doing something and they had instructions out there, they could 634 00:52:47,820 --> 00:52:53,150 go back and say well, maybe I should change my instructions or if they saw people looking in the wrong 635 00:52:53,150 --> 00:52:58,420 direction maybe they could reorient or reposition things So it was really up the companies themselves to see whether 636 00:52:58,420 --> 00:53:03,600 or not what they put in that 6x8 square foot space was eliciting the behaviors 637 00:53:03,600 --> 00:53:08,690 they were looking for from the people, the volunteers interacting with the systems. 638 00:53:08,690 --> 00:53:13,700 [Yevgeniy] In fact several systems did take advantage of that and change signage for example. 639 00:53:13,700 --> 00:53:18,920 Better deal with people wearing glasses. [John] And I think that includes to answer the next question 640 00:53:18,920 --> 00:53:24,060 camera capture settings were a part of that, so if you wanted to change how many pixels between the eyes 641 00:53:24,060 --> 00:53:29,090 before you would capture a picture and you could do that in the first two days. [All agree] 642 00:53:29,090 --> 00:53:34,350 [Arun] I think the major point there is for that, we that was left to the vendors. 643 00:53:34,350 --> 00:53:39,540 We basically provided a fair playing ground. You know a fair set of rules. 644 00:53:39,540 --> 00:53:44,620 And was up to them to figure out how they would innovate across the different 645 00:53:44,620 --> 00:53:49,900 variables they had available to them to adjust and figure out how they would adjust 646 00:53:49,900 --> 00:53:55,130 their own systems to meet the goals that we had set forth. 647 00:53:55,130 --> 00:54:00,130 Alright so the next question. "This is more of an acquisition question and I have not visited MDTF 648 00:54:00,130 --> 00:54:05,180 but were lighting conditions equivalent for acquisition systems regardless of physical location 649 00:54:05,180 --> 00:54:10,450 or were lighting conditions changed during the rally?" [Yevgeniy] Good question. So yes, I could address this. 650 00:54:10,450 --> 00:54:15,640 So their all rally stations were designed to be identical. 651 00:54:15,640 --> 00:54:20,740 Including the lighting conditions at that stations, which were calibrated to be 652 00:54:20,740 --> 00:54:26,060 600 luxe and that ensured fairness 653 00:54:26,060 --> 00:54:31,310 in the design. There were a number of other things that were included in design to ensure fairness. 654 00:54:31,310 --> 00:54:36,470 And habituation and John mentioned the counter balancing as well. 655 00:54:36,470 --> 00:54:41,550 So yes, the conditions at each station were kept as constant as possible. 656 00:54:41,550 --> 00:54:46,860 [Arun] Well I think we are right at the end of our webinar today. 657 00:54:46,860 --> 00:54:52,080 So again we don't want to cut things short. I am sure that there are probably more questions 658 00:54:52,080 --> 00:54:57,100 if nothing else I'm sure you did not have enough time to look at those charts with all of those numbers. 659 00:54:57,100 --> 00:55:02,150 Those great, great numbers. Please so we will put 660 00:55:02,150 --> 00:55:07,380 those up on the website. They will be there by there by the end of the week. Please feel free to take a look at them and 661 00:55:07,380 --> 00:55:12,550 I'm sure you may have more questions. Again we welcome your questions at peoplescreening@hq.dhs.gov. 662 00:55:12,550 --> 00:55:16,960 Thank you and have a great day.