WEBVTT 00:00:06.290 --> 00:00:11.790 [Arun] Hello my name is Arun Vemury, I'm with the DHS Science and Technology Directorate. 00:00:11.790 --> 00:00:16.930 Thanks for joining our webinar today to discussing the results of the biometric technology rally for 2019. 00:00:16.930 --> 00:00:22.150 This webinar in particular is about the system results for matching systems. 00:00:22.150 --> 00:00:27.430 So as you may recall the biometrics rally tested both collection systems or acquisition systems 00:00:27.430 --> 00:00:32.640 as well as matching systems and this webinar is specifically focused on the matching system results. 00:00:32.640 --> 00:00:37.780 Today I am joined by my colleagues. You have Yevgeniy Sirotin, Jerry Tipton and John Howard. 00:00:37.780 --> 00:00:42.810 And let's go ahead and jump into the slides. I think we have a lot of information to share with you. 00:00:42.810 --> 00:00:47.980 And I'd like to make sure we provide as much information as possible. So just very briefly 00:00:47.980 --> 00:00:53.080 the webinar today will be broken up, mostly by providing some background and history. 00:00:53.080 --> 00:00:58.340 What is the S&T Biometric and Identity Technology Center? What are the biometric technology rallies? 00:00:58.340 --> 00:01:03.530 What were we doing with the 2019 biometric technology rallies? How were we measuring 00:01:03.530 --> 00:01:08.640 performance? And then specific results for different biometric modalities collections 00:01:08.640 --> 00:01:13.650 systems and matching systems. They will include finger print, iris, face recognition 00:01:13.650 --> 00:01:18.870 and then we will wrap up with some summary conclusions and some observations. 00:01:18.870 --> 00:01:23.880 So again the S&T Biometrics & Identity Technology 00:01:23.880 --> 00:01:29.110 Center is a group of subject matter expertise within DHS 00:01:29.110 --> 00:01:34.270 Science and Technology. The goal is to help facilitate the learning curve for DHS and it's 00:01:34.270 --> 00:01:39.370 mission partners to become more familiar with and aware of biometric new and emerging 00:01:39.370 --> 00:01:44.510 biometric and identity capabilities to understand how they may apply to various missions 00:01:44.510 --> 00:01:49.560 so that organizations that rely on these technologies and have these 00:01:49.560 --> 00:01:54.610 different mission areas can figure out whether or not these technologies may be useful or applicable for their specific 00:01:54.610 --> 00:01:59.770 needs. The goal truly is to facilitate information sharing and accelerate 00:01:59.770 --> 00:02:04.790 use of technologies if it's appropriate and we follow this approach where we focus on 00:02:04.790 --> 00:02:10.060 building once but then use or share information widely. The goal really is to drive 00:02:10.060 --> 00:02:15.260 efficiency to make this as easy for groups to understand how to effectively 00:02:15.260 --> 00:02:20.380 adopt these technologies quickly and do it well. Right now a lot of our 00:02:20.380 --> 00:02:25.420 focus is on test and evaluation and talking about how to recognize some technologies 00:02:25.420 --> 00:02:30.670 that may be very promising for different needs and today we will focus more on the biometric technology rally. 00:02:30.670 --> 00:02:35.700 So the biometric technology rally is an engagement between S&T 00:02:35.700 --> 00:02:40.760 and industry primarily focused on identifying how well 00:02:40.760 --> 00:02:45.990 specific technology offerings provided or solutions may support important 00:02:45.990 --> 00:02:51.070 DHS use cases. In this particular case a lot of our focus has really been on this need for 00:02:51.070 --> 00:02:56.350 high throughput biometrics. And what we mean by that is the ability to 00:02:56.350 --> 00:03:01.590 recognize people very quickly with minimal staffing in a very small space 00:03:01.590 --> 00:03:06.740 in a small amount of time. The goal is to be able to support things anything like 00:03:06.740 --> 00:03:11.800 either border crossings, transportation security, different 00:03:11.800 --> 00:03:17.070 security arenas or venues to be able to quickly recognize known individuals 00:03:17.070 --> 00:03:22.260 without having to have a significant staffing requirement. 00:03:22.260 --> 00:03:27.380 So we're really trying to focus on trying to recognize hundreds or maybe thousands 00:03:27.380 --> 00:03:32.410 or tens of thousands of people very quickly. With the rally what we are doing is kind 00:03:32.410 --> 00:03:37.640 of defining these use cases. Defining specific measures or metrics that we're 00:03:37.640 --> 00:03:42.800 that we are trying to orient industry to and setting goals for them to 00:03:42.800 --> 00:03:47.860 meet these things in order to help hopefully be applicable to different DHS user’s needs. 00:03:47.860 --> 00:03:53.060 The 2019 rally in particular again is focused on this high throughput use case. It's a little bit 00:03:53.060 --> 00:03:58.090 different than the rally we ran last year. Last year all of the systems had to 00:03:58.090 --> 00:04:03.340 provide at a minimum at least one face image or multiple face images. 00:04:03.340 --> 00:04:08.510 This year was a little bit different, because we also widened 00:04:08.510 --> 00:04:13.630 the scope so that we could also receive applications from companies that were focused on either high throughput 00:04:13.630 --> 00:04:18.950 fingerprint systems or high throughput iris systems also. Of course 00:04:18.950 --> 00:04:23.960 multi-mobile systems whether it was some combination of finger, face or iris were also 00:04:23.960 --> 00:04:28.980 open to participate. We also opened up a new category for participation it wasn't just 00:04:28.980 --> 00:04:34.040 for collection systems or acquisition systems. We also were encouraging companies to submit 00:04:34.040 --> 00:04:39.280 matching algorithms so that we could help evaluate both thee matching or the 00:04:39.280 --> 00:04:44.320 collection side of the capability as well as how well the images that were collected could then be matched with 00:04:44.320 --> 00:04:49.390 different algorithms. One of the things we are looking at doing here is helping to 00:04:49.390 --> 00:04:54.670 stakeholders to understand that there is a breath of opportunities or a breath of capabilities that are out there. 00:04:54.670 --> 00:04:59.670 Both on the collection systems but also on matching algorithms and by looking at 00:04:59.670 --> 00:05:04.780 how well maybe a collection system works across matching algorithms or how well 00:05:04.780 --> 00:05:09.810 a matching algorithm can take in images from many different types of collection systems. 00:05:09.810 --> 00:05:15.030 We get into this concept of having a robust system. Depending on your specific 00:05:15.030 --> 00:05:20.190 use case or your specific need having this robustness may or may not be important. 00:05:20.190 --> 00:05:25.460 But if you can imagine a system where maybe there is one major matching algorithm that's used 00:05:25.460 --> 00:05:30.810 by a lot of different users, but every user has a different camera. You want to make sure that your results 00:05:30.810 --> 00:05:36.060 will work really well even though the images of your system is going to be matched 00:05:36.060 --> 00:05:41.220 to may come from different cameras so you want to make sure that your systems interoperable well and that the 00:05:41.220 --> 00:05:46.320 match will be very robust to those different collections. 00:05:46.320 --> 00:05:51.620 With this webinar we'll actually spend quite a bit of time talking about 00:05:51.620 --> 00:05:56.820 this idea of robustness of algorithms to collection systems. 00:05:56.820 --> 00:06:01.880 As well as potentially collection systems to algorithms. 00:06:01.880 --> 00:06:07.400 [Jerry] Before we get into the details of the results we do want to go back and create a brief description 00:06:07.400 --> 00:06:12.440 of what the test was that rally and for acquisitions systems 00:06:12.440 --> 00:06:17.570 one is that it operate in unmanned mode and it must operate in a 00:06:17.570 --> 00:06:22.610 physical foot print of 6x8 foot. It could collect either 00:06:22.610 --> 00:06:27.760 face, iris, fingerprints or a combination of the three and it needed to provide 00:06:27.760 --> 00:06:33.050 at least one biometric image back per volunteer. It 00:06:33.050 --> 00:06:38.350 needed to do so in time constraints which were defined by DHS. Optionally it could send 00:06:38.350 --> 00:06:43.560 up to three face images back or three pairs of iris images or three sets of fingerprints with 00:06:43.560 --> 00:06:48.600 up to five fingerprints each. For matching systems they had to provide us there 00:06:48.600 --> 00:06:53.830 algorithm, their matching service in a Docker file 00:06:53.830 --> 00:06:59.010 that we could load onto our Maryland test facilities systems. It had to contain a commercially 00:06:59.010 --> 00:07:04.090 available matching algorithm. It had a size limit of 1.5 gigs in size. 00:07:04.090 --> 00:07:09.090 And it had to process templates in less than 1000 milliseconds. 00:07:09.090 --> 00:07:14.320 Along with that it needed to be able to operate without having to contact externally 00:07:14.320 --> 00:07:19.450 re-phone home for a period of one year and it needed to operate on a limited set of 00:07:19.450 --> 00:07:24.460 resources. So we offered four CPU's, two gigabytes of RAM and it could 00:07:24.460 --> 00:07:29.720 not, algorithm could not require a GPU. In addition 00:07:29.720 --> 00:07:34.900 that we had an open API document that we made available to them that these 00:07:34.900 --> 00:07:40.030 vendors could test with in advance of sending us their algorithm, which they did and 00:07:40.030 --> 00:07:45.410 all specs of this are working in an example that was hosted 00:07:45.410 --> 00:07:50.480 and is still is currently hosted at our get hub.mtdf.org website. 00:07:50.480 --> 00:07:55.730 For the rally we announced the call for participation in November of 2018 and we 00:07:55.730 --> 00:08:00.930 accepted the applications through the end of that month 00:08:00.930 --> 00:08:06.160 and we provided an conditional acceptance in early February. We also provided the cloud based 00:08:06.160 --> 00:08:11.320 API for these vendors to start working within that February time frame and once they 00:08:11.320 --> 00:08:16.400 completed a set of requirements, we provided file acceptance notification in March. 00:08:16.400 --> 00:08:21.720 In May we actually held a stakeholders VIP day, which many of you attended. 00:08:21.720 --> 00:08:26.950 And we started the data collection and completed it in early May and since that time 00:08:26.950 --> 00:08:32.100 the team has been performing the analysis of that set-data. 00:08:32.100 --> 00:08:37.180 For matching systems, we received 22 applications and we accepted 15. The evaluators 00:08:37.180 --> 00:08:42.210 came from the Department of State, IARPA, DOJ, NIST and DHS S&T. 00:08:42.210 --> 00:08:47.470 These 15 matching systems included eight phases. 00:08:47.470 --> 00:08:52.630 Algorithms, four iris algorithms and three fingerprint systems. Today 00:08:52.630 --> 00:08:57.700 the results that you hear are based upon matching systems, but there's also an interaction with the 00:08:57.700 --> 00:09:02.850 acquisition system devices. That's very important and you'll be hearing about as well. With that I'll turn it over 00:09:02.850 --> 00:09:07.960 to John Howard to talk about the acquisition process. [John] Thanks Jerry. As Jerry mentioned, we know this is 00:09:07.960 --> 00:09:13.300 the matching system webinar, but we sort of think it is important that everyone understand how we collected 00:09:13.300 --> 00:09:18.460 images we used to evaluate the matching systems so we're just going to go through the 00:09:18.460 --> 00:09:23.550 sort of the acquisition test process really quick. What you see on slide 10 is sort of 00:09:23.550 --> 00:09:28.850 broadly what the acquisition systems were designed to do. The dark gray box on the left 00:09:28.850 --> 00:09:34.090 hand image is that 6x8 foot space that Jerry mentioned. 00:09:34.090 --> 00:09:39.240 Acquisition system providers came to the Maryland Test Facility in early May, they had two days 00:09:39.240 --> 00:09:44.300 they were able to put whatever they wanted in terms of sensors and cameras and instructions 00:09:44.300 --> 00:09:49.300 inside that space as long as it was safe for our participants. And then we brought about 00:09:49.300 --> 00:09:54.470 400 people through and sort of queued them through the process you see on the left hand side and the right 00:09:54.470 --> 00:09:59.540 hand side is actually a picture of what that looked like. Basically groups of about 15 00:09:59.540 --> 00:10:04.900 would queue up in front of the system. They would be scanned one at a time 00:10:04.900 --> 00:10:10.140 into kind of the broader test station. At some point they would approach thee acquisition 00:10:10.140 --> 00:10:15.290 system crossing a beam break. They would do whatever they were instructed to do with the system. The system 00:10:15.290 --> 00:10:20.370 would send us images, either faces, fingers or irises. This common acquisition system API 00:10:20.370 --> 00:10:25.640 the test volunteer would leave the station tripping another beam break and then 00:10:25.640 --> 00:10:30.830 would rate their experience with this satisfaction kiosk. The ground truth 00:10:30.830 --> 00:10:35.970 identification information which comes off the wristband scan, there at step two. 00:10:35.970 --> 00:10:41.000 And the images that were sent via the API during this interaction. What we used 00:10:41.000 --> 00:10:46.230 as our probe images for evaluating the rally matching systems. 00:10:46.230 --> 00:10:51.390 The last thing that I'll sort of mention from a test process standpoint 00:10:51.390 --> 00:10:56.460 is that before all of the test groups entered the bay, they were given general instructions 00:10:56.460 --> 00:11:01.750 that each one of these acquisition stations were going to you know collect biometric images 00:11:01.750 --> 00:11:06.990 for the purposes of performing an identification. They weren't trained specifically on 00:11:06.990 --> 00:11:12.130 how to use any of these systems. So they were sort of naive users approaching each different 00:11:12.130 --> 00:11:17.180 acquisition systems. So they didn't know where to look necessarily. How to interact with 00:11:17.180 --> 00:11:22.440 the system, etc. We also collected prior to all of this 00:11:22.440 --> 00:11:27.640 test process taking place. What you would call same day ground truth images 00:11:27.640 --> 00:11:32.740 we had a manned enrollment station with a trained biometric collector 00:11:32.740 --> 00:11:38.060 take a really good picture of each subjects face. A really good picture of each subjects irises 00:11:38.060 --> 00:11:43.320 and a really good picture of each subjects fingerprints so we knew we sort of what they 00:11:43.320 --> 00:11:48.490 looked like. What each of those biometric samples looked like prior to the interaction with all these different 00:11:48.490 --> 00:11:53.570 acquisitions. A couple of other considerations we took into 00:11:53.570 --> 00:11:58.840 account just to make sure that the test was sort of fair and broadly applicable 00:11:58.840 --> 00:12:03.860 our sample size, which we will talk a little bit more on the next slide was over 400 people that's 00:12:03.860 --> 00:12:08.980 not a gigantic sample size, it's not like NIST level you know millions of images 00:12:08.980 --> 00:12:14.010 testing but it is pretty good. It allows us to sort of get down and report results with a plus 00:12:14.010 --> 00:12:19.290 or minus half of percent precision. Our demographics, our population is very diverse. 00:12:19.290 --> 00:12:24.420 We have all sort of ages, genders, races and people with sort of prior experience coming in 00:12:24.420 --> 00:12:29.430 MDTF working with biometric systems and people that have never been to the MDTF before. 00:12:29.430 --> 00:12:34.710 And then the counter balancing. The order in which every group interacted with 00:12:34.710 --> 00:12:39.870 acquisition system was sort of controlled and randomized so that you didn't 00:12:39.870 --> 00:12:44.900 see, every group didn't see a particular acquisition system 00:12:44.900 --> 00:12:50.500 first and every group didn't see a particular acquisition system last. This is sort of to wash out if they were learning 00:12:50.500 --> 00:12:55.740 how to interact how to with a biometric system on system A and then they got better on system B, well the next time 00:12:55.740 --> 00:13:00.880 they would see system B then system A. So we hope that there was not any habituation effect. So here's what the 00:13:00.880 --> 00:13:05.920 actual test population that participated in the 2019 biometric technology rally looked like. 00:13:05.920 --> 00:13:11.140 There were 430 test volunteers that used every single acquisition system. 00:13:11.140 --> 00:13:16.150 In the upper left there you can see they were almost evenly split 50/50. 00:13:16.150 --> 00:13:21.230 Males and females. Our age distributions are in the upper middle plot. You can see 00:13:21.230 --> 00:13:26.470 a pretty good spread from 18 to 81, maybe a little bit of a skew towards 00:13:26.470 --> 00:13:31.600 the 20 to 30 year old demographic. Our self-reported 00:13:31.600 --> 00:13:36.630 race breakdown is in the upper right chart. You can see we are about 45% 00:13:36.630 --> 00:13:41.900 Black or African-American, 35% White or Caucasian, and then about 20% 00:13:41.900 --> 00:13:47.070 of people that identified as something other than those two. And then we also collected a few other things 00:13:47.070 --> 00:13:52.290 like height and weight as well. We presented this chart back in the 00:13:52.290 --> 00:13:57.470 original webinars that Jerry mentioned in November. This was always sort of always out there about how 00:13:57.470 --> 00:14:02.550 we were going to evaluate these matching systems and it was really on two things. Basically their 00:14:02.550 --> 00:14:07.800 ability to template the biometric image that came from these diverse sort of acquisition systems 00:14:07.800 --> 00:14:12.980 to work with these images and then to also match those images. 00:14:12.980 --> 00:14:18.070 And when I say match those images, I mean correctly identify. So not a one to one 00:14:18.070 --> 00:14:23.380 match, but a one to end, end being our gallery size 00:14:23.380 --> 00:14:28.380 match and that makes since for this unintended high throughput use case, right? If you're trying to build a 00:14:28.380 --> 00:14:33.420 biometric systems that takes less than 10 seconds. You're not gonna have a component where you stop to present an 00:14:33.420 --> 00:14:38.430 idea, you'll present a ticket. So it has to be a one to end, it can't be a one to one evaluation. 00:14:38.430 --> 00:14:43.670 The galleries we used for that end, are images we've collected over the last five years at the MDTF. 00:14:43.670 --> 00:14:48.830 Which is a gallery for faces, there's a gallery for fingers and there's a gallery for irises. 00:14:48.830 --> 00:14:53.920 So different images in each gallery, but about the same size for each one. Each one had about 500 00:14:53.920 --> 00:14:58.930 subjects. So this sort of figure down in here in the lower right is what that looked 00:14:58.930 --> 00:15:04.140 like. If you are face matcher B for example, we gave you this 00:15:04.140 --> 00:15:09.240 gallery, the purple box and then we also sent every probe images that 00:15:09.240 --> 00:15:14.260 was collected from each acquisition system that collected face. So those all 00:15:14.260 --> 00:15:19.490 went into your matcher, we got true identification results out and then we looked at those 00:15:19.490 --> 00:15:24.640 true identification results sort of across acquisition systems and that's this robustness measure 00:15:24.640 --> 00:15:29.690 that we talked about. We wanted to find matching systems that could do well no matter 00:15:29.690 --> 00:15:35.030 where there images were coming from. [Arun] Thanks John. In operational 00:15:35.030 --> 00:15:40.200 biometric deployments, matching systems don't work in isolation and 00:15:40.200 --> 00:15:45.310 in 2019 rally for that reason focused on evaluating operational like 00:15:45.310 --> 00:15:49.330 combinations of matching systems and acquisition systems. And the key metric 00:15:49.330 --> 00:15:54.810 that we use to evaluate this performance is the true 00:15:54.810 --> 00:15:59.990 rate. We define this true identification rate at the 00:15:59.990 --> 00:16:05.100 percentage of transaction that result in a correct identity at a set threshold 00:16:05.100 --> 00:16:10.460 for each matching system. So as John mentioned we're doing identification 00:16:10.460 --> 00:16:15.700 operations against our gallery. It's not a 1 to 1 verification, it's a 00:16:15.700 --> 00:16:20.870 1 to end identification. So this true identification rate value was calculated 00:16:20.870 --> 00:16:25.960 separately for each combination of matching system and acquisition system. 00:16:25.960 --> 00:16:31.240 This is a key point. If you're used to looking at NIST evaluations you can think 00:16:31.240 --> 00:16:36.260 of this approach as you know NIST separately evaluates algorithms on 00:16:36.260 --> 00:16:41.300 classes of images. For example these images versus mug shot images versus 00:16:41.300 --> 00:16:46.310 selfie images. So we separately evaluated each matching 00:16:46.310 --> 00:16:51.570 system using images acquired by each acquisition system 00:16:51.570 --> 00:16:56.720 separately so a key point is each acquisition system 00:16:56.720 --> 00:17:01.780 gathered images on the same 430 subjects, which is not the case for the NIST evaluations. 00:17:01.780 --> 00:17:06.800 In our case the same 430 people had a shot of 00:17:06.800 --> 00:17:11.990 getting an image acquired on each acquisition on each acquisition system in the rally and each matching 00:17:11.990 --> 00:17:17.070 system had a shot of matching system had a shot of those images acquired on those acquisition systems. 00:17:17.070 --> 00:17:22.370 So for each system combination the true identification rate 00:17:22.370 --> 00:17:27.390 was evaluated both excluding failure to acquire to focus on 00:17:27.390 --> 00:17:32.420 matching system performance. This is something we won't focus on in this brief. 00:17:32.420 --> 00:17:37.440 And inclusive of failure to acquire. To focus on sort of overall expected operational performance 00:17:37.440 --> 00:17:42.690 of the system combination and this is really the key value 00:17:42.690 --> 00:17:47.850 because the total system performance is inclusive of all sources of error both in terms of 00:17:47.850 --> 00:17:52.860 acquisition and in terms of matching. So how do we set this threshold? 00:17:52.860 --> 00:17:58.050 We actually fix the threshold for calculating true identification rate at a setting suitable to generate 00:17:58.050 --> 00:18:03.190 a false match rate of one in a million. This was reported 00:18:03.190 --> 00:18:08.490 by each match system provider separately to us. We did not do our own evaluation 00:18:08.490 --> 00:18:13.710 to verify that that's true. We just used what the matching system providers 00:18:13.710 --> 00:18:18.720 told us. The FMR threshold setting used for matching 00:18:18.720 --> 00:18:23.740 system performance was chosen such that the expected number of false positives 00:18:23.740 --> 00:18:28.970 observed during the rally is zero. In fact we had 430 00:18:28.970 --> 00:18:34.130 volunteers, 76 of them were not in the rally gallery and the 00:18:34.130 --> 00:18:39.210 correct behavior for any matching system for a volunteer that's not in the gallery would be 00:18:39.210 --> 00:18:44.510 to say that this volunteer is unidentified. That is that it doesn't have a mate in the gallery. 00:18:44.510 --> 00:18:49.730 So at a threshold at one and a million, the expected true negative identification 00:18:49.730 --> 00:18:54.870 rate should be 100%. So every one of those 76 should 00:18:54.870 --> 00:18:59.920 be correctly classified as being out of the gallery. 00:18:59.920 --> 00:19:05.190 So conversely the number of false matches is zero. So that's the threshold that we picked, the chart on the 00:19:05.190 --> 00:19:10.380 right just kind of shows you a theoretical curve showing 00:19:10.380 --> 00:19:15.500 the performance for true negative identification rate as a function of false 00:19:15.500 --> 00:19:20.810 match rate and where different thresholds that we ask the vendor to provide to us would be 00:19:20.810 --> 00:19:26.060 set and the one we chose again was one in a million. If you have trouble seeing the 00:19:26.060 --> 00:19:31.230 graphs, I recommend you to maximize the slide, so that you have a better shot 00:19:31.230 --> 00:19:36.420 of being able to follow along cause there's going to be a lot of numbers after this. Here I want to pause and take a 00:19:36.420 --> 00:19:41.550 moment to go over how we are going to be visualizing these true identification rate results. 00:19:41.550 --> 00:19:46.560 For the 2019 matching system analyses, we're going to be focusing on 00:19:46.560 --> 00:19:51.800 true identification rates for each combination as I stated before of acquisition systems. 00:19:51.800 --> 00:19:57.160 These are the column headers in this little matrix that you see on the left and matching systems. 00:19:57.160 --> 00:20:02.170 These are going to be the row headers on the matrix on the left. And each circle 00:20:02.170 --> 00:20:07.410 in the visualization is going to refer to one system combination and now we can switch to the blue box 00:20:07.410 --> 00:20:12.550 and sort of take a look at inside. The black number inside the circle 00:20:12.550 --> 00:20:17.610 presents true identification rate performance, which is inclusive of any failures to 00:20:17.610 --> 00:20:22.860 acquire by the acquisition system. That's the more operational measure of performance. 00:20:22.860 --> 00:20:28.060 There is also going to be a red number within each circle that's gonna be more of interest if you're 00:20:28.060 --> 00:20:33.190 ore interested in the performance of the matching systems in isolation. 00:20:33.190 --> 00:20:38.510 So I'm not going to go through those in a lot of detail today. 00:20:38.510 --> 00:20:43.810 But they are going to be there and available for you to look 00:20:43.810 --> 00:20:49.010 at and I should not that every chart that we're gonna present today at during this webinar is going to be 00:20:49.010 --> 00:20:54.140 available on our website MDTF.org for you to look at your leisure and 00:20:54.140 --> 00:20:59.290 all these tutorials for how to read the charts will also be available there. So the other metric 00:20:59.290 --> 00:21:04.380 that John mentioned that we computed for our matching system and for our acquisition system analysis 00:21:04.380 --> 00:21:09.630 is this notion of robustness. And as John mentioned robustness really 00:21:09.630 --> 00:21:14.650 quantifies the variability in matching system performance across the acquisition 00:21:14.650 --> 00:21:19.910 systems. In fact we quantify that very simply as the range of observed 00:21:19.910 --> 00:21:25.100 true identification rate values across systems. And we did that at a false match rate 00:21:25.100 --> 00:21:30.130 of one in a million. So if you have a number of different acquisition systems that 00:21:30.130 --> 00:21:35.150 supply images to your matching system, we're gonna get a different true identification rate value for 00:21:35.150 --> 00:21:40.370 each system combination. And then for that matching system we're gonna take a look at the highest 00:21:40.370 --> 00:21:45.500 performance and the lowest performance and take a look at the difference between the 00:21:45.500 --> 00:21:50.550 spread between those two and that's gonna be our measure of robustness. So in this case, since robustness 00:21:50.550 --> 00:21:55.820 is measuring the variability, we're looking for low variability systems that you know give you 00:21:55.820 --> 00:22:01.010 you the same high performance ideally across different acquisition systems. 00:22:01.010 --> 00:22:06.120 And so we want that robustness value to be low. Our goal 00:22:06.120 --> 00:22:11.430 for the 2019 rally was a 5% variation in performance 00:22:11.430 --> 00:22:16.710 at most across acquisition systems and I should note that you could turn around this 00:22:16.710 --> 00:22:21.910 robustness metric and take a look at how robust are 00:22:21.910 --> 00:22:27.000 matching systems across different algorithms that work with their images and that's another value of robustness 00:22:27.000 --> 00:22:32.200 that we can now report that's tied back to the acquisition system. So this kind of illustrates that a little bit better 00:22:32.200 --> 00:22:37.570 if we take a look on the left there are two plots there 00:22:37.570 --> 00:22:42.630 of a system that's not meeting the robustness threshold and on the right is a different 00:22:42.630 --> 00:22:47.680 plots that are both for a system meets the robustness threshold at least for discounting 00:22:47.680 --> 00:22:52.940 failures to acquire. So if you look on the left first, for this 00:22:52.940 --> 00:22:58.140 system on the plot that has a curve in black. That plot 00:22:58.140 --> 00:23:03.220 you see that the performance, this is true identification rate on the Y-axis and you see 00:23:03.220 --> 00:23:08.510 that it varies quite a bit from acquisition system to acquisition system on the X-axis. 00:23:08.510 --> 00:23:13.530 And robustness basically quantifies how wide a variation 00:23:13.530 --> 00:23:18.670 that is. You see that bracket, in this case it's quite high 00:23:18.670 --> 00:23:23.710 61%. You can take a look at that counting failures to 00:23:23.710 --> 00:23:28.960 acquire or just looking at the algorithm performance by itself. Discounting failures to acquire 00:23:28.960 --> 00:23:34.120 no matter which way you look that bracket is very wide and that performance varies quite a bit 00:23:34.120 --> 00:23:39.220 in cross acquisition system. Whereas for the system on the right you could see that first of all performance varies 00:23:39.220 --> 00:23:44.530 less even counting failure to acquire now that black is only 00:23:44.530 --> 00:23:49.780 32% high, however discounting failures to acquire you can see that the performances 00:23:49.780 --> 00:23:54.950 uniformly are very good. This system was able to reliably match images no matter which 00:23:54.950 --> 00:24:00.030 acquisition system those images were acquired on and you can see that variation is very 00:24:00.030 --> 00:24:05.310 small just 1.2%. So this system met the robustness 00:24:05.310 --> 00:24:10.310 threshold in fact it met the goal discounting failures to 00:24:10.310 --> 00:24:15.580 acquire. What we're gonna show you is for each row again, each row is associated 00:24:15.580 --> 00:24:20.780 with a particular matching system. We're going to show you some robustness values 00:24:20.780 --> 00:24:25.810 on the right margin for that matching system and again we're gonna be using 00:24:25.810 --> 00:24:31.070 this sort of bracket notation to give you a sense of variation there was 00:24:31.070 --> 00:24:36.250 across acquisition systems in this case. And if we look at the chart on the right, that's actually the same 00:24:36.250 --> 00:24:41.330 chart as was on the previous slide for the system with good robustness. We'll show you 00:24:41.330 --> 00:24:46.640 two numbers. One for robustness without counting failures to acquire 00:24:46.640 --> 00:24:51.900 and that's those red numbers and one for the robustness 00:24:51.900 --> 00:24:57.050 counting failures to acquire the black numbers. And there are three numbers needed to describe the robustness. 00:24:57.050 --> 00:25:02.060 There's the maximum level of performance, in this case in both cases it was 100%. Then there's 00:25:02.060 --> 00:25:07.340 the minimum level of performance at the lowest tier observed across acquisition systems 00:25:07.340 --> 00:25:12.720 in this case it was 98.8% for the red numbers and 00:25:12.720 --> 00:25:18.110 67.7% for the black numbers. And then that result in range which is just the difference between those two 00:25:18.110 --> 00:25:23.170 numbers, 1.2% for the red and 32.3% for the black. 00:25:23.170 --> 00:25:28.470 And those numbers will be present on that gray margin to the right. We're also gonna take a look at 00:25:28.470 --> 00:25:33.670 robustness for acquisition systems. Those numbers are gonna be similarly computed but only 00:25:33.670 --> 00:25:38.790 looking at that one acquisition system now across matching systems and they’re going to be presented 00:25:38.790 --> 00:25:43.820 below each column. And to remind everyone robust systems have 00:25:43.820 --> 00:25:49.100 smaller variations between pairs of systems and low numbers indicate more 00:25:49.100 --> 00:25:54.260 robust systems. So you want those numbers to be low, closer to low. Closer to 1.2 than 32.3. 00:25:54.260 --> 00:25:59.280 [Arun] Yevgeniy can you kinda reiterate the difference between the red numbers and the black numbers again? 00:25:59.280 --> 00:26:04.590 [Yevgeniy] Sure. Just to again reorient everyone again the red numbers specifically 00:26:04.590 --> 00:26:09.610 focus on matching systems, algorithm performance and they discount any failures 00:26:09.610 --> 00:26:14.730 of a acquisition system to acquire the images, while they don't describe the 00:26:14.730 --> 00:26:19.750 operational total performance of that system combination their relevant for specifically 00:26:19.750 --> 00:26:25.000 evaluating the matcher, whereas the black numbers focus 00:26:25.000 --> 00:26:30.160 in on total system performance inclusive of any failures to acquire or match. 00:26:30.160 --> 00:26:35.250 These are probably what you would observe if you were for say to deploy that system combination in the field. 00:26:35.250 --> 00:26:40.410 Here's our first actual results slide. Let's take a look 00:26:40.410 --> 00:26:45.500 at the performance of finger print system included in the 2019 rally. So if you 00:26:45.500 --> 00:26:50.790 have previously looked at acquisition system results at MDTF.org either for the 2018 rally 00:26:50.790 --> 00:26:56.000 or 2019 rally. You should recall that we officiate acquisition system names. In this case 00:26:56.000 --> 00:27:01.230 We used Rocky Mountain Peaks and we officiated matching system names 00:27:01.230 --> 00:27:06.360 also to protect privacy of these companies. Here we used names of U.S. rivers. 00:27:06.360 --> 00:27:12.970 In the 2019 rally, we had three fingerprint acquisition systems, these are systems 00:27:12.970 --> 00:27:18.070 Gabe, Foraker and Baker. You can see those as the column headers and then we also have 00:27:18.070 --> 00:27:23.350 three fingerprint matching systems. Iowa, Kansas and Ohio 00:27:23.350 --> 00:27:28.580 that are the row labels and consequently we've got nine circles altogether 00:27:28.580 --> 00:27:33.750 nine system combinations. Any shape that has a couple numbers. Again the black number is 00:27:33.750 --> 00:27:38.820 the main one we are going to focus on which is the total system performance for that combination. 00:27:38.820 --> 00:27:44.110 The circle in green is the best observed performance. 00:27:44.110 --> 00:27:49.520 The combination with the best observed performance. And you can see that overall no fingerprint 00:27:49.520 --> 00:27:54.670 system combination met the 99% true identification goal when you count the 00:27:54.670 --> 00:27:59.740 metrics to acquire. And that green number there is the best total system performance that we 00:27:59.740 --> 00:28:05.000 observed which was only 73.5% which is well short 00:28:05.000 --> 00:28:10.200 of that 99% goal, primarily due to some issues with acquisition. 00:28:10.200 --> 00:28:15.230 However for the counting failures to acquire the 00:28:15.230 --> 00:28:20.260 matching system with the lowest performance variation but is the best robustness was Kansas. 00:28:20.260 --> 00:28:25.470 It had a 5% variation in performance across acquisition 00:28:25.470 --> 00:28:30.620 systems, however even though that variation was reasonably small the 00:28:30.620 --> 00:28:35.680 performance was typically, you know between 60 and a half, down to 55 and a half percent correct. 00:28:35.680 --> 00:28:40.920 So not a very good overall net performance. 00:28:40.920 --> 00:28:46.100 Again looking at however acquisition systems and looking 00:28:46.100 --> 00:28:51.210 at the robustness of those, system Foraker was somewhat robust 00:28:51.210 --> 00:28:56.230 and then counted failures to acquire Foraker achieved 2.3%. 00:28:56.230 --> 00:29:01.430 A variation, which is very low variation 00:29:01.430 --> 00:29:06.570 have again many failures to acquire led to somewhat low performance estimates. 00:29:06.570 --> 00:29:11.580 The story for some of the matching systems by themselves in red is a little bit 00:29:11.580 --> 00:29:16.820 better, but I'm not going to go through those numbers but for some of these systems the performance was quite 00:29:16.820 --> 00:29:21.990 high meeting rally thresholds. However 00:29:21.990 --> 00:29:27.080 here we are focused on total system performance. So I'm going to let you kind of 00:29:27.080 --> 00:29:32.380 take that offline once the results are available on the website, you'll be able to go back 00:29:32.380 --> 00:29:37.630 and look at those. We can also, if you're interested ask a question at the end of this presentation. 00:29:37.630 --> 00:29:42.770 We can go back and take a look at some of those numbers if you're interested. So next 00:29:42.770 --> 00:29:47.830 we'll be iris systems. We have two iris acquisition systems 00:29:47.830 --> 00:29:53.080 participate in the 2019 rally. These were systems Wood and Sanford and we 00:29:53.080 --> 00:29:58.250 actually had four iris matching systems in the rally, Red, Black, White and Green. 00:29:58.250 --> 00:30:03.360 These are all U.S. rivers. The best system combination 00:30:03.360 --> 00:30:08.390 for total system performance was the combination of system Red and system Wood 00:30:08.390 --> 00:30:13.640 Which achieved 82.8% true identification rate. 00:30:13.640 --> 00:30:18.670 Again no iris system combination met the 99% 00:30:18.670 --> 00:30:23.690 true identification rate goal for total system performance. 00:30:23.690 --> 00:30:28.900 Overall accounting for those to acquire the lowest performance variation for matching system 00:30:28.900 --> 00:30:34.070 was system Green at 6% and the acquisition system with 00:30:34.070 --> 00:30:39.140 the best robustness was system Sanford, which actually achieved very good 00:30:39.140 --> 00:30:44.160 under 1% variation though relatively 00:30:44.160 --> 00:30:49.350 poor net level of performance due to a lot of failures to acquire in that system. 00:30:49.350 --> 00:30:54.460 So that takeaway here is the iris and fingerprint 00:30:54.460 --> 00:30:59.480 system combinations that we tested, didn't quite live up to the goals of the 00:30:59.480 --> 00:31:04.630 rally. Now that we are used to reading these charts, here is a slide with 00:31:04.630 --> 00:31:09.700 a fair number of more bubbles. Again this information will 00:31:09.700 --> 00:31:14.960 be presented on our websites. You'll be able to look at it at 00:31:14.960 --> 00:31:20.140 your leisure and my purpose here is to sort of explain how the data is laid out 00:31:20.140 --> 00:31:25.260 and sort out major conclusions and takeaways. In the 2019 00:31:25.260 --> 00:31:30.320 rally there were 10 face acquisition systems and you can 00:31:30.320 --> 00:31:35.570 see those as columns across the top and there were face matching systems and you can see those 00:31:35.570 --> 00:31:40.990 as rows across the left side. 00:31:40.990 --> 00:31:46.150 Overall the good news story for face, four out of eight 00:31:46.150 --> 00:31:51.210 matching systems were able to meet the 99% true identification goal in combination 00:31:51.210 --> 00:31:56.760 with at least one acquisition system in fact for four of these matching systems 00:31:56.760 --> 00:32:02.020 the performance was flawless with one particular acquisition systems. 00:32:02.020 --> 00:32:07.160 System Teton. Five of eight matching systems met the 95% 00:32:07.160 --> 00:32:12.220 true identification rate threshold, in combination with at least one acquisition system 00:32:12.220 --> 00:32:17.470 and these are, just to orient you, these are going to be the filled circles on those systems that meet 00:32:17.470 --> 00:32:22.680 system combinations that meet the goal for the operational 00:32:22.680 --> 00:32:27.710 true identification rate and the half-filled circles are those that meet the threshold, 00:32:27.710 --> 00:32:33.010 but don't quite meet the goal. So you can see that five of those matching systems 00:32:33.010 --> 00:32:38.240 met the threshold. That also includes system Salmon in addition to 00:32:38.240 --> 00:32:43.400 Pecos, Wabash, Trinity and Mobile. 00:32:43.400 --> 00:32:48.500 12 of 80 face system combinations met the goal of the rally, 00:32:48.500 --> 00:32:53.770 counting failure to acquire. 23 of 80 face system combinations met the 00:32:53.770 --> 00:32:58.970 threshold counting failure to acquire and overall looking at 00:32:58.970 --> 00:33:03.970 the robustness results, we see that matching systems were generally not very robust 00:33:03.970 --> 00:33:09.010 across acquisition systems, when you consider failure to acquire although if you look at just the 00:33:09.010 --> 00:33:14.270 images acquired some of the matching systems performed really well and were able to 00:33:14.270 --> 00:33:19.430 match most, almost all of those images. So that's 00:33:19.430 --> 00:33:24.520 something to take a look at as well. Acquisition systems in general were 00:33:24.520 --> 00:33:29.820 not robust across matching systems, either counting or discounting failure to acquire. 00:33:29.820 --> 00:33:35.050 There's a lot of variation of quality of the matching systems and how well they worked across different 00:33:35.050 --> 00:33:40.200 [Stutters] I'm sorry, across individual 00:33:40.200 --> 00:33:45.480 for individual acquisition systems. What are some of the overall conclusions that we can takeaway 00:33:45.480 --> 00:33:50.720 from our analysis? So first of all some matching 00:33:50.720 --> 00:33:55.870 system and acquisition system, worked perfectly in our test. 00:33:55.870 --> 00:34:00.930 That is to say that they matched all of the 430 diverse 00:34:00.930 --> 00:34:05.960 subjects, diverse test volunteers that used each system. 00:34:05.960 --> 00:34:11.140 That's actually a pretty impressive feat considering the number of places where errors could have creeped in. 00:34:11.140 --> 00:34:16.250 They could failed to acquire, failed to extract and they could have failed to match. So flawless performance 00:34:16.250 --> 00:34:21.540 very good. However if we look across all the systems 00:34:21.540 --> 00:34:26.760 all 80, actually 97 system combinations across modalities 00:34:26.760 --> 00:34:31.930 that we tested only 12% of those. Actually 12 system 00:34:31.930 --> 00:34:36.990 combinations were actually face matching and acquisition system combinations. 00:34:36.990 --> 00:34:42.250 And that should give some concern, because all systems included in 00:34:42.250 --> 00:34:47.450 this evaluated passed subject matter expert review for inclusion into the rally and that 00:34:47.450 --> 00:34:52.480 met all the rally participation criteria. So if you're putting together an operational 00:34:52.480 --> 00:34:57.490 system deployment and you're looking for acquisition systems and algorithms 00:34:57.490 --> 00:35:02.720 matching systems to kind of combine without some extra testing 00:35:02.720 --> 00:35:07.880 your chances of success might be closer to 12% which is 00:35:07.880 --> 00:35:12.950 not very encouraging. The rally for the first time tested a new notion system robustness. 00:35:12.950 --> 00:35:18.220 You know how well does a particular system work across 00:35:18.220 --> 00:35:23.430 systems that it would be operationally paired with. So a matching system might be paired with 00:35:23.430 --> 00:35:28.550 different images from different acquisition systems or an acquisition system 00:35:28.550 --> 00:35:33.570 those images might be matched by different matching 00:35:33.570 --> 00:35:38.600 algorithms. Some face and iris algorithms did maintain good robustness across acquisition systems 00:35:38.600 --> 00:35:43.620 but others did not. No face acquisition system was robust 00:35:43.620 --> 00:35:48.650 across all matching systems including in this evaluation. 00:35:48.650 --> 00:35:53.880 So carefully pick matching systems may perform well 00:35:53.880 --> 00:35:59.030 so long as the acquisition system acquires images. However some matching systems 00:35:59.030 --> 00:36:04.110 that did not perform well even with a good acquisition 00:36:04.110 --> 00:36:09.390 system. So an overall conclusion from the analysis is that system combinations must be carefully considered 00:36:09.390 --> 00:36:14.600 to achieve optimal performance and operations. With that 00:36:14.600 --> 00:36:19.760 I'm going to turn things back over to Arun to close things out. [Arun] So first of all 00:36:19.760 --> 00:36:24.820 thank you all for joining us for the webinar today. I know that we threw a lot of information 00:36:24.820 --> 00:36:29.930 at you. There is also a lot of numbers on some of these slides. We will have 00:36:29.930 --> 00:36:34.950 these charts available on MDTF.org. So you can go back 00:36:34.950 --> 00:36:40.170 and take a look at them at your own convenience. You can also reach out with additional questions 00:36:40.170 --> 00:36:45.330 obviously right now during this webinar, but if you have questions later on please feel free to 00:36:45.330 --> 00:36:50.390 email us peoplescreening@HQ.DHS,gov. So again 00:36:50.390 --> 00:36:55.650 a lot of information here. A lot of discussion about this robustness measure. 00:36:55.650 --> 00:37:00.830 We were talking about it earlier. We realized that one chart on face, facial 00:37:00.830 --> 00:37:05.940 recognition system, both collection and matching systems. There's about 600 numbers there. 00:37:05.940 --> 00:37:10.960 So we know that could be a lot to take in. So we do 00:37:10.960 --> 00:37:16.180 value your feedback and your questions so please let us know and we'll do our best to help answer those 00:37:16.180 --> 00:37:21.320 questions. So the first 00:37:21.320 --> 00:37:26.370 question is "What would prevent a vendor from providing a generous threshold at a lower 00:37:26.370 --> 00:37:31.620 threshold to improve their accuracy? i.e., providing thresholds 00:37:31.620 --> 00:37:36.810 for FMI at one and ten thousand? 00:37:36.810 --> 00:37:41.910 [John] I guess I can take this a little bit. 00:37:41.910 --> 00:37:46.920 So as Yevgeniy mentioned we have these 430 people that went through every acquisition system. 00:37:46.920 --> 00:37:52.150 Like I mentioned we had a gallery of 500 people that we were matching back too 00:37:52.150 --> 00:37:57.310 and 76 of those people actually didn't have any images in the gallery so the correct response from the 00:37:57.310 --> 00:38:02.380 system is unidentified. If they have lowered their threshold 00:38:02.380 --> 00:38:07.650 generously to match more people. Match more of the 00:38:07.650 --> 00:38:12.850 370 some that did have images it would have started falsely matching the ones that didn't. 00:38:12.850 --> 00:38:17.990 So that was actually one of the points of that particular 00:38:17.990 --> 00:38:23.020 chart that had the top two and tried to curve instead of what point would we expect 00:38:23.020 --> 00:38:28.250 zero of the 76 people that are out of gallery subjects to incorrectly match someone 00:38:28.250 --> 00:38:33.400 the answer to that question is a one in a million threshold. 00:38:33.400 --> 00:38:38.470 [Yevgeniy] Yeah, just to add to that a little bit, we did specify 00:38:38.470 --> 00:38:43.480 which threshold we used in these charts and that is the one in a million. We did 00:38:43.480 --> 00:38:48.490 run some of these numbers with the one in 10,000, one and 1,000 00:38:48.490 --> 00:38:53.570 and one in 100,000 thresholds and as expected at those lower thresholds we started seeing 00:38:53.570 --> 00:38:58.840 false positives for some of the systems. One of the things we're analyzing 00:38:58.840 --> 00:39:04.040 we don't have for you today is how 00:39:04.040 --> 00:39:09.170 vendors are capable of picking their thresholds and what are the thresholds they 00:39:09.170 --> 00:39:14.220 report are the right ones to achieve a certain level of performance. In this case 00:39:14.220 --> 00:39:19.450 we simply took it at face value that the threshold that should work at one in 100,000 00:39:19.450 --> 00:39:24.630 or one in a million is the correct one that they reported to us. 00:39:24.630 --> 00:39:29.720 [Arun] Next question, " The finger reliability had more black than red, but 00:39:29.720 --> 00:39:34.730 iris had the opposite trend. Can you comment?" [Jerry] This was in relation to the 00:39:34.730 --> 00:39:39.930 request systems? [Yevgeniy] Yes, sure I think that some of the issues were 00:39:39.930 --> 00:39:45.050 the variation in fingerprint performance across 00:39:45.050 --> 00:39:50.080 systems and I'm not sure if the comment is about robustness of algorithms 00:39:50.080 --> 00:39:55.320 or of acquisition systems but 00:39:55.320 --> 00:40:00.500 two of the, I can say that two if the fingerprint systems included in the rally 00:40:00.500 --> 00:40:05.590 were non-contact and one was contact based, so some of 00:40:05.590 --> 00:40:10.860 the variability in performance, especially with failure to acquire and failure to match was 00:40:10.860 --> 00:40:16.070 complex interaction between what higher acquisition rates 00:40:16.070 --> 00:40:21.190 for some systems and some much lower match rates for other systems. 00:40:21.190 --> 00:40:26.240 So it would take a little bit of time to explain how those interactions 00:40:26.240 --> 00:40:31.470 cinch together in total system performance. 00:40:31.470 --> 00:40:36.660 "And in only matching system performance. [Arun] Aside from the acquisition 00:40:36.660 --> 00:40:45.750 what are the other essential inputs to building a high performing face matching systems algorithm?" 00:40:45.750 --> 00:40:51.080 [Yevgeniy] So I think that. [Arun] It might just be face matching system, I think that's what the 00:40:51.080 --> 00:40:56.100 intent there is. [Yevgeniy] Yeah, so I think some of the 00:40:56.100 --> 00:41:01.320 issues that we see especially in regards to robustness 00:41:01.320 --> 00:41:06.490 there are some face matching systems in our 00:41:06.490 --> 00:41:11.580 evaluation that performed very well no matter what acquisition system the image 00:41:11.580 --> 00:41:16.870 was acquired on, however it really started to separate some of the system 00:41:16.870 --> 00:41:22.090 some of the matching systems from each other was the ability to process 00:41:22.090 --> 00:41:27.120 images from acquisition systems that could reliably acquire images, but none of those images were not 00:41:27.120 --> 00:41:32.160 of high enough quality to match using that matching system so I think 00:41:32.160 --> 00:41:37.380 that a highly performing face matching system should 00:41:37.380 --> 00:41:42.530 be robust across the different potential acquisition systems and the differences in image quality 00:41:42.530 --> 00:41:47.610 that those provide. [John] Yeah, I think maybe just 00:41:47.610 --> 00:41:52.910 [unintelligible] to Yevgeniy's point there is that they are acquisition systems 00:41:52.910 --> 00:41:58.150 that spent a lot of time during the rally, sort of trying to figure out how to position their camera. 00:41:58.150 --> 00:42:03.280 So one option is directly in front of the participants, where they literally would run into 00:42:03.280 --> 00:42:08.340 it if they didn't stop. And that would obviously acquire straight on nicely 00:42:08.340 --> 00:42:13.600 framed image but slow them down, because they need to go around it. And the other option is move that 00:42:13.600 --> 00:42:18.810 acquisition off to the side a little bit sort of getting an angled photo, which may or may not work 00:42:18.810 --> 00:42:23.110 with algorithms, sort of speed up that total process 00:42:23.110 --> 00:42:27.240 of moving through this. I think there's a broad answer to your question that 00:42:27.240 --> 00:42:31.490 factors, considerations that you have to think about. And then there is sort of these algorithms you have to think 00:42:31.490 --> 00:42:35.570 about. Acquisition system interaction effects to consider. 00:42:35.570 --> 00:42:39.750 [Arun] Yeah, I would say yeah 00:42:39.750 --> 00:42:44.090 those are really important. I think if you are talking about a specific operation 00:42:44.090 --> 00:42:48.110 you'd really want to focus on what is the concept of operations or the use. 00:42:48.110 --> 00:42:52.350 How are you supposed to interact with the system. I would start there first. You would want 00:42:52.350 --> 00:42:56.420 to pick the right types of cameras that work in that particular type of 00:42:56.420 --> 00:43:00.580 environment and then have matchers that work well across lots of 00:43:00.580 --> 00:43:04.600 different cameras, because you may have some variability 00:43:04.600 --> 00:43:08.700 with camera systems. You might go out and buy some cameras and then later on some of 00:43:08.700 --> 00:43:14.110 them break and you have to get new cameras. There will diversity in your cameras. 00:43:14.110 --> 00:43:18.250 so it's kind of good to kind of think of it in those 00:43:18.250 --> 00:43:22.510 kind of pieces perhaps. [Jerry] Just to add one more thing there's lots of variability 00:43:22.510 --> 00:43:26.610 in lighting and there's lots of variability in how these cameras behave with differences in lighting. 00:43:26.610 --> 00:43:30.820 And that's another consideration. 00:43:30.820 --> 00:43:35.150 [Arun] Next question. "Given that selecting a 00:43:35.150 --> 00:43:39.300 waiting system combination is low, when will business requirements be published 00:43:39.300 --> 00:43:43.560 for airport operators?" I think this is a little bit of so this is a little bit of a 00:43:43.560 --> 00:43:47.650 targeted question, but what I will say is, right now with 00:43:47.650 --> 00:43:51.850 the results as we have published them we have applied aliases to the different company names. 00:43:51.850 --> 00:43:56.180 So aliases are given 00:43:56.180 --> 00:44:00.320 so Rocky Mountain Peaks for collection systems. Rivers for 00:44:00.320 --> 00:44:04.560 matching algorithms. That being said if there are combinations of 00:44:04.560 --> 00:44:08.650 collection systems or matching algorithms that you are very interested in 00:44:08.650 --> 00:44:12.830 you can email peoplescreening@dhs.gov and we will 00:44:12.830 --> 00:44:17.120 forward your email to those companies so that they can follow-up with you directly. 00:44:17.120 --> 00:44:21.260 So that you have, you can understand which companies 00:44:21.260 --> 00:44:25.530 are performing. Actually I think that goes to the last question that was asked as well. 00:44:25.530 --> 00:44:29.610 "Will you be able to share the actual names of vendors you 00:44:29.610 --> 00:44:33.780 held to an NDA?" So we are not actually providing the vendor names, but what we 00:44:33.780 --> 00:44:38.120 will do is vendors who are very happy with their results 00:44:38.120 --> 00:44:42.260 may self-announce. This is what happened last year, where some companies did press releases. 00:44:42.260 --> 00:44:46.510 And they said we were company x, we were company y. 00:44:46.510 --> 00:44:50.880 But the other thing we offered was if there is specific interest in the company 00:44:50.880 --> 00:44:55.090 again tell us, email peoplescreening@dhs.gov 00:44:55.090 --> 00:44:59.130 tell us which companies you want to reach out to. 00:44:59.130 --> 00:45:03.280 Please don't say all of them. We're not gonna take those 00:45:03.280 --> 00:45:07.520 seriously. We are trying to abide the terms of our 00:45:07.520 --> 00:45:11.610 [unintelligible] with these companies. So if you give us a list of let's say two or three we will 00:45:11.610 --> 00:45:15.790 forward your email to those two or three companies so that they can follow back up with you. 00:45:15.790 --> 00:45:20.090 And hopefully answer your 00:45:20.090 --> 00:45:24.130 questions or share more information about their products. 00:45:24.130 --> 00:45:28.360 Next question were any of the matchers and systems from the same vendor such 00:45:28.360 --> 00:45:32.420 that they might be better optimized?" Good question. [John] Yeah so I 00:45:32.420 --> 00:45:36.580 really took a look at this recently. There were a number of matching systems 00:45:36.580 --> 00:45:41.080 and acquisition systems that were from the same company. 00:45:41.080 --> 00:45:45.220 If Arun tells me I can say it, I'll tell you exactly how many. And there were 00:45:45.220 --> 00:45:49.490 majority of those that actually had acquisition systems that did better with someone else's 00:45:49.490 --> 00:45:53.560 matcher. That was not uncommon. 00:45:53.560 --> 00:45:57.760 So I think there is definitely the room for this sort of 00:45:57.760 --> 00:46:02.060 match making process as Arun likes to call it, where 00:46:02.060 --> 00:46:06.200 you know there might be opportunity for partnerships and sort of collaborations out there. 00:46:06.200 --> 00:46:10.450 [Yevgeniy] Yeah and I would add to that so we that definitely officiated 00:46:10.450 --> 00:46:13.740 matching system names than differently than acquisition system names 00:46:13.740 --> 00:46:16.760 so you won't be able to just look up 00:46:16.760 --> 00:46:19.780 the same alias in the row in the column 00:46:19.780 --> 00:46:23.070 on any of the charts that we put out so if you 00:46:23.070 --> 00:46:26.100 I think if you wanna know those, some of those combinations 00:46:26.100 --> 00:46:29.110 you might have to reach out. [Arun] Okay, umm 00:46:29.110 --> 00:46:32.150 next question. So "In my knowledge face image 00:46:32.150 --> 00:46:35.200 acquisition rate is changed depending on the camera 00:46:35.200 --> 00:46:38.240 configuration. Does S&T have any camera specifications 00:46:38.240 --> 00:46:41.290 standards for the estimation of FTA?" 00:46:41.290 --> 00:46:44.360 umm, I would 00:46:44.360 --> 00:46:47.430 I think the answer to that question is no. I think 00:46:47.430 --> 00:46:50.510 the point here is the face 00:46:50.510 --> 00:46:53.580 acquisition system configuration really needs to be 00:46:53.580 --> 00:46:56.580 tailored to the CONOPS or the use case. 00:46:56.580 --> 00:46:59.670 Depending on the use case, you have a specific 00:46:59.670 --> 00:47:02.680 intended behavior of the 00:47:02.680 --> 00:47:06.730 user as well as what the camera is expected to do. 00:47:06.730 --> 00:47:09.950 this will vary from system to system depending on 00:47:09.950 --> 00:47:12.960 the specific use case. And depending on that use 00:47:12.960 --> 00:47:16.180 case you can kind of tailor these systems 00:47:16.180 --> 00:47:19.430 pick the right type of camera system. So to be honest with 00:47:19.430 --> 00:47:22.700 you the rally is really intended to if we were 00:47:22.700 --> 00:47:22.990 describing it in terms of specific DHS type use 00:47:22.990 --> 00:47:28.230 cases it would work primarily 00:47:28.230 --> 00:47:33.380 I would say it is really targeted for like an exit type of 00:47:33.380 --> 00:47:38.440 scenario or maybe like a TSA travel document as you are approaching the 00:47:38.440 --> 00:47:43.710 TSA representative that would be a relevant use case. Another one might be 00:47:43.710 --> 00:47:48.900 if your just approaching a general security check point and 00:47:48.900 --> 00:47:54.010 you have a direct line entering the system, but other types 00:47:54.010 --> 00:47:59.050 of use cases is where you may have a camera off to the side or above or something like in a CTV camera system. 00:47:59.050 --> 00:48:04.290 I wouldn't use the same camera. So you really need to kind of select 00:48:04.290 --> 00:48:09.440 the camera, the position and the interaction based on the specific scenario and 00:48:09.440 --> 00:48:14.530 I wish I had a easy answer. It's like yeah go to isle three and pick any camera there. 00:48:14.530 --> 00:48:19.540 Where not to that point where these technologies are fully communized. You have to, 00:48:19.540 --> 00:48:24.740 you need to do some careful selection here. I think the results of some of this 00:48:24.740 --> 00:48:29.840 what would Yevgeniy and John talked about, kind of highlighted the fact that there really needs to be some 00:48:29.840 --> 00:48:35.180 careful selection if you’re using these different systems especially with different matchers. 00:48:35.180 --> 00:48:40.440 You have anything else to add there. [John] Maybe just one little point. So we didn't really have 00:48:40.440 --> 00:48:45.610 a ton of input into how those cameras were configured, right? We were very open and transparent 00:48:45.610 --> 00:48:50.710 with our test design, this is exactly how we're going to test your image, all those slides about 00:48:50.710 --> 00:48:56.150 the process and the number of people and what exactly this would look like. They have been up 00:48:56.150 --> 00:49:01.390 on the website from the representative webinar since November and then 00:49:01.390 --> 00:49:06.550 those acquisition system providers actually got to set their configurations to whatever mode they thought would 00:49:06.550 --> 00:49:11.630 give them the best performance and I think that's a lot of times, well maybe not always the goal 00:49:11.630 --> 00:49:16.950 that you'll find in the field as well. [Yevgeniy] Yeah there are many 00:49:16.950 --> 00:49:22.170 reasons for why a camera, an acquisition system might have high failure to acquire rates. 00:49:22.170 --> 00:49:27.330 And these reasons might be different depending on the 00:49:27.330 --> 00:49:32.390 use case and the technology, so sometimes it's a 00:49:32.390 --> 00:49:37.670 challenge in terms of usability. Does the volunteer know 00:49:37.670 --> 00:49:42.850 where to look. Sometimes it's an ergonomics challenge as the camera configures to 00:49:42.850 --> 00:49:47.980 the volunteer can get into the frame. Sometimes it's a lighting challenge. 00:49:47.980 --> 00:49:53.000 You know can the system acquire an image in dim light. 00:49:53.000 --> 00:49:58.220 Although we've previously made this statement that failure to acquire is the highest single 00:49:58.220 --> 00:50:03.370 source of error underneath that there are many different causes for failures to acquire. 00:50:03.370 --> 00:50:08.430 And that makes it challenging to give one prescription for how to solve that. [Arun] Alright 00:50:08.430 --> 00:50:13.730 I think one thing to add. We did give the vendors to the opportunity during the first two days to make human 00:50:13.730 --> 00:50:18.960 factors changes to try to improve the performance of their acquisition systems. [[Yevgeniy] And frankly 00:50:18.960 --> 00:50:24.110 one thing that we did observe and we continue to observe in addition to robustness there's 00:50:24.110 --> 00:50:29.160 a general issue with system reliability. Some of the acquisition systems 00:50:29.160 --> 00:50:34.410 experience technical failure during testing and that ding there acquisition number. 00:50:34.410 --> 00:50:39.580 Because if the systems weren't on they weren't gonna capture any images. 00:50:39.580 --> 00:50:44.670 And this was an issues for some systems in this 2019 rally 00:50:44.670 --> 00:50:49.680 and also for several systems in the 2018 rally. So that continues to be an issue. 00:50:49.680 --> 00:50:54.890 [Arun] Alright, next question. "To your knowledge did any of the algorithms utilize multi image 00:50:54.890 --> 00:51:00.010 templates or was this outside the test design of this rally?" 00:51:00.010 --> 00:51:05.040 [John] Yeah, so I know no algorithms utilized multi template issues I think is the way to answer this. 00:51:05.040 --> 00:51:10.340 So if we go back and look at, I forget exactly which slide, but it talks about the matching system requirements. 00:51:10.340 --> 00:51:15.520 There isn't an opportunity to provide multiple templates per an image. 00:51:15.520 --> 00:51:15.530 Our API specifies image in template out. 00:51:15.530 --> 00:51:20.610 Our API specifies image in template out. And you can 00:51:20.610 --> 00:51:25.620 that's still up on github.mdtf.org, if anyone's really interested. It's a little bit different from 00:51:25.620 --> 00:51:30.820 some of the NIST API's that sort of allow you to create this gallery I think that's for 00:51:30.820 --> 00:51:35.930 optimizing speed purposes and some of the IARP API's, which sort of let you build this 00:51:35.930 --> 00:51:40.950 super template concept. We don't have any of that. 00:51:40.950 --> 00:51:46.170 [Arun] Next question. "We're the acquisition vendors allowed to recalibrate the system after an initial 00:51:46.170 --> 00:51:51.330 period of capture, i.e., after one day?" So the answer is 00:51:51.330 --> 00:51:56.400 yes, I think it was after two days, they were allowed to make two, we called them human factors adjustments 00:51:56.400 --> 00:52:01.690 over the first two days. [Yevgeniy]Over the first two days they could adjust any sort of 00:52:01.690 --> 00:52:06.720 signage or layout of their station after which they have to freeze that however 00:52:06.720 --> 00:52:11.820 they always had an opportunity for break fix in case their system experienced a technical issue. 00:52:11.820 --> 00:52:16.830 They could go back after the group and participants had gone and see if they could get their system up 00:52:16.830 --> 00:52:22.060 and running again. [Arun] Yeah, so actually I should probably provide a little context so during the first two 00:52:22.060 --> 00:52:27.220 days what that really meant that the vendors who are participating could monitor the performance of 00:52:27.220 --> 00:52:32.260 their systems. We actually gave them iPads, so that they could see videos of people interacting 00:52:32.260 --> 00:52:37.540 with their systems as well as see some of the imagery coming out of their systems. 00:52:37.540 --> 00:52:42.720 So that they could determine whether or not the system appeared to be working correctly based on their own 00:52:42.720 --> 00:52:47.820 expectations. So for example if people weren't doing something and they had instructions out there, they could 00:52:47.820 --> 00:52:53.150 go back and say well, maybe I should change my instructions or if they saw people looking in the wrong 00:52:53.150 --> 00:52:58.420 direction maybe they could reorient or reposition things So it was really up the companies themselves to see whether 00:52:58.420 --> 00:53:03.600 or not what they put in that 6x8 square foot space was eliciting the behaviors 00:53:03.600 --> 00:53:08.690 they were looking for from the people, the volunteers interacting with the systems. 00:53:08.690 --> 00:53:13.700 [Yevgeniy] In fact several systems did take advantage of that and change signage for example. 00:53:13.700 --> 00:53:18.920 Better deal with people wearing glasses. [John] And I think that includes to answer the next question 00:53:18.920 --> 00:53:24.060 camera capture settings were a part of that, so if you wanted to change how many pixels between the eyes 00:53:24.060 --> 00:53:29.090 before you would capture a picture and you could do that in the first two days. [All agree] 00:53:29.090 --> 00:53:34.350 [Arun] I think the major point there is for that, we that was left to the vendors. 00:53:34.350 --> 00:53:39.540 We basically provided a fair playing ground. You know a fair set of rules. 00:53:39.540 --> 00:53:44.620 And was up to them to figure out how they would innovate across the different 00:53:44.620 --> 00:53:49.900 variables they had available to them to adjust and figure out how they would adjust 00:53:49.900 --> 00:53:55.130 their own systems to meet the goals that we had set forth. 00:53:55.130 --> 00:54:00.130 Alright so the next question. "This is more of an acquisition question and I have not visited MDTF 00:54:00.130 --> 00:54:05.180 but were lighting conditions equivalent for acquisition systems regardless of physical location 00:54:05.180 --> 00:54:10.450 or were lighting conditions changed during the rally?" [Yevgeniy] Good question. So yes, I could address this. 00:54:10.450 --> 00:54:15.640 So their all rally stations were designed to be identical. 00:54:15.640 --> 00:54:20.740 Including the lighting conditions at that stations, which were calibrated to be 00:54:20.740 --> 00:54:26.060 600 luxe and that ensured fairness 00:54:26.060 --> 00:54:31.310 in the design. There were a number of other things that were included in design to ensure fairness. 00:54:31.310 --> 00:54:36.470 And habituation and John mentioned the counter balancing as well. 00:54:36.470 --> 00:54:41.550 So yes, the conditions at each station were kept as constant as possible. 00:54:41.550 --> 00:54:46.860 [Arun] Well I think we are right at the end of our webinar today. 00:54:46.860 --> 00:54:52.080 So again we don't want to cut things short. I am sure that there are probably more questions 00:54:52.080 --> 00:54:57.100 if nothing else I'm sure you did not have enough time to look at those charts with all of those numbers. 00:54:57.100 --> 00:55:02.150 Those great, great numbers. Please so we will put 00:55:02.150 --> 00:55:07.380 those up on the website. They will be there by there by the end of the week. Please feel free to take a look at them and 00:55:07.380 --> 00:55:12.550 I'm sure you may have more questions. Again we welcome your questions at peoplescreening@hq.dhs.gov. 00:55:12.550 --> 00:55:16.960 Thank you and have a great day.