You are here

Snapshot: Speech Analytics Evaluation for OpenSAT19

Snapshot: Speech Analytics Evaluation for OpenSAT19

Release Date: 
July 23, 2019

Study participants who are playing board games at the Linguistic Data Consortium at UPENN.Effective speech analytic technologies and voice-based communications services are critical for public safety responders, especially when their hands are preoccupied during an active response or when background noises reach levels that can render most digital speech assistants inoperable. In order to be effective, these technologies and services should be hands-free and be able to handle background noises and isolate important voices. 

Following a pilot to improve speech analytic systems for public safety, the first speech analytics evaluation for the Open Speech Analytic Technologies Evaluation Series (OpenSAT19) was recently launched. This effort is supported with funding by the Department of Homeland Security Science and Technology Directorate (S&T) and spearheaded by the National Institute of Standards and Technology (NIST).

To test the efficacy of these technologies and services, this speech analytics evaluation will have three data domains with a variety of datasets, explained S&T’s Cuong Luu.  

The first domain will consist of data that simulates sound and speech conditions in English-speaking public safety communications. This dataset will be a newly obtained large collection of public-safety speech that contains simulated first-responder type background noises and affected speech. NIST expects to continue using this dataset for the next few years, as it will provide an opportunity to measure year-to-year performance.

The second domain will consist of data that is conversational telephone speech in a low resource language – also known as any language that is not commonly spoken. This new dataset will be drawn from a previous Intelligence Advanced Research Projects Activity Babel collection.

An audio engineer (wearing red) at the Linguistic Data Consortium at UPENN collects data from study participants who are playing board games. The third domain will consist of English-speaking audio extracted from amateur online videos. This dataset will provide challenging characteristics such as audio compression, diverse topics, and recording equipment and environmental scenarios.

Specific tasks that speech analytic systems will perform on these datasets include speech activity detection, automatic speech recognition, i.e., speech-to-text, and keyword searches. Researchers will choose which tasks align with their system’s design. The objectives of the OpenSAT 2019 evaluation are to continue evaluating system analytic performance for specific tasks during exposure to a variety of challenging environments; provide a forum for the community to further test and develop speech analytic technologies; enable opportunities for sharing and leveraging knowledge by bringing together developers who specialize in various types of speech analytic tasks; and to provide developers with an opportunity to apply different speech analytic systems on the same datasets. After the evaluation ends, participating researchers will publish papers and present their findings at industry conferences.

“We expect that the findings from this evaluations series will help researchers to make advancements in this arena and ultimately assist the industry in making changes to improve technologies,” Luu said. “As speech analytic technologies continue to evolve and advance, S&T anticipates that the public safety community will begin to use them more and more.”

For additional information about OpenSAT19, visit the NIST OpenSAT web site and download the OpenSAT19 Evaluation Plan.

If you are a developer who specializes in speech analytics and would like to have an opportunity to share and leverage your knowledge, S&T encourages you to register for the evaluation. 

Caption:
An audio engineer (wearing red pants) at the Linguistic Data Consortium at UPENN collects data from study participants who are playing board games. She is controlling the background noises being fed into players’ headsets (the players can hear each other’s voices and background noises) and also serving as a “game monitor” to impose time constraints during games and create a sense of urgency. The black barrier is being used to prevent study participants from seeing each other’s facial expressions while communicating.

Back to Top