Various forms of voice assistants—stand-alone devices or those built into smartphones—are becoming increasingly popular among consumers. Currently, these systems react when you directly speak to them using a specific wake-word, such as “Alexa,” “Siri,” “Ok Google.” However, with advancements in speech recognition, the next generation of voice assistants is expected to always listen to the acoustic environment and proactively provide services and recommendations based on human conversations or other audio signals, without being explicitly invoked.