Speech Recognition using Biologically-Inspired Neural Networks
Abstract
Automatic speech recognition systems (ASR), such as the recurrent neural network transducer (RNN-T), have reached close to human-like performance and are deployed in commercial applications. However, their core operations depart from the powerful biological counterpart, the human brain. On the other hand, the current developments in biologically-inspired ASR models lag behind in terms of accuracy and focus primarily on small-scale applications. In this work, we revisit the incorporation of biologically-plausible models into deep learning and enhance their capabilities, by taking inspiration from the brain's diverse neural and synaptic dynamics. In particular, we propose novel deep learning units by introducing neural connectivity concepts emulating the axo-somatic and the axo-axonic synapses and integrate them into the RNN-T architecture. We demonstrate for the first time that such a model can yield performance levels competitive to the state-of-the-art. Moreover, our implementation has a significantly reduced computational cost and a lower latency.