MSc thesis project proposal

Speech recognition and speech based validation systems for web subscriptions

Topic: For visually impaired people, web subscription systems make use of an audio-based system where a distorted spoken code needs to be retyped. Can speech recognisers be used to bypass this validation?
Many web based systems where people can subscribe (such as gmail and online shops) make use of a standard procedure to validate whether the subscriber is a human being or a computer that is programmed to automatically sign in and abuse the system. Typically these validation systems consist of a code based on a picture with distorted text comprising numbers and characters that needs to be typed in by the user. The distortion of the image is there to prevent automated character recognition algorithms to “read” the code. For visually impaired people, there is usually also an audio-based version where the code is spoken. As with the image-based approach, these spoken messages are usually distorted with a number of acoustical distortions (additive noise, reverberation, etc.) in order to prevent the use of speech recognisers to recognise the code and use this to abuse the system. In this project, the focus is on how well the distortions added to these spoken messages can protect against computerised sign in and possibly show whether it is indeed possible to bypass these validation systems by using state-of-the-art speech recognition technology. As the vocabulary is limited (to only numbers and letters), speech recognition on the clean message or mildly distorted messages should be rather straightforward. As there is a limit to the amount of distortion that can be added (a human being should still be able to understand the message) there is most likely a trade-off between the effective protection using this system and the user-friendliness.


The goal is to research under which distortion conditions, standard (robust) speech recognisers can be used to understand the message correctly. Besides the type of speech recogniser, this most likely depends on the type and amount of distortion, and the size of the vocabulary used. Depending on the progress and experience of the student, extensions could include improvements on these validation systems, or more theoretical derivations on e.g. the minimum amount of mutual information between original and distorted message that is needed for an average human listener to understand the message and maximum amount of mutual information that is required such that speech recognisers fail.


This project requires knowledge on at least:
  • signal processing
  • Basic knowledge on speech recognition
  • stochastic/random processes
  • experience with Matlab and some C/C++
and preferably also information theory

Contact Richard Hendriks

Circuits and Systems Group

Department of Microelectronics

Last modified: 2014-09-18