MAVA stands for the “MARCS Auditory-Visual Australian recordings of IEEE sentences”. It consists of the audiovisual recording of 205 phonetically balanced sentences from the IEEE sentence database, by a native Australian English female talker. The auditory channel is annotated at the sentence, word and phoneme level, and the video channel is provided with frame-by-frame lip contour tracking. Both annotation types are manually checked. The video channel is available in four different dimensions referenced to the centre of the lip area: full-face, upper-face, lower-face and lip regions.