The MAVA corpus (MARCS Auditory-Visual Australian recordings of IEEE sentences) is a collection of high quality audiovisual recordings of 205 phonetically balanced sentences from the IEEE sentence database, recorded by a native Australian English female talker. The audio channel is annotated at the word and phoneme level. In addition, for the video channel, frame-by-frame lip contour X Y coordinates are provided. The center of the lip region is used as a reference for deriving four video regions: full face, upper face, lower face and lips. All files are freely available for download under the Creative Commons BY-NC-SA licence.
Click on the link below to download a detailed description of how the corpus was designed, recorded and post-processed. This file is also availabe on the side “ATTACHMENT” panel:
Use the links below to browse the entire collection or a 10 items subset. You need to be registered to access the data.
Use the above links to explore the MAVA collection and perform various actions, including downloading your own selection of files. Alternatively, you can use the direct-dowload links below to download pre-established subsets of the corpus.
- Download all items, all file types (2050 files, 3.69 GB)
Items s1 to s10, all file types:
- Download items s1 to s10, all file types (100 files, 181.6 MB)
All items, select file type:
- Download all items, full-face audio-video (205 files, 1.17 GB)
- Download all items, full-face video (205 files, 1.16 GB)
- Download all items, upper-face video (205 files, 728.2 MB)
- Download all items, lower-face video (205 files, 431.1 MB)
- Download all items, lip region video (205 files, 96.8 MB)
- Download all items, 16kHz audio (205 files, 20.7 MB)
- Download all items, 48kHz audio (205 files, 62.1 MB)
- Download all items, word and phoneme annotation (205 files, 958 kB)
- Download all items, lip contour (205 files, 20.6 MB)
- Download all items, lip midpoint (205 files, 393 kB)
To cite MAVA, please use:
Aubanel, V., Davis, C. Kim, J (2017). The MAVA corpus. [online resource]. DOI: http://dx.doi.org/10.4227/139/59a4c21a896a3
The research leading to these results was partly funded by the Autralian Research Council under grant agreement DP130104447. V. A. also acknowledges support from the European Research Council under the European Community’s Seventh Framework Program (FP7/2007-2013 Grant Agreement 339152.