Esben Skovenborg & Thomas Lund

Our CTO for Broadcast & Production, Thomas Lund and our Senior Research Engineer, Esben Skovenborg (Ph.D.) presented Convention Paper 8983 entitled: 'Level-Normalization of Feature Films Using Loudness vs Speech'  at the 135th AES Convention in New York.

It was very well received. In fact, the authors were selected by AES to receive a 'Best Peer-Reviewed Paper' award for their work. "Needless to say we're very honored to receive this award that recognizes our scientific research methods as well as our contribution to the field of loudness technology. Although TC and others have provided loudness solutions for years, considerable challenges remain, and by basing new products and technology on empirical research we are better able to serve the needs of the broadcasters and audio engineers," 
says Esben Skovenborg. 

"We felt it was important to make this study as films may be made for cinema, but are often viewed on TV, where they have to co-exist with all kinds of program material in terms of audio loudness. Further, even if we look at film in an isolated context, there's also a loudness war going on in cinema, and we were trying to understand the reasons why. In broadcast as well as in cinema, it appears the number one goal is to get the overall loudness suitable without hampering speech intelligibility. Speech level is less important,"  adds Thomas Lund.

AES Paper Award

Paper Abstract

We present an empirical study of the differences between level-normalization of feature films using the two dominant methods: loudness normalization and speech ("dialog") normalization.

The sound of 35 recent 'blockbuster' DVDs were analyzed using both methods. The difference in normalization level was up to 14 dB, on average 5.5 dB. For all films the loudness method provided the lowest normalization level and hence the greatest headroom.

Comparison of automatic speech measurement to manual measurement of dialog anchors shows a typical difference of 4.5 dB, with the automatic measurement producing the highest level. Employing the speech-classifier to process rather than measure the films, a listening test suggested that the automatic measure is positively biased because it sometimes fails to distinguish between "normal speech" and speech combined with "action" sounds.

Finally, the DialNorm values encoded in the AC-3 streams on DVDs were compared to both the automatically and the manually measured speech levels, and found to match neither one well.