Commercial Lip Sync Problems
contact site map home


In today’s digital (and hybrid digital/analog) broadcasting facilities the opportunities for lip sync errors continue to multiply. From CCD cameras, to frame synchronizers, production switchers, digital video effects, noise reducers, MPEG encoders and decoders and so on, the video is typically delayed more than the audio. Worse yet, the amount of video delay frequently jumps by a frame or more as frames of video are dropped or repeated. Or, in other cases, digital audio surround sound processing can cause the audio to be delayed by several frames relative to the video.

If these lip sync problems occur during paid advertisement programming it may have adverse effect on TV station’s business. When the advertisers see the problem, they may refuse to pay for the commercial, or demand a makegood.

Since the occurrence of lip sync errors is inevitable we are faced with two issues – how to measure the errors and how to correct them quickly and transparently.

Measuring Lip Sync Errors

In the last few years there has been little change in practical measurement methods for lip sync errors.

The human eye and ear method works. However, it does not provide a precise measurement of the number of frames error and does not identify whether the problem is occurring at one stage or is the sum of several errors occurring at multiple stages. Also, it is labor intensive and does not trigger an automatic alarm every time an error occurs.

Another method is to insert “watermarks”. One example is Tektronix AVDC100 in which the audio envelope is included in the video as least significant bits (LSBs). This method at the first glance looks very elegant in its conception and execution, but compression systems, when heavily compressing, may lose the data of the watermark signal because the LSBs are discarded as “insignificant, thus unnecessary”. Watermarking may be suitable in controlled environments, but as soon as the signal is passed between two non-compatible processing plants the watermark may be lost or damaged, and accordingly, the synchronization information is lost. Similar problems are faced by all other “watermarking” schemes.

A possible method is to have VITC (Vertical Interval Time Code) with the video and LTC (Linear Time Code) with the audio. This works, for example with Dolby E, which includes LTC. Also DATC (Digital Audio Time Code) can be used with AES audio to convey LTC, but it requires additional equipment in the transmission path at one location for DATC insertion at the source, with accompanying decoder on a DA output at the destination.

The difference in seconds and frames between the VITC and the LTC could be used to monitor lipsync. This is assuming that the video is being conveyed at the same rate as VITC, but that may not be the case unless the compression system is designed to halt or slow the VITC and/or DATC numbers whenever the video is halted (frozen) or slowed down. So this method is dependent on compression system implementation, and as these simultaneous VITC-video and LTC-audio rates are not part of the compression system specifications, there is no compulsion to implement it. At the same time, insertion of DATC is just one more form of “watermarking” and therefore it is prone to become lost down the stream.

The LipTracker™ lip sync analyzer is a non-invasive measurement tool for in-service lip sync analysis. After detecting a face in the video, LipTracker™ compares selected sounds in the audio with the mouth shapes that create them in the video. The relative timing of these sounds and corresponding mouth shapes (called Mutual Events or MuEvs) is analyzed to produce a measurement of the lip sync error. The sounds and mouth shapes that are used for MuEv analysis are commonly found in the natural speech patterns of many languages.

Numeric and graphic displays of the current audio offset are updated periodically until the current face is lost or a new face is detected. A history graph charts the most recent error profile and event logging saves the results for scene by scene analysis. An Audio Offset Status indicator provides a visual warning of the current offset.

This unique approach of analyzing real time video and audio content does not require the insertion of cues, codes or watermarks into the program material. Therefore, LipTracker™ can be used at any point in the transmission path.

Correcting Lip Sync Errors

Even in facilities that use tracking audio delays, the results may be unsatisfactory. Some competitive audio synchronizers cannot track delay changes quickly, so if the lip sync is wrong at the start of a commercial it will probably be off for the entire commercial. Or, unwanted audio artifacts may be introduced. The delay change problem stems from the fact that audio is continuous; you cannot simply drop or repeat a frame as you do with video in order to make an instant delay change. In one of our competitor's products, audio samples are dropped or repeated to make delay changes. The manipulation of samples causes clicks and pops in the audio during the period of time the delay is changing. Advertisers are not going to be happy with clicks and pops in their commercials.

Another competitor simply limits the rate of change of their audio delay. This eliminates the clicks, pops and distortion; and keeps the pitch shift very low. Limiting the rate of change means it often takes two or three minutes for the audio delay to catch up with the video after a one or two frame delay change. By the time the audio catches up the commercial is over. This also makes the advertisers unhappy.

The Pixel AD3000 and AD3100 audio synchronizers incorporates pitch shifting and fast track technology to avoid these problems. Our synchronizers can make a ½ second delay change in less than 2 seconds, and do it invisibly. Most experienced television engineers cannot hear any delay change artifacts from these rapid corrections, even when told they are going to happen in the next minute.

The fast track technology of our AD3000 and AD3100 is easily demonstrated. Simply program the synchronizer to a fixed 0 second delay, run any program audio through the unit, and have the engineer listen to the audio but face away from the AD3000. Next, tell the engineer that you are going to change the delay in the next two minutes and quietly program the delay to .5 second. The most that anyone can possibly hear is a very slight tempo change in music, and that usually takes a musician. Most people, even experienced engineers, hear absolutely no artifacts. Repeat the demo and let them watch the front panel so they can see how fast the delay changes. For fun, set the delay to 2 seconds and let them listen to the audio during the delay change.

Try the same demo with any of our competitors’ products and one or more of three audio artifacts will be immediately apparent. There will be clicks, pops or distortion, there may be a pitch change, and it may take a very long time for the audio to change. With all of our competition, there is a tradeoff between pitch change, distortion and time that it takes to change the delay. Try changing the delay from 0 to 2 seconds and then back to zero, noting how long it takes or how noticeable the artifacts sound. Our AD3000 family of products run circles around the competition in all respects. In any case, it only takes one bad commercial to make the advertisers unhappy. The loss of revenue from one commercial may very well be more than the cost of one of our audio synchronizers with pitch correction.

Another important point is that viewers find speakers who do not have proper lip sync as less interesting, less trustworthy, etc. as compared to those where the lip sync is correct (1). This of course is a big concern to newscasters, reporters, politicians and others who are trying to convey a message of trust and sincerity to their audience.

Of course stations are concerned with keeping their news ratings up and they don't want their viewers thinking that the newscasters are not interesting. This has also become an issue to a number of federal politicians, and Congress is looking to upgrade the entire C-SPAN system to guarantee that lip sync is always correct. After all, what politician wants to be perceived as a crook simply because his speech gets aired with a noticeable lip sync error.





1. Dr. Byron Reeves and Clifford Nass, The Media Equation (Stanford, California: Stanford
University Center for the Study of Language and Information), 211-218