The following results show replacement of a word or phrase. Ground truth recordings were not used by our method to synthesize new video.
After generating the video, we add the ground truth voice so that the video result can be evaluated.
For each example we show 3 videos: original sentence, edited result, debug video. Debug video contains from left to right: input to the network, result, frames from which expression data was taken.