logo

Skip to content
Science & Technology

Are video deepfakes powerful enough to influence political discourse?

FACE THE FACTS: Using emotional listener generation technology, University of Rochester computer scientists have generated deepfakes of Joe Biden displaying different emotional expressions in reaction to Donald Trump. (University of Rochester GIF / Luchuan Song)

An expert in AI video generation discusses the technology’s rapid advances—and its current limitations.

This presidential cycle has already seen several high-profile examples of people using deepfakes to try to influence voters. Deepfakes are images, audio recordings, or videos generated or modified using artificial intelligence (AI) models to depict real or fictional people. Recent deepfake examples include and .

It appears generative artificial intelligence is an increasingly prominent tool in the misinformation toolbox. Should voters be concerned about being bombarded with phony videos of politicians created with generative AI? An expert in computer vision and deep learning at the University of Rochester says that while the technology is rapidly advancing, deepfake video generation remains harder for bad actors to leverage due to its complex nature.

While OpenAI’s products, including ChatGPT for text generation and DALL-E 3 for image generation, are taking off in popularity, the company has yet to release an equivalent for video generation. According to , an associate professor of at logo the company has released previews of its Sora video generation software but has yet to release the product, which is still undergoing testing and refinement.

“Generating video using AI is still an ongoing research topic and a hard problem because it’s what we call multimodal content,” says Xu. “Generating moving videos along with corresponding audio are difficult problems on their own—and aligning them is even harder.”

Xu says that his research group was among the first to use artificial neural networks to generate multimodal video in 2017. They started with tasks like . From there, they moved on to problems like , and then to .

“Now, we can generate real-time, fully drivable heads and even ,” says Xu.

Diptych of two video deefakes as GIFs—one of the Mona Lisa and one of Chenliang Xu—manipulated to show them each speaking.
TALKING HEADS: Computer scientist Chenliang Xu and his fellow researchers can generate lifelike talking head videos from an individual photo or even a painting, as demonstrated here with a looping video created from an image of the Mona Lisa and a headshot of Xu. (University of Rochester GIF / Luchuan Song)

Challenges with deepfake detection technology

Xu’s team has also developed technology for . He calls it an area that needs extensive further research, noting that it’s easier to build technology to generate deepfakes than to detect them because of the training data needed to build the generalized deepfake detection models.

Politicians and celebrities are easier to generate than normal people because there is simply more data about them.”

“If you want to build a technology that’s able to detect deepfakes, you need to create a database that identifies what are fake images and what are real images,” says Xu. “That labeling requires an additional layer of human involvement that generation does not.”

Another concern, he adds, is making a detector that is generalizable to different types of deepfake generators. “You can make a model that performs well against the techniques you know about, but if someone uses a different model, your detection algorithm will have a hard time capturing that,” he says.

The easiest targets for video deepfakes

Having access to good training data is crucial for creating effective generative AI models. As a result, Xu says politicians and celebrities will be the earliest and easiest targets when video generators become widely available.

“Politicians and celebrities are easier to generate than normal people because there is simply more data about them,” says Xu. “Because so much video of them already exists, these models can use it to learn the expressions they show in different situations, along with their voices, their hair, movements, and emotions.”

But he expects that, at least initially, the training data the “celeb deepfakes” in particular are built on may make them more easily noticeable.

“If you used only high-quality photographs to train a model, it will produce similar results,” says Xu. “It may result in an overly smooth style that you can pick out as a cue to tell it’s a deepfake.”

Other cues can include how natural a person’s reaction seems, whether they can move their heads, and even the number of teeth shown. But image generators have overcome similar early tells—such as —and Xu says enough training data can mitigate these limitations.

He calls on the research community to invest more effort into developing deepfake detection strategies and grappling with the ethical concerns surrounding the development of these technologies.

“Generative models are a tool that in the hands of good people can do good things, but in the hands of bad people can do bad things,” says Xu. “The technology itself isn’t good or bad, but we need to discuss how to prevent these powerful tools from ending up in the wrong hands and used maliciously.”