DeepFake consists of two concepts – "deep learning of neural networks" (deep learning) and "fake" (fake). And there are dozens of models for creating deepfakes, which are becoming more and more believable and are no longer perceived as a harmless toy for geeks.
What are deepfakes and what tools are used to create
We are talking about synthetically created media content, in which one image is replaced by another using machine learning algorithms. The technology creates a "mix" of images, thanks to which Arnold Schwarzenegger can play almost all the characters of "The Lord of the Rings", and Mr. Bean can act instead of Charlize Theron in an advertisement for the fragrance J'adore Dior.
Usually, such content is created using a generative adversarial network (GAN), which includes 2 systems - a generator and a discriminator. The first element creates images, and the second - criticizes them, that is, the system learns from itself through rivalry between neural networks. The neural auto coder processes an array of media content data, studies the features of the human face and facial expressions, after which it learns to reproduce them and can synthesize content.
One of the popular solutions that are based on GANs technology is FSGAN for the transfer of faces to video (faceswap), created by the Japanese developer Shaoanlu. One neural network learns to adjust the donor's face to the parameters of the target video, the second transfers facial features, and the third is engaged in merging images to get a more realistic picture.
Generative neural networks are most difficult to synthesize such classes of objects as people, machines and gates. The researchers noted this with the example of the LSUN churches dataset. The Unified Perceptual Parsing semantic understanding network is used as a model for semantic segmentation. The neural network marks each pixel as belonging to an object of one of 336 classes, then it selects fragments containing objects from the original image. which are fed to the AI input. So the neural network can create a picture on the sketch and recognize objects.
The GAN network is capable of copying, including voice: in order to reproduce human speech as reliably as possible, the neural network needs only a few minutes of audio recording for training.
So, Google created a speech generator "tacotron", capable of copying a voice based on 5 seconds of input data. The system works in three steps: one neural network verifies speech, the second synthesizes sequences based on Tacotron 2, and the third is responsible for the result at the output.
Although deepfake technology was used before, the term itself arose in 2017, when users of the Reddit platform began to post similar modified content in the Deepfake section - these were usually adult videos that used the faces of famous actresses, or episodes with the replacement of film characters on the face of Nicolas Cage. Thus, the famous actor "replaced" even Don Corleone from "The Godfather" and Maria from "The Sound of Music".
Successful cases of using the technology
Deepfakes are successfully used in the film industry and advertising campaigns: for example, in 2014, Audrey Hepburn "introduced" Dove chocolate in a short video. The Salvador Dali Museum in Florida with the help of deepfakes "revived" the artist who welcomes visitors. To create a twin of the neural network, it took more than 6,000 photos of Dali and a thousand hours of machine learning.
The technology is also used for political purposes: for example, the president of the Indian party Bharatiya Janata (BJP) Manoj Tiwari used deepfake to create a version of speech in the Haryanvi dialect. Thus, voters who do not know English and speak only this dialect were able to understand its content.
"Each bank has an ambassador, and a deepfake can, for example, greet a visitor to the branch or contact center by phone, talk about the news in videos on the site," the expert suggests.
Artificial intelligence can significantly reduce the budgets of music videos, films and TV shows, when actors do not need doubles for dangerous scenes. Stunts can be performed by any stuntman, and the actor's face will simply be superimposed with the help of DeepFake. Disney already has experience in creating video with a resolution of 1024×1024 pixels using similar technology and the company continues to improve deepfakes.
Previously, only presentations or an audio course were available to students. With the help of deepfakes, it is now possible to generate hundreds of lecture videos on previously written courses, which facilitates learning. And the Japanese project Data Grid proposes to simplify the process of buying clothes. To do this, the client's face is superimposed on the virtual model. And then it is possible to assess how suitable this or that model of clothing is.
The Dangers of "Fake" Content
The most obvious risk is associated with the use of technology by fraudsters. Last year, criminals stole $35 million from a bank in the UAE: they cloned the voice of the director of a financial institution and used it as part of a legitimate commercial transaction. The technology carries risks in terms of personal data: doppelgangers can pose as trustees or employees and steal user information.
It is dangerous for many organizations and copying voice using deepfakes. In 2019, the CEO of the British branch of a large energy company fell for the bait of scammers. He took a call from his "supervisor" from Germany, who said that it was necessary to transfer 243 thousand US dollars to the account of the Hungarian supplier.
Since the voice and manner of speech did not cause any suspicion, the top manager transferred the entire amount to them. Later it turned out that he was called by a robot that had learned to copy the voice of a German colleague in the audio recordings of public speeches.
Deepfakes can cause considerable damage not only to finances, but also to reputation. The scandal occurred because of the video of the official speech of the head of the House of Representatives of the US Congress, Nancy Pelosi: because of the slow speech, it seems that the politician is in a state of intoxication. To prove the use of artificial intelligence technology to transform Pelosi's speech was possible only with the help of expertise.
On the other hand, deepfakes can fool Facial Liveness Verification's authentication systems. In 2022, a team of researchers from the University of Pennsylvania (USA) and Zhejiang and Shandong Universities (China) found that most systems are vulnerable to developing forms of deepfakes, because they were tuned to outdated techniques or may be too specific to the architecture of projects. Plus, the authentication system is biased against white men, the faces of women and minorities of color were not so susceptible to verification systems, which carries a potential threat to these categories of customers. Another risk of using the technology is for children: it is more difficult for them to distinguish between the voice and the fake image of parents on the phone.
According to research by the Amsterdam-based company Sensity, since December 2018, the number of video certificates has doubled every six months. Today, there are more than 100 thousand such fakes on the Internet. Often, DeepFake is used to blackmail ordinary users when videos with their faces are created, and scammers threaten to spread incriminating information to family, relatives and friends.
All these examples show that the fact of using deepfake technology is not as important as the fact of a crime. And extortion, slander or fraud can be committed without the use of any technology.
"The victim whose image was used for the fraudulent video will have to prove that in fact there was another person in the video, and the attackers used a deepfake double. Obviously, law enforcement will first of all begin to check who they see in the video. Therefore, in the situation with the founder of Dbrain, it was fortunate that the quality of the image overlay was at a low level.
In addition to criminal liability, the creator of deepfakes can be brought to civil law, since the victim has the right to demand the refutation of information that discredits his honor, dignity or business reputation, the removal of such information, as well as compensation for losses and compensation for moral damage from the creator of the forgery.
What is the reason for the spread of deepfakes
To generate deepfakes, you can create and train your own neural network, but this is too expensive and complex an option. There are many programs available that allow you to modify content, without the need to resort to programming or delve into technological nuances.
You can create a deepfake "on the knee" in just 5 minutes, for example, in reface or Impressions mobile applications. Programs superimpose the faces of famous people on the videos recorded by the user. You just need to select a celebrity, upload a video, and the application will create a deepfake version. What's more, the services already include a catalog of audio recordings from popular movies that can be synced lips.
How to neutralize the negative consequences of deepfakes
Some countries are protected from deepfakes at the legislative level. So, at the end of January 2022, China adopted a law prohibiting their use. Officials believe that materials created by AI pose a threat to national security, so they should have a special note. In the US, some states prohibit the distribution of deepfakes during presidential campaigns, for example, in California it is impossible to create and distribute deepfakes in audio and video format, which negatively affect the image of politicians. And in France, there is a penalty for editing a speech or image of a person without his consent, the law prohibits the publication of retouched photos without special marking.
The Facial Liveness Verification system, which is a mechanism for combating fraud in facial recognition, is able to distinguish interaction with a real person from a fraudster using a fake identifier.
Experts note that the problem with the security of face authentication systems can be solved if you refuse to check on one image, update DeepFake detection systems in graphic and voice domains, synchronize voice authentication with lip movements. As an additional check, it is worth asking users to perform movements that are difficult to reproduce with deepfake systems - viewing the profile and partially darkening the face.
"I haven't heard of deepfake protection systems yet. But the mechanics should be similar to those used to protect against phishing. Conditionally, the link will be transmitted to the service that checks the file for DeepFake signs and gives a recommendation to the user to believe or not the sent video. Organizations that use authorization by face or voice already need to actively think about protection against deepfakes. Steps to confronting a fraudster can be the use of combined authorization using audio and video, the requirement of authorization on the air, and not with the help of pre-recorded files, and the performance of random complex gestures on the camera.
Ordinary Internet users can protect themselves by increasing media literacy and using reliable sources of information. It is necessary to treat with caution if the video is of low quality, with a fuzzy and blurred image, duplication of elements. Deepfakes often give out unnatural facial expressions, especially when blinking, moving eyebrows and lips.