Build a face swapping app, part 1: computer vision algorithm

Feature image

This artile is the first of 3 articles where we are going to build a mobile app that will automatically perform face swapping.

  • Part 1: Computer vision algorithm
  • Part 2: Rest API - Pending
  • Part 3: Mobile app - Pending

Computer vision has proved to be one of the most revolutionary fields in the field of computer science due to the fact that it presented solutions for some of the most challenging problems in the world.

It has also provided innovative techniques that actively contributed to the advancement of technology all across the globe. We can say computer vision is the closest we came to the interaction of the digital world with the physical world.

It enables self-driving cars to understand and interpret their surroundings. Computer vision also has a leading role in technology regarding augmented and virtual reality and also many of the modern applications of artificial intelligence are also based on several concepts of computer vision.

This has allowed AI applications to outperform a huge portion of the classical methods.

As such, facial recognition is another application of computer vision that has brought forward a number of solutions to some existing solutions be it surveillance, or crime and theft prevention.

Facial recognition is, basically, the recognition of a human face using a camera. A popular example of a facial recognition system is the face lock in mobile phones. It uses a facial recognition algorithm to detect and then compare the detected face with the faces in the database for screen lock and unlock the phone automatically if they match.

Face swapping is another application of computer vision that is also based on facial recognition. This article will go through the basics of face swapping in computer vision and the ways by which it can be performed in OpenCV. The article will follow the following structure:


What is Face Swapping?

The majority of you people might have seen filters that let you swap the faces of two people in a picture or with famous celebrities or animals.

These kinds of filters are encountered by many people on social media and applications such as Snapchat.

This is rather a simple concept in theory but not so much in practice. The human eye is very efficient at facial recognition and can easily detect whether a face is real or fake.

It can also easily detect the facial boundary and the features of a face like an eye and a nose. But for computers to interpret all this information is not as easy as a human.

But still, the concept of face swapping, when seen relative to other applications of computer vision, is not as difficult.

Funny result of face swapping

Funny result of face swapping


How can Face Swapping be done?

Ever since the emergence of advanced computer vision techniques, there have been multiple attempts at developing face swapping techniques. Since then, many software, like adobe, have integrated the tools to perform face swapping in their photoshop software.

Similarly, people in the field of artificial intelligence have developed models for the implementation of face swapping. DeepFacLab’s implementation of deep learning models for the purpose of face swapping is one example that provides an integrated framework to perform face swapping between pictures.

Aside from these, the method used in dlib for face swapping is also based on a machine learning based solution. It uses an ML model for landmark detection in a picture which aids in facial recognition. This facial recognition using landmark detection can then be extended to perform face swapping between images. In summary the methods for face swapping include:

  • Classic Photoshop in Adobe software.
  • Artificial Intelligence based methods
    • DeepFaceLab
    • Landmark detection with dlib.

The landmark detection gives the facial recognition part of the swapping process. After the recognition, we extract the face from one image by deleting the part except the face and in the second picture, we delete the face.

We then place the extracted face from one image on the image with the deleted face in the place where the deletion occurred. This obviously does not provide a good result because the outline of the new face would not match or merge with the new image. So, we have to smooth out the boundary of the face and process both images a little at the boundary so that the new face merges well with the new image and it looks realistic.


A guide to Landmark Detection

Landmark detection is a computer vision technique where we detect the key points from a human face and track them.

Although landmark detection in computer vision is related to faces, landmark detection, in general, is not restricted to faces. It refers to the detection and recognition of objects in an image by creating a bounding box around the object. So, such an algorithm will, usually, output two things for us:

  • The probability of the present object.
  • In the presence of the object, the coordinates of the bounding box enclosing it.

Usually, the crucial point in an image that needs to be detected by a neural network, is referred to as a landmark. These landmarks are not necessarily 4 points so as to create a bounding box but can be of any count depending upon the application.

In such an application where we want several landmarks for the recognition of an object or a surface, we would want the model to output the coordinates of the detected landmarks, (x,y) as opposed to the bounding boxes before.

Let us look at a specific example of where we want our neural network model to recognize the two corners of the human lips. This will give two landmarks for each face and output four numbers, that is:

  • (x1, y1)
  • (x2, y2)

This is fairly simple to perform. But what if we want our neural network to not only detect the corners of the lips but the whole outer lining of the lips along with eyes, nose and other important landmarks on a face, the number of points (landmarks) will increase to n.

(x1, y1), (x2, y2), (x3, y3) …...... (xn,yn)

In order to perform such an extensive operation using a neural network, we first need to decide what the positions of the landmarks would be and then label the whole training set with the decided landmarks.

This can turn out to be a hectic task if the training dataset is huge. The sequence of the landmarks is also important here and it needs to be consistent across the whole dataset, for example, if you are placing the first landmark on the tip of the nose, then all images should have the first landmark on the tip of the nose. Similarly, the whole dataset is to be labeled.

After this whole labeling process, the training set is prepared and then we input it into a neural network to start the training process, preferably a convolutional neural network.

Check out our article on detecting face features with Python to learn more about this fascinating topic.

Landmark recognition

Landmark recognition


What is Dlib?

Dlib is a C++ based modern toolkit for the development of machine learning algorithms. It also contains tools for developing complex software in C++ for solving real-world problems.

This tool is used in both industry and academia on a huge scale and in domains such as embedded system design, mobile phones, robotics, and high-performance large computing environments.

Moreover, it is an open-source library that helps in the use since anyone can use it for any application one wants to build.

Unlike many other open-source projects, Dlib provides extensive and precise documentation for each class and module present in the library for a simple understanding of anyone who uses it.

It also has good unit test coverage as it is tested regularly on MS Windows, Mac OSS, and Linux systems. This library is standalone, and no other package is needed for the normal execution of its code.

It offers a huge set of machine learning and numerical algorithms for development. It also contains deep learning tools, so one can use it to create ML models from regression, support vector machines to deep neural networks.

Other extensive tools offered by the library include:

  • Graphical Model Interface Algorithms
  • Image Processing
  • Threading
  • Networking
  • GUI
  • Data Compression and Integrity Algorithms
  • Testing

Implementation is OpenCV and Dlib

As we always do, you can access the full code for this article on Google Colab , and we will describe and explain all of it next.

Before starting the code, we need to install Dlib and OpenCV. Using python pip command install using:

pip install Dlib
pip install opencv-contrib-python

After installing the libraries, we need to download the pre-trained landmark detection model which we will use in Dlib. Download the model here .

Now we can start the implementation.

import cv2
import dlib
import numpy as np
from matplotlib import pyplot as plt

# Loading base images and coverting them to grayscale
face = cv2.imread("chris.jpeg")
body = cv2.imread("trump.jpeg")

We first import the dependencies, that is OpenCV, Dlib, matplotlib and numpy. Matplotlib is used for visual plotting of images. Numpy is used for mathematical computations. We then use the ‘imread’ function of OpenCV to import the two images, one for the face and the other to be used as the body.

face_gray = cv2.cvtColor(face, cv2.COLOR_BGR2GRAY)
body_gray = cv2.cvtColor(body, cv2.COLOR_BGR2GRAY)

# Create empty matrices in the images' shapes
height, width = face_gray.shape
mask = np.zeros((height, width), np.uint8)

height, width, channels = body.shape

Now, convert the images from color images to greyscale images. Then, we initialize a matrix with the size same as the shape of image with the face we will use. Then we extract the shape of the image of the body.

# Loading models and predictors of the dlib library to detect landmarks in both faces
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor("./shape_predictor_68_face_landmarks.dat")


# Getting landmarks for the face that will be swapped into to the body
rect = detector(face_gray)[0]

# This creates a with 68 pairs of integer values — these values are the (x, y)-coordinates of the facial structures 
landmarks = predictor(face_gray, rect)
landmarks_points = [] 

def get_landmarks(landmarks, landmarks_points):
    for n in range(68):
        x = landmarks.part(n).x
        y = landmarks.part(n).y
        landmarks_points.append((x, y))

get_landmarks(landmarks, landmarks_points)

points = np.array(landmarks_points, np.int32)

Now we use the pre-trained face landmark detector we downloaded to recognize the face in the first image. As seen in the for loop, we detect 68 landmarks to recognize the face in the image. We extract the coordinates of each of the 68 landmarks in the for loop.

convexhull = cv2.convexHull(points) 

face_cp = face.copy()
plt.imshow(cv2.cvtColor((cv2.polylines(face_cp, [convexhull], True, (255,255,255), 3)), cv2.COLOR_BGR2RGB))

face_image_1 = cv2.bitwise_and(face, face, mask=mask)

Next, we use the convexhull module of the OpenCV library to draw the contour around the face that is detected from the landmark detection model.

rect = cv2.boundingRect(convexhull)

subdiv = cv2.Subdiv2D(rect) # Creates an instance of Subdiv2D
subdiv.insert(landmarks_points) # Insert points into subdiv
triangles = subdiv.getTriangleList()
triangles = np.array(triangles, dtype=np.int32)

indexes_triangles = []
face_cp = face.copy()

def get_index(arr):
    index = 0
    if arr[0]:
        index = arr[0][0]
    return index

for triangle in triangles :

    # Gets the vertex of the triangle
    pt1 = (triangle[0], triangle[1])
    pt2 = (triangle[2], triangle[3])
    pt3 = (triangle[4], triangle[5])
    
    # Draws a line for each side of the triangle
    cv2.line(face_cp, pt1, pt2, (255, 255, 255), 3,  0)
    cv2.line(face_cp, pt2, pt3, (255, 255, 255), 3,  0)
    cv2.line(face_cp, pt3, pt1, (255, 255, 255), 3,  0)

    index_pt1 = np.where((points == pt1).all(axis=1))
    index_pt1 = get_index(index_pt1)
    index_pt2 = np.where((points == pt2).all(axis=1))
    index_pt2 = get_index(index_pt2)
    index_pt3 = np.where((points == pt3).all(axis=1))
    index_pt3 = get_index(index_pt3)

    # Saves coordinates if the triangle exists and has 3 vertices
    if index_pt1 is not None and index_pt2 is not None and index_pt3 is not None:
        vertices = [index_pt1, index_pt2, index_pt3]
        indexes_triangles.append(vertices)

# Draw delaunay triangles
plt.imshow(cv2.cvtColor(face_cp, cv2.COLOR_BGR2RGB))  

Here, we create triangles from each landmark to the detected features.

# Getting landmarks for the face that will have the first one swapped into
rect2 = detector(body_gray)[0]

# This creates a with 68 pairs of integer values — these values are the (x, y)-coordinates of the facial structures 
landmarks_2 = predictor(body_gray, rect2)
landmarks_points2 = []

# Uses the function declared previously to get a list of the landmark coordinates
get_landmarks(landmarks_2, landmarks_points2)

# Generates a convex hull for the second person
points2 = np.array(landmarks_points2, np.int32)
convexhull2 = cv2.convexHull(points2)

body_cp = body.copy()
plt.imshow(cv2.cvtColor((cv2.polylines(body_cp, [convexhull2], True, (255,255,255), 3)), cv2.COLOR_BGR2RGB))

Now, we perform the same method for facial recognition on the second image as we did in the first.

lines_space_new_face = np.zeros((height, width, channels), np.uint8)
body_new_face = np.zeros((height, width, channels), np.uint8)

height, width = face_gray.shape
lines_space_mask = np.zeros((height, width), np.uint8)


for triangle in indexes_triangles:

    # Coordinates of the first person's delaunay triangles
    pt1 = landmarks_points[triangle[0]]
    pt2 = landmarks_points[triangle[1]]
    pt3 = landmarks_points[triangle[2]]

    # Gets the delaunay triangles
    (x, y, widht, height) = cv2.boundingRect(np.array([pt1, pt2, pt3], np.int32))
    cropped_triangle = face[y: y+height, x: x+widht]
    cropped_mask = np.zeros((height, widht), np.uint8)

    # Fills triangle to generate the mask
    points = np.array([[pt1[0]-x, pt1[1]-y], [pt2[0]-x, pt2[1]-y], [pt3[0]-x, pt3[1]-y]], np.int32)
    cv2.fillConvexPoly(cropped_mask, points, 255)

    # Draws lines for the triangles
    cv2.line(lines_space_mask, pt1, pt2, 255)
    cv2.line(lines_space_mask, pt2, pt3, 255)
    cv2.line(lines_space_mask, pt1, pt3, 255)

    lines_space = cv2.bitwise_and(face, face, mask=lines_space_mask)

    # Calculates the delaunay triangles of the second person's face

    # Coordinates of the first person's delaunay triangles
    pt1 = landmarks_points2[triangle[0]]
    pt2 = landmarks_points2[triangle[1]]
    pt3 = landmarks_points2[triangle[2]]

    # Gets the delaunay triangles
    (x, y, widht, height) = cv2.boundingRect(np.array([pt1, pt2, pt3], np.int32))
    cropped_mask2 = np.zeros((height,widht), np.uint8)

    # Fills triangle to generate the mask
    points2 = np.array([[pt1[0]-x, pt1[1]-y], [pt2[0]-x, pt2[1]-y], [pt3[0]-x, pt3[1]-y]], np.int32)
    cv2.fillConvexPoly(cropped_mask2, points2, 255)

    # Deforms the triangles to fit the subject's face : https://docs.opencv.org/3.4/d4/d61/tutorial_warp_affine.html
    points =  np.float32(points)
    points2 = np.float32(points2)
    M = cv2.getAffineTransform(points, points2)  # Warps the content of the first triangle to fit in the second one
    dist_triangle = cv2.warpAffine(cropped_triangle, M, (widht, height))
    dist_triangle = cv2.bitwise_and(dist_triangle, dist_triangle, mask=cropped_mask2)

    # Joins all the distorted triangles to make the face mask to fit in the second person's features
    body_new_face_rect_area = body_new_face[y: y+height, x: x+widht]
    body_new_face_rect_area_gray = cv2.cvtColor(body_new_face_rect_area, cv2.COLOR_BGR2GRAY)

    # Creates a mask
    masked_triangle = cv2.threshold(body_new_face_rect_area_gray, 1, 255, cv2.THRESH_BINARY_INV)
    dist_triangle = cv2.bitwise_and(dist_triangle, dist_triangle, mask=masked_triangle[1])

    # Adds the piece to the face mask
    body_new_face_rect_area = cv2.add(body_new_face_rect_area, dist_triangle)
    body_new_face[y: y+height, x: x+widht] = body_new_face_rect_area
  
plt.imshow(cv2.cvtColor(body_new_face, cv2.COLOR_BGR2RGB))

Here, we convert the face from the first image into the orientation and size of the face to fit into the face from the second image.

body_face_mask = np.zeros_like(body_gray)
body_head_mask = cv2.fillConvexPoly(body_face_mask, convexhull2, 255)
body_face_mask = cv2.bitwise_not(body_head_mask)

body_maskless = cv2.bitwise_and(body, body, mask=body_face_mask)
result = cv2.add(body_maskless, body_new_face)

plt.imshow(cv2.cvtColor(result, cv2.COLOR_BGR2RGB))

Now, we replace the face from image 2 to the converted face from image 1.

# Gets the center of the face for the body
(x, y, widht, height) = cv2.boundingRect(convexhull2)
center_face2 = (int((x+x+widht)/2), int((y+y+height)/2))

seamlessclone = cv2.seamlessClone(result, body, body_head_mask, center_face2, cv2.NORMAL_CLONE)

plt.imshow(cv2.cvtColor(seamlessclone, cv2.COLOR_BGR2RGB))

cv2.imwrite("./result.png", seamlessclone)

Finally, we merge the boundary of the face with the new body using the seamlessClone module of OpenCV.


Conclusion

Computer vision is an extensive domain with a vast number of applications. Face swapping is one of them.

Modern computer vision based on neural networks and deep learning is the most efficient way of developing a face-swapping model. Dlib and OpenCV can be used to create a face swapping model.

Dlib is an extensive library based on C++ that can be used to develop machine learning models. It provides a pre-trained model for landmark detection for facial recognition. This proves the potential of deep learning-based computer vision solutions.

If you liked this article and want to learn more about applications of computer vision, check out other studies by us: