This is a pix2pix demo that learns from facial landmarks and translates this into a face. A webcam-enabled application is also provided that translates your face to the trained face in real-time.
This project comes in three repositories. This repository, for general purpose scripts and documentation, and a forked version of the face2face and pix2pix-tensorflow repositories added as submodules. The presentation slides for this project are provided as Google Slides.
- Clone this repository recursively to include the two forked repositories mentioned above.
git clone https://github.com/alina1021/facial_expression_transfer.git --recursive
cd facial_expression_transfer
- Create the conda environment from file (Mac OSX)
conda env create -f environment.yml
# activate this environment
conda activate facial_expression_transfer
cd face2face-demo
python generate_train_data.py --file ../iohannis_christmas_speech.mp4 --num 400 --landmark-model ../shape_predictor_68_face_landmarks.dat
Input:
file
is the name of the video file from which you want to create the data set.num
is the number of train data to be created.landmark-model
is the facial landmark model that is used to detect the landmarks. A pre-trained facial landmark model is provided here.
Output:
- Two folders
original
andlandmarks
will be created.
If you want to download this dataset, here is also the video file that I used and the generated training dataset (400 images already split into training and validation).
cd ..
# Move the original and landmarks folder into the pix2pix-tensorflow folder
mkdir pix2pix-tensorflow/photos
mv face2face-demo/original pix2pix-tensorflow/photos/original
mv face2face-demo/landmarks pix2pix-tensorflow/photos/landmarks
rm -rf face2face-demo/landmarks
# Go into the pix2pix-tensorflow folder
cd pix2pix-tensorflow/
# Resize original images
python tools/process.py \
--input_dir photos/original \
--operation resize \
--output_dir photos/original_resized
# Resize landmark images
python tools/process.py \
--input_dir photos/landmarks \
--operation resize \
--output_dir photos/landmarks_resized
# Combine both resized original and landmark images
python tools/process.py \
--input_dir photos/landmarks_resized \
--b_dir photos/original_resized \
--operation combine \
--output_dir photos/combined
# Split into train/val set
python tools/split.py \
--dir photos/combined
# Train the model on the data
python pix2pix.py \
--mode train \
--output_dir face2face-model \
--max_epochs 200 \
--input_dir photos/combined/train \
--which_direction AtoB
Training the model was done on AWS could service using EC2 p2.xlarge instances. The p2.xlarge instance has 1 NVIDIA K80 GPU with 2,496 parallel processing cores. The training takes up to 8 hours depending the actual settings like number of frames, epochs, etc... For example, training the model using 400 frames (320 for training and 80 for validation) and 200 epochs, takes about 5 hours on 1 NVIDIA K80 GPU. The Training on CPU was excluded right away since it takes 3-5 days with the above settings, depending on the CPU type.
See the Pix2Pix model graph in TensorFlow below:
Here are now some results for the discriminator (left) and generator (right) loss functions as a function of step number at epoch 200 when using 400 frames. Note the learning process for both the discriminator and generator was quite noisy and it gets better after more steps.
Testing is done with --mode test
. You should specify the checkpoint to use with --checkpoint
, this should point to the output_dir that you created previously with --mode
train:
# test the model
python pix2pix.py \
--mode test \
--output_dir face2face_test \
--input_dir photos/combined/val \
--checkpoint face2face-model
The testing mode will load some of the configuration options from the checkpoint provided so you do not need to specify which_direction for instance.
The test run will output an HTML file at face2face_test/index.html that shows input/output/target image sets like the following:
For more information around training and testing, have a look at Christopher Hesse's pix2pix-tensorflow implementation.
-
First, we need to reduce the trained model so that we can use an image tensor as input:
python ../face2face-demo/reduce_model.py --model-input face2face-model --model-output face2face-reduced-model
Input:
model-input
is the model folder to be imported.model-output
is the model (reduced) folder to be exported.
Output:
- It returns a reduced model with less weights file size than the original model.
-
Second, we freeze the reduced model to a single file.
python ../face2face-demo/freeze_model.py --model-folder face2face-reduced-model
Input:
model-folder
is the model folder of the reduced model.
Output:
- It returns a frozen model file
frozen_model.pb
in the model folder.
cd ..
# run recorded video
python scripts/run_video.py --source my_video.mov --show 1 --landmark-model ../shape_predictor_68_face_landmarks.dat --tf-model pix2pix-tensorflow/face2face-reduced-model/frozen_model.pb
# run webcam
python scripts/run_webcam.py --source 0 --show 1 --landmark-model ../shape_predictor_68_face_landmarks.dat --tf-model pix2pix-tensorflow/face2face-reduced-model/frozen_model.pb
Input:
source
is your video (my_video.mov) or the device index of the camera (default=0).- I provide my recorded video, my_video.mov file, here.
show
is an option to either display the normal input (0) or the normal input and the facial landmark (1) alongside the generated image (default=0).landmark-model
is the facial landmark model that is used to detect the landmarks.tf-model
is the frozen model file.
Me, my face and output:
Pre-trained frozen model here. The video file used to generate the training and validation datasets for this example is here. This model is trained on 400 images with epoch 200.
Me, my face and output:
Frozen model can be downloaded from here. This model is trained on 400 images with epoch 200.
Increasing the number of epochs will help to reduce the pixelation and blurriness. Also, training with more data will improve the results and have the objects in the background not move during facial expression transfer.
Thanks to Dat Tran for inspiration, code and model!
For training and testing, thanks to Christopher Hesse for Image-to-Image Translation in Tensorflow code and examples.
Thanks also to Phillip Isola1, Jun-Yan Zhu1, Tinghui Zhou1, and Alexei A Efros1 for their fantastic work on Image-to-Image Translation Using Conditional Adversarial Networks.
1 Berkeley AI Research (BAIR) Laboratory, University of California, Berkeley
This project is licensed under the MIT License - see the LICENSE file for details and the license of the other projects used within this repository.