MangaGAN: generation of high quality manga images from photo

May 10, 2020

MangaGAN: generation of high-quality manga images from photo

MangaGAN promoThe authors suggested an interesting approach for solving of manga faces generation from photo problem.

Key achievements proposed in the article:

  • authors collected MangaGAN-BL dataset that contains: 109 noses, 179 mouthes, and 106 manga faces with landmarks. They used frames from Bleach manga
  • GAN-based framework for unpaired photo-to-manga translation.

Authors claim that current state-of-the-art approaches can’t produce good manga faces, because of different reasons. One of the reasons is that manga artists use different styles for drawing different facial features. And it’s difficult for the neural network to learn these patterns.

That is why the authors suggested the following architecture:

MangaGAN architectureConventionally, the framework consists of 3 parts:

  • The top branch performs detection of facial features and transfers the style of each facial feature independently via special pre-trained GAN. In total, they trained 4 GAN’s (for mouth, eye, hair, and nose)
  • The bottom branch performs facial landmark transferring.
  • Image synthesis module

For training eye transferring GAN authors used:

    • adversarial loss with the adoption of the stable least-squares losses:

adversarial loss formula

cycle consistency loss formula

    • structural smoothing loss to encourage networks to produce manga with smooth stroke-lines

structural smoothing loss formulaExample of trained model output:

eye regions samples generated by trained modelFor facial landmark, transferring authors trained CycleGAN adversarial and cycle losses

Synthesizing of result image is performed via Piecewise Cubic Hermite Interpolating Polynomial method (PCHIP).

Results:

comparison of suggested model with other cross-domain translation methodsIn this picture comparison of the suggested model with state of the art models is presented. The last column corresponds to the developed model and seems to be it generates better images indeed.