Spatially-Adaptive Pixelwise Networks for Fast Image Translation
CVPR 2021
Tamar Rott Shaham
Michaël Gharbi
Richard Zhang
Eli Shechtman
Tomer Michaeli
[Paper]
[GitHub]
[Video]

Our novel model, designed with A Spatially Adaptive Pixelwise Network (ASAPNet) enables generating high-resolution images at significantly lower runtimes than existing methods, while maintaining high visual quality. Particularly, as seen in the plot our model is 2-18x faster than baselines, depending on resolution.

Abstract

We introduce a new generator architecture, aimed at fast and efficient high-resolution image-to-image translation. We design the generator to be an extremely lightweight function of the full-resolution image. In fact, we use pixel-wise networks; that is, each pixel is processed independently of others, through a composition of simple affine transformations and nonlinearities. We take three important steps to equip such a seemingly simple function with adequate expressivity. First, the parameters of the pixel-wise networks are spatially varying, so they can represent a broader function class than simple 1x1 convolutions. Second, these parameters are predicted by a fast convolutional network that processes an aggressively low-resolution representation of the input. Third, we augment the input image by concatenating a sinusoidal encoding of spatial coordinates, which provides an effective inductive bias for generating realistic novel high-frequency image content. As a result, our model is up to 18x faster than state-of-the-art baselines. We achieve this speedup while generating comparable visual quality across different image resolutions and translation domains.


5-Minutes Video



Implementation

 [GitHub]

Our model first processes the input at very low-resolution xl, to produce a tensor of weights and biases φp. These are upsampled back to full-resolution, where they parameterize pixelwise, spatially-varying MLPs fp that compute the final output y from the high-resolution input x.

Paper

T. Rott Shaham, M. Gharbi, R. Zhang,
E. Shechtman, T. Michaeli
Spatially-Adaptive Pixelwise Networks for Fast Image Translation
CVPR 2021
[ArXiv] [CVF] [Supplementals] [Bibtex]


References

Xihui Liu, Guojun Yin, Jing Shao, Xiaogang Wang and Hongsheng Li, Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis, NeurIPS 2019

Taesung Park, Ming-Yu Liu, Ting-Chun Wang and Jun-Yan Zhu, Semantic Image Synthesis with Spatially-Adaptive Normalization, CVPR 2019

Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro, High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs, CVPR 2018


Xiaojuan Qi, Qifeng Chen, Jiaya Jia, and Vladlen Koltun, Semi-parametric Image Synthesis, CVPR 2018

Qifeng Chen and Vladlen Koltun, Photographic Image Synthesis with Cascaded Refinement Networks, ICCV 2017


This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.