GIRAFFE

Representing Scenes as Compositional Generative Neural Feature Fields

Michael Niemeyer Andreas Geiger

Max Planck Institute for Intelligent Systems and University of Tübingen

CVPR 2021 (oral, best paper award)

[Paper] [Supplementary] [Code] [Blog] [Video] [Interactive Slides] [Talk]

Abstract

Deep generative models allow for photorealistic image synthesis at high resolutions. But for many applications, this is not enough: content creation also needs to be controllable. While several recent works investigate how to disentangle underlying factors of variation in the data, most of them operate in 2D and hence ignore that our world is three-dimensional. Further, only few works consider the compositional nature of scenes. Our key hypothesis is that incorporating a compositional 3D scene representation into the generative model leads to more controllable image synthesis. Representing scenes as compositional generative neural feature fields allows us to disentangle one or multiple objects from the background as well as individual objects' shapes and appearances while learning from unstructured and unposed image collections without any additional supervision. Combining this scene representation with a neural rendering pipeline yields a fast and realistic image synthesis model. As evidenced by our experiments, our model is able to disentangle individual objects and allows for translating and rotating them in the scene as well as changing the camera pose.

TL;DR: We incorporate a compositional 3D scene representation into the generative model which leads to more controllable image synthesis.

Video

Results

Comparison Against a 2D-based GAN

Note how translating one object affects the other for a 2D-based GAN. In contrast, we incorporate compositional 3D scene structure into the generative model, leading to more consistent results.

Single-Object Translation for 2D-based GAN

Single-Object Translation for Our Method

We can perform more complex operations like circular translations or adding objects at test time.

Circular Translations

Add Objects (Trained on Two-Object Scenes)

Controllable Scene Generation

We show more examples where we control the scene during image synthesis.

Rotate Object

Horizontal Translation

Vertical Translation

Change Object Appearance

Change Background Appearance

Out-of-Distribution Generalization

As our model disentangles individual objects, we are able to generate out of distribution samples. For example, we can increase the horizontal translation range.

Training Distribution

Out-Of-Distribution

We can increase the depth translation range.

Training Distribution

Out-Of-Distribution

We can add more objects at test time.

Out-Of Distribution (Trained On Two-Object Scenes)

Out-Of Distribution (Trained On One-Object Scenes)

Citation

If you want to cite our work, please use:

        @inproceedings{Niemeyer2020GIRAFFE,
          author    = {Michael Niemeyer and Andreas Geiger},  
          title     = {GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields},
          booktitle   = {Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
          year      = {2021},
        }