CVPR 2019

Revealing Scenes by Inverting Structure from Motion Reconstructions

Francesco Pittaluga

Univ. of Florida

Sanjeev Koppal

Univ. of Florida

Sing Bing Kang

Microsoft Research

Sudipta Sinha

Microsoft Research

SYNTHESIZING IMAGERY FROM A SFM POINT CLOUD -- From left to right: (a) Top view of a SfM reconstruction of an indoor scene, (b) 3D points projected into a viewpoint associated with a source image, (c) the image reconstructed using our technique, and (d) the source image. The reconstructed image is very detailed and closely resembles the source image.

Abstract

Many 3D vision systems localize cameras within a scene using 3D point clouds. Such point clouds are often obtained using structure from motion (SfM), after which the images are discarded to preserve privacy. In this paper, we show, for the first time, that such point clouds retain enough information to reveal scene appearance and compromise privacy. We present a privacy attack that reconstructs color images of the scene from the point cloud. Our method is based on a cascaded U-Net that takes as input, a 2D multichannel image of the points rendered from a specific viewpoint containing point depth and optionally color and SIFT descriptors and outputs a color image of the scene from that viewpoint. Unlike previous feature inversion methods, we deal with highly sparse and irregular 2D point distributions and inputs where many point attributes are missing, namely keypoint orientation and scale, the descriptor image source and the 3D point visibility. We evaluate our attack algorithm on the MegaDepth and NYU datasets and analyze the significance of the point cloud attributes. Finally, we show that novel views can also be generated thereby enabling compelling virtual tours of the underlying scene.

Cite

@inproceedings{pittaluga2019revealing,
  title={Revealing scenes by inverting structure from motion reconstructions},
  author={Pittaluga, Francesco and Koppal, Sanjeev J and Bing Kang, Sing and Sinha, Sudipta N},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={145--154},
  year={2019}
}

Results

Effect of Input Attributes

Four example results from models trained with different sets of input attributes. For each example, the left image depicts the original image and the right image shows the result of CoarseNet + RefineNet. Hover over the buttons below to compare the results.

Original Image

Reconstruction

Original Image

Reconstruction

Effect of Input Sparsity

Four example results from the same model, but for inputs with different degrees of sparsity (% of SfM points kept). For each example, the left image shows the input sparsity and the right image shows the result of CoarseNet + RefineNet. CoarseNet and RefineNet were trained on depth, color and SIFT. Hover over the three buttons below to compare the results.

Input Points

Reconstruction

Input Points

Reconstruction

Effect of RefineNet

Four examples comparing results from CoarseNet and CoarseNet + RefineNet. For each example, two images are shown. The left image is the result of a model trained on depth and SIFT as input attributes. The right image is the result of a model trained on depth, color and SIFT as input attributes. Hover over the two buttons below to compare the results.

Depth + SIFT

Depth + SIFT + Color

Depth + SIFT

Depth + SIFT + Color