Real-Time View Synthesis with Multiplane Image Network using Multimodal Supervision

27th International Workshop on Multimedia Signal Processing (MMSP 2025), Beijing, China

Manu Gond^1,2, Mohammadreza Shamshirgarha¹, Emin Zerman¹, Sebastian Knorr³, Mårten Sjöström¹,

¹Mid Sweden University ²Technical University Berlin ³HTW Berlin - University of Applied Sciences

Paper (OpenAccess Copy) Code Huggingface Demo

RT-MPINet Generates MPIs from Single Image

Abstract

We present a real-time multiplane image (MPI) network. Unlike existing MPI based approaches that often rely on a separate depth estimation network to guide the network for estimating MPI parameters, our method directly predicts these parameters from a single RGB image. To guide the network we present a multimodal training strategy utilizing joint supervision from view synthesis and depth estimation losses. More details can be found in the paper.

In-the-wild Examples

Click to open interactive Viewer

Movement mode:

Results

COCO Dataset: Different View Synthesis Methods Against Ours

The visual comparison against SinMPI, TMPI, and AdaMPI.

Ours TMPI

Ours AdaMPI

Ours SinMPI

Ours TMPI

Ours AdaMPI

Ours SinMPI

FPS Rate on RTX 2070 Super

We compare the FPS rate on different resolutions against other methods when rendering end-to-end.

Note: When rendering from predicted MPIs, the rendering speed will be same for all methods.

BibTeX

@inproceedings{gond2025rtmpi,
  title={Real-Time View Synthesis with Multiplane Image Network using Multimodal Supervision},
  author={Gond, Manu and Shamshirgarha, Mohammadreza and Zerman, Emin and Knorr, Sebastian and Sj{\"o}str{\"o}m, M{\aa}rten},
  booktitle={2025 IEEE 27th International Workshop on Multimedia Signal Processing (MMSP)},
  pages={},
  year={2025},
  organization={IEEE}
}