MINR: Implicit Neural Representations with Masked Image Modelling

Abstract

Self-supervised learning methods like masked autoencoders (MAE) have shown significant promise in learning robust feature representations, particularly in image reconstruction-based pretraining task. However, their performance is often strongly dependent on the masking strategies used during training and can degrade when applied to out-of-distribution data.

To address these limitations, we introduce the Masked Implicit Neural Representations (MINR) framework that synergizes implicit neural representations with masked image modeling. MINR learns a continuous function to represent images, enabling more robust and generalizable reconstructions irrespective of masking strategies. Our experiments demonstrate that MINR not only outperforms MAE in in-domain scenarios but also in out-of-distribution settings, while reducing model complexity. The versatility of MINR extends to various self-supervised learning applications, confirming its utility as a robust and efficient alternative to existing frameworks.

Quantitative Results of Mask Reconstruction

Comparison of PSNR performances in In-domain(top) and Out-of-domain(bottom) mask reconstruction. Dataset acronyms represent the CelebA, Imagenette, and MIT-Indoor67, respectively. The arrow(→) in bottom table indicates the source to target domain transfer.