GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis

Srikumar Sastry, Subash Khanal, Aayush Dhakal, Nathan Jacobs

CVPRW 2024

[paper] [code]

Abstract

We present GeoSynth, a model for synthesizing satellite images with global style and image-driven layout control. The global style control is via textual prompts or geographic location. These enable the specification of scene semantics or regional appearance respectively, and can be used together. We train our model on a large dataset of paired satellite imagery, with automatically generated captions, and OpenStreetMap data. We evaluate various combinations of control inputs, including different types of layout controls. Results demonstrate that our model can generate diverse, high-quality images and exhibits excellent zero-shot generalization.

Method

Our goal is to train a suite of models that are capable of synthesizing satellite images (x) given a text prompt (τ ), geographic location (l), and a control image (c). This is done by training diffusion models to learn the conditional distribution p(x|τ, l, c). To this end, we use Latent Diffusion Models (LDM), which have shown state-of-the-art performance in conditional image synthesis. We use ControlNet to incorporate a layout image and fine-tune the pre-trained LDM. We use SatCLIP location embeddings to condition the LDM with geographic location. Finally, we use LLaVA-7b to generate text prompts, which controls style of synthesis.

Geo-Aware Synthesis

Different Layouts