GlobalSplat

GlobalSplat is a feed-forward 3D Gaussian Splatting method that learns a compact set of global scene tokens instead of allocating primitives per pixel. By aligning first and decoding later, it produces globally consistent reconstructions with as few as 2K-32K Gaussians, a tiny disk footprint, and fast single-pass inference, while matching or surpassing the quality of dense baselines.

Baseline Per-Pixel Approaches

Baseline feed-forward 3DGS pipelines with dense per-pixel primitive allocation.

GlobalSplat (Ours)

GlobalSplat: align-first, decode-later feed-forward 3D Gaussian Splatting.

Latent Token Aggregation for Sparse 3DGS

Existing feed-forward 3DGS pipelines rely on view-centric, per-pixel primitive allocation, baking redundancy into the representation. In contrast, GlobalSplat aggregates multi-view inputs into a fixed set of global latent scene tokens before decoding geometry.

Gaussian centers visualized as a point cloud, showing sparse allocation.

Sparse 3D Support

Primitives are positioned only where 3D structure is occupied, yielding a sparse representation without committing to a dense per-pixel grid.

Gaussians visualized as disks according to their spatial scale.

Adaptive Allocation

By decoding from a global scene context without a fixed grid, GlobalSplat places primitives only at occupied 3D locations. It remains sparse while covering low-frequency regions with larger Gaussians.

QUALITATIVE RESULTS

RE10K & ACID COMPARISON

Qualitative comparison on RealEstate10K and ACID against baseline methods. GlobalSplat successfully preserves high-frequency details and multi-view consistency despite a strictly constrained primitive budget.

12-VIEWS QUALITATIVE VIDEO COMPARISON

Move your cursor over each panel to compare baselines, our method, and ground truth.

COMPACTNESS ABLATION (RE10K)

We compare Our 2K, 16K, and 32K Gaussian variants to visualize the quality-compactness trade-off. As the Gaussian budget increases, details improve while remaining compact.

ULTRA-COMPACT COMPARISON

We directly compare C3G and Our 2K variant to visualize performance under a very small representation budget. This comparison shows that even in the ultra-compact regime, our method remains superior.

3D GEOMETRY VISUALIZATION

Visualization of our Gaussian mean prediction of our 32K variant, highlighting compactness, dynamic allocation and geometry coherence.

QUANTITATIVE RESULTS

Radar chart comparing quality, compactness, and efficiency across methods.

REALESTATE10K METRICS

ACID GENERALIZATION METRICS

COMPACTNESS TRADE-OFF

ABLATION STUDY

Abstract

The efficient spatial allocation of primitives serves as the foundation of 3D Gaussian Splatting, as it directly dictates the synergy between representation compactness, reconstruction speed, and rendering fidelity. Previous solutions, whether based on iterative optimization or feed-forward inference, suffer from significant trade-offs between these goals, mainly due to the reliance on local, heuristic-driven allocation strategies that lack global scene awareness. Specifically, current feed-forward methods are largely pixel-aligned or primitive-aligned. By unprojecting pixels into dense, view-aligned primitives, they bake redundancy into the 3D asset. As more input views are added, the representation size increases and global consistency becomes fragile.

To this end, we introduce GlobalSplat, a framework built on the principle of align first, decode later. Our approach learns a compact, global, latent scene representation that encodes multi-view input and resolves cross-view correspondences before decoding any explicit 3D geometry. Crucially, this formulation enables compact, globally consistent reconstructions without relying on pretrained pixel-prediction backbones or reusing latent features from dense baselines. Utilizing a coarse-to-fine training curriculum that gradually increases decoded capacity, GlobalSplat natively prevents representation bloat. On RealEstate10K and ACID, our model achieves competitive novel-view synthesis performance while utilizing as few as 2K-32K Gaussians, significantly less than required by dense pipelines, obtaining a light 4MB footprint. Further, GlobalSplat enables significantly faster inference than the baselines, operating under 78 milliseconds in a single forward pass.

Architecture

Method: Global Latent Alignment

The GlobalSplat Solution: Align First, Decode Later

Scene-Centric Alignment

Fuses multi-view inputs into a fixed number of global latent scene tokens before decoding geometry, maintaining a strict Gaussian budget.

Dual-Branch Encoder

Iteratively processes latent tokens through parallel geometry and appearance streams to prevent texture from masking structural errors.

Coarse-to-Fine Curriculum

Progressively increases Gaussian capacity per latent slot during training to stabilize optimization and prevent representation bloat.

BibTeX

@article{globalsplat,
  title   = {GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens},
  author  = {Itkin, Roni and Issachar, Noam and Keypur, Yehonatan and Chen, Xingyu and Chen, Anpei and Benaim, Sagie},
  journal = {arXiv preprint arXiv:2604.15284},
  year    = {2026}
}

Method	12 Views				24 Views				36 Views
Method	PSNR ↑	SSIM ↑	LPIPS ↓	#G (K) ↓	PSNR ↑	SSIM ↑	LPIPS ↓	#G (K) ↓	PSNR ↑	SSIM ↑	LPIPS ↓	#G (K) ↓
LVSM (non-GS)	28.65	0.898	0.095	–	27.24	0.874	0.112	–	26.38	0.855	0.126	–
NoPoSplat	21.26	0.667	0.200	602	21.24	0.664	0.200	1204	21.19	0.663	0.200	1806
AnySplat	23.06	0.807	0.215	1500	24.11	0.838	0.198	2636	24.20	0.842	0.192	3309
EcoSplat	–	–	–	–	24.72	0.822	0.183	78	–	–	–	–
DepthSplat	21.35	0.809	0.190	786	19.66	0.743	0.239	1572	18.84	0.704	0.268	2359
GGN	18.24	0.661	0.327	138	17.20	0.634	0.343	512	–	–	–	–
Zpressor	28.46	0.910	0.098	393	28.51	0.911	0.097	393	28.50	0.911	0.097	393
C3G	23.61	0.740	0.203	2	23.80	0.747	0.198	2	23.81	0.747	0.199	2
GlobalSplat (Ours)	28.57	0.885	0.138	16	28.53	0.883	0.140	16	28.45	0.880	0.144	16

Method	12 Views				24 Views				36 Views
Method	PSNR ↑	SSIM ↑	LPIPS ↓	#G (K) ↓	PSNR ↑	SSIM ↑	LPIPS ↓	#G (K) ↓	PSNR ↑	SSIM ↑	LPIPS ↓	#G (K) ↓
LVSM (non-GS)	28.65	0.898	0.095	–	27.24	0.874	0.112	–	26.38	0.855	0.126	–
NoPoSplat	21.26	0.667	0.200	602	21.24	0.664	0.200	1204	21.19	0.663	0.200	1806
AnySplat	23.06	0.807	0.215	1500	24.11	0.838	0.198	2636	24.20	0.842	0.192	3309
EcoSplat	–	–	–	–	24.72	0.822	0.183	78	–	–	–	–
DepthSplat	21.35	0.809	0.190	786	19.66	0.743	0.239	1572	18.84	0.704	0.268	2359
GGN	20.11	0.71	0.27	278	18.50	0.68	0.299	385	17.76	0.664	0.311	466
Zpressor6	28.46	0.910	0.098	393	28.51	0.911	0.097	393	28.50	0.911	0.097	393
Zpressor3	23.63	0.846	0.157	197	23.65	0.846	0.157	197	23.65	0.846	0.157	197
C3G	23.61	0.740	0.203	2	23.80	0.747	0.198	2	23.81	0.747	0.199	2
GlobalSplat2K (Ours)	26.83	0.838	0.198	2	26.84	0.838	0.198	2	26.84	0.838	0.200	2
GlobalSplat16K (Ours)	28.57	0.885	0.138	16	28.53	0.883	0.140	16	28.45	0.880	0.144	16
GlobalSplat32K (Ours)	29.54	0.903	0.121	32	29.48	0.901	0.122	32	29.39	0.899	0.126	32

Method	12 Views				24 Views				36 Views
Method	PSNR ↑	SSIM ↑	LPIPS ↓	#G (K) ↓	PSNR ↑	SSIM ↑	LPIPS ↓	#G (K) ↓	PSNR ↑	SSIM ↑	LPIPS ↓	#G (K) ↓
LVSM (non-GS)	29.23	0.849	0.142	–	28.29	0.826	0.161	–	27.61	0.807	0.178	–
DepthSplat	21.45	0.769	0.220	786	20.15	0.711	0.258	1572	19.60	0.681	0.279	2359
GGN	21.99	0.686	0.295	287	20.90	0.657	0.314	396	20.43	0.644	0.323	475
Zpressor	28.44	0.859	0.140	393	28.53	0.860	0.138	393	28.45	0.859	0.139	393
C3G	22.24	0.598	0.332	2	22.24	0.598	0.331	2	22.20	0.598	0.333	2
GlobalSplat (Ours 16K)	28.04	0.815	0.207	16	28.03	0.813	0.208	16	27.99	0.810	0.213	16

Variant	PSNR ↑	SSIM ↑	LPIPS ↓
Ours (full)	28.57	0.885	0.139
Plücker only	28.30	0.880	0.140
w/o consistency loss	28.15	0.876	0.143
Single-stream	28.02	0.873	0.151
Direct full-capacity prediction	27.69	0.867	0.150

Total #G	#Latents	Splats/Token	PSNR ↑	SSIM ↑	LPIPS ↓
2,048	256	8	25.25	0.785	0.250
2,048	2,048	1	26.83	0.838	0.198

16,384	2,048	8	28.57	0.885	0.138

32,768	2,048	16	28.58	0.884	0.135
32,768	4,096	8	29.54	0.903	0.121

Metric	LVSM	DepthSplat	Zpressor	C3G	GGN	GlobalSplat (Ours)
Peak Mem (GB)	4.60	29.84	3.70	6.04	25.08	1.79
Inf. Time (ms)	940.00	669.50	194.20	387.14	1800.64	77.88
Size on Disk (MB)	–	534	134	0.1	174	3.8