Spatial intelligence, encompassing 3D reconstruction, perception, and reasoning, is fundamental to applications such as robotics, aerial imaging, and extended reality. A key enabler is the real-time, accurate estimation of core 3D attributes (camera parameters, point clouds, depth maps, and 3D point tracks) from unstructured or streaming imagery. Inspired by the success of large foundation models in language and 2D vision, a new class of end-to-end 3D geometric foundation models (GFMs) has emerged, directly predicting dense 3D representations in a single feed-forward pass, eliminating the need for slow or unavailable precomputed camera parameters. Since late 2023, the field has exploded with diverse variants. With the rapid proliferation of 3D GFMs, we ask:
Method | DTU | 7-Scenes | NRGBD | ScanNet | TUM-RGBD | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ACC ↓ | Comp ↓ | NC ↑ | ACC ↓ | Comp ↓ | NC ↑ | ACC ↓ | Comp ↓ | NC ↑ | ACC ↓ | Comp ↓ | NC ↑ | ACC ↓ | Comp ↓ | NC ↑ | |
DUS3R/LSM | 1.731 | 1.936 | 0.786 | 0.146 | 0.181 | 0.744 | 0.144 | 0.154 | 0.867 | 0.474 | 0.420 | 0.714 | 1.108 | 0.746 | 0.724 |
MASt3R | 1.895 | 2.003 | 0.788 | 0.262 | 0.254 | 0.732 | 0.113 | 0.102 | 0.810 | 0.467 | 0.389 | 0.701 | 0.738 | 0.747 | 0.739 |
Spann3R | 6.275 | 5.460 | 0.705 | 0.255 | 0.188 | 0.653 | 0.262 | 0.262 | 0.628 | 0.487 | 0.408 | 0.617 | 1.561 | 1.002 | 0.621 |
FLARE | 3.406 | 3.950 | 0.491 | 0.152 | 0.154 | 0.704 | 0.060 | 0.056 | 0.839 | 0.357 | 0.302 | 0.561 | 0.515 | 0.486 | 0.677 |
CUT3R | 6.885 | 5.022 | 0.727 | 0.118 | 0.142 | 0.717 | 0.104 | 0.078 | 0.828 | 0.260 | 0.238 | 0.692 | 0.587 | 0.553 | 0.683 |
VGGT | 2.716 | 2.301 | 0.765 | 0.077 | 0.080 | 0.762 | 0.069 | 0.071 | 0.903 | 0.063 | 0.079 | 0.798 | 0.385 | 0.331 | 0.747 |
Fast3R | 4.493 | 3.681 | 0.735 | 0.149 | 0.116 | 0.692 | 0.361 | 0.201 | 0.782 | 0.546 | 0.306 | 0.621 | 0.955 | 0.630 | 0.627 |
MonST3R | 20.145 | 10.322 | 0.603 | 0.276 | 0.277 | 0.677 | 0.471 | 0.458 | 0.659 | 0.623 | 0.541 | 0.594 | 1.688 | 1.031 | 0.670 |
DUS3R/LSM | 1.284 | 1.349 | 0.720 | 0.022 | 0.029 | 0.709 | 0.035 | 0.024 | 0.838 | 0.026 | 0.022 | 0.784 | 0.620 | 0.474 | 0.718 |
MASt3R | 1.374 | 1.409 | 0.723 | 0.025 | 0.028 | 0.697 | 0.043 | 0.042 | 0.809 | 0.035 | 0.020 | 0.757 | 0.209 | 0.211 | 0.708 |
Spann3R | 6.505 | 3.110 | 0.668 | 0.176 | 0.087 | 0.599 | 0.343 | 0.073 | 0.661 | 0.262 | 0.118 | 0.606 | 0.635 | 0.930 | 0.662 |
CUT3R | 4.710 | 2.413 | 0.699 | 0.025 | 0.028 | 0.665 | 0.076 | 0.029 | 0.782 | 0.042 | 0.030 | 0.693 | 0.740 | 0.595 | 0.665 |
VGGT | 2.103 | 1.925 | 0.748 | 0.019 | 0.032 | 0.659 | 0.015 | 0.012 | 0.874 | 0.016 | 0.017 | 0.728 | 0.065 | 0.091 | 0.692 |
Fast3R | 3.647 | 2.319 | 0.725 | 0.046 | 0.057 | 0.636 | 0.059 | 0.028 | 0.772 | 0.200 | 0.097 | 0.625 | 0.711 | 0.337 | 0.610 |
MonST3R | 14.455 | 7.508 | 0.636 | 0.100 | 0.091 | 0.648 | 0.336 | 0.246 | 0.665 | 0.346 | 0.293 | 0.599 | 1.138 | 0.948 | 0.591 |
Method | CO3Dv2 | ScanNet & ADT & TUM-Dyn. | KITTI Odometry | Bonn & Sintel & Rel10k | ACID & Syndrone | ULTRRA | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ATE ↓ | RPEtrans ↓ | RPErot ↓ | ATE ↓ | RPEtrans ↓ | RPErot ↓ | ATE ↓ | RPEtrans ↓ | RPErot ↓ | ATE ↓ | RPEtrans ↓ | RPErot ↓ | ATE ↓ | RPEtrans ↓ | RPErot ↓ | RPEtrans ↓ | RPErot ↓ | |
DUSt3R/LSM | 0.903 | 1.325 | 4.312 | 0.139 | 0.102 | 2.394 | 2.935 | 1.135 | 2.832 | 0.141 | 0.641 | 8.038 | 0.370 | 0.607 | 4.099 | 70.350 | 70.390 |
MASt3R | 0.987 | 1.407 | 3.999 | 0.131 | 0.098 | 2.889 | 1.492 | 0.399 | 0.407 | 0.127 | 0.642 | 7.714 | 0.372 | 0.607 | 3.849 | 71.519 | 78.036 |
Spann3R | 0.915 | 1.295 | 6.352 | 0.294 | 0.164 | 3.778 | 15.848 | 5.031 | 4.645 | 0.140 | 0.633 | 7.817 | 0.351 | 0.599 | 3.272 | 40.503 | 38.366 |
CUT3R | 0.847 | 1.209 | 6.361 | 0.185 | 0.133 | 4.471 | 2.421 | 0.747 | 0.669 | 0.109 | 0.633 | 7.569 | 0.303 | 0.593 | 2.864 | 55.135 | 54.395 |
VGGT | 1.639 | 8.702 | 71.350 | 0.654 | 0.425 | 30.787 | 5.012 | 3.546 | 3.885 | 0.062 | 0.111 | 0.592 | 0.300 | 0.462 | 0.818 | 53.688 | 110.521 |
Fast3R | 0.698 | 1.035 | 4.352 | 0.499 | 0.391 | 23.739 | 22.109 | 7.573 | 7.366 | 0.136 | 0.636 | 8.700 | 0.378 | 0.637 | 3.653 | 51.149 | 54.150 |
MonST3R | 2.456 | 3.327 | 23.458 | 0.448 | 0.286 | 12.817 | 2.426 | 0.782 | 0.949 | 0.118 | 0.632 | 6.666 | 0.320 | 0.568 | 2.167 | 70.388 | 77.325 |
Align3R | 1.027 | 1.550 | 6.499 | 0.425 | 0.215 | 9.430 | 4.611 | 0.817 | 0.600 | 0.134 | 0.628 | 6.810 | 0.378 | 0.550 | 2.414 | 72.010 | 70.638 |
Easi3R | 0.857 | 1.271 | 5.052 | 0.174 | 0.103 | 2.872 | 3.625 | 0.919 | 0.615 | 0.125 | 0.633 | 7.603 | 0.356 | 0.581 | 3.508 | 62.061 | 71.060 |
Geo4D | 0.798 | 1.264 | 5.692 | 0.436 | 0.175 | 10.565 | 1.662 | 0.497 | 0.696 | 0.151 | 0.457 | 2.652 | 0.391 | 0.622 | 0.964 | ✘ | ✘ |
Aether | - | - | - | 0.067 | 0.033 | 1.619 | 1.553 | 0.744 | 0.744 | - | - | - | - | - | - | ✘ | ✘ |
Method | DTU | ScanNet | KITTI | ETH3D | T&T | |||||
---|---|---|---|---|---|---|---|---|---|---|
AbsRel ↓ | δ<1.03 ↑ | AbsRel ↓ | δ<1.03 ↑ | AbsRel ↓ | δ<1.03 ↑ | AbsRel ↓ | δ<1.03 ↑ | AbsRel ↓ | δ<1.03 ↑ | |
Robust MVD | 2.490 | 80.056 | 7.468 | 35.651 | 9.419 | 30.505 | 9.302 | 42.909 | 6.379 | 58.409 |
DUSt3R/LSM | 2.741 | 75.685 | 4.732 | 61.337 | 9.113 | 39.495 | 3.132 | 74.851 | 3.106 | 77.033 |
MASt3R | 3.343 | 68.301 | 5.949 | 54.516 | 9.542 | 46.805 | 2.471 | 81.291 | 2.381 | 82.262 |
Spann3R | 6.431 | 38.339 | 7.779 | 33.713 | 10.195 | 30.858 | 5.121 | 54.708 | 5.580 | 52.812 |
CUT3R | 6.200 | 47.421 | 8.231 | 39.464 | 23.849 | 12.087 | 5.224 | 59.864 | 4.594 | 56.773 |
VGGT | 1.085 | 94.305 | 4.386 | 64.968 | 9.436 | 41.309 | 1.782 | 86.337 | 2.075 | 85.174 |
Fast3R | 3.940 | 62.120 | 6.271 | 50.283 | 13.390 | 26.734 | 4.692 | 62.663 | 4.423 | 64.873 |
MonST3R | 5.346 | 67.977 | 5.557 | 53.309 | 10.191 | 40.274 | 3.368 | 72.624 | 3.289 | 72.491 |
Robust MVD | 2.242 | 84.574 | 8.016 | 35.924 | 10.846 | 25.534 | 10.944 | 35.526 | 6.982 | 60.643 |
MASt3R | 84.904 | 0.000 | 93.584 | 0.000 | 99.069 | 0.000 | 97.021 | 0.000 | 98.234 | 0.000 |
CUT3R | 84.904 | 0.000 | 93.584 | 0.000 | 99.069 | 0.000 | 97.022 | 0.000 | 98.234 | 0.000 |
Method | Bonn | TUM Dyn | KITTI | PointOdyssey | Syndrone | Sintel | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
AbsRel ↓ | δ<1.25 ↑ | AbsRel ↓ | δ<1.25 ↑ | AbsRel ↓ | δ<1.25 ↑ | AbsRel ↓ | δ<1.25 ↑ | AbsRel ↓ | δ<1.25 ↑ | AbsRel ↓ | δ<1.25 ↑ | |
DepthAnyVideo | 0.515 | 25.3 | 0.184 | 84.6 | 0.074 | 95.3 | 0.417 | 61.7 | 0.299 | 83.1 | 0.455 | 47.9 |
VideoDepthAnything | 0.268 | 48.3 | 1.101 | 89.0 | 0.060 | 98.2 | 0.283 | 70.3 | 0.138 | 92.5 | 1.691 | 45.4 |
DepthCrafter | 0.107 | 88.3 | 0.159 | 79.5 | 0.120 | 86.2 | 0.144 | 81.3 | 0.380 | 87.5 | 0.354 | 58.2 |
Marigold | 0.329 | 52.2 | 0.600 | 32.8 | 0.332 | 43.3 | 0.346 | 47.5 | 1.331 | 16.8 | 0.417 | 45.4 |
DUSt3R/LSM | 0.174 | 83.5 | 0.187 | 79.2 | 0.124 | 84.9 | 0.168 | 77.8 | 0.063 | 96.9 | 0.475 | 59.1 |
MASt3R | 0.160 | 81.5 | 0.162 | 83.1 | 0.082 | 93.2 | 0.150 | 79.3 | 0.046 | 97.5 | 0.374 | 63.9 |
Spann3R | 0.205 | 77.4 | 0.204 | 70.6 | 0.449 | 49.1 | 0.303 | 58.4 | 0.241 | 74.5 | 0.587 | 43.3 |
CUT3R | 0.068 | 95.0 | 0.108 | 84.7 | 0.104 | 89.9 | 0.095 | 88.4 | 0.111 | 89.5 | 0.466 | 56.0 |
VGGT | 0.056 | 96.3 | 0.068 | 93.9 | 0.051 | 96.6 | 0.026 | 99.0 | 0.075 | 95.9 | 0.242 | 65.9 |
Fast3R | 0.232 | 69.4 | 0.221 | 71.1 | 0.308 | 46.8 | 0.271 | 66.2 | 0.368 | 44.8 | 0.565 | 48.7 |
MonST3R | 0.061 | 95.4 | 0.197 | 72.6 | 0.083 | 93.4 | 0.066 | 92.3 | 0.110 | 89.7 | 0.343 | 59.4 |
Align3R | 0.062 | 96.8 | 0.107 | 90.1 | 0.105 | 89.2 | 0.077 | 93.3 | 0.097 | 92.9 | 0.237 | 69.0 |
Easi3R | 0.061 | 95.8 | 0.192 | 76.9 | 0.150 | 76.2 | 0.143 | 82.1 | 0.095 | 94.0 | 0.323 | 53.9 |
Geo4D | 0.060 | 97.8 | 0.096 | 93.2 | 0.086 | 93.8 | 0.082 | 93.0 | 0.105 | 93.1 | 0.205 | 73.2 |
Aether | 0.582 | 61.2 | 0.192 | 80.6 | 0.065 | 96.2 | 0.123 | 87.9 | 0.145 | 91.1 | 0.343 | 69.4 |
GeometryCrafter | 0.061 | 96.8 | 0.115 | 87.7 | 0.410 | 53.8 | 0.124 | 83.6 | 0.123 | 90.8 | 0.280 | 72.4 |
MASt3R | 0.549 | 4.6 | 0.633 | 0.9 | 0.754 | 6.4 | 0.749 | 0.2 | 0.967 | 0 | 0.701 | 2.3 |
CUT3R | 0.097 | 90.3 | 0.135 | 80.6 | 0.118 | 87.4 | 0.127 | 88.1 | 0.824 | 0 | 1.020 | 23.6 |
Method | DTU | RealEstate10k | ScanNet++ | ACID | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
PSNR ↑ | SSIM ↑ | LPIPS ↓ | PSNR ↑ | SSIM ↑ | LPIPS ↓ | PSNR ↑ | SSIM ↑ | LPIPS ↓ | PSNR ↑ | SSIM ↑ | LPIPS ↓ | |
LSM | 11.68 | 0.3294 | 0.5218 | 14.04 | 0.4388 | 0.4873 | 12.39 | 0.4596 | 0.5479 | 16.73 | 0.4562 | 0.4567 |
NoPoSplat | 17.91 | 0.6306 | 0.2810 | 24.53 | 0.8450 | 0.1634 | 22.15 | 0.7988 | 0.2359 | 25.35 | 0.7774 | 0.1875 |
FLARE | 17.01 | 0.5672 | 0.2901 | 22.15 | 0.7126 | 0.2363 | 23.19 | 0.8117 | 0.2201 | 22.44 | 0.6229 | 0.2818 |
Method | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 256 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Time ↓ | GPU ↓ | Time ↓ | GPU ↓ | Time ↓ | GPU ↓ | Time ↓ | GPU ↓ | Time ↓ | GPU ↓ | Time ↓ | GPU ↓ | Time ↓ | GPU ↓ | Time ↓ | GPU ↓ | |
DUST3R | 0.35 ± 0.19 | 2.49 | 6.00 ± 0.30 | 2.6 | 13.96 ± 0.86 | 3.65 | 50.37 ± 2.28 | 8.38 | 196.81 ± 6.38 | 27.52 | OOM | OOM | OOM | OOM | OOM | OOM |
MASt3R | 9.43 ± 0.28 | 2.61 | 14.63 ± 0.52 | 2.68 | 21.38 ± 2.26 | 2.78 | 42.28 ± 9.06 | 3.35 | 117.77 ± 40.83 | 6.87 | 392.23 ± 184.36 | 28.78 | OOM | OOM | OOM | OOM |
Spann3R | 0.16 ± 0.12 | 2.79 | 0.28 ± 0.01 | 2.8 | 0.65 ± 0.00 | 2.81 | 1.38 ± 0.01 | 2.84 | 2.81 ± 0.07 | 2.89 | 5.51 ± 0.03 | 2.99 | 11.25 ± 0.16 | 3.19 | 23.64 ± 0.70 | 3.55 |
CUT3R | 0.19 ± 0.07 | 3.33 | 0.26 ± 0.04 | 3.38 | 0.42 ± 0.03 | 3.48 | 0.78 ± 0.03 | 3.65 | 1.50 ± 0.03 | 4.28 | 3.12 ± 0.31 | 5.54 | 5.76 ± 0.12 | 11.68 | 11.65 ± 0.16 | 17.36 |
VGGT | 0.32 ± 0.41 | 7.11 | 0.29 ± 0.40 | 7.72 | 0.24 ± 0.01 | 9.06 | 0.72 ± 0.49 | 10.29 | 2.35 ± 0.04 | 12.75 | 4.23 ± 0.07 | 17.66 | 11.76 ± 0.41 | 28.65 | 34.21 ± 2.51 | 50.92 |
Fast3R | 0.13 ± 0.14 | 4.05 | 0.11 ± 0.03 | 4.26 | 0.15 ± 0.02 | 4.75 | 0.30 ± 0.01 | 5.8 | 0.69 ± 0.02 | 7.25 | 1.78 ± 0.03 | 8.43 | 5.13 ± 0.06 | 10.91 | 16.55 ± 0.12 | 15.75 |
MonST3R | 0.32 ± 0.25 | 2.79 | 14.78 ± 0.52 | 4.8 | 18.77 ± 0.20 | 7.84 | 35.76 ± 0.35 | 8.9 | 73.19 ± 0.37 | 16.15 | 148.17 ± 0.99 | 32.99 | 605.83 ± 25.24 | 66.66 | OOM | OOM |
Easi3R | 0.35 ± 0.19 | 2.49 | 17.35 ± 1.10 | 3.41 | 24.18 ± 0.76 | 4.15 | 60.12 ± 2.67 | 7.69 | 137.16 ± 10.86 | 15.96 | 273.78 ± 2.08 | 32.53 | 901.05 ± 5.29 | 65.68 | OOM | OOM |
Takeaway 1: Current GFMs are promising but face significant challenges when learning from overly complex tasks. Recommendation: Carefully decomposing difficult tasks (e.g., jointly predicting geometry, pose, depth, and tracking) into simpler sub-problems can facilitate more effective learning, especially under limited 3D data.
Takeaway 2: Diverse, high-quality data is critical for strong generalization. To improve robustness in underrepresented domains, GFMs must be trained on data that covers broader distributions and metric-scale annotations.
Takeaway 3: No single backbone—feed -forward ViT or diffusion, dominates; architecture choice should align with task needs. Moreover, leveraging strong 2D feature extractors (e.g., DINO) substantially boosts 3D performance.
Takeaway 4: As GFMs scale to handle more views and complex tasks, efficiency becomes as critical as accuracy for enabling real-time 3D perception.
@article{cong2025e3dbench,
title={E3D-Bench: An End-to-End Benchmark for 3D Geometric Foundation Models},
author={Cong, Wenyan and Liang, Yiqing and Zhang, Yancheng and Yang, Ziyi and Wang, Yan and Ivanovic, Boris and Pavone, Marco and Chen, Chen and Wang, Zhangyang and Fan, Zhiwen},
journal={arXiv preprint arXiv:2506.01933},
year={2025}
}