EPFL researchers unveil FG2 to CVPR: a new AI model that reduces localization errors by 28% for autonomous vehicles in GS environments.

by Brenden Burgess

When you buy through links on our site, we may earn a commission at no extra cost to you. However, this does not influence our evaluations.

Navigating in dense urban canyons of cities like San Francisco or New York can be a nightmare for GPS systems. The imposing skyscrapers block and reflect satellite signals, leading to localization errors of tens of meters. For you and I could mean a missed turn. But for an autonomous vehicle or a delivery robot, this level of imprecision is the difference between a successful mission and an expensive failure. These machines require occasional precision to operate safely and effectively. Addressing this critical challenge, researchers at the Federal Polytechnic School in Lausanne (EPFL) in Switzerland introduced a new revolutionary method for visual location during CVPR 2025

Their new article, “FG2: Location of the transversal vision with fine grain by correspondence of fine grain functionalities”, presents a new model of AI which considerably improves the capacity of a system in the ground level, as an autonomous car, to determine its exact position and orientation by using only a camera and an air (or satellite) image (or satellite). The new approach has shown a remarkable reduction of 28% of the average location error compared to the state of the previous state on a difficult public data set.

The main dishes to remember:

  • Higher precision: The FG2 model reduces the average location error by 28% on the whole of vigor transversal tests, a difficult reference for this task.
  • Human intuition: Instead of relying on abstract descriptors, the model imitates human reasoning by matching characteristics with a fine and semantically coherent grain – such as borders, pedestrian passages and buildings – between a photo at ground level and an air map.
  • Improved interpretability: The method allows researchers to “see” what AI “thinks” by visualizing exactly the characteristics in the ground and the current aerial images, a major step of the previous models of “black box”.
  • Weakly supervised learning: Remarkably, the model learns these correspondence of complex and coherent functionalities without any direct label for correspondence. He succeeded using only the installation of final camera as a supervision signal.

Challenge: see the world from two different angles

The central problem of the location of the transversal vision is the spectacular difference in perspective between a camera at the street and an aerial satellite view. A construction facade seen from the ground seems completely different from its signature on the roof in an aerial image. Existing methods have struggled with this. Some create a general “descriptor” for the whole scene, but it is an abstract approach that does not reflect how humans are naturally locating themselves by identifying specific benchmarks. Other methods transform the image of the soil into a vision of the bird (BEV) but are often limited to the floor plane, ignoring crucial vertical structures such as buildings.

FG2: corresponding fine grain features

The EPFL team FG2 method introduces a more intuitive and effective process. He aligns two points of points: one generated from the image at the level of the ground and another sampled from the aerial card.

Here is a ventilation of their innovative pipeline:

  1. 3D mapping: The process begins by taking image features at ground level and lifting them in a 3D points cloud centered around the camera. This creates a 3D representation of the immediate environment.
  2. Sending intelligent grouping to Bev: This is where magic occurs. Instead of simply flatting 3D data, the model learns to intelligently select the most important characteristics along the vertical dimension (height) for each point. He essentially asks: “For this place on the map, is the road marking at ground level, or is the edge of the roof of this building the best benchmark?” ” This selection process is crucial, because it allows the model to properly associate the features such as the construction of facades with their corresponding roofs in the aerial view.
  3. Corresponding characteristics and installation estimate: Once the ground and air views are represented as 2D point plans with descriptors of rich characteristics, the model calculates the similarity between them. He then samples a sparse set of the most confident matches and uses a conventional geometric algorithm called Alignment of procrustes To calculate the precise installation of 3 DOF (x, y and lace).

Unprecedented performance and interpretability

The results speak for themselves. On the difficult vigor data set, which includes images from different cities of its transverse zone test, FG2 has reduced the average location error by 28% compared to the best previous method. It has also demonstrated higher generalization capacities on the Kitti data set, a must for autonomous driving research.

Perhaps even more important, the FG2 model offers a new level of transparency. By visualizing the paired points, the researchers have shown that the model learns semantically coherent correspondence without it being explicitly told. For example, the system corresponds correctly to the zebra passages, the road marks and even the construction facades in the ground view at their corresponding locations on the air map. This interpretability is extremely precious to establish confidence in autonomous critical security systems.

“A lighter path” for autonomous navigation

The FG2 method represents a significant bond in the visual location with a fine grain. By developing a model that intelligently selects and matches features in a way that reflects human intuition, EPFL researchers have not only broken the previous precision recordings, but also made the decision -making process more interpretable. This work opens the way to more robust and reliable navigation systems for vehicles, drones and autonomous robots, bringing us closer to a future where machines can navigate with confidence in our world, even when the GPS fails.


Discover the Paper. All the merit of this research goes to researchers in this project. Also, don't hesitate to follow us Twitter And don't forget to join our Subseubdredit 100k + ml and subscribe to Our newsletter.


Jean-Marc is a successful AI activity manager. It leads and accelerates the growth of the solutions fueled by AI and launched a computer vision company in 2006. He is a recognized speaker during IA conferences and a MBA of Stanford.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.