Simultaneously Localisation and Mapping

[ kinect  hololens  arkit  google-tango  oculus  arcore  slam  ]

Simultaneous localization and mapping (SLAM) is the computational problem of constructing or updating a map of an unknown environment while simultaneously keeping track of an agent’s location within it.

Survey

Workshop

The Future of Real-Time SLAM: Sensors, Processors, Representations, and Algorithms:

Papers in 30 Years of SLAM

A Solution to the Simultaneous Localization and Map Building Problem

The simultaneous localization and map building (SLAM) problem asks if it is possible for an autonomous vehicle to start in an unknown location in an unknown environment and then to incrementally build a map of this environment while simultaneously using this map to compute absolute vehicle location. Starting from estimation-theoretic foundations of this problem, the paper proves that a solution to the SLAM problem is indeed possible. The underlying structure of the SLAM problem is first elucidated. A proof that the estimated map converges monotonically to a relative map with zero uncertainty is then developed. It is then shown that the absolute accuracy of the map and the vehicle location reach a lower bound defined only by the initial vehicle uncertainty. Together, these results show that it is possible for an autonomous vehicle to start in an unknown location in an unknown environment and, using relative observations only, incrementally build a perfect map of the world and to compute simultaneously a bounded estimate of vehicle location. The paper also describes a substantial implementation of the SLAM algorithm on a vehicle operating in an outdoor environment using millimeter-wave radar to provide relative map observations. This implementation is used to demonstrate how some key issues such as map management and data association can be handled in a practical environment. The results obtained are cross-compared with absolute locations of the map landmarks obtained by surveying. In conclusion, the paper discusses a number of key issues raised by the solution to the SLAM problem including suboptimal map-building algorithms and map management.

Loop-Closing

Probabilistic Algorithms and the Interactive Museum Tour-Guide Robot

This paper describes Minerva, an interactive tour-guide robot that was successfully deployed in a Smithsonian museum. Minerva’s software is pervasively probabilistic, relying on explicit representations of uncertainty in perception and control. During 2 weeks of operation, the robot interacted with thousands of people, both in the museum and through the Web, traversing more than 44 km at speeds of up to 163 cm/sec in the unmodified museum.

Optimization of the Simultaneous Localization and Map Building Algorithm for Real Time Implemention

Addresses real-time implementation of the simultaneous localization and map-building (SLAM) algorithm. It presents optimal algorithms that consider the special form of the matrices and a new compressed filler that can significantly reduce the computation requirements when working in local areas or with high frequency external sensors. It is shown that by extending the standard Kalman filter models the information gained in a local area can be maintained with a cost /spl sim/O(N/sub a//sup 2/), where N/sub a/ is the number of landmarks in the local area, and then transferred to the overall map in only one iteration at full SLAM computational cost. Additional simplifications are also presented that are very close to optimal when an appropriate map representation is used. Finally the algorithms are validated with experimental results obtained with a standard vehicle running in a completely unstructured outdoor environment.

Real-time SLAM using laser

FastSLAM: A Factored Solution to the Simultaneous Localization and Mapping Problem

The ability to simultaneously localize a robot and accurately map its surroundings is considered by many to be a key prerequisite of truly autonomous robots. However, few approaches to this problem scale up to handle the very large number of landmarks present in real environments. Kalman filter-based algorithms, for example, require time quadratic in the number of landmarks to incorporate each sensor observation. This paper presents FastSLAM, an algorithm that recursively estimates the full posterior distribution over robot pose and landmark locations, yet scales logarithmically with the number of landmarks in the map. This algorithm is based on a factorization of the posterior into a product of conditional landmark distributions and a distribution over robot paths. The algorithm has been run successfully on as many as 50,000 landmarks, environments far beyond the reach of previous approaches. Experimental results demonstrate the advantages and limitations of the FastSLAM algorithm on both simulated and real-world data.

An Atlas Framework for Scalable Mapping

This paper describes Atlas, a hybrid metrical/topological approach to SLAM that achieves efficient mapping of large-scale environments. The representation is a graph of coordinate frames, with each vertex in the graph representing a local frame, and each edge representing the transformation between adjacent frames. In each frame, we build a map that captures the local environment and the current robot pose along with the uncertainties of each. Each map’s uncertainties are modeled with respect to its own frame. Probabilities of entities with respect to arbitrary frames are generated by following a path formed by the edges between adjacent frames, computed via Dijkstra’s shortest path algorithm. Loop closing is achieved via an efficient map matching algorithm. We demonstrate the technique running in real-time in a large indoor structured environment (2.2 km path length) with multiple nested loops using laser or ultrasonic ranging sensors.

Vision-based SLAM using Poses and Images

Probablistic ROBOTICS

Probabilistic robotics is a new and growing area in robotics, concerned with perception and control in the face of uncertainty. Building on the field of mathematical statistics, probabilistic robotics endows robots with a new level of robustness in real-world situations. This book introduces the reader to a wealth of techniques and algorithms in the field. All algorithms are based on a single overarching mathematical foundation. Each chapter provides example implementations in pseudo code, detailed mathematical derivations, discussions from a practitioner’s perspective, and extensive lists of exercises and class projects.

Square Root SAM: Simultaneous localization and mapping via square root information smoothing

Solving the SLAM (simultaneous localization and mapping) problem is one way to enable a robot to explore, map, and navigate in a previously unknown environment. Smoothing approaches have been investigated as a viable alternative to extended Kalman filter (EKF)-based solutions to the problem. In particular, approaches have been looked at that factorize either the associated information matrix or the measurement Jacobian into square root form. Such techniques have several significant advantages over the EKF: they are faster yet exact; they can be used in either batch or incremental mode; are better equipped to deal with non-linear process and measurement models; and yield the entire robot trajectory, at lower cost for a large class of SLAM problems. In addition, in an indirect but dramatic way, column ordering heuristics automatically exploit the locality inherent in the geographic nature of the SLAM problem. This paper presents the theory underlying these methods, along with an interpretation of factorization in terms of the graphical model associated with the SLAM problem. Both simulation results and actual SLAM experiments in large-scale environments are presented that underscore the potential of these methods as an alternative to EKF-based approaches.

A Tree Parameterization for Efficiently Computing Maximum Likelihood Maps using Gradient Descent

In 2006, Olson et al. presented a novel approach to address the graph-based simultaneous localization and mapping problem by applying stochastic gradient descent to minimize the error introduced by constraints. Together with multi-level relaxation, this is one of the most robust and efficient maximum likelihood techniques published so far. In this paper, we present an extension of Olson’s algorithm. It applies a novel parameterization of the nodes in the graph that signiflcantly improves the performance and enables us to cope with arbitrary network topologies. The latter allows us to bound the complexity of the algorithm to the size of the mapped area and not to the length of the trajectory as it is the case with both previous approaches. We implemented our technique and compared it to multi-level relaxation and Olson ’s algorithm. As we demonstrate in simulated and in real world experiments, our approach converges faster than the other approaches and yields accurate maps of the environment.

Parallel Tracking and Mapping for Small AR Workspaces

This paper presents a method of estimating camera pose in an unknown scene. While this has previously been attempted by adapting SLAM algorithms developed for robotic exploration, we propose a system specifically designed to track a hand-held camera in a small AR workspace. We propose to split tracking and mapping into two separate tasks, processed in parallel threads on a dual-core computer: one thread deals with the task of robustly tracking erratic hand-held motion, while the other produces a 3D map of point features from previously observed video frames. This allows the use of computationally expensive batch optimisation techniques not usually associated with real-time operation: The result is a system that produces detailed maps with thousands of landmarks which can be tracked at frame-rate, with an accuracy and robustness rivalling that of state-of-the-art model-based systems.

FrameSLAM: From Bundle Adjustment to Real-Time Visual Mapping

Many successful indoor mapping techniques employ frame-to-frame matching of laser scans to produce detailed local maps as well as the closing of large loops. In this paper, we propose a framework for applying the same techniques to visual imagery. We match visual frames with large numbers of point features, using classic bundle adjustment techniques from computational vision, but we keep only relative frame pose information (a skeleton). The skeleton is a reduced nonlinear system that is a faithful approximation of the larger system and can be used to solve large loop closures quickly, as well as forming a backbone for data association and local registration. We illustrate the workings of the system with large outdoor datasets (10 km), showing large-scale loop closure and precise localization in real time.

FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance

This paper describes a probabilistic approach to the problem of recognizing places based on their appearance. The system we present is not limited to localization, but can determine that a new observation comes from a previously unseen place, and so augment its map. Effectively this is a SLAM system in the space of appearance. Our probabilistic approach allows us to explicitly account for perceptual aliasing in the environment—identical but indistinctive observations receive a low probability of having come from the same place. We achieve this by learning a generative model of place appearance. By partitioning the learning problem into two parts, new place models can be learned online from only a single observation of a place. The algorithm complexity is linear in the number of places in the map, and is particularly suitable for online loop closure detection in mobile robotics.

Relative Bundle Adjustment

This report derives a relative objective function for bundle adjustment – driven by the desire for a truly large scale simultaneous localization and mapping algorithm that can operate incrementally in constant time. It is precisely the choice of a single privileged coordinate frame that makes bundle adjustment expensive to solve. This is especially true during loop closures, when the single frame approach necessitates adjusting all parameters in the loop. We give a relative formulation that is designed specifically to avoid the cost of optimizing all parameters at loop closure. Instead of optimizing in a single Euclidean space, relative bundle adjustment works in a metric-space defined by a connected Riemannian manifold. We find evidence that in this space, the global maximum likelihood solution can be found incrementally in constant time – even at loop closure

KinectFusion: Real-Time Dense Surface Mapping and Tracking

We present a system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware. We fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real-time. The current sensor pose is simultaneously obtained by tracking the live depth frame relative to the global model using a coarse-to-fine iterative closest point (ICP) algorithm, which uses all of the observed depth data available. We demonstrate the advantages of tracking against the growing full surface model compared with frame-to-frame tracking, obtaining tracking and mapping results in constant time within room sized scenes with limited drift and high accuracy. We also show both qualitative and quantitative results relating to various aspects of our tracking and mapping system. Modelling of natural scenes, in real-time with only commodity sensor and GPU hardware, promises an exciting step forward in augmented reality (AR), in particular, it allows dense surfaces to be reconstructed in real-time, with a level of detail and robustness beyond any solution yet presented using passive computer vision.

Kintinuous: Spatially Extended KinectFusion

In this paper we present an extension to the KinectFusion algorithm that permits dense mesh-based mapping of extended scale environments in real-time. This is achieved through (i) altering the original algorithm such that the region of space being mapped by the KinectFusion algorithm can vary dynamically, (ii) extracting a dense point cloud from the regions that leave the KinectFusion volume due to this variation, and, (iii) incrementally adding the resulting points to a triangular mesh representation of the environment. The system is implemented as a set of hierarchical multi-threaded components which are capable of operating in real-time. The architecture facilitates the creation and integration of new modules with minimal impact on the performance on the dense volume tracking and surface reconstruction modules. We provide experimental results demonstrating the system’s ability to map areas considerably beyond the scale of the original KinectFusion algorithm including a two story apartment and an extended sequence taken from a car at night. In order to overcome failure of the iterative closest point (ICP) based odometry in areas of low geometric features we have evaluated the Fast Odometry from Vision (FOVIS) system as an alternative. We provide a comparison between the two approaches where we show a trade off between the reduced drift of the visual odometry approach and the higher local mesh quality of the ICP-based approach. Finally we present ongoing work on incorporating full simultaneous localisation and mapping (SLAM) pose-graph optimisation.

SLAM++: Simultaneous Localisation and Mapping at the Level of Objects

We present the major advantages of a new ‘object oriented’ 3D SLAM paradigm, which takes full advantage in the loop of prior knowledge that many scenes consist of repeated, domain-specific objects and structures. As a hand-held depth camera browses a cluttered scene, realtime 3D object recognition and tracking provides 6DoF camera-object constraints which feed into an explicit graph of objects, continually refined by efficient pose-graph optimisation. This offers the descriptive and predictive power of SLAM systems which perform dense surface reconstruction, but with a huge representation compression. The object graph enables predictions for accurate ICP-based camera to model tracking at each live frame, and efficient active search for new objects in currently undescribed image regions. We demonstrate real-time incremental SLAM in large, cluttered environments, including loop closure, relocalisation and the detection of moved objects, and of course the generation of an object level scene description with the potential to enable interaction.

Kintinuous: Spatially Extended KinectFusion

In this paper we present an extension to the KinectFusion algorithm that permits dense mesh-based mapping of extended scale environments in real-time. This is achieved through (i) altering the original algorithm such that the region of space being mapped by the KinectFusion algorithm can vary dynamically, (ii) extracting a dense point cloud from the regions that leave the KinectFusion volume due to this variation, and, (iii) incrementally adding the resulting points to a triangular mesh representation of the environment. The system is implemented as a set of hierarchical multi-threaded components which are capable of operating in real-time. The architecture facilitates the creation and integration of new modules with minimal impact on the performance on the dense volume tracking and surface reconstruction modules. We provide experimental results demonstrating the system’s ability to map areas considerably beyond the scale of the original KinectFusion algorithm including a two story apartment and an extended sequence taken from a car at night. In order to overcome failure of the iterative closest point (ICP) based odometry in areas of low geometric features we have evaluated the Fast Odometry from Vision (FOVIS) system as an alternative. We provide a comparison between the two approaches where we show a trade off between the reduced drift of the visual odometry approach and the higher local mesh quality of the ICP-based approach. Finally we present ongoing work on incorporating full simultaneous localisation and mapping (SLAM) pose-graph optimisation.

Google Tango

Tango (formerly named Project Tango, while in testing) was an augmented reality computing platform, developed and authored by the Advanced Technology and Projects (ATAP), a skunkworks division of Google. It used computer vision to enable mobile devices, such as smartphones and tablets, to detect their position relative to the world around them without using GPS or other external signals. This allowed application developers to create user experiences that include indoor navigation, 3D mapping, physical space measurement, environmental recognition, augmented reality, and windows into a virtual world.

SLAM in Real Life

Google ARCore

ARCore is Google’s platform for building augmented reality experiences. Using different APIs, ARCore enables your phone to sense its environment, understand the world and interact with information. Some of the APIs are available across Android and iOS to enable shared AR experiences. ARCore uses three key capabilities to integrate virtual content with the real world as seen through your phone’s camera:

  • Motion tracking allows the phone to understand and track its position relative to the world.
  • Environmental understanding allows the phone to detect the size and location of all type of surfaces: horizontal, vertical and angled surfaces like the ground, a coffee table or walls.
  • Light estimation allows the phone to estimate the environment’s current lighting conditions.

Microsoft HoloLens

HoloLens 2 offers the most comfortable and immersive mixed reality experience available with industry-leading solutions that deliver value in minutes. All backed by the reliability, security, and scalability of cloud and AI services from Microsoft. The HoloLens 2 Development Edition helps you jump-start your mixed reality plans with an offer that combines HoloLens 2 with free trials of Unity software and Azure credits for cloud services.

ARKit

ARKit 3 goes further than ever before, naturally showing AR content in front of or behind people using People Occlusion, tracking up to three faces at a time, supporting collaborative sessions, and more. And now, you can take advantage of ARKit’s new awareness of people to integrate human movement into your app.

Oculus

Oculus Insight Tracking is a state-of-the-art inside-out tracking system that enables outward facing sensor architecture to capture, trace, and navigate physical spaces, delivering a greater sense of immersion, presence, and mobility in VR.

Intel RealSense

The Intel® RealSense™ Tracking Camera T265 includes two fisheye lens sensors, an IMU and an Intel® Movidius™ Myriad™ 2 VPU. All of the V‑SLAM algorithms run directly on the VPU, allowing for very low latency and extremely efficient power consumption. The T265 has been extensively tested and validated for performance, providing under 1% closed loop drift under intended use conditions. It also offers sub 6ms latency between movement and reflection of movement in the pose. This is fast enough for even highly‑sensitive applications such as Augmented and Virtual Reality.

Written on July 25, 2019