Localization

What is Localization?

In simple terms, localization is just figuring out where the robot is. In the case of our rover, we need to know what point it is located at in 3D space, as well as what angle it is rotated to in each axis. This information is also known as a pose (see this page). We need to have this data at all times, and it needs to be updated frequently enough to keep up with the rover's motion. We typically estimate this data by measuring it with a variety of sensors and then processing that sensor data with a variety of fusion and filtering algorithms to make it more accurate.

Sensors

GPS

GPS (Global Positioning System) is a system that uses signals from an array of satellites orbiting earth to figure out where you are on the planet (read more about how this works here). We use a GPS unit mounted to the rover, which outputs our position on the earth in degrees latitude and degrees longitude.

GPS Driver

We use an ArduSimple RTK2B Budget GPS receiver. Once configured, this receiver outputs NMEA messages (full reference here) over USB, which contain all of the GPS data we need for our autonomy system. In order to get them from serial messages on a USB port to ROS messages sent to our autonomy nodes, we need a GPS driver.

Fortunately, there is already a ROS package that handles this for us, called nmea_navsat_driver. This package can be configured to read from a certain USB port at a certain baud rate, and to accept messages that produce either estimated position covariance or estimated velocity. These parameters are configured in config/esw.yaml. The driver will then publish NavSatFix messages to the \fix topic, which contains the information we want.

RTK

Real-Time Kinematics (RTK) is a technique used to improve the accuracy of satellite-based positioning (like GPS). It uses measurements of the phase of a signal's carrier wave in addition to the content of the signal (GPS data) in order make a positioning observation. In addition, the system relies on real-time corrections coming from a base station or an aggregate of base stations who also measure the phase of carrier wave signals. This allows the units to calculate their relative position to within millimeters, although their absolute position is accurate only to the same accuracy as the computed position of the base station. The typical nominal accuracy for these systems is 1 centimeter ± 2 parts-per-million (ppm) horizontally and 2 centimeters ± 2 ppm vertically.

The range to a satellite is essentially calculated by multiplying the carrier wavelength times the number of whole cycles between the satellite and the rover and adding the phase difference. Determining the number of cycles is non-trivial, since signals may be shifted in phase by one or more cycles. This results in an error equal to the error in the estimated number of cycles times the wavelength, which is 19 cm for the L1 signal. Solving this so-called integer ambiguity search problem results in centimeter precision. The error can be reduced with sophisticated statistical methods that compare the measurements from the coarse-acquisition (C/A) signals and by comparing the resulting ranges between multiple satellites.

How MRover Uses RTK

Our RTK configuration currently consists of one base station module and one rover module. The easiest way to remember our configuration is as follows, the base station rtk module will be located on the MRover base station and the rover rtk module will be located on the MRover rover.

Networking: the base station's RTCM corrections are sent over LCM to the rover where they are written over serial directly to the rover's gps module. The result being that if the base station is working correctly and you can communicate with the rover then the connection between the base station and rover gps modules has already been established.
On the base station, the base_station/gps program should be run to interface with the base station gps module, the device file for the base station gps module may vary based on what devices are currently plugged in but it is expected to be a ttyACM device where a number starting from zero is appended to the device name to specify which device is which. Expect the base station gps filepath to be /dev/ttyACM0 and you should probably be fine. If you need to change the file path to the mounted device file you can do so by changing the config file located in config/gps/config.json. The relevant key to change is filename for the base_station object.
On the rover, the jetson/gps program should be run to interface with the rover gps module, the device file for the rover gps module may vary based on the devices that are currently plugged in but will most likely be some variant of /dev/ttyACM. Expect the rover gps filepath to be /dev/ttyACM0 when running on the physical rover. If you are testing both the rover and base station gps modules connected to a single computer, then first plug in the base_station and then the rover, the rover gps module should be mounted to the device file /dev/ttyACM1. If you need to change the expected filepath you can do so on the config config/gps/config.json the object of interest is currently labeled onboard but may at some point be renamed to jetson. The relevant field is filename.

The Config

The config is located in config/gps and holds configuration information for both the base station and rover gps code.

For rover:

baudrate: The baud rate the rover gps module is configured to communicate at over serial
filename: The path to the device file that the rover gps module is mounted to
max_error_count: The number of consecutive exceptions that can occur before the error is propagated forward and crashes the program
sleep: The amount of time in seconds that the gps code should take to release priority to other co-routines (listen for rtcm corrections)

For basestation:

baudrate: The baud rate the base station gps module is configured to communicate at over serial
filename: The path to the device file that the base staion gps module is mounted to
buffsize: The size in bytes that each lcm message should be. Too small of a buffsize may result in the rover module not accepting the rtcm corrections and too large of a buffsize may cause networking issues

IMU

An IMU (Inertial Measurement Unit) is a device that contains several sensors that are used in combination to measure orientation and movement. A 9DoF (Degree of Freedom) IMU like the one we use consists of:

A 3 axis gyroscope, which can measure angular velocity in each axis
A 3 axis accelerometer, which can measure linear acceleration in each axis
A 3 axis magnetometer, which essentially acts as a compass and can measure the direction of magnetic north in each axis

All of this data is then combined using sensor fusion algorithms (more on this later) in order to produce a 3D orientation estimate.

IMU Driver

We use an Adafruit BNO055 IMU. It is centered around the Bosch BNO055 chip, which is what contains all of the sensors. This IMU has onboard sensor fusion capabilities, as it can (allegedly) produce a globally accurate 3D orientation measurement using its own onboard black box sensor fusion algorithms. It can provide a wide range of data over I2C and UART.

In order to access the data it measures, both the raw sensor readings and the filtered orientation estimate, we obviously need to read them from the sensor into our computer. Since the only good library for this sensor only supports I2C, and ordinary computers can't read I2C, we need an intermediary processor to read it for us and convert the data to serial data that the computer can actually read. For this we are using an Arduino Nano Every microcontroller. We have an Arduino sketch kept in the embedded-testbench repository which uses the Adafruit BNO055 Unified Sensor Library to read the IMU data over I2C and then publish it over a serial connection. The arduino is then plugged into our main Jetson computer over USB so the data is accessible on a serial port.

To get this data from serial to the ROS network, we have an IMU driver node. this node reads the IMU data over serial and then publishes it to several standard ROS messages, which other nodes can then subscribe to. The IMU driver node is configured in config/esw.yaml. The specific data published includes:

IMU: Imu messages on the /imu/data topic
Magnetometer: MagneticField messages on the /imu/magnetometer topic
IMU Temperature: Temperature messages on the /imu/temp topic
IMU Calibration: custom CalibrationStatus messages on the /imu/calibration topic

For information on the design process and decisions behind the IMU driver, read this discussion.

Cameras

We currently use the ZED 2i. Existing algorithms have the capability of detecting visual features seen by the camera and matching them across frames, allowing us to determine how they have shifted over time. We can then use this information to derive the camera's movement and estimate pose in a technique known as visual odometry.

feature_matching

Functions such as the 8-Point Algorithm can identify the rotation and translation matrices $R$ and $T$ from 8 (as its name suggests) matched features. Put together, these produce a odom->base_link transform that describes pose.

GPS Linearization

The geodetic coordinates (latitude, longitude, altitude) that the GPS gives us are very hard to work with, since they can be very inconvenient and unintuitive over small distances. To solve this problem, we can linearize them into cartesian coordinates (x, y, z) using an ENU local tangent plane approximation.

This approximation essentially assumes the earth is flat in the area where the rover is driving. In our case this is actually a pretty good approximation, since the rover only ever drives a few kilometers during the auton mission, and this distance is very small relative to the size of the earth.

The linearization is computed by first converting the geodetic coordinates from the GPS to ECEF (earth centered, earth fixed) coordinates using the WGS 84 ellipsoid model of the earth. These ECEF coordinates are cartesian coordinates, but their origin (0,0,0) is at the center of the earth. These coordinates are then transformed to be centered at the tangent plane around the rover. The equations for this transform can be found here.

The local tangent plane is centered at a reference geodetic coordinate, which is a (latitude, longitude, altitude) coordinate that we choose which is close enough to the area of operation that the linearization approximation is accurate. This means the reference coordinate needs to be changed when the rover is being operated in very different geographical location, such as when we move from Michigan to Utah for our competition.

Accuracy

It's difficult to measure the accuracy of the tangent plane linearization, since we don't have any pairs of ground truth geodetic coordinates that are a known distance apart (maybe you could find some of these online). What we've done to measure accuracy is use this calculator to calculate distance between geodetic coordinates (that were around 30 km apart) and compare that calculated distance to the distance calculated through linearization. The calculator uses Vincenty's Formulae to calculate distance, which is accurate within a millimeter. Unfortunately the calculator is only precise to 1 meter, so the only hard accuracy bound we can give is within 1 meter across 30km. If we want a tighter bound, we could find a higher precision calculator that uses the same formula and compare to that.

Having a tight accuracy bound probably isn't all that important, because small inaccuracies probably won't be reflected in our system's actual performance. This is because our target geodetic waypoints are linearized using the same algorithm, so effectively both our measured position and our target position are being mapped through the same function. The properties of this linearization function mean that even if it maps our current GPS geodetic coordinate to an incorrect cartesian coordinate, it will map our target coordinate to an incorrect cartesian coordinate in the same way. This means our distance to the target will still approach zero as we get closer to the target, and the distance between position and target will have all of the properties required for our navigation to work, even if the actual distances are inaccurate.

Guide to Localization Frames

When determining and defining where the robot is, we have to deal with an inherent tradeoff in data quality. Usually data sources can be either locally accurate or globally accurate, but it is much more difficult to get a single source of data that is both globally and locally accurate.

This problem is addressed by the standards introduced in REP 105 (which you should read to get a general overview). Unfortunately, the REP uses some misleading terms and gives a somewhat confusing explanation, so here is a hopefully more clear way of explaining it.

First, a few important notes and definitions:

local vs global accuracy

In the context of localization, we often talk about local and global accuracy, so we need to be very clear about what these things mean. Locally accurate data must change in a smooth way, i.e. no discrete/discontinuous jumps. However, it may drift over time without bounds, meaning it can gradually accumulate large errors. On the other hand, globally accurate data must not drift significantly over time, but may have discrete/discontinuous jumps at any time. These definitions are of course a little bit vague, since they somewhat depend on the idea of a short vs long period of time, but this isn't usually a problem since we don't deal with a lot of significant edge cases.

abstraction of sensor data sources into transforms

For the purposes of this explanation, we will abstract away any sensor processing/fusion algorithms and just imagine all localization data sources as providing a transform telling us where the robot is located. A global sensor transform will be relative to a fixed frame, in this case the map frame. A local sensor transform on the other hand may be relative to any arbitrary starting point, so long as the transform obeys the rules of local accuracy.

`map`

Since pose is relative (see SE3 wiki entry), we need to define where the robot is relative to some "fixed" frame. Fixed in this case means it does not move relative to the thing/place we are navigating in, which we will call the world. We will use the "map" frame as our fixed frame, and the pose of the robot will be defined relative to that frame, i.e. it will be defined as a transform from map to the robot.

`base_link`

The base_link frame is simply the frame of the robot. It is rigidly attached to the robot, and in our case is located at the center of the chassis.

`odom`

The odom frame is an intermediate frame in between map and base_link. It doesn't really have a physical representation, but it gives us a good way to separate local and global data sources. Odom essentially acts as a local reference frame for the robot. This means that the pose of the robot relative to the odom frame should always be locally accurate, but doesn't need to be globally accurate.

There are two transforms we need to define in order to connect these three frames, which thereby defines the pose of the robot:

`odom_to_base_link`

The transform from the odom frame to the base_link frame, which we will call odom_to_base_link, will be defined using locally accurate sensor data. This means the pose of the robot in the odom frame (which is the exact same thing as this transform) will be locally accurate, i.e it can drift over time, but must always change smoothly and continuously without discrete jumps.

`map_to_odom`

The transform from the map frame to the odom frame, which we will call map_to_odom in this case, will be defined using globally accurate sensor data. Our goal here is to use this data to make the transform from map to base_link (which is equal to map_to_odom * odom_to_base_link) globally accurate, i.e it will not drift over time, but it may change non-continuously with discrete jumps. We want to do this by only changing the map_to_odom frame, which means we have to first figure out what the odom_to_base_link transform is (by asking the TF tree) and then "subtract" that from our global sensor transform in order to determine the correct map_to_odom transform.

Using these frames in practice

The main benefit of this system is having an organized way of separately accessing local and global localization data. For applications where you need a locally accurate localization source, you can get the odom_to_base_link transform from the TF tree and use that as your robot pose. Similarly, for applications where you need globally accurate localization, you can get the map_to_odom transform from the TF tree and use it as your robot pose (for exact syntax, see the wiki section about the TF tree or just google it).

Here's a simple example of using these localization frames: suppose we have a simple 4 wheeled robot equipped with a GPS unit and wheel encoders. The GPS provides a globally accurate source of localization data, while the wheel encoder data can be fed into a sensor processing algorithm to produce a locally accurate source of localization data.

In order to use these two data sources, our first step is to publish the wheel encoder localization data as the odom_to_base_link transform to the TF tree. Similarly, we will also use the GPS data to publish the map_to_odom transform to the TF tree, as described in the map_to_odom section. Once this is done, we can actually use this data for some autonomous routines.

Suppose we want to write a function that rotates the robot 90 degrees in place. Since this is a routine that won't take very much time, is only based on relative measurements, and needs to be quite accurate on a local scale, we want to use a locally accurate data source. To do this, we will get the odom_to_base_link transform from the TF tree, look it's rotation, and then send the robot drive commands until the rotation we read from odom_to_baselink has increased by 90 degrees.

Now suppose we want to write a function to drive to a specific waypoint on our map, a mile away. Since this routine will take a while and is based on absolute measurements (relative to the map frame), we need to use a globally accurate data source to avoid drift and so we can measure relative to the map frame. To do this, we will get the map_to_odom transform from the TF tree, look at its position, and then use a drive controller to drive the robot in the direction of the waypoint until the position we read from map_to_odom is close enough to our target waypoint.

Kalman Filters and Sensor Fusion

A Kalman filter is a dynamic low-pass filter that reduces noise in sensor measurements. It is a 2-phase recursive algorithm consisting of a prediction step and estimation step, allowing it to accept many inputs at various points for the best possible estimation of the system's state. This kind of sensor fusion is integral to the localization subsystem to provide accurate pose information.

A general algorithm flowchart is shown below:

Inputs: sensor measurement(s) ($z$, n x 1 column matrix). Can be obtained from a variety of sources. For example, magnetometer and accelerometer data are often fused to produce a 3D orientation measurement.
Outputs: state variable ($x_k$, m x 1 column matrix). What you want to estimate - this does not have to be the same type as the input. For example, a very common use of the Kalman filter takes in position (measurement) to estimate velocity (state).
System model: Parameters for the Kalman filter (matrices $A$, $H$, $Q$, $R$). Dependent on the process being estimated and amount of noise in sensors. More on this below.
Additional Notes: hat symbol ($\hat{x_k}$) denotes estimation, minus symbol (${\hat{x_k}}^{-}$) denotes prediction.

Determining the System Model

At its core, the Kalman filter is a recursive function. We use this fact to determine the system model matrices $A$, $H$, $Q$, and $R.$

Linear state estimation is defined recursively by:

$x_k = Ax_{k-1} + w_k$

Where $w_k$ is the state error and $A$ is the n x n state transition matrix. Because the state estimation is linear, it follows that no elements within $A$ may depend on the state variable $x_k$. An extended Kalman filter (EKF) is a generalized version of the Kalman filter that allows for nonlinear state estimation.

Sensor measurement is defined by:

$z = Hx_k + v_k$

Where $v_k$ is the measurement error, and $H$ is the m x n matrix that maps state to measurement.

$Q$ and $R$ are the process covariance and sensor covariance matrices respectively. These are sometimes given by the manufacturer, otherwise they are tuned from the information given on the sensor's noise level and educated guesses on how it affects the state variable.

Prediction and Estimation

After providing the filter with an initial state and error covariance, the prediction and estimation steps loop to continuously receive new data and update the state estimate.

Prediction

Screenshot 2024-08-26 at 10 58 54 AM

The algorithm predicts the state estimate and the error covariance using the above equations. Notice that the equation used to predict the state estimate is identical to the linear state estimation function, only without the error term.

The error covariance is instead computed separately and makes use of the error matrix $Q$.

Estimation

Screenshot 2024-08-26 at 11 01 36 AM

First, the Kalman Gain $K_k$ is computed. This determines how much “weight” the algorithm gives incoming measurements. The more weight, the more the incoming measurement influences the state estimate.

The estimate is computed using the state prediction, Kalman gain, and the difference between the measurement and predicted measurement $H{\hat{x_k}}^{-}$. The error covariance is also calculated, and will be used to predict the ${k + 1}^{th}$ error covariance.

Global EKF

To get a globally accurate estimate of the rover's pose, we use an EKF that fuses GPS and IMU data. The EKF outputs a more accurate, less noisy pose estimate which is then published to the TF tree effectively as the map->base_link transform.