Rover written with RUST
Guide of Visual Odometry
2025.11.05 18:54
Studying the code you've built is the best way to learn.
I've created a detailed, step-by-step tutorial for you in a single text block. You can copy and paste this into a new file or a text document for your personal study.
This tutorial explains the entire system we just built, from a single camera frame to the final 3D render, and then gives you concrete ideas on how to use your data for a real robot.
Tutorial: Understanding Your Rust Visual Odometry System
This document explains how our program works, step-by-step. We will trace the journey of a single stereo image from the camera all the way to a 3D point on your screen.
The Core Architecture: Two Threads
First, it's critical to understand that our program runs in two parallel threads. This is the key to making it fast: one thread does the heavy math while the other draws the picture.
-
The VO Thread (The "Factory")
-
File:
src/main.rs -
Function:
run_live_vo_thread -
Job: Does all the hard work. It talks to the camera, runs expensive OpenCV functions (rectification, stereo, feature detection), and calculates the 3D points and camera pose. Its only goal is to produce "results."
-
-
The Main Thread (The "Showroom")
-
File:
src/main.rs -
Function:
main -
Job: Does all the drawing. It creates the OpenGL window and the OpenCV
imshowwindows. It spends 99% of its time just drawing the last-known results and waiting for new ones. It must never do heavy calculations, or the display will freeze.
-
-
The MPSC Channel (The "Conveyor Belt")
-
Code:
let (tx, rx) = mpsc::channel::<(...)>(); -
Job: This is the high-speed "conveyor belt" that connects the two threads.
-
The
tx(Transmitter) lives in the VO Thread. It sends the final results. -
The
rx(Receiver) lives in the Main Thread. It receives the results.
-
Part 1: The Journey of a Frame (VO Thread)
This is the step-by-step life of a single frame inside the run_live_vo_thread function.
Step 1: Initialization (Happens Once)
Before the loop starts, we prepare everything:
-
load_calibration(): We openstereo_params_PC.yml. This function loads all the critical matrices (K1,D1,R1,P1,Q, etc.) that define your camera. It also calculates the true vertical FOV from theP1matrix andimage_height(which we found was ~30.15 degrees). -
init_rectification(): We use the calibration data to pre-compute two "lookup tables" (RectificationMaps). These maps will be used to warp our raw images, making them perfectly aligned and distortion-free. -
cap_combined.set(...): We tell the camera to send us a single, wide image (e.g., 640x240) which is actually two 320x240 images side-by-side.
Step 2: Acquire & Split (The Loop Starts)
-
cap_combined.read(): We grab one wide frame from the camera. -
Mat::roi(): We use ourroi_landroi_r(Regions of Interest) to slice this wide frame into two separateMatobjects:curr_left_grayandcurr_right_gray.
Step 3: Rectification (The First "Magic" Step)
This is the most important step for stereo vision.
-
imgproc::remap(): We call this function twice. It uses theRectificationMapsfrom Step 1 to "un-warp"curr_left_grayandcurr_right_gray. -
Result: We get
rectified_leftandrectified_right. These images are now perfectly aligned, as if taken by "perfect" pinhole cameras. This is required for the next step to work.
Step 4: Disparity & 3D Reprojection
-
compute_disparity_map(): We feed the two rectified images to theStereoSGBMalgorithm. It compares them and produces adisparity_map_curr. This is a 2D image where the color of each pixel represents its depth (brighter = closer). -
calib3d::reproject_image_to_3d(): This is the second "magic" step. We give it the 2D disparity map and ourQmatrix (from the calibration file). It uses this information to convert the 2D depth image into a full 3D point cloud (point_cloud_curr). Each pixel now has a real-world (X, Y, Z) coordinate in millimeters.
Step 5: Visual Odometry (Where are we?)
This step figures out how we moved relative to the last frame.
-
detector.detect_and_compute(): We find hundreds of "keypoints" (like corners and interesting textures) in therectified_leftimage. -
knn_match(): We compare the keypoints from this frame to the keypoints we saved from the last frame (prev_frame_data). This gives us hundreds of 2D matches. -
Build 3D Correspondences: We loop through all 2D matches and, for each one, we look up its 3D (X,Y,Z) coordinate in both the current
point_cloud_currand the previousprev_frame_data. We now have a list of(Point_A_in_3D, Point_B_in_3D)pairs. This is ourcorrespondenceslist (e.g., "70 valid 3D correspondences"). -
odom.update(): We send this list to our Odometry class. It uses RANSAC to find the best rotation and translation that explains the movement of all these points. It then accumulates this motion intoglobal_rotationandglobal_translationto keep track of our position from the start.
Step 6: Process for Renderer (Build the Vec<f32>)
Now we create the final product. We loop through every single pixel in the point_cloud_curr:
-
Get (X,Y,Z): We get the (X, Y, Z) value (in mm).
-
Filter: We check if it's valid:
if z_cv.is_finite() && z_cv > 300.0 && z_cv < 10000.0. This is our 30cm-to-10m "depth filter" that removes bad data. -
Flip Coords: We flip the Y and Z axes (
y_gl = -y_cv,z_gl = -z_cv) to convert from OpenCV's coordinate system to OpenGL's. -
Calculate Color: We calculate the grayscale color based on depth, from white (close) to dark gray (far).
-
points_vec.push(...): We push all 6 numbers (x_gl, y_gl, z_gl, r, g, b) into our finalVec<f32>.
Step 7: Draw Text & Send
-
put_text(): We use your custom settings (red, 0.8 scale, etc.) to draw thepoint_count,centroid, andglobal_translationonto thetracking_vizimage. -
data_tx.send(...): We bundle up our three results and send them down the "conveyor belt" to the Main Thread:-
tracking_viz(The image with text) -
points_vec(The giant list of 3D points) -
(global_rot, global_trans)(Our final pose)
-
The VO thread's loop is now done and it immediately starts over at Step 2.
Part 2: Receiving & Using The Data (Main Thread)
This part is much simpler and faster. This is the main function's event loop.
Step 1: Check the "Conveyor Belt"
-
rx.try_recv(): This is the most important part of the main thread. It peeks at the "conveyor belt" to see if a new package from the VO thread has arrived.-
If
Ok(...): A new package is here! We enter theifblock. -
If
Err(...): No new package. We skip theifblock completely and just re-draw the old data. This is why your 3D view still feels smooth even if the VO thread is slow.
-
Step 2: Update All State (When a new package arrives)
-
highgui::imshow(...): We immediately display thetracking_imgin the OpenCV window. -
point_cloud.update_data(points_vec): We send the entireVec<f32>to ourPointCloudobject. This uploads all the new vertex data to the GPU's VBO. -
Convert Pose to View Matrix: This is the logic for our "first-person" camera:
-
We convert the
vo_rot_naandvo_trans_naintocgmathtypes (r_gl,t_gl). -
We store the camera's 3D position in
vo_camera_pos_gl. -
We combine them into a 4x4
vo_pose_matrix. -
We invert this matrix to create the
vo_view_matrix. (This is a key 3D concept: the "view" matrix is always the inverse of the camera's "pose" or position in the world).
-
Step 3: Render the Scene (Happens every loop)
-
let view = vo_view_matrix;: We tell our renderer to use thevo_view_matrixwe just calculated (or the old one if no new package arrived). -
let proj = perspective(...): We create the projection matrix using the correct FOV we loaded at the start. -
let model_matrix = Matrix4::identity();: We tell the renderer to draw the point cloud at the origin (0,0,0). -
point_cloud.draw(...): This function tells the GPU, "Draw all the points in your VBO, using thismodel_matrix,viewmatrix, andprojmatrix." -
axis.draw(...): This does the same for the world axis. -
windowed_context.swap_buffers(): This presents the final image to your screen.
Part 3: How to Use Your "End Data"
You now have a program that, on every frame, produces three incredibly valuable pieces of data. They live in the VO thread right before the data_tx.send(...) call.
-
point_cloud_curr(The RawMat): The full 3D point cloud (320x240 = 76,800 points, including invalid ones) as an OpenCVMat. -
points_vec(The FilteredVec<f32>): The clean point cloud, filtered by depth (e.g., ~29,000 points), in a format ready for OpenGL. -
global_translation(The Pose): Ana::Vector3<f32>holding your camera's (X, Y, Z) position in millimeters.
Here are 3 ideas for what you can do with this data:
Idea 1: Record Your Path (Simple SLAM)
You can save the camera's path to a file to see where it went.
-
In
run_live_vo_thread(before the loop):
-
In
run_live_vo_thread(inside the loop, afterodom.update):
-
At the very end (after the loop):
Idea 2: Simple Robot Navigation
You can use the pose for basic automation.
-
In
run_live_vo_thread(inside the loop, afterodom.update):
Idea 3: Create a 3D Scan File (.xyz)
You can save the 3D environment to view in a program like MeshLab.
-
In
run_live_vo_thread(after the loop):