Skip to menu

Robotics with Object Pascal

Rover written with RUST

Guide of Visual Odometry

2025.11.05 18:54

me Views:36

Studying the code you've built is the best way to learn.

I've created a detailed, step-by-step tutorial for you in a single text block. You can copy and paste this into a new file or a text document for your personal study.

This tutorial explains the entire system we just built, from a single camera frame to the final 3D render, and then gives you concrete ideas on how to use your data for a real robot.


 

 Tutorial: Understanding Your Rust Visual Odometry System

 

This document explains how our program works, step-by-step. We will trace the journey of a single stereo image from the camera all the way to a 3D point on your screen.

 

 The Core Architecture: Two Threads

 

First, it's critical to understand that our program runs in two parallel threads. This is the key to making it fast: one thread does the heavy math while the other draws the picture.

  1. The VO Thread (The "Factory")

    • File: src/main.rs

    • Function: run_live_vo_thread

    • Job: Does all the hard work. It talks to the camera, runs expensive OpenCV functions (rectification, stereo, feature detection), and calculates the 3D points and camera pose. Its only goal is to produce "results."

  2. The Main Thread (The "Showroom")

    • File: src/main.rs

    • Function: main

    • Job: Does all the drawing. It creates the OpenGL window and the OpenCV imshow windows. It spends 99% of its time just drawing the last-known results and waiting for new ones. It must never do heavy calculations, or the display will freeze.

  3. The MPSC Channel (The "Conveyor Belt")

    • Code: let (tx, rx) = mpsc::channel::<(...)>();

    • Job: This is the high-speed "conveyor belt" that connects the two threads.

    • The tx (Transmitter) lives in the VO Thread. It sends the final results.

    • The rx (Receiver) lives in the Main Thread. It receives the results.


 

 Part 1: The Journey of a Frame (VO Thread)

 

This is the step-by-step life of a single frame inside the run_live_vo_thread function.

 

Step 1: Initialization (Happens Once)

 

Before the loop starts, we prepare everything:

  1. load_calibration(): We open stereo_params_PC.yml. This function loads all the critical matrices (K1, D1, R1, P1, Q, etc.) that define your camera. It also calculates the true vertical FOV from the P1 matrix and image_height (which we found was ~30.15 degrees).

  2. init_rectification(): We use the calibration data to pre-compute two "lookup tables" (RectificationMaps). These maps will be used to warp our raw images, making them perfectly aligned and distortion-free.

  3. cap_combined.set(...): We tell the camera to send us a single, wide image (e.g., 640x240) which is actually two 320x240 images side-by-side.

 

Step 2: Acquire & Split (The Loop Starts)

 

  1. cap_combined.read(): We grab one wide frame from the camera.

  2. Mat::roi(): We use our roi_l and roi_r (Regions of Interest) to slice this wide frame into two separate Mat objects: curr_left_gray and curr_right_gray.

 

Step 3: Rectification (The First "Magic" Step)

 

This is the most important step for stereo vision.

  • imgproc::remap(): We call this function twice. It uses the RectificationMaps from Step 1 to "un-warp" curr_left_gray and curr_right_gray.

  • Result: We get rectified_left and rectified_right. These images are now perfectly aligned, as if taken by "perfect" pinhole cameras. This is required for the next step to work.

 

Step 4: Disparity & 3D Reprojection

 

  1. compute_disparity_map(): We feed the two rectified images to the StereoSGBM algorithm. It compares them and produces a disparity_map_curr. This is a 2D image where the color of each pixel represents its depth (brighter = closer).

  2. calib3d::reproject_image_to_3d(): This is the second "magic" step. We give it the 2D disparity map and our Q matrix (from the calibration file). It uses this information to convert the 2D depth image into a full 3D point cloud (point_cloud_curr). Each pixel now has a real-world (X, Y, Z) coordinate in millimeters.

 

Step 5: Visual Odometry (Where are we?)

 

This step figures out how we moved relative to the last frame.

  1. detector.detect_and_compute(): We find hundreds of "keypoints" (like corners and interesting textures) in the rectified_left image.

  2. knn_match(): We compare the keypoints from this frame to the keypoints we saved from the last frame (prev_frame_data). This gives us hundreds of 2D matches.

  3. Build 3D Correspondences: We loop through all 2D matches and, for each one, we look up its 3D (X,Y,Z) coordinate in both the current point_cloud_curr and the previous prev_frame_data. We now have a list of (Point_A_in_3D, Point_B_in_3D) pairs. This is our correspondences list (e.g., "70 valid 3D correspondences").

  4. odom.update(): We send this list to our Odometry class. It uses RANSAC to find the best rotation and translation that explains the movement of all these points. It then accumulates this motion into global_rotation and global_translation to keep track of our position from the start.

 

Step 6: Process for Renderer (Build the Vec<f32>)

 

Now we create the final product. We loop through every single pixel in the point_cloud_curr:

  1. Get (X,Y,Z): We get the (X, Y, Z) value (in mm).

  2. Filter: We check if it's valid: if z_cv.is_finite() && z_cv > 300.0 && z_cv < 10000.0. This is our 30cm-to-10m "depth filter" that removes bad data.

  3. Flip Coords: We flip the Y and Z axes (y_gl = -y_cv, z_gl = -z_cv) to convert from OpenCV's coordinate system to OpenGL's.

  4. Calculate Color: We calculate the grayscale color based on depth, from white (close) to dark gray (far).

  5. points_vec.push(...): We push all 6 numbers (x_gl, y_gl, z_gl, r, g, b) into our final Vec<f32>.

 

Step 7: Draw Text & Send

 

  1. put_text(): We use your custom settings (red, 0.8 scale, etc.) to draw the point_count, centroid, and global_translation onto the tracking_viz image.

  2. data_tx.send(...): We bundle up our three results and send them down the "conveyor belt" to the Main Thread:

    • tracking_viz (The image with text)

    • points_vec (The giant list of 3D points)

    • (global_rot, global_trans) (Our final pose)

The VO thread's loop is now done and it immediately starts over at Step 2.


 

 Part 2: Receiving & Using The Data (Main Thread)

 

This part is much simpler and faster. This is the main function's event loop.

 

Step 1: Check the "Conveyor Belt"

 

  • rx.try_recv(): This is the most important part of the main thread. It peeks at the "conveyor belt" to see if a new package from the VO thread has arrived.

    • If Ok(...): A new package is here! We enter the if block.

    • If Err(...): No new package. We skip the if block completely and just re-draw the old data. This is why your 3D view still feels smooth even if the VO thread is slow.

 

Step 2: Update All State (When a new package arrives)

 

  1. highgui::imshow(...): We immediately display the tracking_img in the OpenCV window.

  2. point_cloud.update_data(points_vec): We send the entire Vec<f32> to our PointCloud object. This uploads all the new vertex data to the GPU's VBO.

  3. Convert Pose to View Matrix: This is the logic for our "first-person" camera:

    • We convert the vo_rot_na and vo_trans_na into cgmath types (r_gl, t_gl).

    • We store the camera's 3D position in vo_camera_pos_gl.

    • We combine them into a 4x4 vo_pose_matrix.

    • We invert this matrix to create the vo_view_matrix. (This is a key 3D concept: the "view" matrix is always the inverse of the camera's "pose" or position in the world).

 

Step 3: Render the Scene (Happens every loop)

 

  1. let view = vo_view_matrix;: We tell our renderer to use the vo_view_matrix we just calculated (or the old one if no new package arrived).

  2. let proj = perspective(...): We create the projection matrix using the correct FOV we loaded at the start.

  3. let model_matrix = Matrix4::identity();: We tell the renderer to draw the point cloud at the origin (0,0,0).

  4. point_cloud.draw(...): This function tells the GPU, "Draw all the points in your VBO, using this model_matrix, view matrix, and proj matrix."

  5. axis.draw(...): This does the same for the world axis.

  6. windowed_context.swap_buffers(): This presents the final image to your screen.


 

 Part 3: How to Use Your "End Data"

 

You now have a program that, on every frame, produces three incredibly valuable pieces of data. They live in the VO thread right before the data_tx.send(...) call.

  1. point_cloud_curr (The Raw Mat): The full 3D point cloud (320x240 = 76,800 points, including invalid ones) as an OpenCV Mat.

  2. points_vec (The Filtered Vec<f32>): The clean point cloud, filtered by depth (e.g., ~29,000 points), in a format ready for OpenGL.

  3. global_translation (The Pose): A na::Vector3<f32> holding your camera's (X, Y, Z) position in millimeters.

Here are 3 ideas for what you can do with this data:

 

Idea 1: Record Your Path (Simple SLAM)

 

You can save the camera's path to a file to see where it went.

  • In run_live_vo_thread (before the loop):

     
    let mut camera_path: Vec<na::Vector3<f32>> = Vec::new();
     
  • In run_live_vo_thread (inside the loop, after odom.update):

     

    let (global_rot, global_trans) = odom.get_pose(); camera_path.push(global_trans.clone());

 

  • At the very end (after the loop):

     
    // When the program quits, save the path to a file
    use std::io::Write;
    let mut file = std::fs::File::create("path_output.txt")?;
    for point in camera_path {
        writeln!(file, "{} {} {}", point.x, point.y, point.z)?;
    }
    println!("Saved camera path to path_output.txt");
     

Idea 2: Simple Robot Navigation

 

You can use the pose for basic automation.

  • In run_live_vo_thread (inside the loop, after odom.update):

     
    let (global_rot, global_trans) = odom.get_pose();

    // Example: Check if the robot has moved more than 1 meter (1000mm)
    if global_trans.norm() > 1000.0 {
        println!("ALERT: I am more than 1 meter away from home!");
        // (Here you would send a command to your robot's motors,
        // like "reverse" or "turn around")
    }

Idea 3: Create a 3D Scan File (.xyz)

 

You can save the 3D environment to view in a program like MeshLab.

  • In run_live_vo_thread (after the loop):

     

     
    // Note: This only saves the *last* frame. A real 3D scanner
    // would need to combine points from *all* frames using the pose.

    // We can use the `points_vec` we already built.
    use std::io::Write;
    let mut file = std::fs::File::create("scan.xyz")?;

    // `points_vec` is [x,y,z,r,g,b, x,y,z,r,g,b, ...]
    // We step 6 items at a time.
    for chunk in points_vec.chunks_exact(6) {
        let x = chunk[0];
        let y = chunk[1];
        let z = chunk[2];
        let r = (chunk[3] * 255.0) as u8;
        let g = (chunk[4] * 255.0) as u8;
        let b = (chunk[5] * 255.0) as u8;

        // Write in "XYZRGB" format
        writeln!(file, "{} {} {} {} {} {}", x, y, z, r, g, b)?;
    }
    println!("Saved last point cloud to scan.xyz");