Research & Notes

Sub-projects

Motivations & backgrounds

In the XR field, visual odometry has long been the preferred way of estimating device locations. It's a computationally intensive task that has to be optimized for a tiny device that sits on your face.

Not only that, you have to completely localize yourself within a few milliseconds, delays should be minimized in an XR device.

The T261/T265 has 3 sensors: 2 cameras and 1 IMU. Together with their custom silicon, this tiny module is able to output stable poses without customized sensor placement.

Intel’s CEO decided to go on a side quest and make something for us (again), It is called the RealSense T261/265: decently accurate SLAM completely contained in a 22-gram package

The T261/T265 is widely used in Robotics and is especially famous in the Project North Star series headsets.

Before Intel discontinued all of this, they left us a treat that is extremely proprietary and hard to work with: https://www.intel.com/content/www/us/en/products/sku/125926/intel-movidius-myriad-x-vision-processing-unit-4gb/specifications.html

So, we wonder, what if we can make something that combines the flexibility and the power of specialized computing to make something even better? We can then offload parallelizable algorithms (corner detections, non-maximum suppression, or other image convolution algorithms) onto the FPGA with another upstream processor to handle the Kalman filters or optimization algorithms!

Currently, @Vincent Xie’s Northstar uses 2 camera modules, the T261 for SLAM and the Ultraleap SIR170 (Rigel) for hand tracking in the near IR range. We require extra computing or camera modules for any additional functions. Not very convenient at all!

What if, we combine these cameras, into one unified, hackable, and flexible system?

Isaac is so cameras rich….

@Anonymous is so camera rich…

Hardware Brainstorming

Since we wanted to guarantee that specific needs can be met, our hardware is built with flexibility in mind

graph TD;
    subgraph FPGA["FPGA"]
        MEM --> FEAT["Feature Processing"]
        IMUDATA["IMU Data"] --> PREINT["IMU Pre-integration"]
        
        subgraph Feature_Pipeline["Parallel Feature Pipeline"]
            FEAT --> FD["FAST Corner Detection"]
            FEAT --> BRIEF["BRIEF Descriptors"]
            FEAT --> MATCH["Feature Matching"]
        end
        
        MEM["Framebuffer Memory"]
        USB["USB 3 PHY"]
    end
    
    subgraph Camera_System["Camera System"]
        CAM1["Camera 1"] --> MEM
        CAM2["Camera 2"] --> MEM
        IMU["IMU"] --> IMUDATA
    end

    subgraph PC["PC"]
        BA["Bundle Adjustment"]
        LCD["Loop Closure Detection"]
        ATLAS["Atlas Management"] 
        RELOC["Relocalization"]
    end

MEM --> USB
Feature_Pipeline --> USB
    FPGA --> USB
    PREINT --> USB
    USB --> |"Features, IMU, Video Stream"| PC

Requirements & Rationales

Hardware Requirement	Rationales
At least 2 cameras on the final board	Need Stereo vision to to lower compute costs
Hardware capable of low latency processing	VR is extremely latency sensitive
onboard IMU	Assisting vision only algorithms
High speed output to a host device	Allowing more flexibility on usage, from simple stereo camera to streaming debugging information
A large enough frame buffer to store at least 3 frames from at least 2 cameras	Allows onboard interframe processing.