In the XR field, visual odometry has long been the preferred way of estimating device locations. It's a computationally intensive task that has to be optimized for a tiny device that sits on your face.
Not only that, you have to completely localize yourself within a few milliseconds, delays should be minimized in an XR device.
The T261/T265 has 3 sensors: 2 cameras and 1 IMU. Together with their custom silicon, this tiny module is able to output stable poses without customized sensor placement.
Intel’s CEO decided to go on a side quest and make something for us (again), It is called the RealSense T261/265: decently accurate SLAM completely contained in a 22-gram package
The T261/T265 is widely used in Robotics and is especially famous in the Project North Star series headsets.
Before Intel discontinued all of this, they left us a treat that is extremely proprietary and hard to work with: https://www.intel.com/content/www/us/en/products/sku/125926/intel-movidius-myriad-x-vision-processing-unit-4gb/specifications.html
So, we wonder, what if we can make something that combines the flexibility and the power of specialized computing to make something even better? We can then offload parallelizable algorithms (corner detections, non-maximum suppression, or other image convolution algorithms) onto the FPGA with another upstream processor to handle the Kalman filters or optimization algorithms!
Currently, @Vincent Xie’s Northstar uses 2 camera modules, the T261 for SLAM and the Ultraleap SIR170 (Rigel) for hand tracking in the near IR range. We require extra computing or camera modules for any additional functions. Not very convenient at all!
What if, we combine these cameras, into one unified, hackable, and flexible system?
@Anonymous is so camera rich…
Since we wanted to guarantee specific needs can be met, our hardware is built up with these needs in mind.
graph TD;
subgraph FPGA["FPGA"]
MEM --> FEAT["Feature Processing"]
IMUDATA["IMU Data"] --> PREINT["IMU Pre-integration"]
subgraph Feature_Pipeline["Parallel Feature Pipeline"]
FEAT --> FD["FAST Corner Detection"]
FEAT --> BRIEF["BRIEF Descriptors"]
FEAT --> MATCH["Feature Matching"]
end
MEM["Framebuffer Memory"]
USB["USB 3 PHY"]
end
subgraph Camera_System["Camera System"]
CAM1["Camera 1"] --> MEM
CAM2["Camera 2"] --> MEM
IMU["IMU"] --> IMUDATA
end
subgraph PC["PC"]
BA["Bundle Adjustment"]
LCD["Loop Closure Detection"]
ATLAS["Atlas Management"]
RELOC["Relocalization"]
end
MEM --> USB
Feature_Pipeline --> USB
FPGA --> USB
PREINT --> USB
USB --> |"Features, IMU, Video Stream"| PC
Hardware Requirement | Rationales |
---|---|
We need at least 2 cameras on the final board. | |
We need hardware capable of low latency processing of large amount of pixels. | |
We need an IMU on board. | Assisting vision only algorithms |
We need high speed output to a host device. | Allowing more flexibility on usage, from simple stereo camera to all on board algorithms. |
VITracker would need a large enough frame buffer to store at least 3 frames from at least 2 cameras. |
components: