Tech Lead/PM | Justin Lin |
---|---|
GitHub | https://github.com/uwrealitylabs/universal-gestures-unity |
https://github.com/uwrealitylabs/universal-gestures-lab | | Scrum Board | https://www.notion.so/7413f4e3318642aba04e34b2f83869a2?v=14850a97b88c470e922605a3cf40e0f5&pvs=4 | | Expected Delivery | EOT W25 |
Changes to Spec:
Change Date | Change Author | Change Reason |
---|---|---|
Feb. 13, 2024 | Justin Lin | Initial Author |
Feb. 22, 2024 | Justin Lin | Created Arch Diagram |
Aug. 26, 2024 | Justin Lin | Update Links on Front Page |
Point Persons:
Role | Name | Contact Info |
---|---|---|
Sedra Lead | Peter Teertstra | [email protected] |
Team Lead | Justin Lin | [email protected] |
UW Reality Labs Leads | Vincent Xie | |
Kenny Na | ||
Justin Lin | [email protected] | |
[email protected] | ||
[email protected] |
This specification is a proposed plan for a R&D project that a software subteam of UW Reality Labs could work on. Alongside the development of a virtual reality headset from first principles, UW Reality Labs also hopes to provide students with the opportunity to learn the technical skills to develop software for existing XR platforms, such as Meta Quest or Apple Vision Pro.
This project would involve building a software package for Meta Quest headsets geared towards developers of mixed-reality video games and apps that use hand tracking. For developers who build using Unity, the most popular editor for creating immersive experiences, Meta provides the XR All-in-One SDK (UPM). This SDK enables developers to interface with the sensors on Quest headsets, and easily enable hand tracking functionality within their app. However, the functionality surrounding detecting specific hand poses is limited. For example, the All-in-One SDK allows you to recognize when you make a thumbs-up gesture with one of your hands. The way it does this is allowing you to create a configuration using flexion and curl values detected from your hand.
https://lh7-rt.googleusercontent.com/docsz/AD_4nXdpfSAf3l-g0vSyPfQKSRxanzeXL5ZSA3xUI-YItb6hv6DxKGT7rB2YWfq38G7_mar9JqB2QeUy_uE4eUg34hBk5NAn6nwKMZDWtlaVbezDhvts_8ZuJexNM5yDqNGN66zdDAqSdCUx-Vc1KsVwLKeumo-Y?key=5mdlb9wpnDngnogH1UYr4g
https://lh7-rt.googleusercontent.com/docsz/AD_4nXfvHKo3xjPONgtXeB2EFk5atL5CW7UxF8CglAWF9t6QFdak_OlMM62mooJbjKqcZ8liGWyRur_xUJKLeeYNN6H4159CJHZj2zL64_J1tf0vwdDuBukFzOjeDAmX9G3zcgvAQOYJiUl57aqjfwaiRZyte9xk?key=5mdlb9wpnDngnogH1UYr4g
The limitation of this approach is the limited customization to the curl and flexion settings of the fingers. This tool only allows you to define your gesture with fingers curled, slightly curled, not curled, flexed, slightly flexed, or unflexed, etc. It does provide some threshold customization to define your own poses, but it is difficult to create a profile for complex gestures (like making a heart with your hands) that behaves like you would expect it to. Additionally, the existing package does not take into account the relative position of your hands to one another. Without writing additional code, making a circle with your hands together would be recognized the same as making half-circles with both hands spread apart. This makes recognizing a specific, complex shape that is formed with two hands together very difficult.
The goal of our project is to create an importable Unity package which expands and simplifies the hand gesture recognition system provided by Meta. Instead of using an on-or-off approach to finger poses, we will take the float values given by the sensors on the headset for finger positions and apply a Machine Learning approach. By collecting a training set of popular hand gestures, we will train neural networks to classify them. The output of these trained neural networks will be accessible to scripts that we create in C#, which will finally be attachable to GameObjects within Unity.
From the developers perspective, all they have to do is import our package, attach the “UWRL Hand Pose Detection” script to an object within their Scene, choose which gesture they want to recognize, and specify a function to be run when it is recognized. Just like that, the difficult part of recognizing a gesture is handled for them—they can immediately get to work on their creative project. A ML approach has the advantage of being able to learn from the hands’ relative positions to one another, if we pass that information as part of an input vector. Additionally, a neural network’s output can be a confidence level from 0 to 1, and it can be trained on the different ways people naturally create shapes with their hands. Both of these advantages enable greater flexibility in recognition while ideally avoiding accidental activations (which is another issue with the existing system).
Further ideas for expansions on the project include a tool to record a dataset of your own hand gesture, then training a neural network on it locally within your machine. This would work by simply going into Play Mode and performing the hand gesture to collect a training set. After training, this gesture would then be available to select from the dropdown menu of hand gestures. This feature would aim to abstract away all the technical details of machine learning, and use our pre-designed deep learning architecture to recognize whatever gesture that we have not already pre-trained.
The tech stack for this project involves Unity and PyTorch, as well as the C# and Python programming languages. The target platform is Meta Quest, inclusive of Meta Quest 2, Meta Quest 3, and Meta Quest Pro headsets. The project will build on top of Meta’s XR All-in-One SDK (UPM) and therefore it will be a dependency.
The architecture starts with a Unity GameObject placed in the user’s scene. The user can attach our “UWRL Hand Pose Detection” script onto the object. The script will have the following configurable fields: