Tech Lead/PM | Justin Lin & Nathan Reilly |
---|---|
GitHub | https://github.com/uwrealitylabs/universal-text-unity |
Scrum Board | https://www.notion.so/uwrl/1f9bc072402f8056b481d64fa56b4ef5?v=1f9bc072402f81b7a7ca000c4e8d9e22&pvs=4 |
Expected Delivery | July 2025 |
Changes to Spec:
Change Date | Change Author | Change Reason |
---|---|---|
Aug. 17, 2024 | Justin Lin | Initial Author |
Aug. 31, 2024 | Nathan Reilly & Justin Lin | Technical revisions for Text Label composition. Added Introduction |
Sep 27, 2024 | Nathan Reilly | Large-scale revision of the implementation |
Jan 20, 2025 | Nathan Reilly | Revision of the UTT and UTS implementation & other updates for W25 |
Point Persons:
Role | Name | Contact Info |
---|---|---|
Sedra Lead | Peter Teertstra | [email protected] |
Team Lead | Justin Lin | |
Nathan Reilly | [email protected] | |
[email protected] | ||
UW Reality Labs Leads | Vincent Xie | |
Kenny Na | ||
Justin Lin | [email protected] | |
[email protected] | ||
[email protected] |
Google docs version of tech spec here.
When you prompt a virtual assistant (for example Meta AI on Raybans glasses), what happens when you ask “What am I looking at”? Currently, the pipeline seems rather simplistic. The cameras on the glasses take a picture, that picture is passed through a model that can assign text labels to images, and finally that text label describing the whole image is passed into an LLM. This process, especially the step where a model must describe everything in an image using words, is often inaccurate.
What if we could build a system that…
If we created this, we could use it for…