Tech Lead/PM Justin Lin & Nathan Reilly
GitHub https://github.com/uwrealitylabs/universal-text-unity
Scrum Board Link
Expected Delivery EOT W25

Changes to Spec:

Change Date Change Author Change Reason
Aug. 17, 2024 Justin Lin Initial Author
Aug. 31, 2024 Nathan Reilly & Justin Lin Technical revisions for Text Label composition. Added Introduction
Sep 27, 2024 Nathan Reilly Large-scale revision of the implementation

Point Persons:

Role Name Contact Info
Sedra Lead Peter Teertstra [email protected]
Team Lead Justin Lin
Nathan Reilly [email protected]
[email protected]
UW Reality Labs Leads Vincent Xie
Kenny Na
Justin Lin [email protected]
[email protected]
[email protected]

Table of Contents

Introduction

https://lh7-rt.googleusercontent.com/docsz/AD_4nXfAhUT28rynNlZwiyoJWw1l8fIkaPH0ohD-bkfJYqiF9-k3q_EbXJIS4cJgOlt-F7lApl1CJxTRIviOwHY8aATEWhDtRoH67vU4J7ChI1JZUG-phDr-lQPk4VNLS2JdbbLooMpPI5UR8imND2KCf0wG9tBR?key=saRBwJK3_3Nj56HudTMcag

When you prompt a virtual assistant (for example Meta AI on Raybans glasses), what happens when you ask “What am I looking at”? Currently, the pipeline seems rather simplistic. The cameras on the glasses take a picture, that picture is passed through a model that can assign text labels to images, and finally that text label describing the whole image is passed into an LLM. This process, especially the step where a model must describe everything in an image using words, is often inaccurate.

What if we could build a system that…

If we created this, we could use it for…