Tech Lead/PM | Justin Lin & Nathan Reilly |
---|---|
GitHub | https://github.com/uwrealitylabs/universal-text-unity |
Scrum Board | Link |
Expected Delivery | EOT W25 |
Changes to Spec:
Change Date | Change Author | Change Reason |
---|---|---|
Aug. 17, 2024 | Justin Lin | Initial Author |
Aug. 31, 2024 | Nathan Reilly & Justin Lin | Technical revisions for Text Label composition. Added Introduction |
Sep 27, 2024 | Nathan Reilly | Large-scale revision of the implementation |
Point Persons:
Role | Name | Contact Info |
---|---|---|
Sedra Lead | Peter Teertstra | [email protected] |
Team Lead | Justin Lin | |
Nathan Reilly | [email protected] | |
[email protected] | ||
UW Reality Labs Leads | Vincent Xie | |
Kenny Na | ||
Justin Lin | [email protected] | |
[email protected] | ||
[email protected] |
https://lh7-rt.googleusercontent.com/docsz/AD_4nXfAhUT28rynNlZwiyoJWw1l8fIkaPH0ohD-bkfJYqiF9-k3q_EbXJIS4cJgOlt-F7lApl1CJxTRIviOwHY8aATEWhDtRoH67vU4J7ChI1JZUG-phDr-lQPk4VNLS2JdbbLooMpPI5UR8imND2KCf0wG9tBR?key=saRBwJK3_3Nj56HudTMcag
When you prompt a virtual assistant (for example Meta AI on Raybans glasses), what happens when you ask “What am I looking at”? Currently, the pipeline seems rather simplistic. The cameras on the glasses take a picture, that picture is passed through a model that can assign text labels to images, and finally that text label describing the whole image is passed into an LLM. This process, especially the step where a model must describe everything in an image using words, is often inaccurate.
What if we could build a system that…
If we created this, we could use it for…