Creating tutorials in Augmented Reality


My Role




CHI Conference
Carnegie Mellon
HCI Researcher
UX Designer
Dena Sabha
Judy Kong
Apple AR Kit
14 weeks
Summer 2020



Using complex or unfamiliar interfaces can be challenging and frustrating. I wanted to explore guiding a person in real time through a task using Augmented Reality. In general, Augmented Reality Tutorials are a great way to navigate through situations in real time with guidance. However, while consuming a tutorial in Augmented Reality is intuitive in easy to use , creating tutorials in Augmented Reality is extremely challenging.


My team and I created TutorialLens: a system for authoring interactive AR tutorials through narration and demonstration. To use TutorialLens, authors demonstrate tasks step-by-step while verbally explaining what they are doing.TutorialLens automatically detects and records finger positions and guides authors to capture important changes of the device. Using the created tutorials, TutorialLens then provides AR visual guidance and feedback for new users to complete the demonstrated tasks. TutorialLens is automated, friendly to users without AR development experience, and applicable to a variety of devices and tasks.
How might we utilize and improve Augmented Reality Authoring Tools that are practical and accessible for anyone to use?

my process


Video Coding Analysis
User Study


User Flows
Digital Prototype

User Testing

Usability Tests
Quantitative Analysis
Future Work

01. Formative Study

To identify the key challenges and needs in authoring user tutorials, I conducted a formative study consisting of two parts: a video coding analysis of 12 tutorial videos on YouTube, and a remote user study with 10 participants for observing and understanding how they create tutorial videos.

The goal was to make it easy to author content in AR. In order to do that I had to explore and understand two different user groups: Those who create tutorial content (creators) through an authoring mode and those who consume tutorial content (consumers or new users), access mode:
Creators and Consumers: Diagram of how AR tutorial works for each stakeholder

Looking at a system with two different user experiences forced me to piece things together and figure out what kind of flow translates to a different user. In other words, what kinds of things does the authors have to do, say or show that will be of importance to the people consuming the content.

Video Coding Analysis + Codebook

The goal of our video coding analysis was to understand how existing tutorials are made, through observation, and to summarize the types of feedback and guidance used to convey a step in the tutorial. After analyzing 12 tutorial videos, I created a code book introducing the hierarchy of types of feedback on devices given by creators of tutorials. View the codebook here.

User Study

For the second part of the formative study, I designed a study that would allow me to Explore how people create video tutorials. By analyzing descriptions and patterns participants share when creating tutorial content could help me identify what could be incorporated using Augmented Reality.

Research Questions




What was challenging about creating the video tutorials? 
What kinds of things would authors want to change?
How is the tutorial process narrated? 




How does the author feel about their tutorial? 
How was the video tutorial produced? 
What kinds of objects were used? 

Study Set Up

During the study, participants were asked to independently create a tutorial video that would walk a new user through a task on a device they had at home. Once the tutorial video was created and uploaded, I used the code book to code the videos participants made. This approach allowed me to observe what participants did in their video, and probe some questions about their process. While interviewing with participants, we watched the tutorial videos together and had participants think aloud about their video. This allowed participants to review their video and answer questions about their experience while creating a tutorial. I then interviewed participants about their experience creating the video, as well as their consumption of tutorial video content. I then grouped responses from our participants in an affinity diagram to help my team with analysis and guide us through our design.

Key Insights & Results

01 Project Context

02 Clear Instructions

03 Ability to Edit

participants felt the need to explain device features in depth.
participants mentioned that they would have liked to be more clear with their instruction
participants reported that they do not have sufficient video editing skills to efficiently edit their video

Design Recommendations






multi modal feedback

voice, text & visual feedback
step-by-step guidance

Explore mode
speech-to-text recognition
enable easy editing
combine user steps for many tasks

02. System Design

In the authoring mode, an author is guided to demonstrate tasks step-by-step, and the system captures the author’s narration and demonstration for each step, and generates an action sequence containing the status of device display screen at the beginning of each user step, user actions to complete the step, and text description of the step. In the access mode, a new user follows the AR guidance and text and audio instructions to access the interface and complete the task. The system design was driven by  insights found from the Formative Study.

User Flow + Mid-Fidelity Mock-Ups

User flow of AR tutorial system Design
My research partner and I began to scope out the design of the system on Figma. We were able to talk about the generally flow the user should go through as well as discuss what was possible to code out in a certain amount of time.
Working in parallel with a developer helped me learn how to prioritize features or change a design to solve the same the problem.
Wireframe of AR tutorial System

Digital Prototype + How the System Works

Modeling Tasks:
Our system models a task on a user interface as an action sequence. Each action sequence contains a sequence of "states" of the interfaces that can be uniquely identified by a corresponding reference image.
Detecting User Actions:
The user action detection mechanism of TutorialLens is based on the Apple ARKit framework, and specifically utilizes the image tracking functionality to locate authors’ fingers in the 3D space. In order for ARKit to track fingers more accurately, authors need to wear finger labels with printed QR code mask patterns. Our system also uses speech-to-text in the Apple Speech framework to transcribe author narration, and uses the Apple ReplayKit framework to record screen while capturing user actions.
Using Anchor Points for Better Detection and Tracking:
To make user action detection more accurate and robust, we use relative 3D finger locations to "anchor point", which improves the accuracy of recorded finger movements. We define an "anchor point" to be a static location on the interface, for example, the machine logo, the control panel, etc. An anchor point is expected to be within the same camera view as the demonstrated user actions, such as pressing a button.
Design System of AR tutorial System

03. User Evaluation

Once we had out application running and we recruited 7 pairs of (a total of 14) participants, to test out our application. This evaluation would allows to see the pain points participants had using the application as well as the response they had about the application itself. Most participants in our study had little or no experience with AR development.

Usability Test Set Up

Our user study consisted of two major parts. In the first part, we evaluated the authoring part of TutorialLens by asking one of the two participants to be the tutorial author. In the second part, the other participant was asked to be the "new" user and to follow the AR user tutorial created in the first part of the study to complete the multi-step task on the home device. We recorded the time of completion for creating the tutorial, and for finishing the task using the tutorial. We aimed to address the following: 
1. How well our system allowed participants to create AR tutorials.

2. How usable the created AR tutorials are in helping new users interact with unfamiliar interfaces.

Research Questions




How learnable is our application to authors? 
How useful is our application to new users? 
What parts of the application did our participants find confusing? 



How long did it take for authors to create their tutorials? 
How did the system work as a whole?

Summary of Results + Key Takeaways

1. Participants believed that our system could be especially helpful if the tasks are very complicated or if they are really unfamiliar with the devices.  
2. Most author participants liked the step-by-step design of TutorialLens, and thinks it’s “ very helpful” and “pretty natural”. Participants felt like the design “ helps you to get organized” and “ forces you to do step-by-step.”
3. New-user participants overall liked the feedback and guidance given by TutorialLens, and were very positive about the potential use of TutorialLens. Participants found the created tutorials “ pretty intuitive” and “ really easy to use”.
4. Participants also liked the feedback given by TutorialLens upon detection of step completion when there’s an update on the display screen, which gives them a feeling that “ I know when I’m done with my step.”
5. Although some author participants didn’t feel much of a need of TutorialLens when they were first done creating the tutorials, they were surprised and expressed more confidence in TutorialLens when they learned about the very positive feedback from new-user participants on the tutorials they created.
1. All authors were able to create interactive AR tutorials

2. All users were able to complete the specified task with guidance

Discussion + Future Work

Once we had out application running and we recruited 7 pairs of (a total of 14) participants, to test out our application. This evaluation would allows to see the pain points participants had using the application as well as the response they had about the application itself. Most participants in our study had little or no experience with AR development.

Successful Features

1. Generalizability
TutorialLens is generalizable to a variety of devices and tasks. As long as the device gives visual feedback to users during interactions, authors can capture these visual updates when creating tutorials and our system.
2. Multi-Modal Guidance
Multi-modal feedback (text, voice and visual instructions) especially helps when the finger movements are too vague for a novice user to identify what action to take with the interface, or when the fingers move too far away from the anchor points, and can't be tracked by the system.
3. Supporting tasks with many pathways
Multi-modal feedback (text, voice and visual instructions) especially helps when the finger movements are too vague for a novice user to identify what action to take with the interface, or when the fingers move too far away from the anchor points, and can't be tracked by the system.

Limitations and Technical Improvements

1. Systems Finger Tracking & QR Code

The current finger tracking method in our prototype requires paper labels to be attached to user fingers, which is not ideal for the user experience. This issue could be resolved with a more developed hand and finger tracking algorithm in AR development platforms.
2. Anchors Not Being Recognized

Another limitation of our system is that it requires the anchors to be relatively close to the points of interactions on every step. For instance, when fingers move too far from the display screen and control panel during demonstration, most of the finger movements won’t be captured by the camera. To make up for this, our prototype consists of multi-modal feedback in the access mode - the user can just follow the text and audio instructions when they don’t see AR simulations of finger movements on the screen.

Conclusion + Reflection

I had so much fun working on this project and learned so much about User Research. I was able to design and conduct a formative study and user evaluation all in a remote setting. Conducting studies remotely was a challenge but through pilot studies and iterations, I was able to make it work! I learned how to do video coding analysis, create a code book and drive a study. I conducted interviews, analyzed information through affinity diagrams and worked with a developer to build our prototype. Working in parallel with a developer allowed me to understand the limitations we had in development as well as communicate what we need in the design and different ways to address the needs. In addition to conducting a formative study we had users test out our prototype to help evaluate our application. It was incredible to be apart of a project from start to finish. I was excited to see the final product after all the research we had done. This project solidified my interest in the UX field and I cannot wait to learn more!
Designed by Dena 2021