Couger wins third place in Facebook Research's VR/AR Eye Tracking Accuracy Competition

Couger Inc. is pleased to announce that our AI model by Devanathan Sabarinathan and Dr. Priya Kansal won third place in the "OpenEDS Challenge" hosted by Facebook, and their paper on the AI model was accepted for publication at the ICCV, the world's top computer vision conference. This competition is for the accuracy of AI models that track people's gaze and eye movements, a technology that is expected to improve the performance of smart glasses such as VR/AR.


The spread of AR/VR has increased the demand for eye tracking, which tracks eye gaze (where the wearer is looking) and eye movements when wearing smart glasses. While hardware specifications for smartphones and other devices have evolved to the point where they can enjoy high-load processing such as gaming and video viewing as a matter of course, CPU performance is still limited. Therefore, VR/AR hardware similarly requires distributed processing in the cloud and edge computing to operate in any environment, regardless of people or environment.

Deep learning has already produced success stories in the area of eye tracking. However, due to hardware resource limitations, machine learning solutions face challenges in terms of real-time performance.

Furthermore, creating a stable and efficient machine learning solution requires the acquisition of large amounts of accurate training data from thousands of users in different environments. However, collecting such data is impractical and expensive.

Against the backdrop of these issues, Facebook, the provider of the Oculus Store, the fundamental business of VR, which is already estimated to have sales of over 10 billion yen, sponsored a competition for the accuracy of AI models.

Competition Overview

The OpenEDS Challenge, sponsored by Facebook, presents the two aforementioned challenges.

  1. Semantic Segmentation Challenge: Eye Position estimation in 2D Images
  2. Synthetic Eye Generation Challenge: Efficient data generation

The Couger team participated in the 1st "Semantic Segmentation" challenge and placed 3rd.
Eye tracking requires accurate recognition of 2D images. This means that important eye regions (sclera, iris, and pupil) must be demarcated pixel by pixel from the rest of the eye.
The ideal solution is accurate, stable, and resource efficient. Therefore, the challenge was judged in terms of model accuracy and lightweight model size.

The following were recommended for this challenge.

  1. Semantic segmentation that is accurate and generalizable
  2. Training focused on natural recognition of the human eye region using the OpenEDS dataset*1
  3. Balance between accuracy and model complexity
  4. Use of data synthesis techniques such as UnityEyes and NVGaze
  5. *1 OpenEDS dataset: A dataset of eye images collected by a VR device with two attached cameras facing the eye side, provided by Facebook.

About "EyeNet," a proprietary model developed by Couger

EyeNet is based on SkeletonNet, a model for skeletal recognition developed and presented by Couger at another top conference, CVPR, held in July 2019 in the United States. The difficulty was to keep the model lightweight while maintaining the high accuracy required by the OpenEDS Challenge (model size under 2MB and number of parameters under 400,000 were the requirements for this competition).
While the top rankers in this competition mainly focus on devising data pre-processing methods to improve recognition accuracy, the Couger achieved higher accuracy by using multiple attention mechanisms*2 , a method to determine which part of the input data to focus on, and by uniquely designing the model itself by combining methods from the "Residual Network "*3 , a world-class accuracy neural network model in image recognition. Another key feature is that despite such a highly accurate model, it is ultra-lightweight.

Photo: Actual "EyeNet" identification

Figures for the Couger-developed model "EyeNet" :
mIoU: 0.95112 (6.3% improvement)
Model Complexity: 258,021.00000 (38% improvement)
Total score: 0.97556 (28% improvement)

Baseline model values:
mIoU: 0.89478
Model Complexity: 416,088.00000
Total Score: 0.76240

*2 Attention mechanism: A method for determining which parts of input data to focus on in natural language processing and image processing.

*3 Residual Network: A model of neural network devised by Microsoft Research in 2015.

Reference Information
OpenEDS Challenge Official Site



Inquiry Form

Recommended articles