Our engine uses very basic and affordable hardware – 2D stereoscopic cameras, and IMU (inertial measurement unit), in order to provide:
- An accurate digital 3D scene representation of one’s current physical environment: Enabling an intelligent understanding of the mapped 3D scene by creating a depth map and 3D reconstruction
- Information about a broad series of essential factors that influence the environment and are crucial for building high quality real-life AR experience such as:
- Light sources, reflections, transparency, shadows, etc.
- Recognition of real world objects, their physical characteristics and how they affect the scene
- Ongoing analysis of user orientation and position in the environment sensing and presenting the environment from the user’s point of view – because it keeps moving
The technology behind our engine
This engine performs the most basic need for all other components – the ability to accurately and robustly identify the same points in different image frames (taken from different location and/or time). This involves smart and diverse feature extraction (with sub-pixel fine-tuning), descriptors for each of those features, smart mechanisms to be able to create matches, followed by robust techniques such as RANSAC for removing outliers.
Position and orientation
This brings the ability to determine the exact viewpoint of the device, and is needed for correct rendering of the scene as well as interpreting hand movements (see below) in accurate 3D space. This process requires fusion of data sources (the IMU and the cameras) with techniques such as Kalman filter, as well as obtaining high-accuracy position and orientation from the visual information alone, using highly efficient variations of optimization techniques such as single photo resection.
Physical world digitization
This module allows to create a digital replication of the physical world – where things are, what they look like, are they reflective or transparent, and what light sources exist in the scene. Data is initially created by dense stereo matching, followed by techniques such as Structure-From-Motion, Bundle Adjustment, SLAM and PTAM, as well as shape analysis and ray-tracing.
Control and gesture NUI
This module allows control, through direct “contact” with virtual objects as well as through gestures. It requires learning through instruments such as neural nets typical hand poses and gestures (for locating the hands and for interpreting gestures). Tracking from moving camera (via background subtraction), is used to provide low latency. Lastly, predictive analysis based on both machine learning of user behavior as well as current scene structure is also used to reduce latency.