Robots need to see, hear, speak, and express themselves as naturally as humans can. Furhat is designed for human social interactions using a multimodal system with discrete modular subsystems that handle functions such as facial animation, neck motion, visual perception, audio processing, cloud service integration and other operations that allow it to interact with humans just as we interact with each other.
One of Furhat’s very unique features is its simple and powerful back-projection technology with real time high-fidelity rendering of dynamic facial expressions and lip-syncing through a smart 3D graphics engine. The projection is cast onto replaceable polymer masks that give Furhat it's human-like appearance.
Furhat uses computer vision to track users in real-time, do facial expression analysis, estimate head pose and user distance. This includes a highly accurate face detection system based on state-of-the-art deep learning (single-shot-detector) model, depth estimation, and spatial modelling.
The combination of visual and audio input enables multi-user tracking and interactions with up to 10 people simultaneously.
The combination of facial animation and replaceable masks make it very easy to create expressive robot characters with any ethnicity, age and gender. Furhat today comes with a generic adult mask and over 20 animated faces that can be further customized and edited directly on the robot. Child and "Anime" masks are available separately, and additional masks can be created on-demand.
In addition to facial animations Furhat has built in gestures such as blinking, smiling, nodding and other micro eye, mouth and head gestures that make the robot feel alive. Furhat's gestures can be fully automated or controlled individually either through a Wizard-of-oz style interface or a skill framework built in your preferred programming language.
Furhat is integrated with leading Microsoft Azure and Google Cloud speech recognition models that accurately capture human speech. These speech recognition models offer quick predictive response time and currently support more than 40 languages and dialects in most countries around the world.
Furhat is designed for natural conversations with rapid turn-taking. You have detailed controls for managing initiative, turn-taking, interruptions, error handling and priming the speech recogniser for expected utterances. The integrated NLU engine lets you easily define intents and entities to extract meaning from user utterances in multiple languages.
Furhat comes with more than 200 voices with varying regional dialects, and is integrated with Text-to-Speech (TTS) models from Microsoft Azure, Amazon Polly, Acapella and ElevenLabs. You can choose the voice that best suits your Furhat character, whether it’s young or old, male or female, or fictional. With ElevenLabs voice cloning you can also easily clone voices of real people and run them on the robot.
The robot comes with a microphone array with noise cancelation and direction of arrival. You can also plug in any microphone for optimizing the robot for a specific environment. Powerful and high-end stereo speakers also allow the robot to be heard in noisy environments.
The Furhat SDK comes with a powerful set of programming tools and APIs for researchers, educators, developers, and students.
Explore SDKA zero-code LLM-driven conversation designer to rapidly ideate, create, and test interactions through prompting.
Explore Creator