Behind the scenes: programming the world’s most advanced social robot
What’s it like building a brand new program on the world’s most advanced social robot? With no how-to, no experience, and high ambitions?
As part of our focus on #FurhatCreators, Yoav shares his experience creating on the Furhat platform.
Tell us about yourself. Who are you and what do you work with at Prototyp?
My name is Yoav, and I’m a software engineer working at Prototyp. Prior to moving to Stockholm and joining Prototyp, I’ve worked with different programming environments: everything from embedded systems, through web-based systems, to games and visualization. As a code lab helping companies break new ground we work on various projects, and usually have several running in parallel for each developer – so technologies, industries, and approaches can vary accordingly.
Ever since I joined Prototyp during the fall of 2018 I’ve worked, among other projects, on projects involving Furhat.
Did you have any experience with social robots at that point?
I had worked a little bit with robotics during my high school days, but these were remote-controlled robots, not AI powered and certainly not social. I first heard of Furhat when I came to Stockholm in the spring of 2018 to be interviewed for different positions. So Furhat ended up being my first experience working with social robots.
So, how did you go about developing your first skill for Furhat? Where did you start?
If I remember correctly, we started working using the example skills. They covered pretty much all of the basics and were mostly easy to read and understand, especially when combined with the documentation.
Most of the work was done in a tight schedule. I do remember that most of the effort went into polishing the experience rather than getting the basic interaction to work.
Ok, we want to hear more about that. But first, what was the learning curve like for the Furhat Robot Development Kit (RDK)? Was it easy to use, difficult? In what way?
Coming from my previous background in games and embedded systems, I was already familiar with state machines as a way to model programs, which is the main technique used in the Furhat RDK – so it was pretty easy for me to jump on the wagon and get started.
And although it’s not always intuitive in the beginning, from observing other developers in my team, it seems to me like the approach is also relatively easy to grasp for developers who are new to state machines.
I was also happy with the choice of Kotlin as the programming language behind the Furhat RDK. As a functional programming advocate, I find Kotlin to be a delightful programming language that encourages a clean programming style on the one hand, while still being very approachable to new developers on the other. I feel like the RDK is making good use of Kotlin’s features, and most of the code I’ve seen in example skills was self-explanatory.
What are the biggest opportunities and challenges when developing for social robots in general?
Surprisingly, the greatest challenge has little to do with the robot itself, and more to do with understanding how we humans behave in the situation in which the robot participates.
We are so geared for social interaction that most things happen automatically, a fact that makes it sometimes difficult to grasp why a certain interaction with the robot feels awkward.
There were some technical issues, and some of the bugs we came across turned out to be difficult to reproduce or even understand, due to the nature of the complex interaction. We started to develop our own tools and procedures for testing and validating skills and parts of skills, but there is still plenty of work to be done.
Tell us more. What was it that needed polishing with PETRA? What were the greatest challenges you faced there?
The polishes we had to do with PETRA were mostly around handling the great variety of answers potential users can ask. In a regular form-based interview, you usually have to select an option out of a limited list, for example “yes” or “no”, or selecting your age from a list of valid options.
But in spoken interaction, people can answer all kinds of unexpected, but relevant, things.
For example, the robot asks you “Have you been suffering from constipation?” and you answer, “a very long time ago,” or, “I’m not sure what that means”.
In a good interaction, the robot should respond in a convincing manner. Maybe in the first case it should recognize that it should consider the answer as a “no,” because you implied this isn’t relevant anymore. For the second answer, it should provide you with a short explanation.
In addition to this inherent complexity, some rough edges had emerged from the limitations of the speech recognizer, which doesn’t work well with short answers, like “male” or “female”. They were often recognized as “mail” or “Gmail”, respectively.
We had some frustrating days where, because of the complexity of the interaction and different responses, a fix for one thing caused some other, seemingly unrelated part of the interaction to stop working correctly. That was frustrating.
As one of the first developers to build applications for robots, where do you think the industry is heading?
I guess that like all new technologies, it will first be adopted by the adventurous few, then it will be opposed by conservatives who fail to see the benefits or are afraid of imagined or exaggerated consequences. And then, finally, it will become everyday.
There are just two questions which are difficult to answer: how long this process will take, and whether mediocre, cheap-to-produce solutions will win over the better-designed but more expensive ones.
I believe that it is inevitable, and probably for the best, that robots will replace people in the more boring and routine parts of the service industry, in the same way that machines replaced humans in dangerous and dull factory jobs a hundred years ago.
Would you like to try creating something on Furhat on your own, from scratch?
I have been thinking about several uses for the robot, if I had the time. I thought about creating a roleplaying or storytelling skill, in which the robot proxies a game-master for a traditional roleplaying game, and uses the Furhat’s ability to change skins and voices to enhance the interaction while still retaining the intimate feeling of the original experience. A friend of mine who works with image processing suggested that it might be possible to use live actors to control the facial expressions of the robot, to create an even more sophisticated experience.
Another idea that came up in one of our at Prototyping Days – a weeklong internal bootcamp that we hold twice a year – is to use the Furhat as a tutor for children with development problems, regarding the recognition of facial expressions or other face-to-face interactions. While a human therapist might slip, get bored, or be inconsistent, a robot can repeat the exact same expression, or conduct the exact same dialog again and again until the child learns to recognize it correctly.