Golem - Ii+ - Waiter

The following describes how we integrated the functionality known as Multiple Direction of Arrival Estimation into a Human-Robot-Interaction Scheme We followed three strategies: Contextualized Long Distance Calls Facing the User and Detecting Multiple Users To better describe these strategies, we made Golem, our service robot, act as a waiter in a restaurant with two tables. [Golem: Hello, my name is Golem, I am your host.] [Golem: Let me know if you need something.] [Customers: Waiter! Waiter!] We have three very rude customers that want to place their order first. Golem is able to estimate from what direction is a customer calling from and, with knowledge from its surroundings, can decide which table to navigate to and attend. [Customers: Waiter! Waiter!] [Golem: Going to T2.] [Golem: T2] [Golem: I will go to table T2.] Golem has decided to go table 2. However, customers at table 1 still want to be attended. To this effect, Golem's navigation can be interrupted by a calling customer. [Golem: T1] When this happens, Golem stops, acknowledges the customers, and tells them to wait while he attends table 2. [Golem: I will be there in a second.] Now in table 2, Golem takes the order from the customer. [Golem: Hello there.] [Golem: What would you like to drink?] [Customer1: I want a Coke.] [Golem: Coke for you.] [Golem: Do you want something else?] [Customer1: No.] [Golem: I will bring your order soon.] [Golem: I will go table T1.] Now that Golem has finished taking the order, it navigates to table 1, since customers there interrupted his navigation before, and surely want to also place an order. In this table, however, we have two customers that like to talk over each other when the order is being taken. [Customer2: I want a Coke.] [Customer3: I want a water.] Golem is able to estimate the directions of several sound sources at the same time. The following is an example of what Golem hears when two people are talking over each other. Each blue circle indicates an estimation of a direction of a sound source. Because typical automatic speech recognizers don't perform well with more than one person talking at a time, this functionality can be useful here. [Customer2: I want a Coke.] [Customer3: I want a water.] If Golem detects more than one person talking, it discards the order, and tells the customers to not speak over each other. [Golem: Can you speak one at a time?] [Golem: What would you like to drink?] [Customer2: I want a Coke.] Golem uses the estimated direction to turn its head in the direction of the user to let her know in a very subtle but effective way that it is her turn. [Golem: What would you like to drink?] [Customer3: I want water.] [Golem: Water for you.] [Golem: Do you want something else?] [Customer3: No.] [Golem: I will bring your order soon.] Since all tables have now been attended, it is time to pick the orders from the bar. [Golem: Give me a Coke.] The bartender is asked to provide the drinks in the order they were asked for. In this case, the order from table 2 is fulfilled first. [Golem: Here you have Coke.] [Golem: Enjoy your drinks.] [Golem: Give me a Coke and water.] The order from table 2 is fulfilled, it is now time to fulfill the order from table 1. [Golem: Here you have Coke.] [Golem: Here you have water.] We hope we have demonstrated how the functionality known as Multiple Direction of Arrival Estimation can be integrated into a Human-Robot-Interaction Scheme. We believe that it can provide a good compliment to how robots interact with humans in the future. [Golem: Enjoy your drinks.]