RIUNIONI SCIENTIFICHE SULLA RICERCA BIOMEDICA DELL’OSPEDALE PEDIATRICO BAMBINO GESU’ E DELL’UNIVERSITA’ ROMA TRE Martedì 28 Maggio 2013 “Stress ossidativo e patologie correlate” Università Roma Tre, Dipartimento di Scienze, Aula 2 (Piano terra), Viale Marconi, 446 Roma Ore 14.30 – 19.00 NATURAL ANTIOXIDANTS. ARE THEY AN ANSWER TO “OXIDATIVE STRESS” OF MOD
Nats-informatik.uni-hamburg.deCollaborating on Utterances with a Spoken Dialogue System Using an ISU-based Approach to Incremental Dialogue Management brief pause”. As discussed by Clark (1996), thisdevice is an efficient solution to the problem posed by uncertainty on the side of the speaker whether a reference is going to be understood, as it checks for understanding in situ, and lets the conversation overlapping turn-taking, a whole range of partners collaborate on the utterance that is in pro- available. We explore the use of one such Spoken dialogue systems (SDS) typically can- not achieve the close coupling between produc- tion and interpretation that is needed for this to work, as normally the smallest unit on which they operate is the full utterance (or, more precisely, the turn). (For a discussion see e.g. (Skantze and Schlangen, 2009).) We present here an approach immediate feedback, trial intonations and to managing dialogue in an incremental SDS that can handle this phenomenon, explaining how it is the incremental system was judged as sig- implemented in system (Section 4) that works in a micro-domain (which is described in Section 3).
As we will discuss in the next section, this goes be- yond earlier work on incremental SDS, combining In human–human dialogue, most utterances have the production of multimodal feedback (as in (Aist only one speaker.1 However, the shape that an et al., 2007)) with fast interaction in a semantically utterance ultimately takes on is often determined more complex domain (compared to (Skantze and not just by the one speaker, but also by her ad- dressees. A speaker intending to refer to some- thing may start with a description, monitor while Collaboration on utterances has not often been they go on whether the description appears to be modelled in SDS, as it presupposes fully incre- understood sufficiently well, and if not, possibly mental processing, which itself is still something extend it, rather than finishing the utterance in the of a rarity in such systems. (There is work on form that was initially planned. This monitoring collaborative reference (DeVault et al., 2005; Hee- within the utterance is sometimes even made very man and Hirst, 1995), but that focuses on written explicit, as in the following example from (Clark, input, and on collaboration over several utterances and not within utterances.) There are two systems The system described in (Aist et al., 2007) is A: Allegra, uh, replied and, uh, . . .
able to produce some of the phenomena that we In this example, A makes use of what Sacks and Schegloff (1979) called a try marker, a “question- reference game (as we will see, the domain we ing upward intonational contour, followed by a have chosen is very similar), where users can re- fer to objects shown on the screen, and the SDS Though by far not all; see (Clark, 1996; Purver et al., gives continuous feedback about its understand- ing by performing on-screen actions. While we domain, and indeed found frequent use of “pack- do produce similar non-linguistic behaviour in our aging” of instructions, and immediate feedback, as system, we also go beyond this by producing in (2) (arrow indicating intonation).
verbal feedback that responds to the certainty ofthe speaker (expressed by the use of trial intona- tion). Unfortunately, very little technical details are given in that paper, so that we cannot compare Even more closely related is some of our own previous work, (Skantze and Schlangen, 2009),where we modeled fast system reactions to deliv- We chose these as our target phenomena for the ery of information in installments in a number se- implementation: intra-utterance hesitations, possi- quence dictation domain. In a small corpus study, bly with trial intonation (as in line 2);2 immediate we found a very pronounced use of trial or in- execution of actions (line 4), and their grounding stallment intonations, with the first installments of role as display of understanding (“yeah” in line 3).
numbers being bounded by rising intonation, and The system controls the mouse cursor, e.g. moving the final installment of a sequence by falling into- it over pieces once it has a good hypothesis about nation. We made use of this fact by letting the sys- a reference; other actions are visualised similarly.
tem distinguish these situations based on prosody, and giving it different reaction possibilities (back-channel feedback vs. explicit confirmation).
The work reported here is a direct scaling up of Our system is realised as a collection of incre- that work. For number sequences, the notion of mental processing modules in the InproToolKit utterance is somewhat vague, as there are no syn- (Schlangen et al., 2010), a middle-ware pack- tactic constraints that help demarcate its bound- age that implements some of the features of the aries. Moreover, there is no semantics (beyond model of incremental processing of (Schlangen the individual number) that could pose problems and Skantze, 2009). The modules used in the im- – the main problem for the speaker in that do- plementation will be described briefly below.
main is ensuring that the signal is correctly identi-fied (as in, the string could be written down), and the trial intonation is meant to provide opportuni- For speech recognition, we use Sphinx-4 (Walker ties for grounding whether that is the fact. Here, et al., 2004), with our own extensions for incre- we want to go beyond that and look at utterances mental speech recognition (Baumann et al., 2009), where it is the intended meaning whose recogni- and our own domain-specific acoustic model. For tion the speaker is unsure about (grounding at level the experiments described here, we used a recog- 3 rather than (just) at level 2 in terms of (Clark, 1996).) This difference leads to differences in the Another module performs online prosodic anal- follow up potential: where in the numbers domain, ysis, based on pitch change, which is measured in typical repair follow-ups were repetitions, in se- semi-tone per second over the turn-final word, us- mantically more complex domains we can expect ing a modified YIN (de Cheveign´e and Kawahara, 2002). Based on the slope of the f0 curve, we clas-sify pitch as rising or falling.
This information is used by the floor track- To investigate these issues in a controlled set- ing module, which notifies the dialogue manager ting, we chose a domain that makes complex and (DM) about changes in floor status. These sta- possibly underspecified references likely, and that tus changes are classified by simple rules: silence also allows a combination of linguistic and non- following rising pitch leads to a timeout signal linguistic feedback. In this domain, the user’s goalis to instruct the system to pick up and manipu- 2Although we chose to label this “intra-utterance” here, late Tetris-like puzzle pieces, which are shown on it doesn’t matter much for our approach whether one consid-ers this example to consist of one or several utterances; what the screen. We recorded human–human as well matters is that differences in intonation and pragmatic com- as human–(simulated) machine interactions in this then back-channels as in the example, indicating acoustic understanding (Clark’s level 2), but fail-ure to operate on the understanding (level 3). (As an aside, we found that it is far from trivial to find ;17 execute(A,T) ;18 U) >} the right wording for this prompt. We settled on The user then indeed produces more material, which together with the previously given informa- sent to the DM faster (200ms) than silence after tion resolves the question. This is where the RN- falling pitch (500ms). (Comparable to the rules in LAs come in: when a sub-question is resolved, the DM looks into the field for RNLAs, and if there Natural language understanding finally is per- are any, puts them up for execution to the action formed by a unification-based semantic composer, manager. In our case, slots 4 and 13 are both which builds simple semantic representations out applicable, but as they have compatible RNLAs, of the lexical entries for the recognised words; and this does not cause a conflict. When the action a resolver, which matches these representations has been performed, a new question is accommo- against knowledge of the objects in the domain.
dated (not shown here), which can be paraphrasedas “was the understanding displayed through this action correct?”. This is what allows the user reply The DM reacts to input from three sides: semantic in line 3 to be integrated, which otherwise would material coming from the NLU, floor state signals need to be ignored, or even worse, would confuse from the floor tracker, and notifications about exe- a dialogue system. A relevant continuation, on the cution of actions from the action manager.
other hand, would also have resolved the question.
The central element of the information state We consider this modelling of grounding effects used in the dialogue manager is what we call the of actions an important feature of our approach.
iQUD (for incremental Question under Discus- Similar rules handle other floor tracker events; sion, as it’s a variant of the QUD of (Ginzburg, not elaborated here for reasons of space.
1996)). Figure 1 gives an example. The iQUD our current prototype the rules are hard-coded, collects all relevant sub-questions into one struc- but we are preparing a version where rules and ture, which also records what the relevant non- information-states can be specified externally and linguistic actions are (RNLAs; more on this in a second, but see also (Buß and Schlangen, 2010),where we’ve sketched this approach before), and what the grounding status is of that sub-question.
Let’s go through example (2). The iQUD in Evaluating the contribution of one of the many Figure 1 represents the state after the system has modules in an SDS is notoriously difficult (Walker asked “what shall I do now?”. The system an- et al., 1998). To be able to focus on evaluation of ticipates two alternative replies, a take request, or the incremental dialogue strategies and avoid in- a delete request; this is what the specification of terference from ASR problems (and more techni- the slot value in 1 and 10 in the iQUD indicates.
cal problems; our system is still somewhat frag- Now the user starts to speak and produces what is ile), we opted for an overhearer evaluation. (Such shown in line 1 in the example. The floor tracker a setting was also used for the test of the incremen- reacts to the rising pitch and to the silence of ap- propriate length, and notifies the dialogue man- We implemented a non-incremental version of ager. In the meantime, the DM has received up- the system that does not give non-linguistic feed- dates from the NLU module, has checked for each back during user utterances and has only one, update whether it is relevant to a sub-question on fixed, timeout of 800ms (comparable to typical the iQUD, and if so, whether it resolves it. In this settings in commercial dialogue systems). Two situation, the material was relevant to both 4 and of the authors then recorded 30 minutes of inter- 13, but did not resolve it. This is a precondition for actions with the two versions of the system.We the continuer-questioning rule, which is triggered then identified and discarded “outlier” interac- by the signal from the floor tracker. The system tions, i.e. those with technical problems, or where recognition problems were so severe that a non- Okko Buß and David Schlangen. 2010. Modelling understanding state was entered repeatedly. These sub-utterance phenomena in spoken dialogue sys-tems. In Proceedings of Semdial 2010 (“Pozdial”), criteria were meant to be fair to both versions pages 33–41, Poznan, Poland, June.
of the system, and indeed we excluded similar Herbert H. Clark. 1996. Using Language. Cambridge numbers of failed interactions from both versions (around 10 % of interactions in total).
Alain de Cheveign´e and Hideki Kawahara. 2002. YIN, We measured the length of interactions in the a fundamental frequency estimator for speech and two sets, and found that the interactions in the in- music. Journal of the Acoustical Society of America, cremental setting were significantly shorter (t-test, p < 0.005). This was to be expected, of course, David DeVault, Natalia Kariaeva, Anubha Kothari, Iris as the incremental strategies allow faster reactions Oved, and Matthew Stone. 2005. An information-state approach to collaborative reference. In Short (execution time can be folded into the user utter- Papers, ACL 2005, Michigan, USA, June.
ance); other outcomes would have been possible, though, if the incremental version had systemati- tions, facts and dialogue. In Shalom Lappin, editor, The Handbook of Contemporary Semantic Theory.
We then had 8 subjects (university students, not involved in the research) watch and directly Peter A. Heeman and Graeme Hirst. 1995. Collabo- judge (questionnaire, Likert-scale replies to ques- rating on referring expressions. Computational Lin-guistics, 21(3):351–382.
tions about human-likeness, helpfulness, and re- Massimo Poesio and Hannes Rieser. 2010. Comple- activity) 34 randomly selected interactions from tions, coordination, and alignment in dialogue. Dia- either condition. Human-likeness and reactivity were judged significantly higher for the incremen- tal version (Wilcoxon rank-sum test; p < 0.05 and p < 0.005, respectively), while there was no effect utterances in dialogue: a corpus study. In Proceed- ings of the SIGDIAL 2009, pages 262–271, London,UK, September.
Harvey Sacks and Emanuel A. Schegloff. 1979. Two preferences in the organization of reference to per- We described our incremental micro-domain dia- sons in conversation and their interaction. In George logue system, which is capable of reacting to sub- Psathas, editor, Everyday Language: Studies in Eth- tle signals from the user about expected feedback, nomethodology, pages 15–21. Irvington Publishers,Inc., New York, NY, USA.
and is able to produce overlapping non-linguistic David Schlangen and Gabriel Skantze. 2009. A gen- actions, modelling their effect as displays of un- eral, abstract model of incremental dialogue pro- derstanding. Interactions with the system were cessing. In Proceedings of EACL 2009, pages 710– judged by overhearers to be more human-like and reactive than with a non-incremental variant. We are currently working on extending and generalis- Buschmeier, Okko Buß, Stefan Kopp, Gabriel ing our approach to incremental dialogue manage- Skantze, and Ramin Yaghoubzadeh. 2010. Middle-ware for incremental processing in conversational agents. In Proceedings of SIGDIAL 2010, Tokyo, Acknowledgments Funded by an ENP grant from DFG.
Gabriel Skantze and David Schlangen. 2009. Incre- mental dialogue processing in a micro-domain. InProceedings of EACL 2009, pages 745–753, Athens, Gregory Aist, James Allen, Ellen Campana, Car- los Gomez Gallo, Scott Stoness, Mary Swift, and Marilyn A. Walker, Diane J. Litman, Candace A.
Michael K. Tanenhaus. 2007. Incremental under- Kamm, and Alicia Abella. 1998. Evaluating spoken standing in human-computer dialogue and experi- dialogue agents with PARADISE: Two case studies.
mental evidence for advantages over nonincremen- Computer Speech and Language, 12(3).
tal methods. In Proceedings of Decalog (Semdial2007), Trento, Italy.
Willie Walker, Paul Lamere, Philip Kwok, Bhiksha Raj, Rita Singh, Evandro Gouvea, Peter Wolf, and source framework for speech recognition. Techni- Performance of Speech Recognition for Incremental
Irritable bowel syndrome (IBS) is one of the most common conditionsthat is encountered in general medical practices It has the potentialfor protean manifestations, but generally is characterized by abdominalpain, bloating, and disturbed defecation. Based upon survey data from thegeneral population, the prevalence of symptoms that are suggestive of IBSis between 14% and 24% in women and from 5%