# 2 Background

# Contents
* 2.1 Human-ComputerInteraction
* 2.2 Dialogue Strategy Development
 - 2.2.1 Conventional Development Life cycle
 - 2.2.2 Evaluation and Strategy Quality Control
 - 2.2.3 Strategy Implementation 
 - 2.2.4 Challenges for Strategy Development
* 2.3 Literature review : Learning Dialogue Strategies 
 - 2.3.1 Machine Learning Paradigms 
 - 2.3.2 Supervised Learning for Dialogue Strategies 
 - 2.3.3 Dialogue as Decision Making under Uncertainty
 - 2.3.4 Reinforcement Learning for Dialogue Strategies 
* 2.4 Summary

# 2.1 Human-Computer Interaction

#### dialogue strategy & dialogue designer

* For computers, holding a conversation is difficult. Engaging in a conversation requires more than just technical language proficiency.
* Humans acquire these communicative skills over time, but for a dialogue system, they need to be developed by a dialogue designer.
* This usually is an expert who defines a dialogue strategy , which “tells” the system what to do in specific situations.

#### HCI

* Human-Computer Interaction (HCI) is the study of interaction between people (users) and computers (such as dialogue systems). 
* Human-machine dialogue dif- fers from human-human dialogue in various ways. 
* The most prominent features are the lack of deep language understanding and the lack of pragmatic competence (communicative skills) of the system.

#### error handling, uncertainty handling

* A substantial amount of recent work targets the problem of limited language understanding capabilities with so-called 
 - “error handling”, e.g. (Bohus, 2007; Frampton, 2008; Skantze, 2007a), or 
 - “uncertainty handling” mechanisms, e.g. (Thomson and Young, 2010; Williams, 2006; Williams and Young, 2007a).

This book addresses the problem of pragmatic competence: how to improve the communicative skills of a system by providing effective mechanisms to develop better dialogue strategies.

# 2.2 Dialogue Strategy Development
* 2.2.1 Conventional Development Life cycle
* 2.2.2 Evaluation and Strategy Quality Control
* 2.2.3 Strategy Implementation 

Academic systems often aim to emulate human behaviour in order to generate ‘natural’ behaviour, whereas commercial systems are required to be robust interfaces in order to solve a specific task.

* In the following we first describe 
 - the general development cycle 
 - for dialogue strategies 
 - (which is commonly used in industry as well as in research). 
* We then focus on two central aspects of this cycle, 
 - where techniques in research and industry differ widely: 
 - strategy evaluation/quality control and 
 - strategy implementation/formalisation. 
* We later argue for a computational learning-based approach, 
 - where the standard development cycle is 
 - replaced by data-driven techniques.

## 2.2.1 Conventional Development Life cycle

## 2.2.2 Evaluation and Strategy Quality Control
* 2.2.2.1 Quality Control in Industry
* 2.2.2.2 Evaluation Practises in Academia
* 2.2.2.3 The PARADISE Evaluation Framework

### 2.2.2.1 Quality Control in Industry

In industry the initial design is commonly motivated by guidelines and ‘best practises’ which should help to assure the system’s usability

### 2.2.2.2 Evaluation Practises in Academia

Dialogue strategies developed in academia are usually extensively tested against some baseline in order to make scientific claims, e.g. by showing some significant differences in system behaviour. 

### 2.2.2.3 The PARADISE Evaluation Framework

* PARADISE is a widely used framework for automatic dialogue evaluation introduced by (Walker et al, 1997, 1998b, 2000). 
* The main idea behind PARADISE is to estimate subjective user ratings (obtained from questionnaires) from objective dialogue performance measures (such as dialogue length) which are available at system run- time. 
* (Walker et al, 1997) propose to model “User Satisfaction” (US) using multiple linear regression (see Equation 2.1).
* User Satisfaction is calculated as the arithmetic mean of nine user judgements related to different quality aspects (see Table 2.1), which are rated on Likert scales . 
* A likert scale is a discrete rating scale where the subject indicates his/her level of agreement with a statement (e.g. from “strongly agree” to “strongly disagree”).

* κ : A param- eter related to task success (either the coefficient κ calculated from an external an- notation of correctly identified concepts, or a direct user judgment on perceived task success)
* $C_i$ : additional interaction parameters measuring dialogue efficiency and quality 

### 2.2.2.4 Strategy Re-Implementation

* After testing and evaluation, an error analysis is performed and the results are then used to re-design the strategy. 
* However, there is no framework which describes how evaluation results are best transferred into code.

## 2.2.3 Strategy Implementation
* 2.2.3.1 Implementation Practises in Industry
* 2.2.3.2 Implementation Practises in Academia

### 2.2.3.1 Implementation Practises in Industry

* Most commercial systems rely on Finite State Automata (FSA) controlled by menus, forms, or frames
* However, this development methodology is limited by the fact that every change in the conversation must be explicitly represented by a transition between two nodes in the network. 
* Dialogue strategies designed as FSA are based on hand-crafted rules which usually lack context-sensitive behaviour, are not very flexible, cannot handle unseen situations, and are not reusable from task to task.
* Furthermore, FSA easily become intractable for more complex tasks and cannot model complex reasoning.

### 2.2.3.2 Implementation Practises in Academia

* Most research systems to date have been based either on 
 - planning with logical inference, 
 - e.g. (Blaylock and Allen, 2005; Steedman and Petrick, 2007), 
 - or they are implemented in the “Information State Update” (ISU) approach 
 - using frames or 
 - tree sub-structures as control mechanism, 
 - e.g. (Larsson and Traum, 2000; Lemon et al, 2001). 
* More recently, statistical systems 
 - using machine learning approaches 
 - have become more prevalent, 
 - for example 
 - (Griol et al, 2008; Henderson et al, 2008; Thomson and Young, 2010; Young et al, 2007, 2009), and see (Frampton and Lemon, 2009) for a survey.

* Planning approaches are mostly used for complex tasks, like collaborative problem solving, intelligent assistants, and tutorial dialogues. 
* ISU-based systems are used for a variety of applications with different complexity (see Table 2.2 for refer- ences). 
* Both approaches have an higher expressive power than simple FSA, and can lead to more sophisticated (e.g. context-dependent) strategies. 
* On the other hand, these systems are harder to maintain and debug.

## 2.2.4 Challenges for Strategy Development

* How can this chasm be bridged? 
* Is there a third option which can meet the challenges for both 
 - cost-effective industrial speech interfaces and 
 - the advanced dialogue agents of academic research? 
* What requirements does it have to meet?

#### organic interface
Zue calls the dialogue system of the future an “organic interface”, that can learn, grow, re-congure, and repair itself.
* robust towards unseen events
 - generalise to unseen events
* context sensitive
 - dynamically adapt to every possible system context
* adaptive to the application environment
 - automatically adapt to different situations

# 2.3 Literature review : Learning Dialogue Strategies 
* 2.3.1 Machine Learning Paradigms 
* 2.3.2 Supervised Learning for Dialogue Strategies 
* 2.3.3 Dialogue as Decision Making under Uncertainty
* 2.3.4 Reinforcement Learning for Dialogue Strategies

## 2.3.1 Machine Learning Paradigms

In general, there are three major learning paradigms, each corresponding to a particular abstract learning task:

* Supervised Learning (SL)
* Unsupervised Learning (US)
* Reinforcement Learning (RL)

To date, different Machine Learning approaches have been applied to automatic dialogue management:

* Supervised approaches, which learn a strategy which mimic a given data set;
* Approaches based on decision theory, which are supervised approaches in the sense that they optimise action choice with respect to some local costs as observed in the data. In contrast to SL they explicitly model uncertainty in the observation;
* Reinforcement Learning-based approaches, which are related to decision theoretic approaches, but optimise action choice globally as a sequence of decisions.

## 2.3.2 Supervised Learning for Dialogue Strategies 

* example-based learning
* human assisted design

## 2.3.3 Dialogue as Decision Making under Uncertainty

Action selection is guided by the following optimisation:

* In this framework the agent selects the action A = a that maximizes expected utility, EU(a|o), where o are observed events.
* where utility(a,s) expresses the utility of taking action a when the state of the world is s. 
* The utility function is trained via “local” user ratings. 
* Users rate the appropriateness of an action in a certain state via a GUI while they are interacting with the system

## 2.3.4 Reinforcement Learning for Dialogue Strategies

In contrast to the above approaches, Reinforcement Learning treats dialogue strategy learning as a sequential optimisation problem, leading to strategies which are globally optimal 
* Markov Decision Processes (MDPs)
* Partially Observable Markov Decision Pro- cess (POMDP)

# 2.4 Summary

# 참고자료
* [1] Reinforcement Learning for Adaptive Dialogue Systems: A Data-driven Methodology for Dialogue Management and Natural Language Generation - https://www.amazon.com/Reinforcement-Learning-Adaptive-Dialogue-Systems/dp/3642439845