DE4104 - Disabled and Elderly

DE4104 - Disabled and Elderly

Deliverable D04.1, D04.2

Project Number: DE4104

Project Title: VOICE - Giving a VOICE to the deaf, by developing awareness of VOICE to text recognition capabilities

Deliverable Type: (PU/LI/RP)*PU

Deliverable Number: D04.1, D04.2

Contractual Date of Delivery: 4/99, 9/99

Actual Date of Delivery:3/00, 3/00

Title of Deliverable: User needs analysis report and validation of the prototype report

Work-Package contributing to the Deliverable: WP4

Nature of the Deliverable: (PR/RE/SP/TO/OT)** RE

Author(s): K. Miesenberger, Johannes Kepler University Linz

Abstract: The two deliverables report on the usability studies of speech recognition systems for deaf and hard of hearing people performed during the VOICE project.

a) D04.1 reports on the user needs analysis giving special attention to the ‘scientific state-of-the-art’ in usability and human factors research. The focus is on the state of the art in user needs analysis for people with disabilities and the design for all issue in the man-machine-interface development.

b) D04.2 reports on the user needs analysis and user tests performed in the VOICE project by giving special attention to text to speech recognition systems available on the market and the VOICE prototype.

Keyword List: Speech to Text Recognition, Deafness, Hearing Impairment, People with Disabilities

*Type: PU-public, LI-limited, RP-restricted

**Nature: PR-Prototype, RE-Report, SP-Specification, TO-Tool, OT-Other

VOICE projectDE4104Johannes Kepler University

VOICE – Giving a VOICE to the deaf by developing awareness of voice to text recognition capabilities

JRC - ISIS, Ispra

FBL, Mortara

Associazione Lombarda Famiglie Audiolesi, Milan

Centro Comunicare è Vivere, Milan

Institut für Hör- und Sehbildung, Linz

Johannes Kepler Universität, Linz

User needs analysis report and

validation of the prototype report

Part A: Scientific state-of-the-art in usability and user needs analysis

Part B: Evaluation of speech recognition systems and the VOICE prototype

Author:

Dr. Klaus Miesenberger

Johannes Kepler University Linz

Institute for Applied Computer Science

Department Computer Science for Blind People

This report integrates deliverable D04.1 and D04.2 This integration is made because of the close relation of both documents. D04.1 concentrates on the state of the art analysis of usability and human factors research and this body of knowledge is and will be used for the usability analysis of speech recognition systems and the VOICE prototype in the second part of the report, deliverable D04.2. The separation of both documents would lead to several repetitions not necessary in one document.

Table of Contents

Executive Summary

Introduction

Part A: Scientific state-of-the-art in usability and user needs analysis

1.Introduction

1.1 Usability: The need for flexibility and adaptability

1.2 Usability and spoken language in HCI

2.The challenge of usability

3.What is usability and why is it important

4.Criteria, steps and principles of usability and user-centred design

4.1 Criteria to measure usability

4.2 Principles of user-centred design

4.3 The process of user-centred design

5.Tools, methodologies, guidelines and projects promoting usability, user centred design and evaluation and accessibility

5.1 The World Health Organisation

5.2 World Wide Web Consortium (W3C) and World Wide Web Accessibility Initiative (WAI)

5.3 Guidelines for software development on different platforms: Microsoft and Sun

5.4 ISO standards – and others

5.5 Design for All and Universal Design

5.6 INUSE

5.7 EU-CON

5.8 Heuristic Evaluation

5.8 INCLUDE

5.9 MUSiC

5.10 USERfit

5.11 Respect

5.12 AVANTI

5.13 FORTUNE

5.14 Other References/Further Sources of Information

6.Conclusion

PART B: Evaluation of speech recognition systems and the VOICE prototype

1.Introduction: How speech input technology works

2.Introduction: How the VOICE prototype works

2.1 Subtitling presentations and conferences

2.2 Subtitling of TV and video

2.3 On the telephone line

3.Evaluation of speech recognition systems for different groups of people with disabilities

3.1 Methods used in the evaluation of speech recognition systems and the VOICE prototype

3.1.1 User groups taken into account during the tests performed in the VOICE project

3.1.2 Questionnaires used during the VOICE project

3.1.3 Speech recognition questionnaire

3.1.4 Prototype questionnaire

3.2 Occasions where the evaluation was done and where information was collected

3.3 Result I: What speech recognition can do at the man-machine-interface

3.4 Result II: Potential and needs for different groups of people with disabilities at a speech input oriented man machine interface

3.4.1 Users with special needs

3.4.2 Potential and needs to foster age related abilities

3.4.3 Potential and needs to foster vision abilities

3.4.4 Potential and needs to foster hearing abilities

3.4.5 Potential and needs to foster cognitive abilities

3.4.6 Potential and needs to foster mobility and movement abilities

3.5 Result III: Quantitative results of the questionnaire survey

3.5.1 Report on users needs analysis of speech recognition systems

3.5.2 Report on voice prototype: listener and speaker

4.Conclusion

5.Appendix 1: Questionnaire for speech recognition systems

6.Appendix 2: Questionnaire for the VOICE prototype

Executive Summary

The VOICE project aimed at raising the awareness on the evolving potential of speech recognition systems for people with disabilities, in particular for hard of hearing and deaf people. Although a lot of experts and users in this target group do know these systems or do use them frequently they are not aware of the broad aspects of supporting the efficiency at the man-machine-interface and the possibilities to support the communication with those having difficulties in communication.

The main focus of the project therefore was on disseminating information and on building up contacts to user groups,

groups of users with disabilities,
their contact persons and
relatives and people/experts working in fields related to the enhancement of the situation and the quality of life of people with disabilities

in order to make them aware that this technology has a high potential for them.

As a rather new technology which is already available with a high quality and at a low price the project additionally aims to find out the estimations of users with disabilities and those supporting them. Therefore the VOICE project tried to evaluate speech recognition systems which are already on the market to collect data on the usability in everyday professional or private life. At the same time when people are confronted with this technology and the question on how this will be of use for them their imagination and visionary thinking is stimulated. They bring forward very important information where the technology could be of use. In addition they define first basic problems in the complex situation of application.

As a basic research in order to be able to handle the collection of data as well as to base the discussions on a profound body of knowledge a review on the scientific state-of-the-art in usability and human factors research was performed. A focus was made on the state of the art in user needs analysis for people with disabilities and the design for all issue in the man-machine-interface development. This body of knowledge offered a considerable contribution to the evaluation itself and did guide the evaluation process. The results of these literature studies are presented in part A of this document what corresponds to the deliverable D04.1 of the VOICE project.

Based on this body of knowledge the user needs analysis were performed, reported in part B of this document what corresponds to deliverable D04.2 of the VOICE project. Part B therefore reports on the user needs analysis and user tests performed in the VOICE project by giving special attention to

text to speech recognition systems available on the market and
the VOICE prototype developed at the beginning of the project.

User’s estimations on speech recognition systems and the prototype elaborated in this deliverable are based on presentations given by the VOICE staff on several occasions and on tests with speech recognition system by different user groups.

The results have to be seen more as user’s estimations than as real needs of the users in everyday life. This seems to be adequate according to the goal of the project – to raise awareness – what means that mostly new users were addressed who only had the possibility to work with the systems a short time. How these estimations could be compared with the real user’s needs when systems are used intensively at work or at home can’t be quantified and should be subject of further research activities.

The evaluation made evident that estimations and user’s needs will differ considerably because an effective application of speech recognition systems in practice asks for a long period of training and certain amount of usage. In the short run people understand the quality and potential of the systems but due to the need to change the style of work they do not really start to use the systems in practice. These estimations will for sure change when the systems is used for a longer time. One of the major results of the evaluation is the fact that getting used to work with the system needs a long time and this often leads to a situation where systems which one sees as useful and of high quality, are not applied in practice.

The analysis of the scientific state of the art showed that a remarkable body of knowledge in user centered design, usability and design for all is available. Nevertheless no specific and elaborated information could be found on usability of speech recognition systems. Generally the body of knowledge on usability of Human Computer Interaction is applicable because these system are using GUI standards. Nevertheless more information on application of speech recognition systems to support people with disabilities would be needed because of the special situation of usage. This body of knowledge in most of his parts refers on the process of the development and/or evaluation of Human Computer Interaction; it does not refer on the specific situation of application of speech recognition systems for the target groups mentioned.

The evaluation of standard speech recognition systems showed that the quality of these systems is much higher than people expect it to be. After almost any trial or presentation people reported that they are surprised by the quality of these systems. Nevertheless this positive impression does not lead automatically to an increased use in practice. The evaluation made evident that the unfamiliar style of interaction with the system via speech and the differences compared to normal “human-human” communication lead to a situation, that standard methods of interaction are preferred and special usage for the specific needs of people with disabilities are often not considered.

An other important result is the fact that people do not know how flexible these systems adapt to the style of speaking of people who have speaking problems (e.g. speech impairment, hard of hearing and deaf people). Although a human being not knowing the person could hardly understand this person their voice very often is recognized very well by speech recognition systems.

Generally people were only aware of the possibilities for mobility and movement impaired people. The potential and the chances for other groups of people with disabilities, e.g. hard of hearing and deaf people - the main focus group in the VOICE project -, blind and visually handicapped users, mentally retarded and elderly people, were not seen without making people aware.

The biggest problems encountered in applying speech recognition system were related to the training process. These interfaces should be made more adaptable in order to be able to take the specific needs of the users into account. The stories, which have to be read during the training process, for example, are to complicated, to demanding or to tiring for different groups of people with disabilities and elderly people, too. Possibilities to adapt these stories would be very much appreciated. Generally this would make the systems more usable, in particular for younger users. The arrangement of the training dialogue often causes problems to visually handicapped and blind people (e.g. finding the position from where to start reading again when the system did not recognize correctly, video information).

After having done the intensive training, problems occur when people do not remember what they can say to fulfill a certain task. Short and context sensitive lists of commands are often not available as on-screen help, outputs in accessible formats (e.g. Braille or large print) or as basic digital documents which could be adapted in a fast and easy way.

Although different user groups might have big problems in using the system to dictate text into a word processing application, these systems seem to be very suitable to be used as an alternative or additional device for handling the desktop.

Due to the usage of an other channel of input the complexity of the system is increased what often asks for simplification and reduction to simple commands and avoiding confusing interactions, for example those using speech output.

Generally there is evidence that people, in order to make them use the systems in practice, would be in favour of case studies and information on how the system could be integrated and support them in everyday life. In any case users have to be informed that getting trained and ready to use the system will take a longer period of time; having spoken words recognized by the system is simply not enough. The system, when really applied in a way that could improve the efficiency and effectiveness at the man-machine-interface or when dictating, asks for changes in the style of work and the environment of work. The estimated potential of speech recognition can only be experienced if people are willing to employ the system over a longer period of time.

The evaluation of the prototype for using speech recognition for subtitling purposes for hard of hearing and deaf people showed that people are very interested in such systems. It was pointed out that these systems must not be seen as a replacement of but have to be seen as an addition to sign language or lip reading.

Although the speed and accuracy of speech recognition is very high even a very small time lag between speaking and the presentation of subtitles leads to problems for the audience in being forced to decide to go for the lip-reading or the subtitles. The possibility to get a protocol of a speech afterwards is of course a big help.

The efforts to prepare all the technology needed, to train the system and to behave as the system prescribes makes speakers doubtful if the system might be usable in practice.

The VOICE project was able to show that the system could be a valuable tool in the subtitling of TV broadcasts or the support of the subtitling work. Using the system on the telephone line still seems to be very critical due to the low and changing quality of the speech signal.

The evaluation showed that people are still unaware of the power of speech recognition systems to increase the efficiency at the man-machine-interface and to support people with disabilities. A lot of people pointed out that they will start to test the systems more intensively because of the presentation. More doubts were pointed out that a system like the prototype would be really applied in practice. This is estimated not because of the technology but because of the standard style of interaction and communication which is interfered to much by this a technology.

The evaluation showed that there is a high potential in use of speech recognition as an assistive device to

a)handle the man-machine-interface in standard and in special situations

b)support the communication of hard of hearing and deaf people and

c)support the communication of speech impaired people.

Although a considerable body of knowledge is available concerning the accessibility and usability of the man-machine-interface for people with disabilities there is a need for further research and engagement to put this potential into action. Thereby a special focus on simple dialogues suitable for everyday life (e.g. smart houses, ATMs) should be made.

Introduction

Usability of man-machine-interaction got more and more important over the last years. The increasing use of all kinds of tools asks for focusing design and development on usability in order that users are satisfied in using these tools in their environment for their specific purposes and the usage is opened for everybody.

Fundamental advances in a more user-friendly interface to computers are already recognised as a democratic necessity. Without such advances a large portion of society will be excluded form the Information Society and put at a disadvantage. New IT based systems replace older non-technical systems and therefore the Information Society does not only offer possibilities to choose but forces to change, too, when well known systems disappear. For people with disabilities of course it is often not an alternative but the only possibility to gain better access to information.

Spoken language interfaces have a high potential to provide a higher level of flexibility and adaptability at the MMI. Voice recognition has to be seen as a step forward towards a media independent representation and handling of information which is provided by the multimedia power of computers. This flexibility is a prerequisite for more user friendly, adaptable and accessible interfaces. The more one can change the input as well as the output channels the more effective ways of interaction can be found. [Miesenberger, K.: Voice to Text Recognition and Text to Speech Synthesis – A Challenge for Rehabilitation and Integration of People with Disabilities, EMBEC conference, Wien, November 1999]

Modern concepts of Human-Computer-Interaction (HCI) revolutionised the style of interacting not only with computers but with a broad variety of tools where the interaction is separated from the actual process performing the final task and organised according to the specific interface the computer provides. More and more processes at the interface to the user are oriented towards this specialised level of interaction which is known as the evolved and tested standard of HCI. The separation of the level of interaction from the level of the actual performing the task by the machine makes standardisation and therefore a better orientation of users in different tasks possible.