2003 IFA Congress: Montreal, Canada

Advanced Digital Capture Technology in Identification of Stuttering Disfluencies

Glen Tellis, Thomas Meloy, Michelle Henning, and Dawn Jarvie
Indiana University of Pennsylvania, Dept. of Special Education, 259 Davis Hall, Indiana, PA 15705


The use of video-capture technology is a powerful method of identifying stuttering disfluencies and secondary behaviors. We use video-capture technology to save sessions onto Videotizer machines, on computer hard drives, in DVD format, as well as on a server. Supervisors and clinicians use voice-over on the Videotizers so that clients can review sessions. Clients can use the computer cursor on movies to freeze any frame and identify disfluencies and secondary behaviors. We have found that identification of disfluencies and secondary behaviors with computerized video recordings of speech samples is more powerful than videotape only recordings. Procedures for assessment are discussed.

  1. Introduction
Currently, transcription and playback of audio- and video-recorded data is often used to tally disfluencies and secondary behaviors (Peters & Guitar, 1991). Even though these methods can be tedious and time-consuming, there is agreement that transcription based procedures are needed for an in-depth analysis of disfluency and secondary behaviors. The reliability of these measures, however, has been debated (Cordes, 1994). Many authors have discussed the need to reliably measure stuttering (Conture, 1990; Cordes, 1994; Young, 1984) given the variability from clinician to clinician in the accuracy of identifying disfluencies and secondary behaviors (Curlee, 1981; lngham & Cordes, 1997; Kully & Boberg, 1988).

Reliability and accuracy of measurement may be enhanced by real-time analysis. Conture (1990) has suggested that stuttering may be analyzed in real-time instead of in the transcription-based format. One of the benefits of real-time analysis is that the clinician can assess a client’s stuttering quicker and easier. With improvements in technology, it is now possible to video-record a sample of speech and simultaneously play and save the speech sample on a computer hard-drive. This method of recording data seems to be a promising avenue for future research because of its potential for (1) using real-time visual and speech feedback, (2) saving the data on a compact disk (CD), digital video disk (DVD), or videotape, (3) implementing frame-by-frame playback, (4) pausing the picture frame while the person is in a block without any distortion of the image, (5) recording the time spent speaking on-line, and (6) using the computer cursor to fast-forward, rewind, play, and pause the speech sample without having the difficulty of searching for a speech segment.

Apart from the advantages outlined above, movie-clips can be arranged linearly for multiple images to be isolated and compared. Multiple images, therefore, can be viewed on a single 21-inch monitor and compared simultaneously. These images can either be of the same speech sample or of different speech samples. A single speech sample can also be enlarged to encompass the entire screen. Other advantages are that live recordings can be made from a video camera to a computer, video-recordings and direct computer recordings can be used simultaneously, and recordings can be transferred from a computer to the video-recorder. The above methods are very beneficial in
enhancing identification of disfluencies and are used regularly in our clinical sessions with clients who stutter.

There are numerous commercially available computer programs that permit recording and playback of speech samples. Clinicians can use programs that are IBM Personal Computer (PC) or Macintosh compatible. For example, clinicians can use I-Movie software from Apple to record, edit, and playback movie-clips. Software is also available for IBM PC compatible machines. For example, software from Pinnacle, Dazzle, and Vegas Video can be used to achieve similar results as the I-Movie software. The IBM and Macintosh programs are excellent for recording, playing back, and editing, however, editing and compressing movies of clinical sessions is a time-consuming process with these programs.

To address the issue of time-consuming editing and compression, we decided to build a state- of-the-art laboratory that had the capability of live video-capture technology. We also saw a need to create a combined program of classroom instruction and clinical experience that uses modern technology to teach students to assess and treat fluency disorders. Initially, we researched the work of IRIS technologies, a company that designs video-audio-computing systems and routers. The IRIS video system allows multiple clinical sessions to be observed, recorded on a Videotizer (digital video recorder with 100 hours of video storage), digitally stored on a secure internal Intranet central server, and then transferred to video-tape or DVD format. Sessions can also be quickly retrieved from the central server any number of times for analysis in the classroom, an immensely helpful feature to complement classroom instruction.

This paper will discuss the latest video-capture technology and methods of using this new technology to improve reliability of identification of disfluencies and secondary behaviors.

  1. Method
For the IRIS technology to work, a system of eight cameras in four observation rooms allow supervisors to view clients and clinicians from different angles and hear multiple sessions at once in a control room. The system includes an audio feature that allows supervisors the option of providing students feedback by way of headphones. Supervisors no longer need to constantly interrupt a session when immediate feedback is needed. Supervisors can observe all four sessions simultaneously from the control room or stay behind a one-way mirror and instruct discreetly from there with minimal interference via the wireless headphone set.


IRIS software and hardware is used. The IRIS components include, a video and audio routing matrix, a RS232-422 converter to change the video signal from analog to digital, a video-audio controller, and a digital media Videotizer recorder. Apart from the IRIS software and hardware other equipment includes a nonlinear editing system to edit the video. The IRIS software includes a system that allows for easy editing and conversion to Moving Picture Experts Group (MPEG) format. A Dell Pentium 4 computer with 1.7 GHz processor and 512 MMB RAM, 80 GB hard-drive, and a CD-RW/DVD-ROM, with a 21” Dell Pll30FD Trinitron Monitor is used. A Dell RAID Server with, 292 GB, dual processor, with tape backup, 512MB, Floppy, Windows 2000, Remote Assistant Card, CD, and Veritas SAN is used. A DVD with a CDRW-DVDRW burner is also used. A VCR and Television is used to transfer movies to videotape. Several Sony mini-digital video recorders are used as well as Pan/Tilt/Zoom Unitized ceiling mounted cameras. Clinicians have wired lapel microphones and earphones with receivers and transmitters. Supervisors have the option of using wireless headsets with microphones.


Supervisors sit in a control room and observe the stuttering treatment sessions on a quad-screen television monitor. Four sessions, therefore, are observed simultaneously. Supervisors record all four sessions simultaneously on four Videotizers by clicking the record button on the computer screen. Each Videotizer is connected to the computer. Supervisors can also use the camera joystick 58 Theory, research and therapy in fluency disorders controls in the control room to pan around the therapy room or zoom onto the client’s face. During the sessions, supervisors can use the headphones to discreetly provide feedback to students to adjust clinical procedures. Supervisors can also include their comments (via a microphone) on the video samples either by using voice-over or typing feedback on the video samples. The voice-over samples are recorded on a separate channel on the Videotizer. After or during each session, clinicians can review the supervisor voice-over or type-written feedback and make adjustments to their assessment procedures.

Clip-marks (bookmarks) are another practical feature of the IRIS software. Supervisors and clinicians can use clip-marks when moments of stuttering or secondary behaviors occur. For example, in a 30 minute session, supervisors or clinicians can make clip-marks on the computer screen either during a session or after the session ends and analyze samples for disfluencies and secondary behaviors. The use of clip-marks is an extremely helpful feature of the software because it allows clinicians and clients to conduct real-time analysis of disfluencies. A clinician can use the super fast-forward feature in the software to go through an entire 30 minute session in 30 seconds. The clinician can also click the up arrow key on the computer screen to jump to clip-marks for identification purposes. The video can also be slowed down to slow-motion mode. Slowing down the video makes for easy identification of disfluencies and secondary behaviors.

On completion of the session, the clinicians can make MPEG movies of the clip-marks. A 30 minute session can be compressed in less than 3 minutes and a 2 minute clip-mark can be converted into a MPEG movie in less than 20 seconds. This is an excellent feature of the software because the clinician can burn the software onto a CD or DVD and give the samples to the client for review at home. Apart from the advantages outlined in the introduction section, these MPEG samples can be viewed in frame-by-frame playback mode and paused during a block to identify disfluencies and secondary behaviors.

While the sessions are taking place, the speech samples are transferred from the Videotizer to the server and also burned onto CD, DVD, or VHS format. Clinicians canlater bring up the clinical sessions and identify every disfluency and secondary behavior with their clients. Clinicians can also take movies from mini-digital video recorders or videotape format and send them to the Videotizers or vise-versa.

  1. Discussion

The use of the ‘IRIS software and hardware has had a tremendous impact on our clinical program in fluency disorders. Students are now able to conduct real-time analysis without the time consuming process of watching their clinical sessions on VHS format and losing quality because of distorted images while pausing frames. We have compared computerized video recordings of speech samples with videotape-only recordings in our clinic and have noted that computerized recordings provide a more powerful method of analyzing moments of disfluencies and secondary behaviors than videotape-only recordings. Apart from the advantages listed in the preceding sections, it appears that clinicians using traditional videotape procedures for identification may fail to record numerous secondary behaviors. Various measurement instruments regard secondary behaviors as a major component in determining whether the person who stutters is identified as having a mild, moderate, or severe problem. Additionally, many clinicians advocate the reduction or elimination of secondary behaviors as part of effective therapy. We have found that when clients in our clinic use the advanced digital system, they increase their identification of stuttering disfluencies and secondary behaviors.

Our clinicians are now also able to transcribe their sessions in a timely manner because of the practical feature in the IRIS software that includes slowing down the speed of the digital video output. This has solved the problem of tedious and time-consuming transcription and has allowed our clinicians to spend more time analyzing their client’s speech samples. Analysis of speech samples is, therefore, quicker and easier with this technology. It appears that this slowing down

and pausing feature in the software has also resulted in more reliable and accurate identification of disfluencies and secondary behaviors as well as less variability in responses from clinician to clinician. Another advantage of this software is that editing and compression of video samples is no longer time-consuming. Video can be edited while sessions are taking place with the use of clip-marks. Clip-marks and entire sessions can be compressed and saved as MPEG files in seconds and clinicians can, therefore, spend valuable clinical time identifying disfluencies and secondary behaviors with their clients instead of editing movies with the traditional video-capture technology or watching sessions on a VCR.

The IRIS software also allows us to successfully create a combined program of classroom instruction and clinical experience that uses modern technology to teach students to assess and treat fluency disorders. A notable feature of the software is that the saved sessions can be viewed in classrooms via a projector. The sessions can be retrieved from the secure internal Intranet server and displayed in classrooms. Beginning clinicians and students can View the saved sessions in a classroom and supervisors and faculty can teach an entire class about various types of disfluencies and secondary behaviors. An entire class, therefore, can view speech samples from one client and conduct practice assessment procedures in class. These practice sessions have been very beneficial in our training program because we are able to train our students in identification in a stress-free environment.

The use of this advanced digital software seems to be a promising avenue for future research. IRIS software and hardware has resulted in improved identification of disfluencies and secondaries by our clinicians and clients. This technology is not only beneficial in identifying disfluencies and secondaries but can also be used during treatment session with persons who stutter. For example, if a client is incorrectly producing a cancellation, the clinician can save every frame that was incorrect and compare them to frames that had correct cancellations. Clinicians can, therefore, use the software as a learning tool for their clients. The IRIS system can also be used for training clinicians in an educational setting. Supervisors have the ability to observe four clinicians at once and provide discreet feedback without interrupting the flow of the treatment session. This software and hardware has many clinical applications as well as potential for future research in assessment and treatment of fluency disorders. '

This project was funded by the Pennsylvania Department of Education’s 2002 Link to Learn Higher Education Technology Grant.

Conture, E. G. (1990). Stuttering (2â_œâ  ed.). Englewood Cliffs, NJ: Prentice-Hall.

Cordes, A. K. (1994). The reliability of observational data: I. Theories and methods of speech- language pathology. Journal of Speech and Hearing Research, 37, 264-278.

Curlee, R. F. (1981). Observer agreement on disfluency and stuttering. Journal of Speech and Hearing Research, 24, 595-599.

Ingham, R. J., & Cordes, A. K. (1997). Identifying the authoritative judgments of stuttering: Comparisons of self-judgments and observer judgments. Journal of Speech and Hearing Research, 40, 581-594.

Kully, D., & Boberg, E. (1988). An investigation of interclinic agreement in the identification of fluent and stuttered syllables. Journal of Fluency Disorders, 13, 308-318.

Peters, T. J., & Guitar, B. (1991). Stuttering: An integrated approach to its nature and treatment. Baltimore, MD: Williams & Wilkins.

Young, M. A. (1984). Identifications of stuttering and stutterers. In R. F. Curlee & W. H. Perkins (Eds.), Nature and treatment of stuttering: New directions. San Diego: College-Hill.


In preparation for the 2018 World Congress the IFA is implementing Japanese translations of some pages on the site. Choosing Japanese below to see these translations.

Not all pages are translated, but you can use Google translate to see a machine translation using the switch below

Google Translate

Follow the Joint World Congress