Computers, Humans and ‘Daisies’: Becoming Machine through Voice.

This post commences with a brief video extract taken from the photo-essay that I wrote for the special issue of the TDPT journal (10.3: ‘What is New in Voice Training?’). I decided to share on the blog a different audio-visual standpoint on my work. In contemporary academia, the so-called ‘practice turn’ allows scholars to find new and creative ways to share their research. In this sense, Voice Studies has necessitated a vocal approach to dissemination, and performance training needs to be addressed inclusively. I felt the urge to ‘vocalise’ my project, therefore my blog-entry aims to embed voices in the discussion and to offer a different way of listening to it.

What if a computer, or a machine, could teach us to sing or talk? As part of my practice-as-research Ph.D., I tried to train myself to ‘sound’ as an artificial voice, with an unusual coach: the computer itself. From November 2017-April 2018, I worked on an experimental training of voice re-production, with the specific aim of inverting conventional approaches to the loop of vocal mimicry: normally, we shape artificial voices on the basis of ‘natural’ voices, making computers mimic humans. My idea was to reverse the process and investigate how humans could mimic computers. I decided to develop a training approach that started from artificial voices, exploring human-machine communication, as well as approaching performance training differently. This blog entry contains audio-visual documentation of this process and, further, it is designed to accompany the self-reflexive and contextual account that can be found in the photo-essay. With these documents, I explore the work undertaken, explain the pitfalls and frustrations involved in the process, and outline future possibilities for performing machines differently.

Becoming Machine – a brief collection of my screen recorded exercises

Screen-recorded, the first video shows the process of editing and recording through the DAWs – Digital Audio Workstations – Praat and Ableton Live. The plug-in Chipspeech was the primary tool for this research: it allowed me to digitally recreate the original IBM 704’ speech synthesis that sung ‘Daisy Bell’ in 1964. In the first part, I have included one of the exercises that I created. My wish was to mix digital and real-life training, so I devised mixed-sources exercises. In this case, I present my attempt to ‘be taught by the computer how to vocalise vowels. It is possible to see how I created the vowels on Chipspeech, how I tried to replicate them, and then how I filed my recordings on the computer and sorted them by frequency and in alphabetical order. Praat, the sound analysis DAW, was fundamental to investigate the files phonetically. At 01.45 the video shows the analysis and comparison between audio files – the letter ‘I’ for example – and the difference in frequencies.

The second part – starting at 03.13 – introduces the other approach I developed for my project. Ableton Live is on vertical mode; the top left column has a speech synthesis version of ‘Daisy Bell’, in each cell. The second column is empty, as well as the third, the fourth, the fifth, and the sixth columns. On the top right, the seventh column, in every cell, has my human-voice-produced version of ‘Daisy Bell’. As the video continues, I filled the empty cells in the empty columns with ‘new’ recordings that attempted to increase the ‘robot-ness’ feel in my voice. First by copying the speech synthesis, listening to it. Secondly, by adding ‘robot-ness’ to my voice as I was listening to my human recording. On the left, you can find recordings based on me trying to replicate the speech synthesis; on the right, recordings based on myself trying to emit a robotic version of ‘Daisy Bell’, while listening to the human version. The central columns are meant to be filled by ‘re-worked’ and improved versions of the recordings, after a close listening to the ones in the second and the sixth columns.

The second video is recorded with the front camera of my laptop. After a brief introduction of the work that I am about to do, I start vocalising what I understand as ‘human speech-synthesis’. I decided to upload this part to the blog to help the reader engage with my struggle of trying and failing. My intention was to show the numerous attempts through which I realised how hard—impossible, even—the project was, and to invite the viewer/listener to think how a human could look and feel while ‘becoming a machine’ through newly devised voice pedagogy. This video documents my training on two separate days: one at the beginning of the project, the other towards the end – and allows me and the reader (or viewer/listener) to notice the differences in my voice.

The three audio files that I have chosen among more than a hundred represent my two best attempts in recreating the speech synthesis version of the song – included here under the name Robot.

Robot

The file Struggle is probably the most important: in less than 2 seconds, it embodies the struggle of months repeating the first two syllabi of ‘Daisy Bell’.

Struggle

The third file, Robot-Human, is a comparison between me and the computer voice.

Robot-Human

This work invites and cultivates a different point of listening, and hopefully, provokes a discussion on how human practitioners might engage with computers, speech synthesis and robots. I hope that other practitioners are inspired to engage in a similar attempt, and share these attempts in vocally becoming a machine (perhaps as comments below). Will their struggle be the same as mine?

About the author: I was born in Italy in 1990. I am a Ph. D. student, a musician, a trained actor, a DJ, and a comedian. My field of interest moves between voice, artificial voice, voice training, hauntology, posthumanism, HRI and HAAI. I am currently a PhD student at the University of Exeter working on a project on analysing the Posthuman Condition through voice, looking at the differences between artificial voices and natural voices in Performance Practices. My work with voices echoes in my musical project called Mr Everett, where we investigate human and machine communication through voice, comedy, and dance: https://www.youtube.com/watch?v=wiyrp4qXTdc