Thoughts/Code by Mary Truong
Communication requires a signifier: a sensed thing that refers to another thing. Say I send an email saying,
"I want an apple." The text is sensed through the recipient's eyes which can see the word "apple," and the
recipient realizes that I am referring to a physical "apple." There's fluidity in the way we signify things.
Images are full of visual subtleties in meaning, text saturates with poetics, and words are carried with the lilt of a voice.
Let's divide communication into three parts: sender, message, and receiver. The voice works as a substrate
that carries meaning from the sender's body; therefore, it is intrinsically tied to an originating body.
A voice implies a body, but it is not part of a body— they are separate. And yet a voice is as integral to
identity as a body: to share pieces of ourselves, we "share our voice" and "let our voice be heard."
Deprived of context, speech consisting of the same words may seem to have the same meaning, but voice adds context.
Me saying, "I am tired," could be radically different than you saying the same phrase.
Michael Chion used the term "acousmatic voice" to describe a voice that is separate from its origin; a voice that
seemingly has no source. With the advent of new technology (radio, telephone), acousmatic voices have become mundane.
An acousmatic voice is still recognized as part of its origin, but it is digitally mutated and mangled beyond the
recognizable self. How many times has a person flinched after hearing a digital reproduction of their voice?
After a voice has been translated into digital signals and transmitted across networks, how can we confidently establish
the source of the voice? Is the source the person, or is it the machine?
A new acousmatic voice exists given our current technology: rather than a reproduction, these voices are productions,
and constantly in the state of producing and being produced. Digital voice assistants like Alexa are not tied to a
singular body. They permeate wherever their corresponding digital device exists. These voices were invented through
piecemeal construction, the concatenative synthesis of electronic samples (although, now, speech is produced using neural
networks— black boxes of computed rules). If Alexa has a body, we can speculate that her body is fragmented in the processes
of thousands of servers (or maybe that's incorrect— do we think of the separate cells in our body as fragmentation?).
But it is clear that Alexa is not an individual entity who has a will to convey information. In the paradigm of sender,
message, and recipient, what does Alexa become?
Alexa, the voice, cannot be the sender (after all, voices are not bodies), but Alexa is a responsive technology;
she exists to serve you, the user. If the user initiates Alexa, then her response indicates your original request.
Alexa is an acousmatic voice, and the unseen source of her voice is you, the user.
This doesn't seem to be quite right. Yes, Alexa responds to its users, but she is not simply an echo: there are some
processes involved in her response that transfigures wish into fulfilment. A monumental corporation decides Alexa's methods
and mannerisms; as with most computable things, Alexa is calculation. So Alexa's body is you sifted through the filter of
corporate constraints, repeatedly— as often as you use Alexa to your and Amazon's benefit (although the benefit is mostly
Amazon's). Alexa is you, but you reflected as a consumer.
This sound sequencer reverses how Alexa's voice is produced: rather than concatenating units of speech to build a voice,
I took Alexa's speech and divided it into distinct sounds that the user can choose. The user is still the source of Alexa's
voice, but the dynamic shifts: you do not use Alexa to consume; you use Alexa to produce (even if the production is a
simple beat-box pattern). Rather than the user's words being altered through language processing, Alexa's voice detunes.
This simple web-app is meant to be a source of creation.