I’ll be presenting on Digital Sound Pedagogy at Hodgepodge Coffee in Atlanta this morning at an Atlanta Connected Learning meetup. The slides I’ll use for my presentation can be viewed below or at this link.
Last spring, my colleague Lauren Neefe recorded an interview with me about my work on sound, machine learning, and laughter. After the interview, she edited the interview into the first episode of Flash Readings, a podcast that showcases the research of the Marion L. Brittain Fellows at Georgia Tech. The podcast is now up at TECHStyle, our online forum for digital pedagogy and research.
Each episode of Flash Readings focuses on a particular sound in relation to a Brittain Fellow’s research. The episode featuring my research, titled “Laughter Worth Reading, focuses on two instances of laughter on recordings of William Carlos Williams’s “This Is Just To Say,” and in the episode I consider the difference such laughter makes in terms how audiences perceive poems and on how critics should interpret them. It also gestures toward the work I’ve been doing in the wake of the HiPSTAS Institute.
I’ve posted a session proposal for this weekend’s THATCamp over at the THATCampVA 2013 blog.
The text of the proposal:
Tools for exploring big sound archives
Brandon Walsh has already proposed a session about tools for curating sound, so what I’m proposing here might well fit into his session, but in case what I’m proposing is too different, I wanted to elaborate.
At THATCamp VA 2012, I proposed and then participated in a discussion about how digital tools could help us not just think about tidily marked plain-text files, but also the messier multimedia data of image files, sound files, movie files, etc. We ended up talking at length about commercial tools that search images with other images (for example, Google’s Search By Image) and that search sound with sound (for example, Shazam). A lot of our discussion revolved around the limitations of such tools–yes, we can use them to search images with other images, but, we asked, would a digital tool ever be able to tell that a certain satiric cartoon is meant to represent a certain artwork. For example, would a computer ever be able to tell that this cartoon represents this artwork?
Our conversation was largely speculative (and if anyone wanted to continue it, I’d be happy to have a similar session this time around).
Since then, however, I’ve become involved with a project that takes such thinking beyond speculation. As a participant in the HiPSTAS institute, I’ve been experimenting with ARLO, a tool originally designed to train supercomputers to recognize birdcalls. With it, we can, for example, try to teach the computer to recognize instances of laughter, and have it query all of PennSound, a large archive of poetry recordings, for similar sounds. We might be able, then, to track intentional and unintentional instances when audiences laugh at poetry readings.
The project involves both archivists and scholars–the archivists are interested in adding value to their collections (for example, by identifying instances of song in the StoryCorps archive), and the scholars are interested in how this new tool might help us better visualize and explore poetic sound and historical sound recordings.
My sound-related proposal, then, is this: to have a conversation about potential use cases for this and similar tools. Now that we know we can identify certain kinds of sounds in large sound collections, how should we use such a tool? Since Brandon’s already interested in developing sound collections using Audacity, I thought we might also add this big-data/machine-learning tool into the mix of the conversation.
The presentations at today’s HiPSTAS sessions laid a groundwork for understanding the scholarly context of current work in digital sound studies. The LBJ Library’s Sarah Cunningham emphasized the urgency of analog-to-digital conversion because of the impending deterioration of many analog formats, and she also stressed the difficulty of prioritizing such digitization efforts. Loriene Roy and, later in the day, Tim Powell emphasized tensions between scholars’ desires for freely accessible, constantly archived information and the need to respect sacred traditional knowledge and traditional cultural expressions in indigenous communities. Quinn Stewart showed the way that GLIFOS software has increased access to the collections he works with, and John Wheat, of the Briscoe Center for American history, emphasized the undiscovered treasures that currently remain hidden in many audio archives. Al Filreis, who reminded everyone that the first goals of Pennsound are to provide access to sound files and to reach out to potential communities of poetry readers, wondered whether the curated character of Pennsound was fully compatible with the ARLO tools.
Two of today’s presentations, though, were particularly interesting to the way I think about the potential of digital tools to promote our understanding of poetic sound.
First, J. Stephen Downie, of the School of Library and Information Science at the University of Illinois, blowed me away with his demonstrations of the digital tools he and other musicologists have been using to explore music. I suppose I knew this sort of search and comparison was going on because I use Shazam regularly, but seeing the tools and interfaces Downie has been developing and using made me think that literary scholars are far behind in using such tools to consider sound. I, and everyone else, was particularly impressed by a demonstration of a tool that compares each second of a song to every other second of the same song to digitally determine song structures (A parts, B parts, refrains, bridges, etc.). When Downie showed that he could click on one of his visualization’s diagonal lines (which imply a song match between two parts of a song and thereby define a section) and have the two parts being compared play at once, many of us gasped. While the spoken-word and recorded-poetry most of us at HiPSTAS are working with aren’t quite as meticulously timed as most music, the demo of Downie’s music tools made clear that literary and historical scholars could be making significantly better use of the sound files now at our fingertips.
The other presentation that will stick with me most was Steve Evans’s talk on “The Phonotextual Braid” of timbre, text, and technology. While Evans’s emphasis on a poet’s performance as the most important scene of poetic sound is one I have some disagreement with, he showed that questions of sound need not be divorced from questions of context, politics, and society. In the talk, Evans worried that the current vogue for the digital humanities might displace scholarly attention to equally important theoretical concerns, but I think it’s worth pointing out that engaging with ARLO has already advanced my theoretical thinking about of poetic sound. I’ve never been one to make the claim that building is the same as theory in DH, but I do think the tension between questions of theory and questions of the digital humanities is not so irresolvable as some seem to think.
The first meeting of the HiPSTAS institute begins in Austin tomorrow, and I want to use the occasion to clarify some of my recent ideas about poetic sound and to describe how I believe the digital tools associated with HiPSTAS can help us explore the same.
A few years ago, as experiments in data mining and mass search began to seem like the next big thing in the digital humanities, I found myself dissatisfied with most of the results. Most big-data approaches struck me as glorified Google n-gram searches, and I thought big data might only confirm our suspicions about what we thought we already knew about literary history. For example, we might do a big-data search on faith in the 19th and 20th centuries to map a decline of the same we already knew was there. Most results just seemed to confirm our presuppositions. At THATCamp Virginia 2012, I proposed and led a session about the place that messy data like image files and sound files have in such a digital humanities environment, and while the discussion was interesting, we didn’t make much headway into specifics or practical applications.
I worried that what most interests me about literature, and especially about poetry—the seemingly unquantifiable, non-semantic aspects of poetic language whose appeal to readers is more elusive than the narrative representation of content—might be wholly inaccessible to the digital humanities. As I described two major sides of my academic interests in a job interview a while back (that is, the side of me that’s interested in modernism, nonsense, and the non-semantic properties of language and the side of me that’s interested in digital humanities and digital culture), an interviewer wondered if nonsense wasn’t finally incompatible with the digital humanities. What place could the messy, intentionally non-meaning words of a Dada sound poem, for example, have in a digital-humanities environment obsessed with and dependent on tidily structured textual data?
At the Society for Textual Scholarship conference 2012 (which was also in Austin at almost exactly this time last year), I presented a paper asking that very question. My paper asked how the digital humanities could begin to consider Hugo Ball’s “Karawane,” whose performance context I explored in my dissertation (and which I blogged about here), as an instance of poetic sound. What would the object of study for a DH exploration of “Karawane” even be in the first place?
The very term sound poem points to an ontology based in sound, but Ball himself never recorded a performance of the poem. Instead, we access the poem through surrogates, either textual surrogates like a famous printing of the poem (whose famous typography was designed by Richard Huelsenbeck, not by Ball) or recorded performance surrogates. The several examples of such surrogates captured by PennSound and Ubu, including a remarkable one by Christian Bök, a notably bizarre one by Marie Osmond, and a somber one by Trio Exvoco, are notably different from each other. Bök’s bravura song-chant contrasts sharply with Osmond’s anglicized earnestness, and Trio Exvoco adds numerous echo effects and uses multiple voices. The situation became even more complicated when I began to consider the numerous remediations of the poem found on YouTube, which audiovisually reimagine the scene at the Cabaret Voltaire and often invent details not backed by the historical record (rotten fruit thrown at a humiliated Ball in one such video is an example). Even as I noted these videos’ departures from fact, I began to wonder why archives like Pennsound would treat Bök’s version of the poem, or Marie Osmond’s, or Trio Exvoco’s, as any more authoritative (and worthy of archiving) than some student’s interpretation of the poem in a student project on YouTube. To be sure, I adore Bök’s reading of the poem—but it almost certainly isn’t much closer to what some original performance of the poem was than is Osmond’s, or a random YouTube videomaker’s.
Charles Bernstein has argued that we should treat authors’ performances of their poems with a special regard: “a poet’s reading of her or his own work has an entirely different authority,” he argues, from that of an ordinary reader. This philosophy guides the collection of sound at Pennsound. There are exceptions, of course—Jerome McGann’s readings of Edgar Allan Poe represent one, Bök’s readings of Ball another—but for the most part Pennsound is an archive of poets reading their own work.
Beyond his implicit rebuke to the intentional fallcy, Bernstein breaks with another major tradition of poetry scholarship. In considering poetic sound, the individual performance has generally been regarded as subordinate or even irrelevant to the abstract sound of the “poem itself,” by which critics tend to mean some transcendent version of a poem outside any material instantiation. Prosodists and formalists tend to emphasize this idealized version of poetic sound, marked neatly by stressed and unstressed syllables, feet, caesura, and rhyme schemes (some evidence of which is on display at Herbert Tucker’s Scholars’ Lab project “For Better or For Verse”). In this version of poetic sound, sound emerges from the textuality of a poem but also somehow supersedes any given instance of the poem. It’s a model arguably shared by Garrett Stewart, who argues that literature is a “phonotext,” always laden with the potential of sound and, in reading, always actualized, whether silently voiced mentally or voiced aloud, as sound.
If Bernstein emphasizes poetic sound in performance and Stewart poetic sound as sonic potential in a text, another group of critics tends to react against sound altogether. At last year’s STS panel, Laura Mandell echoed some of her earlier comments on poetic sound, claiming that sound is finally a delusion among literary scholars. As she has argued elsewhere, “paper texts themselves cannot in any sense be said to contain sound: paper does not actually utter speech sounds.” Like Mandell, Johanna Drucker deemphasizes sound as the key sensory component of poetry, noting that “Sound is not on the page, even if a graphic transmission allows for its properties to be noted for reproduction in mental or verbal rendering.”
So, put another way, we’ve got three versions of the place sound should have in the study of poetry:
- Drucker and Mandell see a poetic critical discourse obsessed with sound but to varying degrees argue that sound has not been nearly so central to poetry as is usually assumed;
- New-Critical/post-New-Critical formalists and Stewart see a poetic text crackling with the potential for sound, yet insist upon the primacy of the elusive “text itself” over any particular performed instance of sound;
- and Bernstein, noting that “differences among the alphabetic, gramophonic, and live are not so much ones of textual variance as of ontological condition,” at once recognizes sound in the text and argues that scholars should be paying more attention to poetic sound in recorded and live contexts. The specific recordings he finds most valuable, however, are those of the author herself.
I’m sympathetic with aspects of all of these models of sound. Formalists and Stewart offer admirable models for detecting and making use of patterns of sounds that reside in poetic texts, both when those patterns are intentional and when they are coincidental. The frequency with which I find formal analyses of poetic sound uninspiring or unenlightening, however, makes me sympathetic to Mandell’s sense that an emphasis on textual sound has hindered our understanding of how poetry actually affects readers. I admire Pennsound, and I’m grateful to Bernstein for pushing scholars to understand recorded poetic sound better. Yet Bernstein’s emphasis on the author’s voice leaves me wondering what we can do with poems whose authors are long dead and unrecorded, and also leaves me wondering whether the voices that readers hear in their heads (Mandell would deny their existence) or pronounce aloud don’t also have clear value that can help us understand how poetic sound works.
My exploration of “Karawane” left me thinking that the sound of this sound-oriented poem cannot be understood without recordings. The same poem’s sound, however, is so variable across recordings that it’s hard to fix any of those recordings as the “best” or the most authoritative. To adequately listen to “Karawane,” I argued last year, we would need to listen to all the versions of the poem that circulate on the web at once.
While I’ve so far developed only a basic familiarity with the sound-analysis tools of HiPSTAS, I’ve begun to think that they might let us do so. If I’m right, I think that these tools give us access to a new model of poetic sound that would be hard to use in any practical sense before digital tools: the sound of a poem as the sum total of performances of that poem, listened to all at once.
Now, as I began to prepare for the HiPSTAS meeting this week, I thought this might only be a theoretical assertion, but I’ve begun to think that the HiPSTAS tools might make such an aggregate version of poetic sound a practical object of study. I’d considered the idea (and am still considering the idea) of starting a large archive of amateur readings of poetry. Then, I realized that some of what I was looking for can be found over at Librivox, the site devoted to producing audiobooks recorded by volunteers for public-domain distribution on the web. Librivox has (had?) a weekly poetry project in which volunteers record a series of versions of a poem. And while “Karawane” would be too obscure for such a project, another key poem featuring nonsense language has been recorded by the project. So Librivox features fully 34 volunteer, public-domain readings of “Jabberwocky,” that most famous nonsense poem.
I’ve been trying to think of ways to listen to all these versions of “Jabberwocky,” or at least some portion of them, all at once. In doing so, I’ve taken some cues from Lev Manovich’s efforts to read massive corpora of images. Manovich, for example, arrays thousands of covers of Time magazine in an impressive visualization that he draws on to make claims about the history of Time magazine, magazine design, etc. Simply laying 34 recordings of “Jabberwocky” end-to-end might not be especially illuminating—we’d simply here the poem over and over again in sequence. But isolating specific parts of the poem and laying those end to end might be more interesting. Here, for example, is an aggregate version of the phrase “mome raths,” as read by 34 Librivox volunteers:
And here is the same set of sounds, but pronounced all at once in a symphony of verbalized mome rath:
These files, which I made using the open-source audio-editing software Audacity, probably don’t offer much new insight into the nature of the nonsense words mome rath. But put together, I do think they offer a interestingly realized version of what mome rath sounds like. Together, these 34 different voices saying mome rath don’t become the “authoritative” version of what mome rath sounds like—there have been, and will be, many other voicings of the term. I’m enticed, however, by the possibility of putting together so many different versions of poetic phrases, lines, etc. and figuring out how those aggregate versions of poetic sound might help us think of prosody and performance differently.
If I’m understanding their capabilities correctly, the HiPSTAS’s ARLO tool might help us put into analytical use the kind of model of poetic sound I’ve proposed here. Over the last week, I’ve uploaded Librivox’s 34 amateur, public-domain readings of “Jabberwocky” into ARLO and begun tagging select terms—”brillig,” “borogoves,” and “mome raths” so far—in each of these versions. Here’s what tagging looks like in the ARLO interface, in a few different readings of “Jabberwocky”:
Once I’ve tagged a bunch of instances of “mome raths,” my plan is to ask ARLO to search for other sounds, probably in the Pennsound archive, that look like these various versions of “mome raths.”
“Jabberwocky” has always been recognizably English, even though its first and last stanza are not written using English words that existed before. My hope is that, if I ask ARLO to find words that look like “mome rath,” we can better understand the relationship between putative nonsense and ordinary poetic language.
This “Jabberwocky” experiment will be the first I undertake in the months ahead as I begin to use ARLO to test the viability of a model of poetic sound based in aggregate amateur poetic performance. My hope is that ARLO will reveal such an aggregate model as a useful way to think through the sound of poetry as it occurs in the context of ordinary readers, not just in the rarefied context of an author’s official poetry reading.