Meet etcML

etcML strikes me as an incredibly useful tool, in the right hands.  I will explain how etcML works by providing an overview of a highly successful usage of it.  All of the information that follows can be found on etcML’s webpage, which is about as user-friendly as any webpage I have ever visited.

Rob Voigt wants to predict whether proposals on Kickstarter will reach the level of funding desired by the architects of those proposals.  He inputs a sufficiently large set of past proposals (the more, the better), labeling each as either “success” or “failure” according to whether or not each reached its funding goal.  Here, Rob is training etcML, which ultimately functions as a categorizer.  Then, in order to test the effectiveness of his success/failure categorizer, he inputs the text of other past proposals which he has not labeled, and he asks etcML to categorize them.  The tool is 20% better than random odds at predicting whether a proposal got funded or not, based solely on the language of the proposal.  The tool also provides a readout of the words/phrases most highly correlated with successful proposals, as well as those most highly correlated with failed proposals.  Rob concludes from this that “concrete plans win”: the better defined one’s goals and available resources (in the language of the proposal), the more likely a proposal is to succeed.

He then retrained his success/failure categorize to deal with just arts-related proposals, or just music-related proposals, and after analyzing the words/phrases most highly correlated with success or failure in each category of proposals, he came up with suggestions for authors of proposals in those categories.

etcML’s best-known usage is to predict the overall tone of Tweets.  Programmers input thousands upon thousands of words/phrases and labeled each with a connotation ranging from “very negative” to “very positive.”  etcML then uses statistical analysis informed by this set of givens (given connotations for given words/phrases) in order to predict the overall connotation/tone/mood of a body of text (a Tweet, in these cases).

Anyone can develop a new categorizer by inputting and labeling bodies of text according to the way one wants the tool to label bodies of text.  Anyone can use any existing categorizer to analyze any existing set of texts on, or one can use any existing categorizer to analyze a new set of texts one uploads.

How can we use etcML in our class projects?  One could upload a large number of literary criticisms and/or scholarly articles and/or layperson reviews/reactions pertaining to one’s text.  Then, one could run the positive/negative/neutral categorizer to determine whether reactions of any and/or all sort(s) are overall positive, negative, or neutral.  Additionally, one might use this tool in conjunction with biographical information about an author to speculate as to how the overall connotations/”sentiments” of an author’s works correlate with  the connotations/”sentiments” of events in that author’s life (this is similar to but different from a suggested usage given on the “Learn more” page).

As stated on that “Learn more” page, the key to maximizing the utility of this tool is ‘framing interesting questions as problems of categorization,’ to paraphrase.  etcML allowed Rob Voigt to answer the highly interesting and practical question of ‘what sorts of language make for successful, or unsuccessful, Kickstarter proposals?’  There is no doubt it can answer other highly interesting questions of practical importance, and conveniently enough, etcML can be accessed at



Leave a Reply