Tag Archives: Ngrams

The Quest for the Most “Popular” Fruit

The top search result on Google for the “world’s most popular fruit” leaves one rather baffled – it directs you to the website for the Detox Lounge , a San Diego-based wellness center for detoxification juices. Here, an anonymous author daringly declares that mango is the most desired fruit around the globe – could this be true?

Subsequent research on Google revealed that my suspicions were correct – under the assumption that books cataloged in Google’s NGram reflect upon the behavior of our general consumption – this this titian-shaded, succulent fruit has never been, nor will it ever be, the most popular fruit in the world. In order to illustrate my point, I thought of 4 other fruits that I deemed “popular” and entered them, along with mango, on this tool:

  1. Apple
  2. Banana
  3. Tomato
  4. Mango
  5. Orange

And the result is as follows:





Now, it is fairly obvious from the above what is happening: orange apparently has been, as Charlie Sheen would say, #winning, since the turn of the 19th century. According to Wikipedia, orange “probably” originated from Southeast Asia, and has been “cultivated in China as far back as 2500 BCE.” Could the rich history of orange be at the cause of its popularity? This could certainly be true, but we must take into account of the fact that it is a homonym, for it also depicts a color (or it could also be referring to the House of Orange – a royal line of the Netherlands). Apple, the runner-up fruit, was apparently first cultivated in eastern Turkey, and Alexander the Great “is credited with finding dwarfed apples in Kazakhstan in 328 BCE.” The third-place is given to the fruit that is often confused as a vegetable – tomato – it seems that while the people who inhabited in Mesoamerica, such as the Aztecs, have domesticated this red fruit since 500 BCE, but it wasn’t until the 16th century was it made popular by the Spaniards after their colonization of the Americas. The fourth-place goes to the fruit with the most interesting history: banana. It seems that the Southeastern Asian and Papua New Guinean farmers have domesticated the banana since sometime between 5000-8000 BCE. And if this weren’t enough, other specifies of this yellow, almost-crescent shaped fruit, was also found on the African continent, dating back to 1000 BCE. It remained on the continent of Africa, and the neighboring Middle East and Southeastern Asian locations until it was “introduced to the Americas by Portuguese sailors” in the 16th century. And now we turn our attention to mango, something that has apparently been cultivated “in South Asia for thousands of years,” and sometime between the 4th and 5th centuries, it was transmitted to East Asia and reached East Africa by the 10th century. It was subsequently brought to the Caribbean and South America in the 14th century. The biographies of the banana and mango certainly shatters my initial assessment on the positive correlation between a fruit’s apparent popularity and the amount of time it has been cultivated.

A quick search on the OED reveals the initial appearances of these fruits on the dictionary are as follow: apple (1225), orange (1400), banana (1563), mango (1582), tomato (1604). But in evaluating the popularity of these words, we must take into account how the word “orange” garners multiple associations. Therefore, it would logical to assume that part of its popularity must have been due to this very fact. Similarly, the word “apple,” given its traditional associations with the Garden of Eden and other cultural implications stemming from this tradition (or even Apple/Mac products), is also affected by the complex meanings attached to it. There is, nevertheless, a huge gap between these two words and the rest of the fruits – but does it necessarily represent their popularity as fruits? This remains uncertain, but certainly plausible. However, in reading this graph, one thing is certain: Detox Lounge is incorrect in claiming mango to be the most popular fruit in the world.

*all historical facts on fruits are quoted from their respective Wikipedia pages



Woman v Lady

I am currently in many classes dealing with the role of women in literature and religion. Thus, I found it appropriate that I base my research on the word, “woman.” I decided to first compare the two words, “woman” and “lady,” since these are both associated with certain societal expectations in the female role.

Screen Shot 2014-02-03 at 3.37.26 PM

It seemed like “woman” was always the typical vernacular when referring to someone who is a female. However, “lady” is more prevalent in the 18th century when a lady is meant to become a mother and wife, always catering to her husband’s needs. The split between the two happens around the 1840s. Previous to this split, books like The Coquette and Charlotte Temple, were written in order to emphasize a woman’s fidelity and chastity. These books were pedantic and meant to reflect the cultural values at the time, as well as teach women how to carry themselves. This, I noticed, is not very reflected in the graph, unfortunately, probably because women are not often the main characters of literature at the time.

The graph from 1700 to 2000 illustrates the difference between the two words today, as well. The trend for “lady” is a pretty steady decline, whereas the use of “woman,” which is used just as often as “lady” in the earlier centuries, gets used more regularly closer to the 2000s. Nowadays, “woman” has taken the additional meaning of someone who is strong and independent. Because it is clear that the word has taken new form, I decided to try out the word, “woman,” in addition to its possible associations. I tried adjectives such as “strong,” “independent,” “powerful,” “proper,” and “motherly.” I hoped that these would trend in their respective cultural associations.

Screen Shot 2014-02-03 at 2.54.14 PM

For the most part, this is the case. “Strong,” “independent,” and “powerful” become the most used of the five in that respective order. These word associations are hidden connotations of the depiction of the female within literature. Thus we can see from the graph that women in literature become more associated with power than the domestic realm. Their independent characteristics are emphasized as opposed to their influence on their husband and children.

There is a particular spike that I found interesting, as well. In the 1950s, “motherly” hops significantly. I expected this to be the result of the returning husbands from WWII. As the men left for the war, the women took over the workforce, and afterwards, when the men return the women must also return to the domestic realm.  It seems that this is a reflection of the cultural values of the time.


The Transition from Machine to Technology

My discovery of an interesting shift in the use of the words “machine” and “technology” started with a search that yielded much more predictable results. Originally, I used Ngram to analyze the occurrence of the words “computer” and “machine,” the latter of which declined in use while the former increased. Though the notable increase in the use of the word “computer” and its surpassing of “machine” around 1970 is understandable and expected given the initial advancements in computer invention at that time, the same cannot be said for “technology” versus “machine.”

The computer is a specific technology, or machine, far less abstract in its reference than “technology.” Indeed, technology refers more broadly to tools in general that provide a better means to an end. Though modern use of the word associates it more readily with digital tools, its definition is widely applicable: “machinery and equipment developed from the application of scientific knowledge.” Hence, the relatively recent surge in the use of this word seems strange, as it does not simply refer to modern machinery, but to machinery as a whole.

The fact is, however, that “technology” meant something very different until the 19th century.  According to the OED, “technology” in previous centuries referred to “the systematic treatment of grammar,” and “a discourse or treatise on an art or arts.” Only in the late 18th century did it begin to refer to “the branch of knowledge dealing with the mechanical arts and applied sciences” and “the application of such knowledge for practical purposes, esp. in industry, manufacturing, etc.”

But this still leaves the question of why the use of the word “technology” correlates so strongly with “computer” unanswered. In my research on the etymology of the word on the OED, I still came up short, as for the most part no changes occurred in the explicit definition of “technology” after the 19th century. I can only speculate that the word simply acted as a convenient one to refer to recent digital advancements. Perhaps linguistically technology seemed to best define the computer. We are also members of an increasingly technology-centered society, so it follows that the use of the word “technology” increased.



Change of warfare

One thing I have found interesting aspect I have found in literature is the presence of war.  This is a fluctuating thing that has always been a part of human history.  What I wanted to look at was the use of the word gun vs. the use of the word sword.  After putting these words into Google Ngram viewer I found some interesting results.


As you can see from the graft the use of sword was more widely used during the 18th century.  This slowly changed as warfare evolved.  We can see how the gun began to be more popular in writing.

Around the beginning of the 19th century we can begin to see the sword be lose its value in literature.

This is right before World War I so that makes sense but what is interesting is looking at the time right before World War II.  You can see that sword spikes and becomes important in literature again.  Then at then guns take a huge leap forward and never look back.

I clicked on this time period and found that one of the most popular pieces of literature during this time were manuals for weapons.  This again shows the influence of warfare on writing.

Another interesting point to look at is the spike of the word sword in the 2000’s.  I think this can be attributed to the success of fiction novels like Lord of the Rings and Game of Thrones.

It is interesting to look at how warfare has effected writing and the trends that writing takes based off changing culture.


Good time and fun

I looked at “fun” and “good time” on the ngram viewer and I had  interesting results. Overall the word “fun” is noticeably more popular than the term “good time”. However, from 1820-1840 the word fun plummeted in popularity. Around 1800 the word fun was at its peak. “Good time” remained steadily unpopular in usage compared to fun.

The term “good time” seems to have a negative connotation to it, since when I googled it had this as its definition, “recklessly pursuing pleasure”. The word “fun” had this as its top result, “enjoyment, amusement, or lighthearted pleasure”. I knew that the term “good time” did not have a purely innocent meaning, however I was still surprised to see it described as “reckless”.

<iframe name=”ngram_chart” src=”https://books.google.com/ngrams/interactive_chart?content=good+time%2C+fun&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2Cgood%20time%3B%2Cc0%3B.t1%3B%2Cfun%3B%2Cc0″ width=900 height=500 marginwidth=0 marginheight=0 hspace=0 vspace=0 frameborder=0 scrolling=no></iframe>

Lay vs Lie vs Lay/Lie

Screen Shot 2014-02-02 at 7.57.57 PM

Screen Shot 2014-02-02 at 7.58.11 PM

The topic of my post, the distinction (if any) between the verbs “to lay” and “to lie,” began as an exploratory probe into the more advanced features of Ngrams.  My initial search is displayed in the first image of this post; I searched “lay_INF” in order to see how the various forms of the verb stack up against one another.  As for why I decided to search that particular verb: I have no idea; it was the first word that came to mind.

Interestingly, as the second image in this post verifies, the breakdown of forms for the word “lay” also includes the breakdown of forms for the world “lie” – and with the same frequencies.  However, the breakdown of forms for “lie” does not include the forms of “lay.”

When one searches “lay_INF,” Ngrams seems to view the verbs “to lay” and “to lie” as indistinct from  each other – that is, Ngrams seems to view them as the same word.  But when one searches “lie_INF,” Ngrams seems to distinguish “to lay” from “to lie” – that is, Ngrams seems to view them as different words.

The Oxford English Dictionary distinguishes the two verbs from each other.  “To lay” is a transitive verb (meaning it takes a direct object), whereas “to lie” is intransitive.  Thus, it is very clear that “to lay” and “to lie” are different words, and depending on whether or not a direct object figures in the sentence, only one of the two verbs applies properly.

What is the significance of this?  It appears that I have found an error in Ngrams; at the very least, I have found an inconsistency.  Despite this, Ngrams strikes me as a potentially useful tool.  I was playing around with other searches, and the ability to view each usage of a word in its particular context enables a curious mind to determine precisely how the usage of that word, as well as how the sort of texts that word appears in, evolves over time.

Hipsters and Hepcats

Let’s talk about the entomology of the word “hipster”. Now, according to my jazz aficionado boyfriend hipster (he actually said hepster) is a term that came out of be-bop jazz culture in the early 1940s. Hipster, he claimed, was interchangeable with the term “hepcat” and has been grossly misused in the 21st century to refer to a subculture of Brooklynites who insist upon wearing thrift store clothing large heavy framed glasses and drinking copious amounts of PBR. Curious to see if he was right, and eager to complete my ENMC assignment, I plugged the words into our handy dandy N-gram viewer. Lo and behold he was not making this up- hipster and hepcat first appeared, according to both google and my boyfriend, in 1939. For obvious reasons I extended my search parameters up to 2008. And, after realizing that hepcat and hipster do not occur before the 1930s I went ahead and set the parameters to search books from 1935 to 2008.

But, what about those PBR chugging, chain smoking Brooklynites? Regardless of how the word came into being, I now associate hipster with that piece of Brooklyn subculture. Looking at the Google searches of the word reveals something rather intriguing: up through the 1970s hipster remains associated with jazz culture. The hipster is “painfully cultured and laid back.” The hipster is a “low-lying rebel.” The hipster is always referred to in the context of a musical subculture. But, in 2002, there’s a sudden shift in the referent. The signifier remains the same; the actual word “hipster” remains in use. But the culture to which the signifier refers becomes an urban class whose identity as hipster refers to a way of speaking (the top three Google books were all urban dictionaries) as opposed to a musical subculture.

Hepcat, the sister term to hipster, never really seemed to catch on. The word never moved beyond its nascent meaning; even in the more contemporary book searches hepcat remains a relic of 1940s bebop. The Ngram viewer shows the changing definition of the term hipster, but it cannot explain why one of two initially synonymous words retains its original function whilst the other shifts in use and meaning. This is an instance where the Ngram serves well as a preliminary research tool. It shows a pattern in the words’ use which is often the first step in research. The very nature of the Ngram means that for indepth research it will not be extraordinarily useful. It displays the pattern, can help refine and better visualize that pattern, but to understand the pattern the researcher must move beyond the initial data provided by the Ngram.

Perceptions of Perfection

As we move into a world that seems more rapid and fast-paced than ever before, I wanted to delve into the literature of the English language, and see what’s at the forefront of what individuals consider to be “perfect”. By searching this one particular word, I hypothesized that I would find the centrality of what one values and strives to achieve, depending on the year and what is going on in culture at that given time. In looking further into what the Google Ngram advanced settings offers, I was extremely interested in examining how America’s views of what is considered “perfect” is perceived and measured. I searched “The Perfect *”, and studied how America’s views have changed over time.

The trend that I found most alarming was that “The Perfect Man” and “The Perfect Woman” were among the most popular used phrases from 1900 to now. Since 1951-1952, these phrases have been on a pretty steady incline. Most everyone is familiar with the storybook image of America in the 1950s. Images are continually popularized of a simpler, happier time emerging from the aftermath of the Second World War. Families moved to the suburbs, fostered a baby boom, and forged a happy life of family togetherness in which everyone had a specified role. Women were considered domestic caregivers, with sole responsibility for the home and child rearing, while men ‘brought home the bacon.’ The creation of the “perfect woman” and “the perfect man” gave a clear picture of what is desired at this time; emulating their proper gender role in society. In effect, men and women began to construct their identities around this image, and may still continue to do so today. As such, the use of “the perfect man” and “the perfect woman” is still on the rise, to this day.

However, since the 1980s, “the perfect place”, “the perfect time”, and “the perfect opportunity” have overcome the emphasis on the man and woman. I believe this is attributed to both the sexual revolution and the increase in divorce in the 1980s. Therefore, there’s more focus on the individual, rather than on finding the perfect person to compliment one’s life. In the 1920’s, especially, there was the most usage of “the perfect way”, which makes sense considering the ideals and norms of the 1920s—the parties, the glamour, and the overdone qualities of society to achieve a state of great indulgence. However, the 1980s mirrored a very different perspective than that of the 1920s trends, and that of the 1950s.

The sexual revolution fueled the marital tumult of the times: Spouses found it easier in the Swinging Seventies to find extramarital partners, and came to have higher, and often unrealistic, expectations of their marital relationships. Thus, they were less focused on finding the right man or woman, and instead began to highlight their own personal circumstances, looking more into their opportunities, and current state and place. Increases in women’s employment as well as feminist consciousness-raising also did their part to drive up the divorce rate, as wives felt freer in the late ’60s and ’70s to leave marriages that were abusive or that they found unsatisfying.

As the sexual revolution took full swing, inevitably, so did the increase in the divorce rate. America’s divorce rate began climbing in the late 1960s and skyrocketed during the ’70s and early ’80s, as virtually every state adopted no-fault divorce laws. The rate peaked at 5.3 divorces per 1,000 people in 1981.

The effectiveness of measuring the changes of America’s ideals throughout time is quite amazing. Ngram offers us a new perspective into this ever-changing world, and provides us the opportunity to really understand the variations and fluctuations of our values and what’s most highlighted as both culture and time turns its course and changes pace.

Blog Post #1: Social Identification and Its Bearing

When examining societies, noting how their people identify can lend insight on values.  A particularly significant form of identification is that of group vs. individual.  A main difference between these two categories lies in their spreads of ideals.  If everyone identifies themselves as individuals, a greater variety of values is implied, but also a greater potential for conflict when the society is ruled by weaker majorities.  However, while group identification leads to higher levels of agreement, it can also magnify conflicts when different groups interact.

Since research on the OED yielded noun and verb variants of “group” and noun and adjective variants of “individual”, the Ngram study was conducted using part-of-speech tags to ensure that only the usages related to personal identification were included.  The period of 1800 – 2000 was examined.

ENMC 3600 Short Blog Post #1, Picture 1

Before the 20th century, the two terms were used roughly equally, the usage of “individual” surpassing that of “group” at the beginning of the period.  However, the 1900s saw the mass increase in the relative use of “group”.  This could perhaps be linked to the acceleration of industry and the formation of businesses, groups in which various individual talents condensed under common ideals to reap profits.  A wildcard search of “group” yields “group of” as the context in which the word is, by far, most frequently used.

ENMC 3600 Short Blog Post #1, Picture 2

The word that generally comes after this phrase would probably be an occupation or some other class with which all members of the group associate, such as “group of bankers” or “group of scientists”. The rise in this phrase’s usage during the 1900s supports the idea that it is connected to the industrial era.

But has this heightened identification with groups led, on the broad scale, to greater compatibility, or greater dissent?  To try and find out, I plotted “cooperation” against “competition”.

ENMC 3600 Short Blog Post #1, Picture 3

Both words saw a marked increase in usage during the industrial 1900s, with a slight divergence after 1980, when “cooperation” began to dip but “competition” continued to rise.  This indicates that while the predominating group identification converges values, making goals perhaps more obtainable, it also creates a greater divide between values of entities than did individual identification.

Being able to separate usages of words with the part-of-speech and wildcard tools increases the value of Ngram as a device with which we can measure social changes.  However, Ngram measures just that: changes.  Conclusions are not made as to why these changes have occurred, leaving human speculation to accomplish what is perhaps too massive a part of the task.  Also, while Google Books is useful in seeing particular instances in which words/phrases were used, it is not adequately filtered, minimizing any aid in making claims based on Ngram.

Blog Post #1: Soups

Before playing with the parameters of the Ngrams I thought that it was most useful for finding out when certain words and phrases came to be popular and how its popularity changed through the course of time.  Then after learning about the vast number of various applications that can be used, I came to realize how much more of an extensive history of the English-speaking world the Ngrams provided to the viewer.

While exploring I started mixing up applications and found an interesting result when mixing the wild card with the application that allows the viewer to narrow the search to just nouns. I ended up learning about the popularity of soups from 1800 to 2000 and which flavors have gained or lost popularity within that time frame by just searching *_NOUN soup.

Although superficially this seemed like a rather useless search, I ended up learning a lot of things about culinary history, especially with the use of the applications. Without them I would have just learned that soup really did not become popular until the 1900’s. Now, with the wild card and noun application I could narrow down the search to find out which flavors of soup are popular and how it has vastly changed throughout time. For example, I would have never guessed that before the 1900’s the most popular soup was turtle soup and how now it is one of the lowest ranking popular soups. It was not until the 1900’s that vegetable soup became more popular than turtle soup. In addition I would have never guessed that chicken soup did not really start become popular until the 1980’s, it only became the second most popular soup until the late 1990’s. Before it was onion soup.

After being able to maneuver the Ngrams more effectively, it is evident how much one can learn from it. Although it is hard at first to specify what exactly one want to find out, once one gets a hold of it the possibilities are vast. One can find out about the simplest things like soup.