Advanced Google N-Gram Use: Abuse

I first attempted to investigate the prevalence of different types of drug abuse by using the advanced wildcard feature. I entered the search term “*abuse”. However, this yielded the following message: “If you meant to multiply, use parentheses in your search. Wildcards can replace only entire words, not parts of words. Skipping “*abuse”. No valid ngrams to plot!” Believing the ngram unable to plot wildcards which precede the search term, I instead tried a wildcard search using a part-of-speech wildcard dependency. From the search term, “abuse=>*_NOUN”, I was able to generate an ngram.

Still dissatisfied that my initial search did not work, I then decided to search the term, “*_NOUN abuse”, which retains the wildcard aspect of the search, but no longer includes the dependency relationship. This yielded an additional ngram.

The most striking finding from these searches is the fact that searches must follow minutely specific rules. Thus a desired ngram may be difficult to create simply because it must follow a very precise format. Thus, while “*abuse” does not generate an ngram, “*_NOUN abuse” does. Additionally, the two search terms, “*_NOUN abuse” and “abuse=>*_NOUN”, lead to slightly different ngram results. For “abuse=>*_NOUN” the top ten words are substance, child, drug, alcohol, neglect, Child, power, authority, trust, and confidence. For “*_NOUN abuse” the results are substance, child, drug, alcohol, Child, Drug, wife, cocaine, spouse, and men. Unfortunately, wildcard searches cannot be combined with case-insensitive searches, leading to difficulty in understanding the overall prevalence of a term as opposed to the prevalence of its case-variants. And what is the precise difference in meaning between the terms “*_NOUN abuse” and “abuse=>*_NOUN”? Ultimately, advanced usage of the ngram viewer demonstrates that how we search is just as important as what we search.

Leave a Reply