Tag Archives: Big Data

Quantitative Research and its Misconceptions

In this week’s reading, author Franco Moretti argues about trends in novel literature over a span of several decades, and how literary history is defined by data sets and not by individual works. He states that:

Quantitative research provides… data, not interpretation. Quantitative data can tell us when Britain produced one new novel per month, or week, or day, or hour for that matter, but where… and why–is something that must be decided on a different basis.

While I do agree that quantitative data does provide data, such data sets are capable of exhibiting a limited form of interpretation. Collection of large data sets can be passed off to outside viewers as being completely unbiased, but for the most part, data mining does not exist for the sake of mining data; the motivation is almost never that circular. Big data, therefore, presents a watered-down form of interpretation of a subject through both the data it provides, and that which has been purposefully omitted from it. Applying a more concrete argument to a data set is essential to strengthening the claims made by both, but the fact is that data sets are assembled for specific, situational purposes, and therefore carry within them implicit arguments to be defined by the viewer/reader.

Southwest and Big Data

Looking back on the project, I think I could have spent a little more time evolving my topic. I knew that I wanted to do my presentation on something that related to my field of study, Aerospace Engineering. And, I knew that there was a lot of data in the airline industry, so I thought I could merge the two. After researching how the two related, I came upon how Southwest uses big data and decided to use this as my idea. I quickly found all the information I could on the topic, which wasn’t enough, but I was too far into the project to change the topic. I stuck by what I had already came up with and decided to add a little more on how the airplane itself gathers data. So, I looked up what sensors an airplane might use and came up with this sensor:

Sensor

This water vapor sensor shows what an airplane’s data can do for itself and other fields, such as weather forecasting. The picture shows what a sensor might look like and all the parts involved with it. Since it would be hard to verbally tell what the sensor looks like and how it works, I added this picture to give the audience the image that my words could not create. My argument for this whole project was meant to be what data Southwest collects and how it creates a safer, more efficient, and more customer friendly airline. There was not as much information behind this as I thought there would have been. I thought the part about them teaming up with NASA was really interesting though. Southwest did not release too much information on how they maintain such customer satisfaction other than that they partnered with Aspect to analyze speech and social media to see what their customers want. All this data did work though because in the end, they do have the least amount of complaints.

Graph

It is amazing how they have such a low number of complaints and such a high number of customers. This visual shows that even airlines that have less customers have much more complaints. Other than Delta, all other big airlines, such as Virgin, US Airways, American Airlines, and United, have many more complaints. Adding this image to the Pecha Kucha gave the audience more of an idea of what the other popular airlines’ customer satisfaction was like.

If I were to do this project again, I would have given myself more time and come to office hours to talk about how I could formulate my argument/topic better. If I had done that, my presentation would have been much better.

 

Language and Culture

The book, Uncharted: Big Data as a Lens on Human Culture, explores the relationship between language and culture. Aiden and Michel asserts that there is a significant change in form of language and the use of language as the culture changes. And in order to improve the study of culture, study of language is crucial. This idea is deeply related with the program they created, the Ngram. They’ve studied how does the use of certain language changes over time. For example, the word ‘tea’ was way more used than the word ‘coffee’ in English history. Yet, since 1970s, coffee has become dominant as the main beverage among common people, and thus has become much more used word than ‘tea’. Such example shows that tracing the use of language can lead to better understanding of culture in general. The book compares the culture to dinosaurs. Both have a common characteristic that through traces from the past it can be found and studied. Just as the study of dinosaur is made through fossils, the study of culture can be improved by its trace from the past, which is introduced as use of language.

The assumption might not be always right. Sometimes language reacts later than the change of trend in culture. Language cannot directly mirror the cultural trend or changes. Yet, it is true that language is the best way to observe cultural changes. Language is easily observable through books or different works of literature. It is the most commonly used method of communication and exchange of ideas. Tracing the culture through language, which is a cultural fossil, may not be the exact way of examination, yet it is definitely a revolutionary method.

Information Technology In Automobiles

Pecha Kucha Reflection

I chose this topic: “Information Technology in Automobiles,” specifically because I am personally interested in the car industry, and I thought that I could find a lot of relevant pictures about this topic. I didn’t want to choose a topic that would be hard to accumulate images, but I also wanted to present something that I’m interested in.  I developed my argument around how big data is changing the way people drive and interact with their vehicles. I think that big data is beginning to create a whole new type of vehicle on the road, which will enhance the driving experience and keep people safer.

Originally, I wasn’t sure on how to begin the process for creating this presentation, because of the tricky timing between narration and slides. I decided to focus on a few main points, and find pictures for those points before I began writing a script. While reflecting, I think this was the best decision, because my images were incorporated well into my presentation, and it also helped me stay on topic with my script. In order to find the images, I tried to assemble pictures that showed examples, such as the driver fatigue system shown below. If that was not possible, I tried to enhance my argument with strong graphs or figures. Basically, for every slide, I not only wanted to incorporate the images in my talk, I wanted to enhance the argument I was making by incorporating these images.

Driver fatigue

graph

 

 

One thing I did to help with the 20 second timing, was to set up my script in separate paragraphs so that I know where i need to be when following the pages. This helped me make sure i finish at the right time, and I also wanted to make sure I didn’t spend over 20 seconds on a slide, therefore I can express every image instead of disregarding some. When rehearsing my initial script, I found things to be very fast paced and quick. Therefore, I shortened a lot of lines and paragraphs, so that I would have breathing room for the presentation. This took a lot of patience and timings, which was a lot harder than I had anticipated.

scrpt

 

The scheming that went into making this pecha kucha really helped me with my presenting skills, because you have to keep everything precise and strong in 20 seconds or less. This was unlike anything I had ever worked on before, and I learned a lot about presenting because of it. If I had another chance to do a pecha kucha, I believe that I would try to work without a complete script. When I was presenting, I felt like I was staring down way to much and not interacting with the audience. I would set up note cards in order to highlight key terms and sentences, but I would try to communicate more with the audience in order to be more engaging.

G.K. Zipf and the Fossil Hunters – Reading Response

This chapter out of the book Uncharted written by Aiden and Michel focuses on the appearance of certain words in the English language. More specifically, it focuses on irregular verbs. by analyzing the appearance of the irregular and “regular” versions of the same verb, the phasing out of the irregular verb form can be predicted mathematically based solely on the frequency of the verb’s use in English language. The  clear example that is presented is the word throve vs. thrived. Clearly we mostly use the word “thrived” instead of “throve” but this isn’t the case when we look that the comparison between the words “drove” and “drived.” According to Aiden and Michel, the only difference between these two verbs which both have been irregular at some point is the fact that “drove” was used much more often than “throve.” As Aiden and Michel state on page 44,

“…once one took frequency into account,
the process of regularization was mathematically indistinguishable
from the decay of a radioactive atom. Moreover, if we knew
the frequency of an irregular verb, we could use a formula to compute its half-life.”

 

For the most part, Zipf, Aiden and Michel used literary resources to make their predictions on verb frequency. They state that the sole factor that influences the “regularization” of irregular verbs is the frequency of the verb in question in literature. Although their prediction may be correct to a certain extent, they disregarded the effect of social influences from their main argument. On the second page of the anecdote Burn, baby, burnt,  it states,

“A few days later, he saw another distressing headline, this one in the Los Angeles Times: “Kobe Bryant Says He Learned a Lot from Phil Jackson.” The student knew nothing about Phil Jackson, but was still shocked that Kobe had learned from Phil. If anything, he should have learnt.”

Although the pure analysis of the frequency of irregular verbs such as “learnt” may be a good determinant of the future of the regularization of that particular verb, it does not take into account any social factors or any other determinants that affect the frequency of the verb. It may be that the regularization of the verb starts with the simple news headline that used the regularized version of the verb which inadvertently sparked it’s popularity within the general public and as a result it eventually sets off the cycle for the word to make its way into formal literature. These social effects may or may not speed up or slow down the process of the regularization of certain words; for example, if kids are being taught generation after generation that the correct past tense for “drive” is “drove” and not “drived,” then these social pressures may affect the eventual outcome of the word, regardless of frequency. The significance of social factors on the regularization of irregular verbs can only be determined through further careful analysis.

The “Gold” of the Information Age

Abstract

A report by the World Economic Forum in Davos, Switzerland in 2012, recognized “Big Data” to be a completely new class of economic assets, much like gold and currency (Lohr). Big data is becoming as valuable as gold to large companies and governments around the world in the “Information Age” of the 21st century. During the California Gold Rush of 1848, thousands of people moved to California from 1848 to 1855 in hopes of finding gold and becoming wealthy. The gold rush sparked the American economy due to the vast amount of laborers and gold being acquired on U.S. soil, which helped fuel the United States through the Second Industrial Revolution. Today we are experiencing the “Rush of Big Data” around the globe. Thousands of businesses, such as Google, Yahoo, and IBM are using large quantities of data in order to create new products and markets for consumers. The “Rush of Big Data’ is fueling the Information Age of the 21st century, and causing major impacts on businesses and economies all over the world.

Continue reading The “Gold” of the Information Age

Big Data in Baseball

Abstract 

As Major League Baseball starts to implement player tracking it will become a big data league. Once this system is implemented, next year, alot of things will change in baseball. Teams will be able to better evaluate defensive talent, but at what cost? This shift will cost the fan’s excitement as teams limit hits by putting better defences on the field.

Continue reading Big Data in Baseball

Has Big Data Changed the Game of Soccer?

When selecting a Pecha Kucha topic I chose to talk about soccer because it is something that interests me and I also know a lot about it. But narrowing it down to a particular topic was very difficult. It took me over a week to finally decide on the right topic but after reading an article on bleacherreport.com on the advantage the German national team had over other teams during the 2014 FIFA world cup I knew what I wanted to do. I had never really thought of data as an advantage in soccer. Soccer always appeared to be a game of skill and luck but after reading the article I was surprise at the huge impact data made in the German national team tactics. So I finally decided on the topic ‘has big data changed the game of ‘soccer?’ The process of making the presentation was very difficult. I read through over 20 articles to find facts to prove the benefits data has and I also had to be sure these facts where 100% accurate. Then it was also hard finding the right images to convey my message. I couldn’t find the right image for some slides so I had to edit pictures to suit the presentation. When practicing my presentation, I found it difficult timing my word to match the images on the slide. So I had to edit my script a lot of times to allow the slides flow with the script.

Screen Shot 2014-10-29 at 5.58.49 PM

In my second slide I was talking about the origin of soccer. I mentioned that the most related ancient game to modern soccer was the game of Cuju. I tried to give a brief description of the game and the image was included to help that description. The image works to show the viewers how the game of Cuju looked when played and it helped to develop the relationship to modern day soccer. Since the slide only lasted 20 seconds, I could not give a clear description of the game within the time frame but the image easily showed the viewer exactly how Cuju looked.

Screen Shot 2014-10-29 at 5.59.08 PM

In my fourth slide I was talking about the popularity of soccer across the globe. I stated that soccer has grown to gain wide popularity in many countries around the world especially in Africa, Europe and South America. The image was used to give the viewers a more detailed look at how popular soccer has become. It would have been outrageous for me to list all the countries but the image makes it easier to understand and portray.

 

The Backbone of an Argument

Claims that are used in arguments must be properly supported in order to contribute as a whole. If the original information is changed or exaggerated, the overall credibility of the work could be subject to question. Darrell West’s report on big data’s application in education (link) retains its credibility because it uses reliable and accurate citations as a backbone for its argument.

To prove that West’s report can be trusted, one must look closely at how he cites his sources and how those sources shape his argument (or how he shapes his sources to match his argument). At the bottom of each page that contains an external reference, West points the reader to his sources.

Screen Shot 2014-10-14 at 11.17.02 PMTo prove that West’s use of of other researchers knowledge  is consistent with their research, it is necessary to look closer at the reference in the footer. By taking the title of Joseph Beck and Jack Mostow work listed in the footer as source number 5 and searching for the document online, one can easily find an abstract of the original document (link). While this work also contains references to external sources, the aspect that west was referring to (reading one story multiple times does not lend to as much learning as reading a variety of stories) was researched and carried out by the authors of the source. This makes this document the primary source for this particular piece of information in West’s report.

The work by Joseph Beck and Jack Mostow contained information that was consistent with what West claimed in his report:

Screen Shot 2014-10-14 at 11.32.39 PM

West’s use of the source was honest and accurate. He brought in external information, properly sited it, and correctly reported the content of the source. His individual interpretation of the source (and how it affects education), as with any citation, is what provides backing for his argument. In this case, the source was referring to the effects of rereading on learning and West showed that this can be applied to education through the use of computer  aided education. The source provided backbone information and West shaped it in a way to support his argument.

 

Social Media: Weapon of Mass Datafication?

Sorry this is late.

In the reading “Small Change” by Malcolm Gladwell, the author expresses his disdain for social media’s effectiveness to solve a social issue, and he is correct in this assumption. His statement, “Social networks are effective at increasing participation—by lessening the level of motivation that participation requires…It makes it easier for activists to express themselves, and harder for that expression to have any impact,” rings true because it embodies the distinction between opinions and actions. Without physical action and response in the non-cyber world, a thought cannot have an effect on society.
However, social media is a wonderful way to collect information. For example, in Big Data by Mayer-Schönberger and Cukier, Facebook and Twitter were used to count the number of vaccinated and unvaccinated people,” Datafication is not just about rendering attitudes and sentiments into an analyzable form, but human behavior as well. This is otherwise hard to track, especially in the context of the broader community and subgroups within it. The biologist Marcel Salathe of Penn State University and the software engineer Shashank Khandelwal analyzed tweets to find that people’s attitudes about vaccinations matched their likelihood of actually getting flu shots.” By using software, almost anyone can keep track of public trends and use this information to make an observation and predict social tendencies.
In all, social media is a device for facilitation, not a tool for action. When this concept is grasped, real changes will begin to occur, as opposed to the fleeting calls to action that have flooded our feeds. And when social media is finally recognized as purely a information sharing program and not a political machine, people will begin to enjoy spending time online.

The emergence of new data

It is fascinating to realize that a lot of new data comes from old data. Sometimes new data will replace old data because it is more current. However, when old data is categorized and processed, the trends that emerge can be recorded as more data which could possibly be analyzed further.

8-31 reading-response

This passage (found on page 9) in a few words captures the extent to which data has increased over the past several decades. Though, this should not be surprising. Throughout the excerpt of Big Data, the author keeps articulating different situations in which people have used large amounts of raw data to hypothesize trends. These hypotheses can be used to create larger trends, and the cycle can continue. The analysis of the raw data is what catapulted the surge of data the modern generation has now.

One example of this in Big Data was the navigator Matthew Fontaine Maury. He was a navigator that decided to find the best trade routes by going off of the popular approach. He asked everyone he encountered for their knowledge of the seas and the routes they have. He asked old fisherman their secrets for learning the seas so that he could find routes that didn’t fight nature, but rather routes that nature helped along. He collected his own data in order to create his hypotheses because the data he needed wasn’t readily at his fingertips, and in the end created trade routes far superior to the ones previous.

Modern day society has what Maury created for himself, a database just waiting to be examined. People have been able to predict when the price of airplane tickets will be cheapest or track packages all because they used the data in front of them. Big Data fosters the idea that society could have a lot of answers right in front of us that we just haven’t pieced together yet.

Data is Progress?

“In the future – and sooner than we may think – many aspects of our world will be augmented or replaced by computer systems that today are the sole purview of human judgment. Not just driving or match making, but even more complex tasks. After all, Amazon can recommend the ideal book, Google can rank the most relevant website, Facebook knows our likes, and LinkedIn divines who we know. The same technologies will be applied to diagnosing illnesses, recommending treatments, perhaps even identifying “criminals” before one actually commits a crime. Just as the internet radically changed the world by adding communications to computers,  so too will big data change fundamental aspects of life by giving it a quantitative dimension it never had before” – Big Data (page 12)

The quantity of data that our society produces and processes on a daily basis rivals that of any other time in human history. Information and knowledge have become not only readily available, but in many ways vital to the technological world we live in. Although this is opening the doors to endless possibilities, we must be cautious of the negative aspects of this growth as big data takes over many aspects of our lives.

Many of the major outlets that analyze information come through large companies that we as people use and interact with frequently. This may be referred to as big data or crowd sourced data. This data not only allows companies to reveal information about its individual users but it also allows them to apply their knowledge in more creative and constructive ways. While many uses for this data are still in the early stages, big data and crowd sourced information will soon become vital to our society, subsequently bringing the negative aspects of open information along with it.

The addition of large scale data collection also raises some concerns, despite the possible benefits. Privacy is slowly becoming a thing of the past, as corporations like Google and Facebook track everything from what we search to where we go for lunch. Google even knows where I live and has even given me direction to work without prior knowledge of my workplace. The same can be said for government agencies such as the NSA. In the world we live in, knowledge is power, power is money, and there is little legislature in place to prevent large corporate or governmental entities from abusing the use of this information.

While there is likely little that can be done to stop the upcoming transition into a big data driven society, individuals need to be aware of the drawbacks in order to best prevent abuse of the system. Only by reflecting on the drawbacks will we as a society be able to stop the growth of abusive data before it becomes an irreparable aspect of life.

Big Data: The Clay of the Universe

Screen Shot 2014-09-02 at 7.25.25 AM

Mayer-Schönberger, Viktor, and Kenneth Cukier. Big Data: A Revolution That Will Transform How We Live, Work, and Think. Boston: Houghton Mifflin Harcourt, 2013. 7. Print.

Big Data has opened my eyes to the inherent power of data and information. I have always thought of data and information as just numbers or facts – items with no true depth or importance. However, I have come to realize that data is like clay; it will lay idly and remain unimpressive until it is molded into something beautiful. One example is that of our Buzzcards. Data is constantly recorded about which buildings I enter and exit, which dining halls I eat at, and much more. Initially, this information seems unimportant. Who cares if I went to Woodies at 7PM? Datafication involves gathering very large samples of data, however. When data is drawn from every GT student’s Buzzcard, suddenly one can determine which dining hall is the most popular, so that the least popular one can be inspected and improved. One can also determine what time students are generally returning to their dorms, and perhaps campus police can be notified what time they need to be the most alert. Instances such as these shed light on how present datafication is in even everyday life and how it can make people’s lives better.

Although datafication is useful in many ways, I am skeptical about the validity of its usefulness on smaller scales. One such example is social media; I do not feel it is worth allowing these medias to track my every move simply so that I can be provided with relevant advertisements.

Screen Shot 2014-10-06 at 11.57.23 PM

Collecting such an intense amount of data seems to be superfluous as per its use. It is ultimately left up to the individual to decide how much clay he or she would like to add to the pot. Though I feel that datafication is not ideal in every situation, I find it difficult to deny that using data and information in this way as a whole is revolutionary. Understanding “data and how it can be used” will help us understand the world in ways we never have before.

Sources:

http://lifehacker.com/5994380/how-facebook-uses-your-data-to-target-ads-even-offline

The Minds Behind the Data

Revised Edition:

“Google took the 50 million most common search terms that Americans type and compared it the list with the CDC data on the spread of seasonal flu between 2003 and 2008. The idea was to identify areas infected by the flu virus by what people searched for on the internet. Others had tried to do this with internet search terms, but no one else had as much data, processing power, and statistical know-how as Google….Thus when the H1Nl crisis struck in 2009, Google’s system proved to be a more useful and timely indicator than government statistics with ‘their natural reporting lags. Public health ·officials were armed with valuable information.” – Viktor Mayer-Schönberger and Kenneth Cukier’s Big Data: A Revolution That Will Transform How We Live, Work, and Think

The amount of information that Google contains on every person is an enormous compilation of information. The data can be dangerous if it is being used against us however, it is only as dangerous as the people behind the screens. Using the data for a greater purpose; to benefit humans is all up to us. If we put data into the right hands then the positive outcomes will outweigh the negative side effects. In the passage, Google demonstrated that by use of its information, we were able to design a formula for detecting the H1N1 virus, eventually helping to control and calm the pandemic. We have so much technology and information that could be potentially harmful, however, we have to realize the information itself is not the  problem. We, the people who make conscious decisions are the ones who make the choice. In Google’s case they helped save what would have been thousands of cases of H1N1.

Many of us are engineers here at Tech. We are the minds behind the technology and we can control the use of it. We are the ones who will use technology to benefit the human race. There are many people who fear that the change in technology has been for the worse. Even though many fear change, overall I believe it has had a positive effect. Data is a tool that is no different than a swiss army knife; we could use it to harm others or to help others, but the decision is up to us. We are now able to live longer, travel faster and communicate further with technological advances. Technology can be dangerous but with the right minds controlling it can lead to a better society ahead.

Original Edition:

“Google took the 50 million most common search terms that Americans type and compared it the list with the CDC data on the spread of seasonal flu between 2003 and 2008. The idea was to identify areas infected by the flu virus by what people searched for on the internet. Others had tried to do this with internet search terms, but no one else had as much data, processing power, and statistical know-how as Google….Thus when the H1Nl crisis struck in 2009, Google’s system proved to be a more useful and timely indicator than government statistics with ‘their natural reporting lags. Public health ·officials were armed with valuable information.” – Viktor Mayer-Schönberger and Kenneth Cukier’s Big Data: A Revolution That Will Transform How We Live, Work, and Think

The amount of information Google contains on every human being is a dangerous thing to have. However, it is only as dangerous as the people behind the screens. In actuality, the information is arguably the most useful resource to mankind. In the passage, Google demonstrated that by use of its information, we were able to design a formula for detecting the H1N1 virus, eventually helping to control and calm the pandemic. We have so much technology and information that could be potentially harmful, however, we have to realize the information itself is not the problem. We, the people who make conscious decisions are the ones who make the choice. In Google’s case they helped save what would have been thousands of cases of H1N1.

Many of us are engineers here at Tech. We are the minds behind the technology and we can control the use of it. We are the ones who will use technology to benefit the human race. There are many people who fear that the change in technology has been for the worse. Even though many fear change, overall I believe it has had a positive effect. We are now able to live longer, travel faster and communicate farther with technological advances. Technology is a tool that can be dangerous but with the right minds controlling it can lead to a better society ahead.

The Advantages of Big Data

“So in 2013 the amount of stored information in the world is estimated to be around 1,200 exabytes, of which less than two percent is non-digital.”
Mayer-Schönberger, Viktor, and Kenneth Cukier. Big Data: A Revolution That Will Transform How We Live, Work, and Think. Boston: Houghton Mifflin Harcourt, 2013. 9. Print.

Some people believe that the world gets too involved in our private lives by gathering our personal information, but where would we be without these types of services? It frightens some to know that they are being tracked on the internet, and they don’t like the feeling of knowing that “Big Brother” is always watching. These people are worried by the fact that their information is at there for most anyone to see. However, this collection of data is vital since it helps our society as a whole by helping people get the best experience out of their lives.

Big Data gives an example from just a few years ago of how the collection of information enhanced our well-being. The H1N1 virus was a global epidemic, and the CDC was doing a pretty good job, at that time, on tracking the virus’s location. The CDC’s process was tedious and lengthy, though, and Google saw a way that they could enhance the process. When Google unveiled their formula to track where H1N1 was located, it worked just as well, if not better, and allowed the data to be analyzed in real time rather than after 10 days or so.  That wouldn’t be possible if Google wasn’t able to track searches.

We can find almost anything we need on the internet in today’s society. There are few things, compared to the amount of information out there, that you’ll be able to find in a book that you can’t find on the web. The internet’s interface becoming easier over the years has led to increase in its use. Thanks to innovators like Etzioni, the creator of Bing, the internet can help us with all sorts of decisions. It can give us options of fast food based on past searches, or it can give us stats on our favorite player’s previous game just from searching that player. One day, we might even be able receive a weekly weather report every Monday because your device recognizing a pattern of you looking at the weather for a week every sing Monday morning

The ability to have these services is a huge asset for our lives. It has already greatly improved them whether you want to recognize that or not. The world as we know it wouldn’t be the same without big data undergoing the change that it has over the past quarter of a century, and there are still greater changes to come. With nearly all of the data in the world being digital, it’s important that we accept the enhancements from big data taking place around us because they’re just making life easier for humanity. Cukier and Mayer-Schönberger understand this; they give us the example that “in the future- and sooner than we might think- many aspects of our world will be augmented or replaced by computer systems that today are the sole purview of human judgment” (12), which is a fascinating thought that has to constantly be tugging at the back of many of our minds.

.