Tag Archives: Long Post

There’s More than Meets the Eye

Big Data Inforgraphic

This is a link to the Infographic

        This infographic shows a number of statistics related to the collection and transfer of data on the internet, giving the audience an idea of how massive “The World of Data” really is. This information is presented in such a way that the audience believes the information, instead of questioning the sources of the data. The viewers, including myself, get attached to the point that this infographic is trying to make by honing in on specific facts such as: Google collects 24 Petabytes of data per day, 20 hours of video is uploaded to YouTube every minute, and 2.9 million emails are sent every second, which causes us to trust the information in this random image. However, how can we trust the sources of this information and where do they come from? To find out, we will take a look at the specific piece of data: “Google collects 24 petabytes of data per day.” By analyzing the source of information in this image, we can determine the reliability and value of the infographic itself.

Big Data Infographic

 

The claim that “Google processes 24 petabytes of data per day” must have come from some research or information that Google presented themselves. To find this research, I began by searching the web for “Google’s Data Consumption” (I actually used Bing as a search engine, just in case Google was not willing to freely release this information to the public). I got redirected a couple of times to new websites, but it didn’t take long before I found an article about MapReduce, which is the software Google uses to sort and process their large quantities of data. In this article, a photo was shown comparing the amount of data Google has processed from August 2004 to September 2007. If you look at the numbers for 2007, and add up the amount of input data with the amount of machines used, it does indeed come out to over 20 petabytes.

Google MapReduce Satistics

 

Here’s the link to the magazine

        This article was published in 2008, in the “Communications of the ACM” magazine. “ACM (Association of Computing Machinery) is the world’s largest educational and scientific computing society, and they deliver resources that advance computing as a science and a profession.” The fact that this source was researched by a reliable Association, reviewed by a publishing company, and published, I believe it establishes itself as highly credible. The original infographic also mentioned MapReduce as one of its sources, therefore I think this Infographic uses reliable information and can be trusted.

ACM’s website is here

Big Data Infographic 2

        This infographic uses the reliable information that “Google collects 24 petabytes of data per day,” and puts it in context to make a strong claim about “How Big the World of Data” really is. This is how most infographics are, therefore the source of information is usually irrelevant, because the strong claims and visual evidence allows the audience to believe and consider the claim being made. However, the sources of information really matter, especially when being made in other contexts, such as a lawsuit against Google, or a scientific study about how information is collected online. Therefore, it’s important to understand the reliability and value of a piece of information by knowing the source. There’s a reason you cite all of your sources in a research paper, or any other academic paper for that matter. It’s not just so you can sound smarter, it proves that your work is credible and your facts come from actual data and is not made up. This infographic may have turned out to be reliable, however not all infographics are. Depending on the context the information is being used in, most infographics should not be trusted without a little bit of background research.