Wednesday, April 19, 2017

Status of Big Data


Status and future of Big Data in an industry:
How organizations and companies using data and converting it to actions, intelligent decisions and valuable operations?
By processing and mining big data, organizations are using petabytes of information to gain insights into efficacy of supply chain, customer behavior and other part of business performance.
How is big data different?
By using internet, data is generated through new source. Also, big data is produced automatically by machines for example, sensor embedded in an engine. Moreover, not all of data is valuable, so it need to be focused on important parts of data.

Big data at some big companies:

According toDanial Price Google is the largest big data company in the world. It operates 3.5 billion requests per day and it is estimated that google stores over10 exabytes of data (10 billion gigabytes) while Facebook alone has 2.5 billion pieces of content, 2.7 billion ‘likes’ and 300 million photos – all of which adds up to more than 500 terabytes of data. Amazon extracts data from over 150 million customer’s purchases to assist users choose on items to purchase.
Amazon use massive amount of historical purchasing data to make accurate forecasts for shopping needs. In fact, Amazon estimated to have around 1 exabyte of data stored.
Target has focused attention on observing customers buying histories, assess income, estimate ages and marital statuses in order to predict potential buying patterns.
Types of tools using in big data:
Big data infrastructure deal with some software such as:
Hadoop: it is a software for data-intensive distributed applications based in the MapReduce programming model and Hadoop Distributed file system which is distributed file system.
According to Wei Fan and Albert Bifet () “Hadoop allows writing applications that rapidly process
large amounts of data in parallel on large clusters of compute nodes, a MapReduce job divides the
input dataset into independent subsets that are processed by map tasks in parallel.”
Processing big data:
integrating disparate data stores by mapping data to the programming framework and then data connecting and extracting from storage. After that, data need to be transforming for processing. Finally, preparing data for Hadoop MapReduce by subdividing data.

Briefly, there are three stages: Map stage, Shuffle stage and Reduce stage
In Map stage, the input data stores in Hadoop file system (HDFS) in form of file or directory.
In Shuffle and Reduce stage, it operates the data that comes from map stage and generate new output which also stores in HDFS.





No comments:

Post a Comment

Fashion Industry and Big Data

In this post and the next two posts, I am going to write about big data and how fashion industry leverages it. In social media era, peopl...