Status and
future of Big Data in an industry:
How
organizations and companies using data and converting it to actions,
intelligent decisions and valuable operations?
By processing
and mining big data, organizations are using petabytes of information to gain
insights into efficacy of supply chain, customer behavior and other part of
business performance.
How is big data
different?
By using
internet, data is generated through new source. Also, big data is produced automatically
by machines for example, sensor embedded in an engine. Moreover, not all of
data is valuable, so it need to be focused on important parts of data.
Big data at
some big companies:
According toDanial Price Google is the largest big data company in the world. It operates
3.5 billion requests per day and it is estimated that google stores over10
exabytes of data (10 billion gigabytes) while Facebook alone has 2.5 billion
pieces of content, 2.7 billion ‘likes’ and 300 million photos – all of which
adds up to more than 500 terabytes of data. Amazon extracts data from over 150
million customer’s purchases to assist users choose on items to purchase.
Amazon use
massive amount of historical purchasing data to make accurate forecasts for
shopping needs. In fact, Amazon estimated to have around 1 exabyte of data
stored.
Target has
focused attention on observing customers buying histories, assess income,
estimate ages and marital statuses in order to predict potential buying
patterns.
Types of tools
using in big data:
Big data infrastructure
deal with some software such as:
Hadoop: it is a
software for data-intensive distributed applications based in the MapReduce programming
model and Hadoop Distributed file system which is distributed file system.
According to Wei
Fan and Albert Bifet () “Hadoop allows writing applications that rapidly process
large amounts of data
in parallel on large clusters of compute nodes, a MapReduce job divides the
input dataset into independent subsets that are processed by map tasks in
parallel.”
Processing big
data:
integrating
disparate data stores by mapping data to the programming framework and then
data connecting and extracting from storage. After that, data need to be transforming
for processing. Finally, preparing data for Hadoop MapReduce by subdividing
data.
Briefly, there
are three stages: Map stage, Shuffle stage and Reduce stage
In Map stage,
the input data stores in Hadoop file system (HDFS) in form of file or directory.
In Shuffle and
Reduce stage, it operates the data that comes from map stage and generate new
output which also stores in HDFS.
No comments:
Post a Comment