Seite 42 - Cloud Services and Big Data

Big Data
33
Figure 15 - Functions of Hadoop
The underlying framework of Apache’s Hadoop is based on Google’s MapReduce
algorithm. Basically an amount of Big Data is dissembled into various pieces, which
are “mapped” by different computer clusters. In a second step, the Reduce algorithm
filters information (based on predefined values) and reassembles it into a
comprehensible result. In many cases it is not an easy task to implement Hadoop in
an established integrated system. The open source software provider Talend
created an application called “Talend Open Studio for Big Data”, which should ease
the burden of implementing Hadoop. Several modules, extensions, and
administration tools take away a lot of initial work and simplify the installation
process (Weiss, 2012, p. 9).
5.4
Transforming Data into Action: Data Scientists
Big Data can have big value for a company, but to derive a value from gathered data
it is necessary to link the harvested information to the business activities. In virtue
of the variety to be found in data pools, skilled persons are requested that combine
programming, with mathematical understanding, as well as creativity and a scientific
instinct (O'Reilly Media Inc., 2012, pp. 9-10). D.J. Patil (2012), former executive for
Source: Apache Software Foundation, n.y., n.p.