Seite 42 - Cloud Services and Big Data

Big Data

33

Figure 15 - Functions of Hadoop

The underlying framework of Apache’s Hadoop is based on Google’s MapReduce

algorithm. Basically an amount of Big Data is dissembled into various pieces, which

are “mapped” by different computer clusters. In a second step, the Reduce algorithm

filters information (based on predefined values) and reassembles it into a

comprehensible result. In many cases it is not an easy task to implement Hadoop in

an established integrated system. The open source software provider Talend

created an application called “Talend Open Studio for Big Data”, which should ease

the burden of implementing Hadoop. Several modules, extensions, and

administration tools take away a lot of initial work and simplify the installation

process (Weiss, 2012, p. 9).

5.4

Transforming Data into Action: Data Scientists

Big Data can have big value for a company, but to derive a value from gathered data

it is necessary to link the harvested information to the business activities. In virtue

of the variety to be found in data pools, skilled persons are requested that combine

programming, with mathematical understanding, as well as creativity and a scientific

instinct (O'Reilly Media Inc., 2012, pp. 9-10). D.J. Patil (2012), former executive for

Source: Apache Software Foundation, n.y., n.p.