Production Ready Private Cloud Templates

About Hadoop

Apache Hadoop provides a framework for distributed processing of large data sets across clusters of computers using simple programming models. Hadoop can scale from single servers to thousands, with each server providing local computation and storage to the overall cluster.

Our Hadoop Architecture

We are starting with Hortonworks HDP 1.3 in this solution template because of its ubiquity across the Enterprise. Any existing workload can be ported to this deployment. Every Big Data application uses different add-on services specific to their cases. But HDFS and Map Reduce are core to all Big Data applications. For these reasons, this template focuses on providing reliable, scalable, and automated HDFS and Map Reduce.

Design Specifications

  • Sets up a Hadoop master with Name Node and Job Tracker services with a configurable amount of Hadoop workers, each having a Data Node and Task Tracker.

  • Changes in configuration values or adding new Worker Nodes can be managed through SaltStack for ease of administration.

  • Uses HortonWorks HDP 1.3, but can be easily modified to install either Apache Hadoop or Cloudera CDH.