Big Data

[ ]

Front Ends and Extensions for Hadoop

Front Ends and Extensions Take Hadoop in New Directions describe the extenders and connectors for Hadoop and examples of how Hadoop can be taken in new directions with these tools:

  • AtScale makes data stored in Hadoop’s file system accessible within popular Business Intelligence (BI) applications.
  • Microsoft is making it easier to work with Hadoop directly from the Excel spreadsheet. Hortonworks has also made a straightforward tutorial on how you can use Excel as a front end for culling insights with Hadoop.
  • Talend Open Studio for Big Data provides a friendly front end for easily working with Hadoop to mine large data sets.
  • Twill is an abstraction over Apache Hadoop YARN that reduces the complexity of developing distributed Hadoop applications, allowing developers to focus more on their application logic.
  • Kylin is an open source Distributed Analytics Engine designed to provide an SQL interface and multi-dimensional analysis (OLAP) on Apache Hadoop, supporting extremely large datasets.
  • Lens is a Unified Analytics platform. It provides an optimal execution environment for analytical queries in the unified view.

Big Data System

Qunar real-time stream processing systems uses Apache Mesos for cluster management. We use Mesos to manage Apache Spark, Flink, Logstash, and Kibana. Logs come from multiple sources and we consolidate them with Kafka. The main computing frameworks, Spark streaming and Flink, subscribe to the data in Kafka, process the data, and then persist the results to HDFS (Hadoop Distributed File System).

Written on September 20, 2016