Routing Flume Events to Multiple Sinks

When importing data using Flume, you might want to route Flume events to multiple destinations (e.g.: different directories in HDFS) based on their content. Flume has a functionality called Multiplexing to achieve this goal, this article is a guide to the configuration.

More

Oct 7 2014

Tech

Load Data to Hive Using Flume

Flume has a built-in HDFS sink. Importing data to Hive is almost the same as saving data to HDFS directories, with a little difference. This is a guide about the Flume configuration and the corresoponding Hive-QL to load the data table.

More

Oct 4 2014

Tech

Implement a Flume Deserializer Plugin to Import XML Files

Background

Flume is an open-source Apache project, it is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. This article shows how to import XML Files with Flume, including the development of a deserializer plugin and the corresponding configurations of Flume. We are using Flume 1.5.0 integrated in MapR.

The secenario is that XML files are sychronized to a directory periodically, we need to config a Spooling Directory Source to load these XML files into Flume.

Implement a Flume Deserializer

The default deserializer of Flume’s Spooling Directory Source is LineDeserializer, which simply parses each line as an Flume event. In our case, we need to implement a deserializer for XML files based on the structure.

More

Sep 21 2014

Tech

MapR M3 Single-node Cluster Installation on CentOS 6

If you need a single-node MapR cluster and you are not able to use the official MapR sandbox image, you can use this guide to install MapR on a CentOS.

More

Xing's Coding the Future

Background

Implement a Flume Deserializer