Oozie

Oozie is the workflow scheduler system of choice to manage Hadoop jobs. It can combine multiple jobs sequentially into one logical unit of work. It can support MapReduce, Pig, Hive, Sqoop and other jobs. It can also schedule jobs specific to a system, like Java programs or shell scripts.

Oozie is typically used by data developers to build complex data transformations out of multiple component tasks. This provides greater control over jobs and also makes it easier to repeat those jobs at predetermined intervals.

There are two basic types of Oozie jobs:

  1. Oozie Workflow - Directed Acyclical Graphs (DAGs), specifying a sequence of actions to execute. The Workflow job has to wait until completion.

  2. Oozie Coordinator - recurrent Oozie Workflow jobs that are triggered by time and data availability.

For more details visit the OOZIE page on Apache Hadoop website

Last updated