top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

What is Apache Pig?

+1 vote
357 views

What is Apache Pig?
Pig is a high-level platform for creating MapReduce programs used with Hadoop. The language for this platform is called Pig Latin. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMS systems.

 Pig's language layer currently consists of a textual language called Pig Latin, which has the following key properties:

Ease of programming. It is trivial to achieve parallel execution of simple, "embarrassingly parallel" data analysis tasks. Complex tasks comprised of multiple interrelated data transformations are explicitly encoded as data flow sequences, making them easy to write, understand, and maintain.
Optimization opportunities. The way in which tasks are encoded permits the system to optimize their execution automatically, allowing the user to focus on semantics rather than efficiency.
Extensibility. Users can create their own functions to do special-purpose processing.

Pig runs on Apache Hadoop YARN and makes use of MapReduce and the Hadoop Distributed File System (HDFS). The language for the platform is called Pig Latin, which abstracts from the Java MapReduce idiom into a form similar to SQL. While SQL is designed to query the data, Pig Latin allows you to write a data flow that describes how your data will be transformed (such as aggregate, join and sort).

The user can run Pig in two modes, using either the “pig” command or the “java” command:

MapReduce Mode. This is the default mode, which requires access to a Hadoop cluster.
Local Mode. With access to a single machine, all files are installed and run using a local host and file system.

Video for Apache Pig?

 

posted Aug 5, 2015 by anonymous

  Promote This Article
Facebook Share Button Twitter Share Button LinkedIn Share Button


Related Articles

What is Apache Velocity?

Apache Velocity is a Java-based template engine that provides a template language to reference objects defined in Java code. Here is a conversation or quarrel between Velocity (Apache) developers and Spring ones revolving around the reason why Velocity is not supported on the Spring framework.

Velocity is a Java-based templating engine.

It’s an open source web framework designed to be used as a view component in the MVC architecture, and it provides an alternative to some existing technologies such as JSP.

Velocity can be used to generate XML files, SQL, PostScript and most other text-based formats.The core class of Velocity is the VelocityEngine.

It orchestrates the whole process of reading, parsing and generating content using data model and velocity template.

Here are the steps we need to follow for any typical velocity application:

  • Initialize the velocity engine
  • Read the template
  • Put the data model in context object
  • Merge the template with context data and render the view

Velocity Template Language (VTL) provides the simplest and cleanest way of incorporating the dynamic content in a web page by using VTL references.

VTL reference in velocity template starts with a $ and is used for getting the value associated with that reference. 

VTL provides also a set of directives which can be used for manipulating the output of the Java code. Those directives start with #. 

Example

#set ($message = "Query Home")
#set ($customer.name = "Sandeep Bedi")​

Video for Apache Velocity

https://www.youtube.com/watch?v=DnWe-QJHEzQ​

READ MORE
...