Joins in Hadoop

I want to use hadoop for performing operation on graph data I have two file :

1) Edge list file
This file contains one line for each edge in the graph.
sample:

1  2 (here 1 is source and 2 is sink node for the edge)
1  5
2  3
4  2
4  3
5  6
5  4
5  7
7  8
8  9
8  10

2) Partition file
This file contains one line for each vertex. Each line has two values first number is and second number is
sample :

The Edge list file is having size of 32Gb, while partition file is of 10Gb. (size is so large that map/reduce can read only partition file . I have 20 node cluster with 24Gb memory per node.)

My aim is to get all vertices (along with their adjacency list) those having same partition id in one reducer so that I can perform further analytics on a given partition in reducer.

Is there any way in hadoop to get join of these two file in mapper and so that I can map based on the partition id?

Joins in Hadoop

Your comment on this post:

Your answer

Preview