top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Joins in Hadoop

+1 vote
177 views

I want to use hadoop for performing operation on graph data I have two file :

1) Edge list file
This file contains one line for each edge in the graph.
sample:

1  2 (here 1 is source and 2 is sink node for the edge)
1  5
2  3
4  2
4  3
5  6
5  4
5  7
7  8
8  9
8  10

2) Partition file
This file contains one line for each vertex. Each line has two values first number is and second number is
sample :

2  1
3  1
4  1
5  2
6  2
7  2
8  1
9  1
10  1

The Edge list file is having size of 32Gb, while partition file is of 10Gb. (size is so large that map/reduce can read only partition file . I have 20 node cluster with 24Gb memory per node.)

My aim is to get all vertices (along with their adjacency list) those having same partition id in one reducer so that I can perform further analytics on a given partition in reducer.

Is there any way in hadoop to get join of these two file in mapper and so that I can map based on the partition id?

posted Jun 24, 2015 by anonymous

Looking for an answer?  Promote on:
Facebook Share Button Twitter Share Button LinkedIn Share Button

...