Apache Spark GraphX is a component library provided in the Apache Spark ecosystem that seamlessly works with both graphs as well as with collections.
GraphX implements a variety of graph algorithms and provides a flexible API to utilize the algorithms.
Apache Spark GraphX provides the following types of operators - Property operators, Structural operators and Join operators.
Property Operators - Property operators modify the vertex or edge properties using a user defined map function and produces a new graph.
Structural Operators - Structural operators operate on the structure of an input graph and produces a new graph.
Join Operators - Join operators add data to graphs and produces a new graphs.
Property operators modify the vertex or edge properties using a user defined map function and produces a new graph.
Property operators do not impact the graph structure, but the resulting graph reuses the structural indices of the original graph.
Apache Spark GraphX provides the following property operators - mapVertices(), mapEdges(), mapTriplets()
Structural operators modify the structure of input graph and produces a new graph.
Apache Spark Graphx provides the following structural operators.
reverse()
subgraph()
mask()
groupEdges()
Join operators join data from external collections (RDDs) with graphs. Apache Spark Graphx provides the following join property operators.
joinVertices() - The joinVertices() operator joins the input RDD data with vertices and returns a new graph. The vertex properties are obtained by applying the user defined map() function to the result of the joined vertices. Vertices without a matching value in the RDD retain their original value.
outerJoinVertices() - The outerJoinVertices() operator joins the input RDD data with vertices and returns a new graph. The vertex properties are obtained by applying the user defined map() function to the all vertices, and includes ones that are not present in the input RDD.
Apache Spark Graphx provides the following neighborhood aggregation operations.
aggregateMessages() -
mapReduceTriplets() -
collectNeighbours()
Apache Spark Graphx provides various operation to build graphs from an RDD of vertices and edges.
GraphLoader.edgeListFile()
Graph.apply()
Graph.fromEdges()
Graph.fromEdgeTuples()
Apache Spark GraphX provides a set of algorithms to simplify analytics tasks.
Page Rank - PageRank measures the importance of each vertex in a graph.
Connected Components - The connected components algorithm labels each connected component of the graph with the ID of its lowest-numbered vertex.
Triangle Counting - A vertex is part of a triangle when it has two adjacent vertices with an edge between them. GraphX implements a triangle counting algorithm in the TriangleCount object that determines the number of triangles passing through each vertex, providing a measure of clustering.