Apache Spark - Graphx - Interview Questions

What is Apache Spark GraphX?

FAQ

Apache Spark GraphX is a component library provided in the Apache Spark ecosystem that seamlessly works with both graphs as well as with collections.

GraphX implements a variety of graph algorithms and provides a flexible API to utilize the algorithms.

What are the different types of operators provided in the Apache GraphX library?

FAQ

Apache Spark GraphX provides the following types of operators - Property operators, Structural operators and Join operators.

Property Operators - Property operators modify the vertex or edge properties using a user defined map function and produces a new graph.

Structural Operators - Structural operators operate on the structure of an input graph and produces a new graph.

Join Operators - Join operators add data to graphs and produces a new graphs.

What are the property operators provided in the GraphX library?

FAQ

Property operators modify the vertex or edge properties using a user defined map function and produces a new graph.

Property operators do not impact the graph structure, but the resulting graph reuses the structural indices of the original graph.

Apache Spark GraphX provides the following property operators - mapVertices(), mapEdges(), mapTriplets()

What are the structural operators provided in the Grapx library?

FAQ

Structural operators modify the structure of input graph and produces a new graph.

Apache Spark Graphx provides the following structural operators.

reverse()

subgraph()

mask()

groupEdges()

What are the join operators provided in the Grapx library?

FAQ

Join operators join data from external collections (RDDs) with graphs. Apache Spark Graphx provides the following join property operators.

joinVertices() - The joinVertices() operator joins the input RDD data with vertices and returns a new graph. The vertex properties are obtained by applying the user defined map() function to the result of the joined vertices. Vertices without a matching value in the RDD retain their original value.

outerJoinVertices() - The outerJoinVertices() operator joins the input RDD data with vertices and returns a new graph. The vertex properties are obtained by applying the user defined map() function to the all vertices, and includes ones that are not present in the input RDD.

Big Data Interview Guide has over 150+ interview questions and answers. Get the guide for $44.95 only.

BUY EBOOK

What are the neighborhood aggregation operations provided in the GraphX library?

FAQ

Apache Spark Graphx provides the following neighborhood aggregation operations.

aggregateMessages() -

mapReduceTriplets() -

collectNeighbours()

How do you build graphs from a collection of vertices and edges in an RDD using GraphX library?

FAQ

Apache Spark Graphx provides various operation to build graphs from an RDD of vertices and edges.

GraphLoader.edgeListFile()

Graph.apply()

Graph.fromEdges()

Graph.fromEdgeTuples()

What are the analytic algorithms provided in Apache Spark GraphX?

FAQ

Apache Spark GraphX provides a set of algorithms to simplify analytics tasks.

Page Rank - PageRank measures the importance of each vertex in a graph.

Connected Components - The connected components algorithm labels each connected component of the graph with the ID of its lowest-numbered vertex.

Triangle Counting - A vertex is part of a triangle when it has two adjacent vertices with an edge between them. GraphX implements a triangle counting algorithm in the TriangleCount object that determines the number of triangles passing through each vertex, providing a measure of clustering.