Supersteps and Active Vertex Sets

The computation in SGL proceeds iteratively, in supersteps. Each superstep executes over a subset of the graph’s vertices (called the active vertex set). This set consists of vertices that were marked active by the algorithm (we will show in the next section how vertices can be marked active). For the initial superstep, all the graph’s vertices are considered active.

For each superstep, SGL executes the vertex operator on all active vertices. For each vertex (figure-1), the vertex operator initiates neighbor visitations. SGL takes care of updating its neighboring vertices with the result of the computation by applying the neighbor operators to the target vertices as needed. At this point, the neighboring vertices may become active due to the application of the neighbor operator. This happens iteratively until there are no active vertices, at which point the algorithm terminates.

We will now go over how to implement these operators, using the running example of a breadth-first traversal (BFS) algorithm.

The Vertex Operator

The vertex operator describes the computation to be performed on each vertex. We will assume for now that each vertex stores a value that can be read or modified. The vertex also has access to its list of outgoing edges. The purpose of the vertex operator is to read this stored value, perform the computation required for the algorithm, and produce a new value which will be propagated to the neighbors of this vertex. We start with the template of this operator:

struct VertexOp {
  using result_type = bool;
  template<typename Vertex, typename GraphVisitor>
  bool operator()(Vertex v, GraphVisitor graph_visitor) {
      // Computation for BFS goes here.
  }
};

This is the template for all vertex operators in SGL. Only the computation performed inside the operator() differs between algorithms. There are a few points to note in this template. The first, is that we define a result_type, which is always of type bool. This is also the type returned by the operator(), and is used to decide the termination condition of the algorithm -- if all operators return false, the computation terminates, while if any operator returns true, the computation proceeds to the next iteration.

Next, the operator() itself takes two arguments, a vertex v on which the computation is to be performed, and a GraphVisitor object. The GraphVisitor object contains methods to allow visiting neighboring vertices, and is provided by SGL to all vertex operators it invokes.

A Breadth-First Search (BFS) traversal starts with a ‘source’ vertex s, and visits all its neighbor vertices first, before iteratively visiting the neighbors’ neighbors and so on. On each visit, the algorithm stores the vertex’s distance from the source. To perform a BFS traversal in our paradigm, we need to ‘think like a vertex’, i.e., the operator must decide what to do on a single vertex ‘v’ which, when repeatedly applied on the vertices of the graph, will execute the traversal.

In a sequential BFS, there is a global queue that contains vertices to be processed. We pop a vertex from the queue, and first check if the vertex is active (i.e., if it is marked as ‘GREY’ colored). If so, we set the distance of all its neighbors to v’s distance+1, and add them to the queue. We keep iterating until the queue is empty.

In SGL, we can apply a similar logic, except here, the ‘queue’ is globally distributed, unordered, and handled by the SGL framework. Furthermore, SGL splits the handling vertices and their neighbors into two parts (one to check if vertex is active and obtain v’s distance (vertex-operator), and the other to update the distances of v’s neighbors with distance+1). Figure-2 shows the code for the vertex operator for BFS.

struct BFSVertexOp {
    using result_type = bool;
  template<typename Vertex, typename GraphVisitor>
  bool operator()(Vertex v, GraphVisitor graph_visitor) {
    if (v.property().active()) {
      v.property().active(false);
      int distance = v.property().distance();
      graph_visitor.visit_all_edges(v, BFSNeighborOp(v.descriptor(), distance+1));
      return true;
    }
    return false;
  }
};

In the operator above, the code required to implement the vertex operator for BFS is highlighted in blue. The operator first checks if the vertex is ‘active’, and gets the distance of the vertex. We assume for now that the vertex object has a ‘property’ stored on it that contains its distance from source and whether or not it is active. The operator then uses the GraphVisitor's visit_all_edges() method to apply the neighbor operator to all neighbors of vertex v. The current vertex’s descriptor (which is its unique identifier) is stored in the neighbor operator, along with the new distance. The descriptor will be used to set the ‘parent’ vertex of the neighbor for the traversal, while the distance will be the neighbor’s distance from source.

Finally, the operator returns true if the vertex was processed (or it sent out updates for its neighbors), or false otherwise. This is used for termination-detection of the algorithm by SGL -- when all vertex operators return false, the algorithm terminates. If even one vertex operator returns true, SGL starts another iteration.

Technical Specification of the Vertex-Operator

Must return a boolean indicating if the vertex was updated.
Can spawn visits on other vertices via the GraphVisitor::visit/visit_all_edges methods.
Can only directly access/manipulate the given vertex on which it is applied, as well as its edges.

The Neighbor Operator

The neighbor operator carries the result of the computation done in the vertex operator, and updates the neighbor vertices with that result. The operator is applied automatically by SGL to all vertices that are specified to be visited via the GraphVisitor in the vertex operator. In the BFS case, the vertex operator calls graph_visitor.visit_all_edges(v, BFSNeighborOp(...)), which applies the specified operator, along with the arguments passed to it, to all the neighbors of vertex v. The visitor also has a method to specify and visit individual vertices: graph_visitor.visit(vertex_descriptor, NeighborOp(...)), which can be used to visit an arbitrary (valid) vertex in the graph.

The neighbor operator for BFS looks like this:

class BFSNeighborOp {
  // Data members that are payloads to update target.
  size_t m_parent, m_level;

public:
  // Constructor to initialize payload.
  NeighborOp(size_t parent = 0, size_t level = 0)
    : m_parent(parent), m_level(level)
  {}

  using result_type = bool;
  template <typename Vertex>
  result_type operator()(Vertex target) const {
      // Update target vertex’s value if needed.
    if (target.property().level() > m_level) {
      target.property().level(m_level);
      target.property().parent(m_parent);
      target.property().active(true);
      return true;  // Indicates target was updated.
    }
    return false;  // Else ignore.
  }

  // Serialization of data members/payload.
  void define_type(typer& t) {
    t.member(m_parent);
    t.member(m_level);
  }
};

Figure-3: The BFS neighbor operator.

NOTE: An important point to note about the neighbor operator is that it should be idempotent to the algorithm, i.e., repeated application of the operator to a vertex should still produce the correct end-result. Notice, for example, that the BFS neighbor operator checks if the vertex’s property is greater than the incoming distance, and only sets it if this is true. Regardless of how many times it is applied on a vertex, even with different values, and in any order, the end result is still the minimum of all the incoming values. If the operator cannot be made idempotent, then care should be taken to only run the algorithm in the BSP (level-synchronous) execution mode.

Technical Specification of the Neighbor-Operator

Must be idempotent (refer to note above) for non-BSP/non-level-synchronous paradigm.
Must return a boolean indicating if the vertex was updated.
Can not spawn visits on other vertices.
Can only access/manipulate the given vertex on which it is applied.

Another Example -- Single-Source Shortest Paths

For the SSSP algorithm, the only difference from BFS is that the edges have weights, and the distance between two neighboring vertices is the weight of the edge connecting them, rather than 1. Therefore, the only modification needed is in the Vertex operator. Since we need to visit every neighboring vertex with a different distance, we use a different method of the graph_visitor that allows us to visit individual vertices. Below, we present the updated vertex operator for SSSP:

struct SSSPVertexOp {
    using result_type = bool;
  template<class Vertex, class GraphVisitor>
  result_type operator()(Vertex v, GraphVisitor graph_visitor) {
    if (v.property().active()) {
      int distance = v.property().distance();
      v.property().active(false);
      for (const auto& e : v.edges()) {
        graph_visitor.visit(e.target(), BFSNeighborOp(v.descriptor(), distance+e.property()));
      }
      return true;
    }
    return false;
  }
};

Figure-4: The SSSP vertex operator, with SSSP-specific code highlighted in blue.

The Initializer

Once we have the vertex and neighbor operators, before we apply them to the graph, we need to initialize the vertex/edge values as appropriate for the algorithm In the case of traversals, we can assign a distance of zero to the source vertex, and a distance of INT_MAX to all other vertices. Similarly, we can mark the source vertex as active, while all others are inactive. All vertices start out being their own parent initially. This can be done using an initializer functor as shown below:

class BFSInit {
    size_t m_source;
public:
  BFSInit(size_t source) : m_source(source) {}

  using result_type = void;
  template<typename Vertex>
  result_type operator()(Vertex v) {
    if (v.descriptor() == m_source) {
      v.property().distance(0);
      v.property().active(true);
    } else {
      v.property().distance(INT_MAX);
      v.property().active(false);
    }
    v.property().parent(v.descriptor());
  }
};

Figure-5: The BFS/SSSP intializer.

Technical Specification of the Initializer

Operates over vertices and their edges. Can directly manipulate the vertex and its edges.
May spawn visits on other vertices (if using graph_paradigm), or may not (if using map_func / map_reduce).
The initialization step can also be a graph traversal, if needed.

SGL Model