Royal Caliber Consulting for High Performance Computing

VertexAPI2: User Guide

VertexAPI2 is a CUDA/C++ library for the rapid development of high performance GPU programs that process large graphs. This page provides detailed information on working with the VertexAPI2 library. For a basic introduction to the GAS abstraction and other concepts, please first read the Introduction.

Obtaining and Building

To use VertexAPI2 and build the examples, you will need a working installation of CUDA-5.5 or higher and a Kepler or higher GPU. You will also need a recent copy of the moderngpu library. You will need the GNU make program and a unix-ish shell for the following commands to work.

\$ git clone https://github.com/NVlabs/moderngpu \$ git clone https://github.com/RoyalCaliber/vertexAPI2.git \$ cp vertexAPI2/Makefile.moderngpu moderngpu/Makefile \$ cd moderngpu \$ make \$ cd ../vertexAPI2 \$ make

Running Regression Tests

VertexAPI2 comes with a number of correctness tests. We use Graphlab to generate reference results and for benchmarking. You will need to separately obtain and install Graphlab separately. Then continuing from the previous commands, the following will run regressions:

\$ cd PowerGraphReferenceImplementations #now edit and change GRAPHLAB_DIR in the Makefile to point #to your installation of graphlabapi \$ vi Makefile \$ make \$ cd ../test-graphs \$ make \$ cd ../regressions \$ make

Anatomy Of A Vertex Program

A vertex program defines the state stored at each vertex and at each edge, as well as the gather function $g$, the gather reduction operator $\oplus$, the apply function $f$ and the scatter function $h$ (see the Introduction for the definition of these functions in the GAS context).

A skeleton vertex program is shown below. Each part is discussed in detail below.

//Skeleton Vertex Program struct Program { //declare per-vertex state here struct VertexData {}; //declare per-edge state here struct EdgeData {}; //define the type for the result of the gather phase typedef float GatherResult; //define the identity for the gather reduction operator static const GatherResult gatherZero = 0.0f; //whether gatherReduce is commutative enum { Commutative = true }; __host__ __device__ static GatherResult gatherMap(const VertexData* dst, const VertexData* src, const EdgeData* edge) { } __host__ __device__ static GatherResult gatherReduce(const GatherResult& left, const GatherResult& right) { } __host__ __device__ static bool apply(VertexData* vertexData, const GatherResult& gatherResult) { } __host__ __device__ static void scatter(const VertexData* src, const VertexData *dst, EdgeData* edge) { } };

Declaring Graph State

The Program::VertexData and Program::EdgeData types declare the per-vertex and per-edge state respectively. Typical members are native data types like int, float, etc. Presently variable length data structures and large arrays are not supported within the vertex or edge state structures. (They should still work correctly if the templates compile, but performance may be very poor). To run graph algorithms on graphs with rich data, the currently recommended work-flow is to first project down to a graph which has only the variables relevant to the specific algorithms that you wish to run with VertexAPI2.

Defining the Gather Phase

The gather function $g$ is codified by the prototype of gatherMap and the gather reduction operator $\oplus$ is codified by the gatherReduce prototype. The gatherMap function is invoked for every active edge (i.e. an edge whose destination is active) in the graph. Note that gatherMap takes const pointers to the graph, since no modifications are allowed in the gather phase. The values returned by gatherMap are aggregated using gatherReduce. Both gatherMap and gatherReduce return the same type, a variable of type Program::GatherResult. The GatherResult type is again typically a native scalar type like int or float. If the enumeration Commutative is false, then the reduction is done in the same order as the input graph, otherwise it may be performed in any order. The constant gatherZero must be the identity object for the gatherReduce operator, i.e. the following identity must hold for any value of foo.

gatherReduce(gatherZero, foo) == foo

Defining the Apply Phase

The apply function is invoked for each active vertex in the graph and receives the result of the gather phase in the gatherResult argument and may modify the vertex state through the vertexData pointer. The return value indicates whether outgoing vertices of this vertex should be active in the next iteration. Typically the apply function will update the state and compare to the previous state and will return true if the difference is larger than some threshold. Note that outgoing vertices do not get activated for the scatter phase of the current iteration.

Defining the Scatter Phase

The scatter function is invoked for each outgoing edge of each active vertex and allows you to modify the state of the outgoing edges through the edgeData pointer.

Example: PageRank

Here is an example of the use of the pattern described above to compute the Page Rank for each vertex of a graph. Please see the Introduction for a more verbose description of the algorithm itself. The code listed here should be hopefully now be self-explanatory.

//Vertex program for Pagerank struct PageRank { static const float pageConst = 0.15f; static const float tol = 0.01f; struct VertexData { float rank; int numOutEdges; }; struct EdgeData {}; typedef float GatherResult; static const float gatherZero = 0.0f; enum { Commutative = true }; __host__ __device__ static float gatherMap(const VertexData* dst, const VertexData* src, const EdgeData* edge) { //this division is being done too many times right? //should just store the normalized value in apply? return src->rank / src->numOutEdges; } __host__ __device__ static float gatherReduce(const float& left, const float& right) { return left + right; } __host__ __device__ static bool apply(VertexData* vertexData, const float& gatherResult) { float newRank = pageConst + (1.0f - pageConst) * gatherResult; bool ret = fabs(newRank - vertexData->rank) >= tol; vertexData->rank = newRank; return ret; } __host__ __device__ static void scatter(const VertexData* src, const VertexData *dst, EdgeData* edge) { //nothing } };

The snippet above is from vertexAPI2/pagerank.cu in the source distribution. If you compiled the example code and utility programs, you can run the example program with

$ ./pagerank [-t] [-d] input.mtx input.deg [output]

The input is expected in matrix-market (.mtx) format. The .deg file above is a listing of the out-degrees of each vertex. You can generate a .deg file with

$ ./outdeg foo.mtx foo.deg

The -d option will dump output to stdout and the -t option will run an internal test to make sure that the results on the GPU match the results on a simple CPU reference implementation.

Current Limitations and Known Issues

Graph topology modification is not permitted in this API. This may be added at some point if there are good reasons to do so.

Vector and/or variable-length data on edges or vertices is not supported at this time.

Scatter is disabled in the current version of the code on master branch. This will be enabled when we have completed some further development on the gather implementation

The code will not work correctly on large graphs on Fermi-class devices. This is a direct consequence of a limitation in the moderngpu library.