Sharded Formats

As the single-file formats are inherently sequential, we also provide ways to read graphs that have been sharded across multiple files.

Flat Sharded

The format assumes that all of the shards are in the same directory. This may not be the best method if you have thousands of shards, as that may stress your machine's parallel file system.

Format

The file structure for a graph named "foo" would look like:

.
|-- foo.0
|-- foo.1
`-- foo.metadata

This particular graph is sharded across two files, which take the name "foo.XXX". The metadata file contains information about the graph, which is the number of vertices, the number of edges and the number of shards:

$ cat foo.metadata
64 112 2

Reading

Use the following function to read this kind of graph:

auto graph = sharded_graph_reader<graph_type>(metadata_filename, read_adj_list_line());

If the format for each line is an edge list, you can use read_edge_list_line instead of the adjacency list line reader.

Nested Sharder

If the number of shards is large (for running on tens or hundreds of thousands of cores), it may be better to use the nested sharded format, which places files into a hierarchy of directories such that no single folder contains a large amount of shards.

Format

There will be a metadata file which will follow the format:

#vertices #edges #shards
dim_0
dim_1
...
dim_k

The #shards value represents the total number of files that the file was split into. Each dimension represents how many files or directories are in a given level of the file system hierarchy.

All of the lines from the original graph will be split into new files, each following the path format i_0/i_1/.../i_k where each i represents a single number. These numbers shall not exceed their corresponding dimension limit from the metadata file.

For example, the files could be organized as follows:

├── 0
│   ├── 0
│   ├── 1
...
│   ├── 10
├── 1
...
├── 10
├── foo.md

Each leaf of this tree will be a sharded file that contains a subset of the lines of the original file. In this example, 0/10 is a file that contains a part of the graph. The foo.md file is the metadata file.

Reading

Use the following function to read this kind of graph:

auto graph = nested_sharded_graph_reader<graph_type>(metadata_filename, read_adj_list_line());

If the format for each line is an edge list, you can use read_edge_list_line instead of the adjacency list line reader.

results matching ""

    No results matching ""