Sharded Formats
As the single-file formats are inherently sequential, we also provide ways to read graphs that have been sharded across multiple files.
Flat Sharded
The format assumes that all of the shards are in the same directory. This may not be the best method if you have thousands of shards, as that may stress your machine's parallel file system.
Format
The file structure for a graph named "foo" would look like:
.
|-- foo.0
|-- foo.1
`-- foo.metadata
This particular graph is sharded across two files, which take the name "foo.XXX". The metadata file contains information about the graph, which is the number of vertices, the number of edges and the number of shards:
$ cat foo.metadata
64 112 2
Reading
Use the following function to read this kind of graph:
auto graph = sharded_graph_reader<graph_type>(metadata_filename, read_adj_list_line());
If the format for each line is an edge list, you can use read_edge_list_line
instead of the
adjacency list line reader.
Nested Sharder
If the number of shards is large (for running on tens or hundreds of thousands of cores), it may be better to use the nested sharded format, which places files into a hierarchy of directories such that no single folder contains a large amount of shards.
Format
There will be a metadata file which will follow the format:
#vertices #edges #shards
dim_0
dim_1
...
dim_k
The #shards
value represents the total number of files that the file was split into. Each dimension represents how many files or directories are in a given level of the file system hierarchy.
All of the lines from the original graph will be split into new files, each following the path format i_0/i_1/.../i_k
where each i
represents a single number. These numbers shall not exceed their corresponding dimension limit from the metadata file.
For example, the files could be organized as follows:
├── 0
│ ├── 0
│ ├── 1
...
│ ├── 10
├── 1
...
├── 10
├── foo.md
Each leaf of this tree will be a sharded file that contains a subset of the lines of the original file. In this example, 0/10
is a file that contains a part of the graph. The foo.md
file is the metadata file.
Reading
Use the following function to read this kind of graph:
auto graph = nested_sharded_graph_reader<graph_type>(metadata_filename, read_adj_list_line());
If the format for each line is an edge list, you can use read_edge_list_line
instead of the
adjacency list line reader.