STAPL API Reference          
Overview   Containers   Algorithms   Views   Skeletons   Run-Time System
Modules     Classes    
Classes | Modules | Typedefs | Functions | Variables
Adaptive Remote Method Invocation (ARMI)

Parallelism, communication and synchronization support. More...

+ Collaboration diagram for Adaptive Remote Method Invocation (ARMI):

Classes

class  stapl::gang
 Creates a new gang by partitioning the existing one from which the gang construction is invoked. More...
 

Modules

 ARMI Tags
 ARMI primitives tags.
 
 ARMI Type traits
 Type traits related to ARMI.
 
 Distributed objects
 Distributed object creation, registration, retrieval and destruction.
 
 ARMI One-sided primitives
 One-sided (point-to-point or point-to-many) communication primitives.
 
 ARMI Collective primitives
 Collective communication primitives.
 
 ARMI Synchronization primitives
 Synchronization primitives.
 
 ARMI Unordered primitives
 Communication primitives with relaxed consistency.
 
 Request aggregation control
 Primitives that control aggregation.
 
 ARMI Utilities
 Utility classes and variables.
 

Typedefs

typedef int stapl::process_id
 Process id type.
 
typedef std::uint8_t stapl::level_type
 Level type.
 

Functions

void stapl::abort (std::string const &)
 Displays the given std::string and aborts execution.
 
void stapl::abort (void)
 Aborts execution.
 
template<typename T >
void stapl::abort (T const &t)
 Outputs the given object to std::cerr as a string and aborts execution.
 
std::set< unsigned int > stapl::external_callers (void)
 Returns the location ids that are going to make the external call.
 
template<typename F , typename... T>
runtime::external_caller< typename std::result_of< F(T...)>::type >::result_type stapl::external_call (F &&f, T &&... t)
 Calls an external library function. More...
 
void stapl::initialize (option opts=option{})
 Initializes the STAPL Runtime System. More...
 
void stapl::initialize (int &argc, char **&argv)
 Initializes the STAPL Runtime System. More...
 
void stapl::finalize (void)
 Finalizes the STAPL Runtime System. More...
 
bool stapl::is_initialized (void) noexcept
 Returns true if the STAPL Runtime System is initialized. More...
 
std::vector< unsigned int > const & stapl::get_hierarchy_widths (void) noexcept
 Returns the widths of all hierarchy levels.
 
unsigned int stapl::get_available_levels (void) noexcept
 Returns the available parallelism levels. More...
 
void stapl::execute (std::function< void(void)> f, unsigned int n=1)
 Executes the given function on a new created environment. More...
 
process_id stapl::get_process_id (void) noexcept
 Returns the current process id.
 
process_id stapl::get_num_processes (void) noexcept
 Returns the number of processes.
 
unsigned int stapl::get_location_id (void) noexcept
 Returns the current location id.
 
unsigned int stapl::get_num_locations (void) noexcept
 Returns the number of locations in the current gang.
 
std::pair< unsigned int, unsigned int > stapl::get_location_info (void) noexcept
 Returns the current location information consisting of the location id and the number of locations in the gang.
 
void stapl::rmi_poll (void)
 Causes the calling location to check for and process all available requests. If none are available it returns immediately. More...
 
template<typename Predicate >
void stapl::block_until (Predicate &&pred)
 Causes the calling location to block until the given predicate returns true. More...
 
stapl::exit_code stapl_main (int argc, char *argv[])
 The starting point for SPMD user code execution. More...
 
affinity_tag get_affinity (void) noexcept
 Returns the affinity of the current processing element.
 

Variables

constexpr process_id stapl::invalid_process_id = -1
 Invalid process id.
 
constexpr unsigned int stapl::invalid_location_id
 Invalid location id. More...
 

Detailed Description

Parallelism, communication and synchronization support.

ARMI (Adaptive Remote Method Invocation) primitives are designed to abstract the creation, registration, communication and synchronization of parallelism in a STAPL program, allowing for performance and portability on different systems.

The unit of parallel execution is called a location. Contrary to the concept of shared-memory threads, locations may or may not live in the same address space. As such, it is undefined behavior to try to share writeable global variables, references and pointers, including static class members, between locations.

Upon program startup, all locations begin SPMD execution in parallel. There are no purely sequential regions. The starting point for execution is

stapl::exit_code stapl_main(int argc, char* argv[])

which replaces the sequential standard

int main(int argc, char* argv[])

The primitives provide shared-object parallelism through distributed objects named p_objects. Locations communicate with each other using Remote Method Invocations (RMI) on distributed objects. As such, each location in which a distributed object has been constructed has a local part of the distributed object.

Distributed objects are identified by a handle, and their local objects are identified by that handle (stapl::rmi_handle) and a location id. As such, all objects that are communication targets must have a handle and register with it. This handle allows for proper address translation between locations.

Since each location owns a local portion of the distributed object, it is not necessary for a location to use RMI to access its local object. However, it is still valid to use RMI on the local objects. It is up to the distributed object implementation to keep track of which portions are local and which are remote.

Some communication primitives are collective, meaning all locations must call the function before it can complete. Collective calls typically need to perform complicated communication patterns among all locations, such as reductions and broadcasts. The rest of the communication calls are point-to-point or one-sided collective operations, and hence need to be called by only one location.

Point-to-point calls cannot be used to explicitly synchronize specific locations. Collective calls imply synchronization if they return a value.

Any RMI call may be aggregated and/or combined for improved performance, by decreasing the amount of network congestion that can happen due to many small messages. See stapl::set_aggregation() for more details.

To ensure portability, only these primitives should be used to express parallelism and synchronization within a STAPL program. The actual implementation varies (OpenMP, pthreads, MPI, etc). Even if it is known that the primitives have been implemented a certain way for a certain system, using calls outside this specification (e.g., MPI calls) is non-portable and highly discouraged.

SEMANTICS OF RMIs:
RMIs make a number of guarantees. First, RMI requests always maintain order, i.e., a newer request may not overtake and execute before an older request, unless explicitly specified (e.g., the unordered primitives). However, there is no guarantee of fairness between locations. For example, although locations 0 and 1 may simultaneously issue requests to location 2, location 2 may receive all of location 1's requests before receiving any of location 0's requests.

Second, remotely invoked methods execute atomically, i.e., they will not be interrupted by other incoming requests or local operations. The only exception is if the remotely invoked method explicitly uses any of the primitives. In this case, all operations before the usage are atomic, as well as all operations after, until either the end of the method or the next RMI operation.

RMI also has a few semantic differences from traditional C++ method invocation. First, the arguments to RMI are pass-by-value, regardless of type (e.g., pointers and references), i.e., the calling location will not see any modifications made to the arguments. Likewise, the receiving location will not see any modifications made to a return value of an RMI. References and pointers are not allowed as return values.

Second, although remotely invoked methods may use and modify the supplied arguments freely, they should should not store pointers or references to the arguments after the invocation completes. This allows the runtime to reuse buffers, instead of continuously allocating space. Also, since arguments may exist within RMI maintained buffers, remotely invoked methods should not try to delete/free the object, or perform a realloc().

In many cases, especially when using aggregation settings greater than 1, starting a request does not imply it has been transferred to or executed by the destination location. There are three stages of a request: creation, issue, and execution. Only the creation stage is guaranteed to complete when asynchronous calls complete, which gathers and stores enough information to ensure that the request may subsequently execute as expected. After aggregation settings are met, a group of requests is issued to the destination location, performing the necessary data transfer.

OPTIMAL USAGE:
As in traditional C++ method invocation, the style of passing arguments can have a significant impact on performance. Most arguments are passed quicker as a const reference, since no intermediate copies are necessary. Although RMIs require a copy from the calling to the receiving location, to preserve copy-by-value semantics, all other copies will be eliminated. It is almost always quickest to pass an object of type T as a T const& if it will not be mutated and sizeof(T)>sizeof(T*).

Warning
Some compilers have problems with function template argument deduction. If your compiler issues such an error, it may be related to several issues: multiple member functions of the class matching the member functions name, arguments that require implicit casting before properly matching the member functions expected arguments etc. A simple solution is to specify a member function more explicitly:
Rtn (Class::*pmf)(Args...) = &Class::f;
async_rmi(..., ..., pmf, ...);

Function Documentation

◆ external_call()

template<typename F , typename... T>
runtime::external_caller< typename std::result_of<F(T...)>::type>::result_type stapl::external_call ( F &&  f,
T &&...  t 
)

Calls an external library function.

Template Parameters
FFunction type.
TArgument types.

This function is useful for calling functions that are not STAPL-aware or thread-safe, such as MPI-based libraries. It is going to call f only from one location per process.

It is the user's responsibility to call the external_call() in a gang that f can be called correctly. Most of the times external_call() should be called in stapl_main().

Warning
Calling any runtime primitive inside f is undefined behavior.
Parameters
fExternal library function to be called.
tArguments to pass to the function.
Returns
If R is not void, the result of f(t...) is returned in a boost::optional<R> which has a value in all locations that f has been called. If R is void, then it returns true in all locations that f has been called, otherwise false.

◆ initialize() [1/2]

void stapl::initialize ( option  opts = option{})

Initializes the STAPL Runtime System.

Warning
This is an SPMD function.
Parameters
optsOptions to pass for initialization.

◆ initialize() [2/2]

void stapl::initialize ( int &  argc,
char **&  argv 
)

Initializes the STAPL Runtime System.

Warning
This is an SPMD function.
Parameters
argcNumber of arguments from main().
argvArgument vector from main().

◆ finalize()

void stapl::finalize ( void  )

Finalizes the STAPL Runtime System.

Warning
This is an SPMD function.

◆ is_initialized()

bool stapl::is_initialized ( void  )
noexcept

Returns true if the STAPL Runtime System is initialized.

Warning
This is an SPMD function.

◆ get_available_levels()

unsigned int stapl::get_available_levels ( void  )
noexcept

Returns the available parallelism levels.

This is based on the environment variable STAPL_PROC_HIERARCHY that accepts a comma separated value list for the shared memory hierarchy.

Each time execute() is called, one or more levels are consumed.

◆ execute()

void stapl::execute ( std::function< void(void)>  f,
unsigned int  n = 1 
)

Executes the given function on a new created environment.

This function will consume n levels of the machine hierarchy.

Parameters
fFunction to be executed.
nParallelism levels that will be consumed.
See also
get_available_levels()

◆ rmi_poll()

void stapl::rmi_poll ( void  )

Causes the calling location to check for and process all available requests. If none are available it returns immediately.

The main purpose of rmi_poll() is to improve timeliness of request processing for a location that does not perform much communication, in support of a location that does.

Warning
User code should never call this function.

◆ block_until()

template<typename Predicate >
void stapl::block_until ( Predicate &&  pred)

Causes the calling location to block until the given predicate returns true.

While the predicate returns false, requests may be executed.

◆ stapl_main()

stapl::exit_code stapl_main ( int  argc,
char *  argv[] 
)

The starting point for SPMD user code execution.

It replaces the sequential equivalent:

int main(int argc, char* argv[])

Variable Documentation

◆ invalid_location_id

constexpr unsigned int stapl::invalid_location_id
Initial value:
=
std::numeric_limits<unsigned int>::max()

Invalid location id.