This tool is used to measure torch.distributed.rpc throughput and latency for reinforcement learning.
The benchmark spawns one agent process and a configurable number of observer processes. As this benchmark focuses on RPC throughput and latency, the agent uses a dummy policy and observers all use randomly generated states and rewards. In each iteration, observers pass their state to the agent through torch.distributed.rpc and wait for the agent to respond with an action. If batch=False, then the agent will process and respond to a single observer request at a time. Otherwise, the agent will accumulate requests from multiple observers and run them through the policy in one shot. There is also a separate coordinator process that manages the agent and observers.
In addition to printing measurements, this benchmark produces a JSON file.  Users may choose a single argument to provide multiple comma-separated entries for (ie: world_size="10,50,100") in which case the JSON file produced can be passed to the plotting repo to visually see how results differ.  In this case, each entry for the variable argument will be placed on the x axis.
The benchmark results comprise of 4 key metrics:
batch=False you can think of it as batch_size=1.(batch_size / agent_latency).  If not using batch, you can think of it as batch_size=1.batch=False, observer latency is the agent latency plus the transit time it takes for the request to get to the agent from the observer plus the transit time it takes for the response to get to the observer from the agent.  When batch=True there will be more variation due to some observer requests being queued in a batch for longer than others depending on what order those requests came into the batch in.(1 / observer_latency).This benchmark depends on PyTorch.
For any environments you are interested in, pass the corresponding arguments to python launcher.py.
python launcher.py --world_size="10,20" --master_addr="127.0.0.1" --master_port="29501 --batch="True" --state_size="10-20-10" --nlayers="5" --out_features="10" --output_file_path="benchmark_report.json"
Example Output:
--------------------------------------------------------------
PyTorch distributed rpc benchmark reinforcement learning suite
--------------------------------------------------------------
master_addr : 127.0.0.1
master_port : 29501
batch : True
state_size : 10-20-10
nlayers : 5
out_features : 10
output_file_path : benchmark_report.json
x_axis_name : world_size
world_size | agent latency (seconds)     agent throughput            observer latency (seconds)  observer throughput
            p50    p75    p90    p95    p50    p75    p90    p95    p50    p75    p90    p95    p50    p75    p90    p95
10          0.002  0.002  0.002  0.002  4432   4706   4948   5128   0.002  0.003  0.003  0.003  407    422    434    443
20          0.004  0.005  0.005  0.005  4244   4620   4884   5014   0.005  0.005  0.006  0.006  191    207    215    220