fastvideo.v1.distributed.device_communicators.pynccl

Contents

`fastvideo.v1.distributed.device_communicators.pynccl`#

Module Contents#

Classes#

PyNcclCommunicator

Data#

API#

class fastvideo.v1.distributed.device_communicators.pynccl.PyNcclCommunicator(group: torch.distributed.ProcessGroup | fastvideo.v1.distributed.utils.StatelessProcessGroup, device: int | str | torch.device, library_path: str | None = None)[source]#

Initialization

Parameters:

group – the process group to work on. If None, it will use the default process group.
device – the device to bind the PyNcclCommunicator to. If None, it will be bind to f”cuda:{local_rank}”.
library_path – the path to the NCCL library. If None, it will use the default library path.

It is the caller’s responsibility to make sure each communicator is bind to a unique device.

all_gather(output_tensor: torch.Tensor, input_tensor: torch.Tensor, stream=None)[source]#

all_reduce(in_tensor: torch.Tensor, op: torch.distributed.ReduceOp = ReduceOp.SUM, stream=None) → torch.Tensor[source]#

broadcast(tensor: torch.Tensor, src: int, stream=None)[source]#

recv(tensor: torch.Tensor, src: int, stream=None)[source]#

reduce_scatter(output_tensor: torch.Tensor, input_tensor: torch.Tensor, op: torch.distributed.ReduceOp = ReduceOp.SUM, stream=None)[source]#

send(tensor: torch.Tensor, dst: int, stream=None)[source]#

fastvideo.v1.distributed.device_communicators.pynccl.logger[source]#: ‘init_logger(…)’