đĨī¸ Multi-GPU Simulation#
Genesis supports multi-GPU execution for scaling simulations.
Single GPU Configuration#
import genesis as gs
# Automatic GPU selection
gs.init(backend=gs.gpu)
# Force specific backend
gs.init(backend=gs.cuda) # NVIDIA CUDA
gs.init(backend=gs.metal) # Apple Metal
gs.init(backend=gs.cpu) # CPU fallback
Parallel Environments (Single GPU)#
Scale by batching environments on one GPU:
scene.build(n_envs=2048, env_spacing=(1.0, 1.0))
# All environments run in parallel on same GPU
Multi-GPU with Multiprocessing#
Run separate processes per GPU:
import os
import multiprocessing
def run_simulation(gpu_id):
os.environ["CUDA_VISIBLE_DEVICES"] = str(gpu_id)
os.environ["TI_VISIBLE_DEVICE"] = str(gpu_id)
os.environ["EGL_DEVICE_ID"] = str(gpu_id)
import genesis as gs
gs.init(backend=gs.gpu)
# ... simulation code ...
if __name__ == "__main__":
for i in range(2): # 2 GPUs
p = multiprocessing.Process(target=run_simulation, args=(i,))
p.start()
Distributed Training (DDP)#
Use PyTorch Distributed Data Parallel:
torchrun --standalone --nnodes=1 --nproc_per_node=2 train.py
import os
import torch
import torch.distributed as dist
import genesis as gs
local_rank = int(os.environ.get("LOCAL_RANK", 0))
os.environ["CUDA_VISIBLE_DEVICES"] = str(local_rank)
os.environ["TI_VISIBLE_DEVICE"] = str(local_rank)
gs.init(backend=gs.gpu, seed=local_rank)
scene.build(n_envs=2048)
torch.cuda.set_device(0)
dist.init_process_group(backend="nccl", init_method="env://")
model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[0])
# Training loop with gradient synchronization
for step in range(steps):
scene.step()
loss.backward() # DDP handles all-reduce
optimizer.step()
dist.barrier()
dist.destroy_process_group()
Environment Variables#
Variable |
Purpose |
|---|---|
|
PyTorch/CUDA GPU selection |
|
Taichi GPU selection |
|
Rendering GPU (OpenGL/EGL) |
Always set all three together for multi-GPU setups.
GPU Selection Patterns#
Pattern |
Method |
GPUs |
Complexity |
|---|---|---|---|
Single GPU |
|
1 |
Low |
Batched envs |
|
1 |
Low |
Multi-process |
Multiprocessing + env vars |
N |
Medium |
Distributed |
torchrun + DDP |
N |
High |
Best Practices#
Batch first: Use large
n_envson single GPU before scaling to multi-GPUSet all env vars: Always set CUDA, Taichi, and EGL device together
Synchronize DDP: Call
dist.barrier()before destroying process groupsHeadless rendering: Set
pyglet.options["headless"] = Trueon serversMonitor memory: Use
nvidia-smiduring batched simulation
Device Access#
After initialization:
gs.device # PyTorch device (e.g., "cuda:0", "mps:0")
gs.backend # Backend type (gs.cuda, gs.metal, gs.cpu)