Skip to content

Installation Guide

Requirements

  • Python >= 3.8
  • PyTorch >= 2.0.0
  • Triton >= 2.0.0
  • CUDA-capable GPU
pip install triton-augment

Development Installation (From Source)

For contributors or those who want to modify the code:

git clone https://github.com/yuhezhang-ai/triton-augment.git
cd triton-augment
pip install -e ".[dev]"

Input Requirements

Input Requirements

  • Range: Pixel values must be in [0, 1] (use transforms.ToTensor() if loading from PIL)
  • Device: GPU only (CPU tensors are automatically moved to CUDA)
  • Shape: Supports both 3D (C, H, W) and 4D (N, C, H, W) tensors (automatic batching)
  • Dtype: float32 or float16

First Run Behavior

On first use, Triton will compile kernels for your GPU (~1-2 seconds per image size with default config). This is normal and only happens once per GPU and image size.

Optional: Cache Warm-Up

To avoid compilation delays during training, you can optionally warm up the cache after installation:

python -m triton_augment.warmup

For more details and auto-tuning optimization, see the Auto-Tuning Guide.

What to expect

  • First import: Helpful message about auto-tuning status (can be suppressed with TRITON_AUGMENT_SUPPRESS_FIRST_RUN_MESSAGE=1)
  • First use of each image size: ~1-2 seconds (kernel compilation)
  • Subsequent uses: Instant (kernels are cached)

Verification

Test your installation:

import torch
import triton_augment as ta

# Should work without errors
img = torch.rand(4, 3, 224, 224, device='cuda')
transform = ta.TritonColorJitterNormalize(brightness=0.2)
result = transform(img)
print("✅ Installation successful!")