Installation Guide¶
Requirements¶
- Python >= 3.8
- PyTorch >= 2.0.0
- Triton >= 2.0.0
- CUDA-capable GPU
Installation from PyPI (Recommended)¶
Development Installation (From Source)¶
For contributors or those who want to modify the code:
git clone https://github.com/yuhezhang-ai/triton-augment.git
cd triton-augment
pip install -e ".[dev]"
Input Requirements¶
Input Requirements
- Range: Pixel values must be in
[0, 1](usetransforms.ToTensor()if loading from PIL) - Device: GPU only (CPU tensors are automatically moved to CUDA)
- Shape: Supports both 3D
(C, H, W)and 4D(N, C, H, W)tensors (automatic batching) - Dtype:
float32orfloat16
First Run Behavior¶
On first use, Triton will compile kernels for your GPU (~1-2 seconds per image size with default config). This is normal and only happens once per GPU and image size.
Optional: Cache Warm-Up
To avoid compilation delays during training, you can optionally warm up the cache after installation:
For more details and auto-tuning optimization, see the Auto-Tuning Guide.
What to expect¶
- First import: Helpful message about auto-tuning status (can be suppressed with
TRITON_AUGMENT_SUPPRESS_FIRST_RUN_MESSAGE=1) - First use of each image size: ~1-2 seconds (kernel compilation)
- Subsequent uses: Instant (kernels are cached)
Verification¶
Test your installation: