Skip to content

Comparison with other GPU-based data transforms library

πŸ“Š Triton-Augment vs DALI vs Kornia

Feature Triton-Augment (yours) NVIDIA DALI Kornia
Primary goal Fast, fused GPU augmentations for training End-to-end input pipeline (decode β†’ resize β†’ augment) Differentiable augmentations in pure PyTorch
Fused ops βœ”οΈ Crop + flip + brightness + contrast + saturation + grayscale + normalize (single kernel) ⚠️ Only some fusions (e.g., crop_mirror_normalize); color ops are separate kernels ❌ No fusion β€” each op is a separate CUDA/PyTorch op
Per-sample random params βœ”οΈ Built-in, torchvision-style API βœ”οΈ Supported (via feeding random tensors), but more manual βœ”οΈ Built-in
Ease of use βœ”οΈ Simple, torchvision-like ⚠️ Steeper learning curve (pipeline graph) βœ”οΈ Very easy (just PyTorch ops)
Supported ops ⚠️ Limited for now (crop, flip, color jitter, normalize, grayscale) βœ”οΈ Huge library (decode, resize, warp, color, video, audio) βœ”οΈ Wide set (geometry, color, filtering, keypoints)
Performance πŸš€ Very fast for augmentation (1 fused kernel for all ops) πŸš€ Fast for full pipelines (GPU decode/resize), but augmentation uses multiple kernels (less fusion) ⚠️ Moderate (PyTorch kernels, multiple launches)
Integration PyTorch training pipelines PyTorch, TensorFlow, JAX PyTorch only
CPU preprocessing ❌ None (expects tensors already on GPU) βœ”οΈ Hardware-accelerated decode/resize possible βœ”οΈ Built on top of PyTorch
Autograd support ❌ Not needed (augmentations only) ❌ Most ops are not differentiable βœ”οΈ Yes (Kornia is differentiable by design)
Production readiness ⚠️ Early-stage (fast but limited scope) βœ”οΈ Mature, used in industry βœ”οΈ Mature

πŸ“ Notes

  • Triton-Augment is not a replacement for DALI or Kornia. It’s a small, focused library aimed at speeding up a few high-impact augmentations via kernel fusion.

  • DALI is still the best choice if the bottleneck is decode/resize or you need full data pipeline acceleration. However, for augmentation-only workloads (data already on GPU as tensors), Triton-Augment is faster due to higher kernel fusion.

  • Kornia is best if you need differentiable augmentation or a wide variety of transforms.

  • Our advantage: For the operations that are supported, our one-kernel design beats both in raw speed and simplicity.

  • Our limitation: Fewer ops, no CPU pipeline, not designed for everything β€” just the common path.