Modern CV bootcamp: go from OpenCV/NumPy to training, evaluating and deploying deep vision models.
Practice PyTorch-first workflows with optional TensorFlow/Keras parallels.
Cover CNNs and Vision Transformers with a practitioner’s focus.
Gain practical experience via ~70% labs on real datasets and projects.
How this helps: build reliable models and ship to edge with ONNX/TensorRT.
Who it’s for: designed for individuals with Python/C++ basics tackling perception problems.
Emphasis on reproducibility, robust metrics and responsible evaluation.
Curriculum
Computer vision essentials with OpenCV
- Loading/saving, color spaces, drawing primitives, ROI/cropping
- Transforms: resize/rotate/flip; image arithmetic; masking; channel ops
- Kernels & morphology; gradients/edges; illumination issues and normalization
- Mini-lab: lane-detection pipeline refresher
NumPy & array programming
- ndarray basics, broadcasting, views vs copies
- Vectorization, memory layout, dtype & precision trade-offs
- Small lab: implement a convolution and compare to OpenCV
Neural networks from scratch (concepts)
- Perceptron, activations, logits & softmax, cross-entropy
- Backprop & gradient descent; initialization & normalization
- Overfitting vs generalization; regularization overview
PyTorch essentials (with TensorFlow/Keras parallels)
- Tensors, autograd, modules; writing a training loop
- Data pipelines: Dataset/DataLoader, augmentations (Albumentations basics)
- Metrics & evaluation; confusion matrix; reproducibility (seeds/determinism)
- Mixed precision (AMP) and basic multi-GPU (DDP) — overview + demo
Convolutional networks, transfer learning & fine-tuning
- CNN building blocks; receptive fields and shapes; parameter counting
- Using pretrained backbones (torchvision/timm) and head design
- Freeze/partial freeze; discriminative learning rates; early stopping
- Lab: fine-tune a classifier; compare augmentation strategies
Detection/segmentation quick tour
- Object detection families (one-stage vs two-stage) — practitioner’s view
- Modern choices: YOLO-family, DETR/RT-DETR (overview), instance vs semantic segmentation
- Dataset formats, annotation tools, and evaluation (mAP/IoU) basics
Attention & Vision Transformers (practical overview)
- Self-attention intuition; patches & embeddings; positional encodings
- ViT/DeiT-style fine-tuning workflow
- When to pick CNNs vs ViTs; compute/memory considerations
Optimization: training recipes that matter
- Schedulers (cosine/one-cycle), weight decay, label smoothing
- Regularization: dropout, mixup/cutmix (overview)
- Tips for stable training: gradient clipping, sane batch sizes, AMP pitfalls
- Experiment tracking: MLflow or Weights & Biases (brief)
Responsible CV & data quality
- Dataset curation and splits; leakage & shortcuts
- Bias, robustness, augmentations vs distribution shift
- Documentation of experiments and model cards (lightweight)
Deploying to edge and production
- Export: TorchScript and ONNX; verifying numerical parity
- Acceleration: TensorRT / ONNX Runtime / OpenVINO — when to use what
- Quantization (PTQ/QAT), pruning & distillation — practical gains & trade-offs
- Serving options: Triton Inference Server; Jetson deployment notes
Optional modules
Optional — RNN/temporal models & tracking
- Temporal modeling options (RNN/LSTM/GRU vs 1D temporal convs)
- Basics of multi-object tracking (overview)
Course Day Structure
- Part 1: 09:00–10:30
- Break: 10:30–10:45
- Part 2: 10:45–12:15
- Lunch break: 12:15–13:15
- Part 3: 13:15–15:15
- Break: 15:15–15:30
- Part 4: 15:30–17:30