Build a classification training worker supporting datasets (!43) · Merge requests · Workers / YOLO

Eva Bardou requested to merge classif-training-worker into main Dec 05, 2023

I've tested my implementation locally with this dataset which was extracted with the generic worker here.

Configuration:

{
  "class_names": ["List", "Not a List"],
  "image_size": 640,
  "worker_runs": [],
  "training_kwargs": {},
  "num_epochs": 10,
  "dropout": 0.0,
  "model_id": "726128e3-0958-480f-b8ef-6f504c578372"  // Test model in Demo
}

Command:

worker-yolo-train --dataset 03536d70-5f91-4a15-8a1d-3426361e3f12 --config config.json --extras-dir extras

Worker logs

2023-12-08 12:23:00,916 WARNING/arkindex_worker: Missing ARKINDEX_WORKER_RUN_ID environment variable, worker is in read-only mode
2023-12-08 12:23:00,916 INFO/arkindex_worker: Worker will use /home/eva/.local/share/arkindex as working directory
2023-12-08 12:23:03,153 INFO/arkindex_worker: Running with local configuration from config.json
/home/eva/.virtualenvs/yolo-worker/lib/python3.10/site-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 34: CUDA driver is a stub library (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
2023-12-08 12:23:03,390 INFO/arkindex_worker: Processing Dataset (03536d70-5f91-4a15-8a1d-3426361e3f12) (1/1)
2023-12-08 12:23:03,390 INFO/arkindex_worker: Downloading data for Dataset (03536d70-5f91-4a15-8a1d-3426361e3f12) (1/1)
2023-12-08 12:23:05,355 INFO/arkindex_worker: Extracting the dataset archive at /tmp/tmp76a73bkm-03536d70-5f91-4a15-8a1d-3426361e3f12/archive
2023-12-08 12:23:05,488 INFO/arkindex_worker: Connected to cache on /tmp/tmp76a73bkm-03536d70-5f91-4a15-8a1d-3426361e3f12/archive/db.sqlite
2023-12-08 12:23:05,489 INFO/arkindex_worker: Training-related data will be available at /tmp/tmpt82urwqe-training-data/all
2023-12-08 12:23:05,489 INFO/arkindex_worker: Extracting data from dataset (03536d70-5f91-4a15-8a1d-3426361e3f12) for split (Training)
2023-12-08 12:23:05,491 INFO/arkindex_worker: Training dataset fully downloaded.
2023-12-08 12:23:05,492 INFO/arkindex_worker: Extracting data from dataset (03536d70-5f91-4a15-8a1d-3426361e3f12) for split (Validation)
2023-12-08 12:23:05,494 INFO/arkindex_worker: Validation dataset fully downloaded.
2023-12-08 12:23:05,494 INFO/arkindex_worker: Extracting data from dataset (03536d70-5f91-4a15-8a1d-3426361e3f12) for split (Testing)
2023-12-08 12:23:05,496 INFO/arkindex_worker: Testing dataset fully downloaded.
New https://pypi.org/project/ultralytics/8.0.225 available 😃 Update with 'pip install -U ultralytics'
Ultralytics YOLOv8.0.196 🚀 Python-3.10.12 torch-2.1.1+cu121 CPU (AMD Ryzen 5 PRO 4650U with Radeon Graphics)
engine/trainer: task=classify, mode=train, model=yolov8x-cls.pt, data=/tmp/tmpt82urwqe-training-data/all, epochs=10, patience=50, batch=-1, imgsz=640, save=True, save_period=-1, cache=False, device=cpu, workers=8, project=None, name=None, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, vid_stride=1, stream_buffer=False, line_width=None, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, tracker=botsort.yaml, save_dir=/home/eva/Documents/dev/arkindex/workers/yolo/runs/classify/train
train: /tmp/tmpt82urwqe-training-data/all/train... found 5 images in 1 classes ✅ 
val: /tmp/tmpt82urwqe-training-data/all/val... found 4 images in 1 classes ✅ 
test: /tmp/tmpt82urwqe-training-data/all/test... found 4 images in 1 classes ✅ 
Overriding model.yaml nc=1000 with nc=1

                   from  n    params  module                                       arguments                     
  0                  -1  1      2320  ultralytics.nn.modules.conv.Conv             [3, 80, 3, 2]                 
  1                  -1  1    115520  ultralytics.nn.modules.conv.Conv             [80, 160, 3, 2]               
  2                  -1  3    436800  ultralytics.nn.modules.block.C2f             [160, 160, 3, True]           
  3                  -1  1    461440  ultralytics.nn.modules.conv.Conv             [160, 320, 3, 2]              
  4                  -1  6   3281920  ultralytics.nn.modules.block.C2f             [320, 320, 6, True]           
  5                  -1  1   1844480  ultralytics.nn.modules.conv.Conv             [320, 640, 3, 2]              
  6                  -1  6  13117440  ultralytics.nn.modules.block.C2f             [640, 640, 6, True]           
  7                  -1  1   7375360  ultralytics.nn.modules.conv.Conv             [640, 1280, 3, 2]             
  8                  -1  3  27865600  ultralytics.nn.modules.block.C2f             [1280, 1280, 3, True]         
  9                  -1  1   1642241  ultralytics.nn.modules.head.Classify         [1280, 1]                     
YOLOv8x-cls summary: 183 layers, 56143121 parameters, 56143121 gradients, 154.3 GFLOPs
Transferred 300/302 items from pretrained weights
AutoBatch: Computing optimal batch size for imgsz=640
AutoBatch: CUDA not detected, using default CPU batch-size 16
train: Scanning /tmp/tmpt82urwqe-training-data/all/train... 5 images, 0 corrupt: 100%|██████████| 5/5 [00:00<00:00, 3
train: New cache created: /tmp/tmpt82urwqe-training-data/all/train.cache
val: Scanning /tmp/tmpt82urwqe-training-data/all/val... 4 images, 0 corrupt: 100%|██████████| 4/4 [00:00<00:00, 3264.
val: New cache created: /tmp/tmpt82urwqe-training-data/all/val.cache
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
optimizer: AdamW(lr=0.000714, momentum=0.9) with parameter groups 50 weight(decay=0.0), 51 weight(decay=0.0005), 51 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 0 dataloader workers
Logging results to /home/eva/Documents/dev/arkindex/workers/yolo/runs/classify/train
Starting training for 10 epochs...
Closing dataloader mosaic

      Epoch    GPU_mem       loss  Instances       Size
       1/10         0G          0          5        640: 100%|██████████| 1/1 [00:12<00:00, 12.29s/it]
               classes   top1_acc   top5_acc: 100%|██████████| 1/1 [00:03<00:00,  3.22s/it]
                   all          1          1

      Epoch    GPU_mem       loss  Instances       Size
       2/10         0G          0          5        640: 100%|██████████| 1/1 [00:12<00:00, 12.85s/it]
               classes   top1_acc   top5_acc: 100%|██████████| 1/1 [00:03<00:00,  3.04s/it]
                   all          1          1

      Epoch    GPU_mem       loss  Instances       Size
       3/10         0G          0          5        640: 100%|██████████| 1/1 [00:12<00:00, 12.13s/it]
               classes   top1_acc   top5_acc: 100%|██████████| 1/1 [00:02<00:00,  2.99s/it]
                   all          1          1

      Epoch    GPU_mem       loss  Instances       Size
       4/10         0G          0          5        640: 100%|██████████| 1/1 [00:12<00:00, 12.10s/it]
               classes   top1_acc   top5_acc: 100%|██████████| 1/1 [00:04<00:00,  4.01s/it]
                   all          1          1

      Epoch    GPU_mem       loss  Instances       Size
       5/10         0G          0          5        640: 100%|██████████| 1/1 [00:13<00:00, 13.69s/it]
               classes   top1_acc   top5_acc: 100%|██████████| 1/1 [00:03<00:00,  3.02s/it]
                   all          1          1

      Epoch    GPU_mem       loss  Instances       Size
       6/10         0G          0          5        640: 100%|██████████| 1/1 [00:14<00:00, 14.50s/it]
               classes   top1_acc   top5_acc: 100%|██████████| 1/1 [00:03<00:00,  3.22s/it]
                   all          1          1

      Epoch    GPU_mem       loss  Instances       Size
       7/10         0G          0          5        640: 100%|██████████| 1/1 [00:11<00:00, 11.83s/it]
               classes   top1_acc   top5_acc: 100%|██████████| 1/1 [00:04<00:00,  4.09s/it]
                   all          1          1

      Epoch    GPU_mem       loss  Instances       Size
       8/10         0G          0          5        640: 100%|██████████| 1/1 [00:11<00:00, 11.84s/it]
               classes   top1_acc   top5_acc: 100%|██████████| 1/1 [00:03<00:00,  3.09s/it]
                   all          1          1

      Epoch    GPU_mem       loss  Instances       Size
       9/10         0G          0          5        640: 100%|██████████| 1/1 [00:11<00:00, 11.66s/it]
               classes   top1_acc   top5_acc: 100%|██████████| 1/1 [00:03<00:00,  3.14s/it]
                   all          1          1

      Epoch    GPU_mem       loss  Instances       Size
      10/10         0G          0          5        640: 100%|██████████| 1/1 [00:12<00:00, 12.28s/it]
               classes   top1_acc   top5_acc: 100%|██████████| 1/1 [00:03<00:00,  3.19s/it]
                   all          1          1

10 epochs completed in 0.052 hours.
Optimizer stripped from /home/eva/Documents/dev/arkindex/workers/yolo/runs/classify/train/weights/last.pt, 112.5MB
Optimizer stripped from /home/eva/Documents/dev/arkindex/workers/yolo/runs/classify/train/weights/best.pt, 112.5MB

Validating /home/eva/Documents/dev/arkindex/workers/yolo/runs/classify/train/weights/best.pt...
Ultralytics YOLOv8.0.196 🚀 Python-3.10.12 torch-2.1.1+cu121 CPU (AMD Ryzen 5 PRO 4650U with Radeon Graphics)
YOLOv8x-cls summary (fused): 133 layers, 56124481 parameters, 0 gradients, 153.8 GFLOPs
train: /tmp/tmpt82urwqe-training-data/all/train... found 5 images in 1 classes ✅ 
val: /tmp/tmpt82urwqe-training-data/all/val... found 4 images in 1 classes ✅ 
test: /tmp/tmpt82urwqe-training-data/all/test... found 4 images in 1 classes ✅ 
               classes   top1_acc   top5_acc: 100%|██████████| 1/1 [00:02<00:00,  2.95s/it]
                   all          1          1
Speed: 0.0ms preprocess, 674.3ms inference, 0.0ms loss, 0.0ms postprocess per image
Results saved to /home/eva/Documents/dev/arkindex/workers/yolo/runs/classify/train
Results saved to /home/eva/Documents/dev/arkindex/workers/yolo/runs/classify/train
Ultralytics YOLOv8.0.196 🚀 Python-3.10.12 torch-2.1.1+cu121 CPU (AMD Ryzen 5 PRO 4650U with Radeon Graphics)
YOLOv8x-cls summary (fused): 133 layers, 56124481 parameters, 0 gradients, 153.8 GFLOPs
train: /tmp/tmpt82urwqe-training-data/all/train... found 5 images in 1 classes ✅ 
val: /tmp/tmpt82urwqe-training-data/all/val... found 4 images in 1 classes ✅ 
test: /tmp/tmpt82urwqe-training-data/all/test... found 4 images in 1 classes ✅ 
train: Scanning /tmp/tmpt82urwqe-training-data/all/train... 5 images, 0 corrupt: 100%|██████████| 5/5 [00:00<?, ?it/s
               classes   top1_acc   top5_acc: 100%|██████████| 5/5 [00:02<00:00,  1.98it/s]
                   all          1          1
Speed: 0.0ms preprocess, 438.9ms inference, 0.0ms loss, 0.0ms postprocess per image
Results saved to /home/eva/Documents/dev/arkindex/workers/yolo/runs/classify/val
Ultralytics YOLOv8.0.196 🚀 Python-3.10.12 torch-2.1.1+cu121 CPU (AMD Ryzen 5 PRO 4650U with Radeon Graphics)
train: /tmp/tmpt82urwqe-training-data/all/train... found 5 images in 1 classes ✅ 
val: /tmp/tmpt82urwqe-training-data/all/val... found 4 images in 1 classes ✅ 
test: /tmp/tmpt82urwqe-training-data/all/test... found 4 images in 1 classes ✅ 
val: Scanning /tmp/tmpt82urwqe-training-data/all/val... 4 images, 0 corrupt: 100%|██████████| 4/4 [00:00<?, ?it/s]
               classes   top1_acc   top5_acc: 100%|██████████| 4/4 [00:01<00:00,  2.03it/s]
                   all          1          1
Speed: 0.0ms preprocess, 428.1ms inference, 0.0ms loss, 0.0ms postprocess per image
Results saved to /home/eva/Documents/dev/arkindex/workers/yolo/runs/classify/val2
Ultralytics YOLOv8.0.196 🚀 Python-3.10.12 torch-2.1.1+cu121 CPU (AMD Ryzen 5 PRO 4650U with Radeon Graphics)
train: /tmp/tmpt82urwqe-training-data/all/train... found 5 images in 1 classes ✅ 
val: /tmp/tmpt82urwqe-training-data/all/val... found 4 images in 1 classes ✅ 
test: /tmp/tmpt82urwqe-training-data/all/test... found 4 images in 1 classes ✅ 
test: Scanning /tmp/tmpt82urwqe-training-data/all/test... 4 images, 0 corrupt: 100%|██████████| 4/4 [00:00<00:00, 104
test: New cache created: /tmp/tmpt82urwqe-training-data/all/test.cache
               classes   top1_acc   top5_acc: 100%|██████████| 4/4 [00:02<00:00,  1.82it/s]
                   all          1          1
Speed: 0.0ms preprocess, 480.4ms inference, 0.0ms loss, 0.0ms postprocess per image
Results saved to /home/eva/Documents/dev/arkindex/workers/yolo/runs/classify/val3
2023-12-08 12:26:37,908 WARNING/arkindex_worker: Cannot perform this operation as the worker is in read-only mode

Edited Dec 08, 2023 by Eva Bardou

Build a classification training worker supporting datasets

Merge request reports