Build a classification training worker supporting datasets
Closes #22 (closed)
I've tested my implementation locally with this dataset which was extracted with the generic worker here.
Configuration:
{
"class_names": ["List", "Not a List"],
"image_size": 640,
"worker_runs": [],
"training_kwargs": {},
"num_epochs": 10,
"dropout": 0.0,
"model_id": "726128e3-0958-480f-b8ef-6f504c578372" // Test model in Demo
}
Command:
worker-yolo-train --dataset 03536d70-5f91-4a15-8a1d-3426361e3f12 --config config.json --extras-dir extras
Worker logs
2023-12-08 12:23:00,916 WARNING/arkindex_worker: Missing ARKINDEX_WORKER_RUN_ID environment variable, worker is in read-only mode
2023-12-08 12:23:00,916 INFO/arkindex_worker: Worker will use /home/eva/.local/share/arkindex as working directory
2023-12-08 12:23:03,153 INFO/arkindex_worker: Running with local configuration from config.json
/home/eva/.virtualenvs/yolo-worker/lib/python3.10/site-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 34: CUDA driver is a stub library (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
2023-12-08 12:23:03,390 INFO/arkindex_worker: Processing Dataset (03536d70-5f91-4a15-8a1d-3426361e3f12) (1/1)
2023-12-08 12:23:03,390 INFO/arkindex_worker: Downloading data for Dataset (03536d70-5f91-4a15-8a1d-3426361e3f12) (1/1)
2023-12-08 12:23:05,355 INFO/arkindex_worker: Extracting the dataset archive at /tmp/tmp76a73bkm-03536d70-5f91-4a15-8a1d-3426361e3f12/archive
2023-12-08 12:23:05,488 INFO/arkindex_worker: Connected to cache on /tmp/tmp76a73bkm-03536d70-5f91-4a15-8a1d-3426361e3f12/archive/db.sqlite
2023-12-08 12:23:05,489 INFO/arkindex_worker: Training-related data will be available at /tmp/tmpt82urwqe-training-data/all
2023-12-08 12:23:05,489 INFO/arkindex_worker: Extracting data from dataset (03536d70-5f91-4a15-8a1d-3426361e3f12) for split (Training)
2023-12-08 12:23:05,491 INFO/arkindex_worker: Training dataset fully downloaded.
2023-12-08 12:23:05,492 INFO/arkindex_worker: Extracting data from dataset (03536d70-5f91-4a15-8a1d-3426361e3f12) for split (Validation)
2023-12-08 12:23:05,494 INFO/arkindex_worker: Validation dataset fully downloaded.
2023-12-08 12:23:05,494 INFO/arkindex_worker: Extracting data from dataset (03536d70-5f91-4a15-8a1d-3426361e3f12) for split (Testing)
2023-12-08 12:23:05,496 INFO/arkindex_worker: Testing dataset fully downloaded.
New https://pypi.org/project/ultralytics/8.0.225 available 😃 Update with 'pip install -U ultralytics'
Ultralytics YOLOv8.0.196 🚀 Python-3.10.12 torch-2.1.1+cu121 CPU (AMD Ryzen 5 PRO 4650U with Radeon Graphics)
engine/trainer: task=classify, mode=train, model=yolov8x-cls.pt, data=/tmp/tmpt82urwqe-training-data/all, epochs=10, patience=50, batch=-1, imgsz=640, save=True, save_period=-1, cache=False, device=cpu, workers=8, project=None, name=None, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, vid_stride=1, stream_buffer=False, line_width=None, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, tracker=botsort.yaml, save_dir=/home/eva/Documents/dev/arkindex/workers/yolo/runs/classify/train
train: /tmp/tmpt82urwqe-training-data/all/train... found 5 images in 1 classes ✅
val: /tmp/tmpt82urwqe-training-data/all/val... found 4 images in 1 classes ✅
test: /tmp/tmpt82urwqe-training-data/all/test... found 4 images in 1 classes ✅
Overriding model.yaml nc=1000 with nc=1
from n params module arguments
0 -1 1 2320 ultralytics.nn.modules.conv.Conv [3, 80, 3, 2]
1 -1 1 115520 ultralytics.nn.modules.conv.Conv [80, 160, 3, 2]
2 -1 3 436800 ultralytics.nn.modules.block.C2f [160, 160, 3, True]
3 -1 1 461440 ultralytics.nn.modules.conv.Conv [160, 320, 3, 2]
4 -1 6 3281920 ultralytics.nn.modules.block.C2f [320, 320, 6, True]
5 -1 1 1844480 ultralytics.nn.modules.conv.Conv [320, 640, 3, 2]
6 -1 6 13117440 ultralytics.nn.modules.block.C2f [640, 640, 6, True]
7 -1 1 7375360 ultralytics.nn.modules.conv.Conv [640, 1280, 3, 2]
8 -1 3 27865600 ultralytics.nn.modules.block.C2f [1280, 1280, 3, True]
9 -1 1 1642241 ultralytics.nn.modules.head.Classify [1280, 1]
YOLOv8x-cls summary: 183 layers, 56143121 parameters, 56143121 gradients, 154.3 GFLOPs
Transferred 300/302 items from pretrained weights
AutoBatch: Computing optimal batch size for imgsz=640
AutoBatch: CUDA not detected, using default CPU batch-size 16
train: Scanning /tmp/tmpt82urwqe-training-data/all/train... 5 images, 0 corrupt: 100%|██████████| 5/5 [00:00<00:00, 3
train: New cache created: /tmp/tmpt82urwqe-training-data/all/train.cache
val: Scanning /tmp/tmpt82urwqe-training-data/all/val... 4 images, 0 corrupt: 100%|██████████| 4/4 [00:00<00:00, 3264.
val: New cache created: /tmp/tmpt82urwqe-training-data/all/val.cache
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: AdamW(lr=0.000714, momentum=0.9) with parameter groups 50 weight(decay=0.0), 51 weight(decay=0.0005), 51 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 0 dataloader workers
Logging results to /home/eva/Documents/dev/arkindex/workers/yolo/runs/classify/train
Starting training for 10 epochs...
Closing dataloader mosaic
Epoch GPU_mem loss Instances Size
1/10 0G 0 5 640: 100%|██████████| 1/1 [00:12<00:00, 12.29s/it]
classes top1_acc top5_acc: 100%|██████████| 1/1 [00:03<00:00, 3.22s/it]
all 1 1
Epoch GPU_mem loss Instances Size
2/10 0G 0 5 640: 100%|██████████| 1/1 [00:12<00:00, 12.85s/it]
classes top1_acc top5_acc: 100%|██████████| 1/1 [00:03<00:00, 3.04s/it]
all 1 1
Epoch GPU_mem loss Instances Size
3/10 0G 0 5 640: 100%|██████████| 1/1 [00:12<00:00, 12.13s/it]
classes top1_acc top5_acc: 100%|██████████| 1/1 [00:02<00:00, 2.99s/it]
all 1 1
Epoch GPU_mem loss Instances Size
4/10 0G 0 5 640: 100%|██████████| 1/1 [00:12<00:00, 12.10s/it]
classes top1_acc top5_acc: 100%|██████████| 1/1 [00:04<00:00, 4.01s/it]
all 1 1
Epoch GPU_mem loss Instances Size
5/10 0G 0 5 640: 100%|██████████| 1/1 [00:13<00:00, 13.69s/it]
classes top1_acc top5_acc: 100%|██████████| 1/1 [00:03<00:00, 3.02s/it]
all 1 1
Epoch GPU_mem loss Instances Size
6/10 0G 0 5 640: 100%|██████████| 1/1 [00:14<00:00, 14.50s/it]
classes top1_acc top5_acc: 100%|██████████| 1/1 [00:03<00:00, 3.22s/it]
all 1 1
Epoch GPU_mem loss Instances Size
7/10 0G 0 5 640: 100%|██████████| 1/1 [00:11<00:00, 11.83s/it]
classes top1_acc top5_acc: 100%|██████████| 1/1 [00:04<00:00, 4.09s/it]
all 1 1
Epoch GPU_mem loss Instances Size
8/10 0G 0 5 640: 100%|██████████| 1/1 [00:11<00:00, 11.84s/it]
classes top1_acc top5_acc: 100%|██████████| 1/1 [00:03<00:00, 3.09s/it]
all 1 1
Epoch GPU_mem loss Instances Size
9/10 0G 0 5 640: 100%|██████████| 1/1 [00:11<00:00, 11.66s/it]
classes top1_acc top5_acc: 100%|██████████| 1/1 [00:03<00:00, 3.14s/it]
all 1 1
Epoch GPU_mem loss Instances Size
10/10 0G 0 5 640: 100%|██████████| 1/1 [00:12<00:00, 12.28s/it]
classes top1_acc top5_acc: 100%|██████████| 1/1 [00:03<00:00, 3.19s/it]
all 1 1
10 epochs completed in 0.052 hours.
Optimizer stripped from /home/eva/Documents/dev/arkindex/workers/yolo/runs/classify/train/weights/last.pt, 112.5MB
Optimizer stripped from /home/eva/Documents/dev/arkindex/workers/yolo/runs/classify/train/weights/best.pt, 112.5MB
Validating /home/eva/Documents/dev/arkindex/workers/yolo/runs/classify/train/weights/best.pt...
Ultralytics YOLOv8.0.196 🚀 Python-3.10.12 torch-2.1.1+cu121 CPU (AMD Ryzen 5 PRO 4650U with Radeon Graphics)
YOLOv8x-cls summary (fused): 133 layers, 56124481 parameters, 0 gradients, 153.8 GFLOPs
train: /tmp/tmpt82urwqe-training-data/all/train... found 5 images in 1 classes ✅
val: /tmp/tmpt82urwqe-training-data/all/val... found 4 images in 1 classes ✅
test: /tmp/tmpt82urwqe-training-data/all/test... found 4 images in 1 classes ✅
classes top1_acc top5_acc: 100%|██████████| 1/1 [00:02<00:00, 2.95s/it]
all 1 1
Speed: 0.0ms preprocess, 674.3ms inference, 0.0ms loss, 0.0ms postprocess per image
Results saved to /home/eva/Documents/dev/arkindex/workers/yolo/runs/classify/train
Results saved to /home/eva/Documents/dev/arkindex/workers/yolo/runs/classify/train
Ultralytics YOLOv8.0.196 🚀 Python-3.10.12 torch-2.1.1+cu121 CPU (AMD Ryzen 5 PRO 4650U with Radeon Graphics)
YOLOv8x-cls summary (fused): 133 layers, 56124481 parameters, 0 gradients, 153.8 GFLOPs
train: /tmp/tmpt82urwqe-training-data/all/train... found 5 images in 1 classes ✅
val: /tmp/tmpt82urwqe-training-data/all/val... found 4 images in 1 classes ✅
test: /tmp/tmpt82urwqe-training-data/all/test... found 4 images in 1 classes ✅
train: Scanning /tmp/tmpt82urwqe-training-data/all/train... 5 images, 0 corrupt: 100%|██████████| 5/5 [00:00<?, ?it/s
classes top1_acc top5_acc: 100%|██████████| 5/5 [00:02<00:00, 1.98it/s]
all 1 1
Speed: 0.0ms preprocess, 438.9ms inference, 0.0ms loss, 0.0ms postprocess per image
Results saved to /home/eva/Documents/dev/arkindex/workers/yolo/runs/classify/val
Ultralytics YOLOv8.0.196 🚀 Python-3.10.12 torch-2.1.1+cu121 CPU (AMD Ryzen 5 PRO 4650U with Radeon Graphics)
train: /tmp/tmpt82urwqe-training-data/all/train... found 5 images in 1 classes ✅
val: /tmp/tmpt82urwqe-training-data/all/val... found 4 images in 1 classes ✅
test: /tmp/tmpt82urwqe-training-data/all/test... found 4 images in 1 classes ✅
val: Scanning /tmp/tmpt82urwqe-training-data/all/val... 4 images, 0 corrupt: 100%|██████████| 4/4 [00:00<?, ?it/s]
classes top1_acc top5_acc: 100%|██████████| 4/4 [00:01<00:00, 2.03it/s]
all 1 1
Speed: 0.0ms preprocess, 428.1ms inference, 0.0ms loss, 0.0ms postprocess per image
Results saved to /home/eva/Documents/dev/arkindex/workers/yolo/runs/classify/val2
Ultralytics YOLOv8.0.196 🚀 Python-3.10.12 torch-2.1.1+cu121 CPU (AMD Ryzen 5 PRO 4650U with Radeon Graphics)
train: /tmp/tmpt82urwqe-training-data/all/train... found 5 images in 1 classes ✅
val: /tmp/tmpt82urwqe-training-data/all/val... found 4 images in 1 classes ✅
test: /tmp/tmpt82urwqe-training-data/all/test... found 4 images in 1 classes ✅
test: Scanning /tmp/tmpt82urwqe-training-data/all/test... 4 images, 0 corrupt: 100%|██████████| 4/4 [00:00<00:00, 104
test: New cache created: /tmp/tmpt82urwqe-training-data/all/test.cache
classes top1_acc top5_acc: 100%|██████████| 4/4 [00:02<00:00, 1.82it/s]
all 1 1
Speed: 0.0ms preprocess, 480.4ms inference, 0.0ms loss, 0.0ms postprocess per image
Results saved to /home/eva/Documents/dev/arkindex/workers/yolo/runs/classify/val3
2023-12-08 12:26:37,908 WARNING/arkindex_worker: Cannot perform this operation as the worker is in read-only mode
Edited by Eva Bardou