CZII - CryoET Object Identification #3 Baseline YOLO11 Solution

대회

CZII - CryoET Object Identification #3 Baseline YOLO11 Solution

dongsunseng 2025. 1. 16. 17:14

This post is an annotation of baseline YOLO11 solution kernel from @SERGIO ALVAREZ.

https://www.kaggle.com/code/sersasj/czii-yolo11-submission-baseline-with-kdtree-update

CZII YOLO11 Submission Baseline with KDTree Update

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

www.kaggle.com

CZII YOLO11 Submission Baseline with KDTree Update - LB 0.682

Inspired by https://www.kaggle.com/code/itsuki9180/czii-yolo11-submission-baseline (LB: 0.625)
Problem: "It already takes 10 hours with the YOLO model - if I train a 2D UNET and aggregate the results in a similar way to YOLO, would it be possible to fit within the time limit?"
Introduced the KDTree algorithm for performance improvement
- KDTree is an efficient algorithm for finding nearest neighbors
Also added @min fuka's multi-processing idea
- https://www.kaggle.com/code/minfuka/czii-yolo11-submission-baseline-speed-up-ver
Time with KDTree was reduced to ~6500 seconds and with multiprocessing was reduced to ~4500 seconds.
Used synthetic data for training
- Data: https://www.kaggle.com/datasets/sersasj/czii-yolo-l-trained-with-synthetic-data/data
- Code making synthetic data: https://www.kaggle.com/code/sersasj/czii-making-datasets-for-yolo-synthetic-data#CZII:-Creating-Datasets-for-YOLO-with-Additional-Data
  - My annotation on this code: https://dongsunseng.com/entry/CZII-CryoET-Object-Identification-4-Making-synthetic-data-for-Baseline-YOLO11-Solution#google_vignette
Used TS_5_4, TS_69_2, TS_6_4, and TS_6_6 as validation datasets
Used OPTUNA to optimize the following parameters:
- z_distance
- zy_distance
- first_conf
- conf_coef

Score for TS_5_4: 0.658783812957661,{'apo-ferritin': {'total_tp': 42, 'total_fp': 20, 'total_fn': 2, 'fbeta': 0.9321148825065274}, 'beta-galactosidase': {'total_tp': 5, 'total_fp': 27, 'total_fn': 7, 'fbeta': 0.3794642857142858}, 'ribosome': {'total_tp': 20, 'total_fp': 29, 'total_fn': 10, 'fbeta': 0.6427221172022684}, 'thyroglobulin': {'total_tp': 23, 'total_fp': 104, 'total_fn': 7, 'fbeta': 0.6441515650741352}, 'virus-like-particle': {'total_tp': 11, 'total_fp': 2, 'total_fn': 0, 'fbeta': 0.9894179894179894}}

Score for TS_69_2: 0.8191956150699464,{'apo-ferritin': {'total_tp': 35, 'total_fp': 25, 'total_fn': 0, 'fbeta': 0.9596774193548387}, 'beta-galactosidase': {'total_tp': 13, 'total_fp': 46, 'total_fn': 3, 'fbeta': 0.7015873015873016}, 'ribosome': {'total_tp': 35, 'total_fp': 15, 'total_fn': 2, 'fbeta': 0.926791277258567}, 'thyroglobulin': {'total_tp': 28, 'total_fp': 84, 'total_fn': 6, 'fbeta': 0.7256097560975611}, 'virus-like-particle': {'total_tp': 9, 'total_fp': 1, 'total_fn': 0, 'fbeta': 0.9935064935064936}}

Score for TS_6_4: 0.685180923434018,{'apo-ferritin': {'total_tp': 45, 'total_fp': 34, 'total_fn': 12, 'fbeta': 0.7719475277497477}, 'beta-galactosidase': {'total_tp': 7, 'total_fp': 29, 'total_fn': 5, 'fbeta': 0.5219298245614036}, 'ribosome': {'total_tp': 54, 'total_fp': 59, 'total_fn': 12, 'fbeta': 0.7852865697177076}, 'thyroglobulin': {'total_tp': 24, 'total_fp': 77, 'total_fn': 6, 'fbeta': 0.70223752151463}, 'virus-like-particle': {'total_tp': 8, 'total_fp': 4, 'total_fn': 2, 'fbeta': 0.7906976744186046}}

Score for TS_6_6: 0.7575532250952666,{'apo-ferritin': {'total_tp': 37, 'total_fp': 39, 'total_fn': 2, 'fbeta': 0.8985714285714286}, 'beta-galactosidase': {'total_tp': 8, 'total_fp': 43, 'total_fn': 3, 'fbeta': 0.5991189427312775}, 'ribosome': {'total_tp': 17, 'total_fp': 11, 'total_fn': 6, 'fbeta': 0.7297979797979798}, 'thyroglobulin': {'total_tp': 31, 'total_fp': 120, 'total_fn': 4, 'fbeta': 0.7412095639943742}, 'virus-like-particle': {'total_tp': 19, 'total_fp': 2, 'total_fn': 0, 'fbeta': 0.9938461538461538}}

1) Ultralytics setting for offline env (External kernel linked to the main submission kernel)

Ultralytics is an open-source package for implementing and training YOLO (You Only Look Once) object detection models

https://www.kaggle.com/code/itsuki9180/ultralytics-for-offline-install

!pip download -d ./packages ultralytics
!tar cfvz archive.tar.gz ./packages

!pip download -d ./packages ultralytics
- -d ./packages: Specifies the download location as ./packages directory
- Downloads the package and all its dependencies
- Only downloads wheel files without actual installation
!tar cfvz archive.tar.gz ./packages
- Compresses downloaded packages into a tar file
- c: Create a new archive
- f: Specify filename
- v: Verbose (detailed output)
- z: Use gzip compression
- archive.tar.gz: Name of the compressed file to be created
- ./packages: Directory to be compressed
This is done because internet access is restricted in the competition environment
All necessary packages are downloaded and compressed in advance so they can be installed later in an offline environment
wheel files?
- A binary package that bundles Python packages in an installable form
- Includes compiled code, metadata, and dependency information
- Has the .whl extension

!tar xfvz archive.tar.gz
!pip install --no-index --find-links=./packages ultralytics
!rm -rf ./packages

!tar xfvz archive.tar.gz
- x: Extract
- f: Specify filename
- v: Verbose (detailed output)
- z: Extract gzip compression
- Extracts archive.tar.gz to create ./packages directory
!pip install --no-index --find-links=./packages ultralytics
- --no-index: Don't use PyPI (Python Package Index).
  - This means don't download packages from the internet
- --find-links=./packages: Specify local directory to find packages
- Install ultralytics using locally downloaded wheel files
!rm -rf ./packages
- Delete the temporarily used packages directory after installation
- -r: Delete recursively (including all files in directory)
- -f: Force delete (without confirmation messages)

2) Dependencies (Back to the kernel)

# Installing Ultralytics
!tar xfvz /kaggle/input/ultralytics-for-offline-install/archive.tar.gz
!pip install --no-index --find-links=./packages ultralytics
!rm -rf ./packages

# Installing Zarr package
!cp -r '/kaggle/input/hengck-czii-cryo-et-01/wheel_file' '/kaggle/working/'
!pip install /kaggle/working/wheel_file/asciitree-0.3.3/asciitree-0.3.3
!pip install --no-index --find-links=/kaggle/working/wheel_file zarr

Copy wheel files from another Kaggle dataset to working directory
First install asciitree (a dependency of zarr)
Install zarr package
The reasons for this approach:
1. Kaggle notebooks have restricted internet access
2. Required packages are pre-uploaded as datasets
3. Enables package installation in offline environments Specifically, zarr is a package used for efficient storage and processing of large array data, which will likely be used in this competition for handling 3D image data.

import os
import glob
import time
import sys
import warnings
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import cv2
import torch
from tqdm import tqdm
from ultralytics import YOLO
import zarr
from scipy.spatial import cKDTree
from collections import defaultdict

3) Loading model + Configuration

model_path = '/kaggle/input/czii-yolo-l-trained-with-synthetic-data/best_synthetic.pt'
model = YOLO(model_path)

Loading the 'best_synthetic.pt' file
And then, Uses the YOLO class from Ultralytics to load the model

# Processing experiment data paths
runs_path = '/kaggle/input/czii-cryo-et-object-identification/test/static/ExperimentRuns/*'
runs = sorted(glob.glob(runs_path))
runs = [os.path.basename(run) for run in runs]

# Data Splitting
sp = len(runs)//2
runs1 = runs[:sp]
runs1[:5]

#add by @minfuka
runs2 = runs[sp:]
runs2[:5]

#add by @minfuka - GPU Checking
assert torch.cuda.device_count() == 2

Processing experiment data paths:
- Gets experiment data paths from test dataset
- Uses glob.glob to get all experiment folders
- Uses os.path.basename to extract only folder names from paths
Data splitting:
- Divides all experiments into two groups
- Appears to be preparation for parallel processing
GPU Checking:
- Verifies that 2 GPUs are available
- Uses assert statement to raise error if not 2
- For multi-GPU processing
- This appears to be intended for parallelizing data processing using multiple GPUs
- It's preparing for each GPU to process half of the data

particle_names = [
    'apo-ferritin',
    'beta-amylase',
    'beta-galactosidase',
    'ribosome',
    'thyroglobulin',
    'virus-like-particle'
]

particle_to_index = {
    'apo-ferritin': 0,
    'beta-amylase': 1,
    'beta-galactosidase': 2,
    'ribosome': 3,
    'thyroglobulin': 4,
    'virus-like-particle': 5
}

index_to_particle = {index: name for name, index in particle_to_index.items()}

particle_radius = {
    'apo-ferritin': 60,
    'beta-amylase': 65,
    'beta-galactosidase': 90,
    'ribosome': 150,
    'thyroglobulin': 130,
    'virus-like-particle': 135,
}

4) Helper functions

I think that's the single best piece of advice: constantly think about how you could be doing things better and questioning yourself.
- Elon Musk -

저작자표시 비영리 변경금지 (새창열림)

'대회' 카테고리의 다른 글

CIBMTR - Equity in post-HCT Survival Predictions #1 About the Competition (0)	2025.01.30
CZII - CryoET Object Identification #4 Making synthetic data for Baseline YOLO11 Solution (0)	2025.01.28
CZII - CryoET Object Identification #2 Baseline UNet Solution (0)	2025.01.15
CZII - CryoET Object Identification #1 - Training Data (0)	2025.01.14
Child Mind Institute — Problematic Internet Use: The Greatest Shake-Up? (1)	2024.12.23

현재글CZII - CryoET Object Identification #3 Baseline YOLO11 Solution

캐글, 코인, 비트코인, 단타, Kaggle, 투자, cibmtr - equity in post-hct survival predictions, nlp, dl, nodejs, 경제, Prompt Engineering, home credit default risk, Express, 매매일지, 오블완, 티스토리챌린지, backend, llm, ML,

Today :
Yesterday :

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

동선생