대회

CZII - CryoET Object Identification #3 Baseline YOLO11 Solution

dongsunseng 2025. 1. 16. 17:14
반응형

This post is an annotation of baseline YOLO11 solution kernel from @SERGIO ALVAREZ.

https://www.kaggle.com/code/sersasj/czii-yolo11-submission-baseline-with-kdtree-update

 

CZII YOLO11 Submission Baseline with KDTree Update

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

www.kaggle.com

CZII YOLO11 Submission Baseline with KDTree Update - LB 0.682

Score for TS_5_4: 0.658783812957661,{'apo-ferritin': {'total_tp': 42, 'total_fp': 20, 'total_fn': 2, 'fbeta': 0.9321148825065274}, 'beta-galactosidase': {'total_tp': 5, 'total_fp': 27, 'total_fn': 7, 'fbeta': 0.3794642857142858}, 'ribosome': {'total_tp': 20, 'total_fp': 29, 'total_fn': 10, 'fbeta': 0.6427221172022684}, 'thyroglobulin': {'total_tp': 23, 'total_fp': 104, 'total_fn': 7, 'fbeta': 0.6441515650741352}, 'virus-like-particle': {'total_tp': 11, 'total_fp': 2, 'total_fn': 0, 'fbeta': 0.9894179894179894}}

Score for TS_69_2: 0.8191956150699464,{'apo-ferritin': {'total_tp': 35, 'total_fp': 25, 'total_fn': 0, 'fbeta': 0.9596774193548387}, 'beta-galactosidase': {'total_tp': 13, 'total_fp': 46, 'total_fn': 3, 'fbeta': 0.7015873015873016}, 'ribosome': {'total_tp': 35, 'total_fp': 15, 'total_fn': 2, 'fbeta': 0.926791277258567}, 'thyroglobulin': {'total_tp': 28, 'total_fp': 84, 'total_fn': 6, 'fbeta': 0.7256097560975611}, 'virus-like-particle': {'total_tp': 9, 'total_fp': 1, 'total_fn': 0, 'fbeta': 0.9935064935064936}}

Score for TS_6_4: 0.685180923434018,{'apo-ferritin': {'total_tp': 45, 'total_fp': 34, 'total_fn': 12, 'fbeta': 0.7719475277497477}, 'beta-galactosidase': {'total_tp': 7, 'total_fp': 29, 'total_fn': 5, 'fbeta': 0.5219298245614036}, 'ribosome': {'total_tp': 54, 'total_fp': 59, 'total_fn': 12, 'fbeta': 0.7852865697177076}, 'thyroglobulin': {'total_tp': 24, 'total_fp': 77, 'total_fn': 6, 'fbeta': 0.70223752151463}, 'virus-like-particle': {'total_tp': 8, 'total_fp': 4, 'total_fn': 2, 'fbeta': 0.7906976744186046}}

Score for TS_6_6: 0.7575532250952666,{'apo-ferritin': {'total_tp': 37, 'total_fp': 39, 'total_fn': 2, 'fbeta': 0.8985714285714286}, 'beta-galactosidase': {'total_tp': 8, 'total_fp': 43, 'total_fn': 3, 'fbeta': 0.5991189427312775}, 'ribosome': {'total_tp': 17, 'total_fp': 11, 'total_fn': 6, 'fbeta': 0.7297979797979798}, 'thyroglobulin': {'total_tp': 31, 'total_fp': 120, 'total_fn': 4, 'fbeta': 0.7412095639943742}, 'virus-like-particle': {'total_tp': 19, 'total_fp': 2, 'total_fn': 0, 'fbeta': 0.9938461538461538}}

1) Ultralytics setting for offline env (External kernel linked to the main submission kernel)

Ultralytics is an open-source package for implementing and training YOLO (You Only Look Once) object detection models

!pip download -d ./packages ultralytics
!tar cfvz archive.tar.gz ./packages
  • !pip download -d ./packages ultralytics
    • -d ./packages: Specifies the download location as ./packages directory
    • Downloads the package and all its dependencies
    • Only downloads wheel files without actual installation
  • !tar cfvz archive.tar.gz ./packages
    • Compresses downloaded packages into a tar file
    • c: Create a new archive
    • f: Specify filename
    • v: Verbose (detailed output)
    • z: Use gzip compression
    • archive.tar.gz: Name of the compressed file to be created
    • ./packages: Directory to be compressed
  • This is done because internet access is restricted in the competition environment
  • All necessary packages are downloaded and compressed in advance so they can be installed later in an offline environment
  • wheel files?
    • A binary package that bundles Python packages in an installable form
    • Includes compiled code, metadata, and dependency information
    • Has the .whl extension
!tar xfvz archive.tar.gz
!pip install --no-index --find-links=./packages ultralytics
!rm -rf ./packages
  • !tar xfvz archive.tar.gz
    • x: Extract
    • f: Specify filename
    • v: Verbose (detailed output)
    • z: Extract gzip compression
    • Extracts archive.tar.gz to create ./packages directory
  • !pip install --no-index --find-links=./packages ultralytics
    • --no-index: Don't use PyPI (Python Package Index).
      • This means don't download packages from the internet
    • --find-links=./packages: Specify local directory to find packages
    • Install ultralytics using locally downloaded wheel files
  • !rm -rf ./packages
    • Delete the temporarily used packages directory after installation
    • -r: Delete recursively (including all files in directory)
    • -f: Force delete (without confirmation messages)

2) Dependencies (Back to the kernel)

# Installing Ultralytics
!tar xfvz /kaggle/input/ultralytics-for-offline-install/archive.tar.gz
!pip install --no-index --find-links=./packages ultralytics
!rm -rf ./packages

# Installing Zarr package
!cp -r '/kaggle/input/hengck-czii-cryo-et-01/wheel_file' '/kaggle/working/'
!pip install /kaggle/working/wheel_file/asciitree-0.3.3/asciitree-0.3.3
!pip install --no-index --find-links=/kaggle/working/wheel_file zarr
  • Copy wheel files from another Kaggle dataset to working directory
  • First install asciitree (a dependency of zarr)
  • Install zarr package
  • The reasons for this approach:
    1. Kaggle notebooks have restricted internet access
    2. Required packages are pre-uploaded as datasets
    3. Enables package installation in offline environments Specifically, zarr is a package used for efficient storage and processing of large array data, which will likely be used in this competition for handling 3D image data.
import os
import glob
import time
import sys
import warnings
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import cv2
import torch
from tqdm import tqdm
from ultralytics import YOLO
import zarr
from scipy.spatial import cKDTree
from collections import defaultdict

3) Loading model + Configuration

model_path = '/kaggle/input/czii-yolo-l-trained-with-synthetic-data/best_synthetic.pt'
model = YOLO(model_path)
  • Loading the 'best_synthetic.pt' file
  • And then, Uses the YOLO class from Ultralytics to load the model
# Processing experiment data paths
runs_path = '/kaggle/input/czii-cryo-et-object-identification/test/static/ExperimentRuns/*'
runs = sorted(glob.glob(runs_path))
runs = [os.path.basename(run) for run in runs]

# Data Splitting
sp = len(runs)//2
runs1 = runs[:sp]
runs1[:5]

#add by @minfuka
runs2 = runs[sp:]
runs2[:5]

#add by @minfuka - GPU Checking
assert torch.cuda.device_count() == 2
  • Processing experiment data paths:
    • Gets experiment data paths from test dataset
    • Uses glob.glob to get all experiment folders
    • Uses os.path.basename to extract only folder names from paths
  • Data splitting:
    • Divides all experiments into two groups
    • Appears to be preparation for parallel processing
  • GPU Checking:
    • Verifies that 2 GPUs are available
    • Uses assert statement to raise error if not 2
    • For multi-GPU processing
    • This appears to be intended for parallelizing data processing using multiple GPUs
    • It's preparing for each GPU to process half of the data
particle_names = [
    'apo-ferritin',
    'beta-amylase',
    'beta-galactosidase',
    'ribosome',
    'thyroglobulin',
    'virus-like-particle'
]

particle_to_index = {
    'apo-ferritin': 0,
    'beta-amylase': 1,
    'beta-galactosidase': 2,
    'ribosome': 3,
    'thyroglobulin': 4,
    'virus-like-particle': 5
}

index_to_particle = {index: name for name, index in particle_to_index.items()}

particle_radius = {
    'apo-ferritin': 60,
    'beta-amylase': 65,
    'beta-galactosidase': 90,
    'ribosome': 150,
    'thyroglobulin': 130,
    'virus-like-particle': 135,
}

4) Helper functions

 


I think that's the single best piece of advice: constantly think about how you could be doing things better and questioning yourself. 
- Elon Musk -
반응형