2D cell tracking with Ultrack and Stardist¶
The goal of this document is to present Ultrack’s programming interface. We use Stardist for segmentation because its available weights are reliable for 2D nuclei segmentation. However, any algorithm other could be used or even multiple segmentation masks are allowed.
The tracking procedure can be roughly divided into two stages: - Obtaining cell detections and their boundaries (e.g. edges); - Compute tracking from detection and edges.
First, we download demostration dataset from cell-tracking challenge.
[1]:
!wget -nc http://data.celltrackingchallenge.net/training-datasets/Fluo-N2DL-HeLa.zip
!unzip -n Fluo-N2DL-HeLa.zip
File ‘Fluo-N2DL-HeLa.zip’ already there; not retrieving.
Archive: Fluo-N2DL-HeLa.zip
We set some environment variables and import the required python packages.
[2]:
# stardist / tensorflow env variables setup
import os
os.environ["OMP_NUM_THREADS"] = "4"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
from pathlib import Path
import napari
import numpy as np
from napari.utils.notebook_display import nbscreenshot
from tqdm import tqdm
from rich.pretty import pprint
from stardist.models import StarDist2D
from ultrack import track, to_tracks_layer, tracks_to_zarr
from ultrack.imgproc import normalize
from ultrack.utils import estimate_parameters_from_labels, labels_to_contours
from ultrack.utils.array import array_apply
from ultrack.config import MainConfig
2023-09-14 15:38:40.943181: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
1. Computing cell detection and boundaries¶
We set the download dataset path, open a napari viewer and open the images using napari’s default reader.
[3]:
dataset_path = Path("Fluo-N2DL-HeLa/01")
viewer = napari.Viewer()
viewer.window.resize(1800, 1000)
viewer.open(sorted(dataset_path.glob("*.tif")), stack=True)
image = viewer.layers[0].data
nbscreenshot(viewer)
[3]:
Using the predict
helper function we compute the Stardist results.
[4]:
%%capture
model = StarDist2D.from_pretrained("2D_versatile_fluo")
stardist_labels = np.zeros_like(image, dtype=np.int32)
def predict(frame: np.ndarray, model: StarDist2D) -> np.ndarray:
"""Normalizes and computes stardist prediction."""
frame = normalize(frame, gamma=1.0)
labels, _ = model.predict_instances_big(
frame, "YX", block_size=560, min_overlap=96, show_progress=False,
)
return labels
array_apply(
image,
out_array=stardist_labels,
func=predict,
model=model,
)
viewer.add_labels(stardist_labels, name="stardist")
2023-09-14 15:38:47.167529: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-14 15:38:47.167844: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-14 15:38:47.167933: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-14 15:38:47.168164: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-14 15:38:47.168562: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-14 15:38:47.168656: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-14 15:38:47.168738: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-14 15:38:48.214171: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-14 15:38:48.214293: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-14 15:38:48.214384: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-14 15:38:48.214468: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20508 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:03:00.0, compute capability: 8.6
2023-09-14 15:38:49.480967: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:428] Loaded cuDNN version 8401
[5]:
nbscreenshot(viewer)
[5]:
The labels_to_contours
converts labels into Ultrack’s expected input, a detection and edges maps (cells’ boundaries). The sigma
parameter blurs the cell boundaries, assisting the segmentation hypothesis estimation, the goal is to make the boundaries similar to a distance transform inside the cells.
[6]:
detection, edges = labels_to_contours(stardist_labels, sigma=4.0) # multiple labels can be used with [labels_0, labels_1, ...]
Converting labels to edges: 100%|█████████████████████████████████████████████████| 92/92 [00:00<00:00, 92.67it/s]
[7]:
viewer.add_image(detection, visible=False)
viewer.add_image(edges, blending="additive", colormap="magma")
nbscreenshot(viewer)
[7]:
2. Tracking¶
Now that we have our detection
and edges
you can call the track
function for the tracking on the contour representation of the cells.
The track
function is composed of three steps that can also be called individually: - segment
: Computes the segmentation hypothesis for tracking; - link
: Links and assign edge weights to the segmentation hypothesis; - solve
: Solves the tracking problem by selecting the strongly connected segmentation hypothesis.
Each of these steps requires its own configuration, which we’ll set up below. Its documentation can be found here.
We create our configuration instance and print its default values.
[8]:
config = MainConfig()
pprint(config)
MainConfig( │ data_config=DataConfig(working_dir=PosixPath('.'), database='sqlite', address=None, n_workers=1), │ segmentation_config=SegmentationConfig( │ │ threshold=0.5, │ │ min_area=100, │ │ max_area=1000000, │ │ min_frontier=0.0, │ │ anisotropy_penalization=0.0, │ │ max_noise=0.0, │ │ ws_hierarchy=<function watershed_hierarchy_by_area at 0x7f135322aa70>, │ │ n_workers=1 │ ), │ linking_config=LinkingConfig( │ │ n_workers=1, │ │ max_neighbors=5, │ │ max_distance=15.0, │ │ distance_weight=0.0, │ │ z_score_threshold=5.0 │ ), │ tracking_config=TrackingConfig( │ │ appear_weight=-0.001, │ │ disappear_weight=-0.001, │ │ division_weight=-0.001, │ │ dismiss_weight_guess=None, │ │ include_weight_guess=None, │ │ window_size=None, │ │ overlap_size=1, │ │ solution_gap=0.001, │ │ time_limit=36000, │ │ method=0, │ │ n_threads=-1, │ │ link_function='power', │ │ power=4, │ │ bias=-0.0 │ ) )
To assist setting the parameters we inspect Stardist’s results using the function estimate_params_from_labels
from ultrack.utils
. The min_area
was selected to eliminate a few small segments which could be noise or incorrect segmentations. For the max_area
we also avoid the right tail of the distribution because it could also be outliers.
[9]:
params_df = estimate_parameters_from_labels(stardist_labels, is_timelapse=True)
params_df["area"].plot(kind="hist", bins=100, title="Area histogram")
Estimating params.: 100%|█████████████████████████████████████████████████████████| 92/92 [00:01<00:00, 74.68it/s]
[9]:
<Axes: title={'center': 'Area histogram'}, ylabel='Frequency'>
[10]:
config.segmentation_config.min_area = 50
config.segmentation_config.max_area = 950
config.segmentation_config.n_workers = 8
The remaining parameters are harder to estimate without ground-truth data, hence they were tuned by trial and error. From our experience setting the power
parameter to 3 or 4 yields better results, specially in challenging scenarios. Note that, you must adjust the other *_weight
accordingly when power
is updated.
[11]:
config.linking_config.max_distance = 25
config.linking_config.n_workers = 8
config.tracking_config.appear_weight = -1
config.tracking_config.disappear_weight = -1
config.tracking_config.division_weight = -0.1
config.tracking_config.power = 4
config.tracking_config.bias = -0.001
config.tracking_config.solution_gap = 0.0
pprint(config)
MainConfig( │ data_config=DataConfig(working_dir=PosixPath('.'), database='sqlite', address=None, n_workers=1), │ segmentation_config=SegmentationConfig( │ │ threshold=0.5, │ │ min_area=50, │ │ max_area=950, │ │ min_frontier=0.0, │ │ anisotropy_penalization=0.0, │ │ max_noise=0.0, │ │ ws_hierarchy=<function watershed_hierarchy_by_area at 0x7f135322aa70>, │ │ n_workers=8 │ ), │ linking_config=LinkingConfig( │ │ n_workers=8, │ │ max_neighbors=5, │ │ max_distance=25, │ │ distance_weight=0.0, │ │ z_score_threshold=5.0 │ ), │ tracking_config=TrackingConfig( │ │ appear_weight=-1, │ │ disappear_weight=-1, │ │ division_weight=-0.1, │ │ dismiss_weight_guess=None, │ │ include_weight_guess=None, │ │ window_size=None, │ │ overlap_size=1, │ │ solution_gap=0.0, │ │ time_limit=36000, │ │ method=0, │ │ n_threads=-1, │ │ link_function='power', │ │ power=4, │ │ bias=-0.001 │ ) )
Now, we only need to execute the track
functions with the configuration we just created and the input images.
[12]:
track(
detection=detection,
edges=edges,
config=config,
overwrite=True,
)
WARNING:ultrack.core.segmentation.processing:Found zarr with MemoryStore. Using an zarr with MemoryStore can lead to considerable memory usage.
WARNING:ultrack.core.segmentation.processing:Found zarr with MemoryStore. Using an zarr with MemoryStore can lead to considerable memory usage.
Adding nodes to database: 100%|███████████████████████████████████████████████████| 92/92 [00:15<00:00, 6.08it/s]
Linking nodes.: 100%|█████████████████████████████████████████████████████████████| 91/91 [00:05<00:00, 16.83it/s]
Academic license - for non-commercial use only - expires 2024-08-06
Using GRB solver
Solving ILP batch 0
Constructing ILP ...
Solving ILP ...
Gurobi Optimizer version 9.1.2 build v9.1.2rc0 (linux64)
Thread count: 8 physical cores, 16 logical processors, using up to 16 threads
Optimize a model with 28432 rows, 48896 columns and 90543 nonzeros
Model fingerprint: 0x99b71700
Variable types: 0 continuous, 48896 integer (48896 binary)
Coefficient statistics:
Matrix range [1e+00, 1e+00]
Objective range [9e-07, 1e+00]
Bounds range [1e+00, 1e+00]
RHS range [1e+00, 1e+00]
Found heuristic solution: objective -0.0000000
Presolve removed 28430 rows and 48892 columns
Presolve time: 0.68s
Presolved: 2 rows, 4 columns, 6 nonzeros
Found heuristic solution: objective 3011.6957207
Variable types: 0 continuous, 4 integer (4 binary)
Explored 0 nodes (0 simplex iterations) in 0.69 seconds
Thread count was 16 (of 16 available processors)
Solution count 2: 3011.7 -0
Optimal solution found (tolerance 0.00e+00)
Best objective 3.011695720683e+03, best bound 3.011695720683e+03, gap 0.0000%
Saving solution ...
Done!
The to_tracks_layer
and tracks_to_zarr
export the solution into a napari compatible format.
[13]:
tracks_df, graph = to_tracks_layer(config)
labels = tracks_to_zarr(config, tracks_df)
Exporting segmentation masks: 100%|██████████████████████████████████████████████| 92/92 [00:00<00:00, 144.68it/s]
[14]:
viewer.add_tracks(tracks_df[["track_id", "t", "y", "x"]].values, graph=graph)
viewer.add_labels(labels)
viewer.layers["stardist"].visible = False
viewer.layers["edges"].visible = False
nbscreenshot(viewer)
[14]: