Tile sizing

`blockbuster.auto_tile_shape(shape: tuple[int, ...], dtype: Any, target_bytes: int = 64 * 1024 ** 2, use_gpu: bool = False, gpu_memory: int | None = None, available_memory: int | None = None, n_workers: int | None = None, verbose: bool = False) -> tuple[int, ...]`

Balanced tile shape for general-purpose 3-D processing.

Sizes the last three axes (spatial) to stay within the memory budget while keeping the shape as cubic as possible. Leading axes (t, c) are always 1.

Parameters:

Name	Type	Description	Default
`shape`	`tuple[int, ...]`	Full array shape, e.g. `(z, y, x)` or `(t, c, z, y, x)`.	required
`dtype`	`Any`	Array dtype.	required
`target_bytes`	`int`	Memory ceiling per tile. Default 64 MiB.	`64 * 1024 ** 2`
`use_gpu`	`bool`	Size tiles against GPU VRAM rather than host RAM.	`False`
`gpu_memory`	`int \| None`	Available GPU VRAM in bytes; auto-queried when None.	`None`
`available_memory`	`int \| None`	Available host RAM in bytes; auto-queried when None.	`None`
`n_workers`	`int \| None`	Number of parallel workers (divides the RAM budget).	`None`
`verbose`	`bool`	Log the chosen shape and estimated tile size.	`False`

Returns:

Type	Description
`tuple[int, ...]`	Tile shape with the same number of dimensions as shape.

Examples:

>>> tile = auto_tile_shape((128, 2048, 2048), "uint16")
>>> tile
(8, 2048, 2048)

Source code in src/blockbuster/_chunks.py

def auto_tile_shape(
    shape: tuple[int, ...],
    dtype: Any,
    target_bytes: int = 64 * 1024**2,
    use_gpu: bool = False,
    gpu_memory: int | None = None,
    available_memory: int | None = None,
    n_workers: int | None = None,
    verbose: bool = False,
) -> tuple[int, ...]:
    """Balanced tile shape for general-purpose 3-D processing.

    Sizes the last three axes (spatial) to stay within the memory budget while
    keeping the shape as cubic as possible. Leading axes (t, c) are always 1.

    Parameters
    ----------
    shape:
        Full array shape, e.g. ``(z, y, x)`` or ``(t, c, z, y, x)``.
    dtype:
        Array dtype.
    target_bytes:
        Memory ceiling per tile. Default 64 MiB.
    use_gpu:
        Size tiles against GPU VRAM rather than host RAM.
    gpu_memory:
        Available GPU VRAM in bytes; auto-queried when None.
    available_memory:
        Available host RAM in bytes; auto-queried when None.
    n_workers:
        Number of parallel workers (divides the RAM budget).
    verbose:
        Log the chosen shape and estimated tile size.

    Returns
    -------
    tuple[int, ...]
        Tile shape with the same number of dimensions as *shape*.

    Examples
    --------
    >>> tile = auto_tile_shape((128, 2048, 2048), "uint16")
    >>> tile
    (8, 2048, 2048)
    """
    n_workers = n_workers or os.cpu_count() or 1
    itemsize = np.dtype(dtype).itemsize
    n_spatial = min(3, len(shape))

    if use_gpu:
        mem = gpu_memory if gpu_memory is not None else _get_gpu_memory()
        budget = min(target_bytes * 2, mem // 2)
    else:
        mem = available_memory or _get_available_memory()
        budget = min(target_bytes, mem // (n_workers * 4))

    budget = max(32 * 1024**2, budget)

    leading = [1] * (len(shape) - n_spatial)
    spatial = list(shape[-n_spatial:])
    target_voxels = budget / itemsize
    target_side = int(target_voxels ** (1.0 / n_spatial))
    chunk_spatial = [min(s, target_side) for s in spatial]

    capped = [i for i, (c, s) in enumerate(zip(chunk_spatial, spatial)) if c == s]
    uncapped = [i for i in range(n_spatial) if i not in capped]
    if uncapped:
        used_by_capped = np.prod([chunk_spatial[i] for i in capped]) if capped else 1
        remaining = target_voxels / max(1, used_by_capped)
        new_side = int(remaining ** (1.0 / len(uncapped)))
        for i in uncapped:
            chunk_spatial[i] = min(spatial[i], new_side)

    result = tuple(leading + chunk_spatial)

    if verbose:
        mib = np.prod(result) * itemsize / 1024**2
        logger.info(
            "auto_tile_shape: shape=%s dtype=%s → tiles=%s (~%.0f MiB/tile)",
            shape, np.dtype(dtype).name, result, mib,
        )

    return result

`blockbuster.auto_tile_shape_cellpose(shape: tuple[int, ...], dtype: Any, diameter: float | None = None, do_3D: bool = False, use_gpu: bool = False, gpu_memory: int | None = None, available_memory: int | None = None, n_workers: int | None = None, model_memory_bytes: int = 2 * 1024 ** 3, cellpose_memory_factor: int = 20, verbose: bool = False) -> tuple[int, ...]`

Cellpose-optimised tile shape.

Cellpose is fundamentally 2-D: even in 3-D mode it runs 2-D segmentation on orthogonal planes and takes a consensus.

do_3D=False (default) z is set to 1. Each tile is one 2-D (y, x) slice.

do_3D=True z is kept at its full extent per tile. y and x are tiled based on the available memory, accounting for the 3× overhead of three plane orientations.

Parameters:

Name	Type	Description	Default
`shape`	`tuple[int, ...]`	Spatial shape, e.g. `(z, y, x)`.	required
`dtype`	`Any`	Array dtype.	required
`diameter`	`float \| None`	Expected cell diameter in pixels. Tile will be at least `4 × diameter`.	`None`
`do_3D`	`bool`	Whether Cellpose will run in 3-D mode.	`False`
`use_gpu`	`bool`	Size tiles for GPU VRAM.	`False`
`gpu_memory`	`int \| None`	Memory parameters (auto-queried when None).	`None`
`available_memory`	`int \| None`	Memory parameters (auto-queried when None).	`None`
`n_workers`	`int \| None`	Memory parameters (auto-queried when None).	`None`
`model_memory_bytes`	`int`	Memory consumed by the Cellpose model weights (default 2 GiB).	`2 * 1024 ** 3`
`cellpose_memory_factor`	`int`	Cellpose allocates roughly this multiple of raw input bytes (default 20×).	`20`
`verbose`	`bool`	Log the chosen shape and memory estimates.	`False`

Returns:

Type	Description
`tuple[int, ...]`	Tile shape with the same number of dimensions as shape.

Examples:

>>> tile = auto_tile_shape_cellpose((128, 2048, 2048), "uint16", diameter=30)
>>> tile
(1, 2048, 2048)

Source code in src/blockbuster/_chunks.py

def auto_tile_shape_cellpose(
    shape: tuple[int, ...],
    dtype: Any,
    diameter: float | None = None,
    do_3D: bool = False,
    use_gpu: bool = False,
    gpu_memory: int | None = None,
    available_memory: int | None = None,
    n_workers: int | None = None,
    model_memory_bytes: int = 2 * 1024**3,
    cellpose_memory_factor: int = 20,
    verbose: bool = False,
) -> tuple[int, ...]:
    """Cellpose-optimised tile shape.

    Cellpose is fundamentally 2-D: even in 3-D mode it runs 2-D segmentation
    on orthogonal planes and takes a consensus.

    **do_3D=False (default)**
        z is set to 1. Each tile is one 2-D ``(y, x)`` slice.

    **do_3D=True**
        z is kept at its full extent per tile. y and x are tiled based on the
        available memory, accounting for the 3× overhead of three plane orientations.

    Parameters
    ----------
    shape:
        Spatial shape, e.g. ``(z, y, x)``.
    dtype:
        Array dtype.
    diameter:
        Expected cell diameter in pixels. Tile will be at least ``4 × diameter``.
    do_3D:
        Whether Cellpose will run in 3-D mode.
    use_gpu:
        Size tiles for GPU VRAM.
    gpu_memory, available_memory, n_workers:
        Memory parameters (auto-queried when None).
    model_memory_bytes:
        Memory consumed by the Cellpose model weights (default 2 GiB).
    cellpose_memory_factor:
        Cellpose allocates roughly this multiple of raw input bytes (default 20×).
    verbose:
        Log the chosen shape and memory estimates.

    Returns
    -------
    tuple[int, ...]
        Tile shape with the same number of dimensions as *shape*.

    Examples
    --------
    >>> tile = auto_tile_shape_cellpose((128, 2048, 2048), "uint16", diameter=30)
    >>> tile
    (1, 2048, 2048)
    """
    n_workers = n_workers or os.cpu_count() or 1
    itemsize = np.dtype(dtype).itemsize

    if use_gpu:
        total_mem = gpu_memory if gpu_memory is not None else _get_gpu_memory()
    else:
        total_mem = (available_memory or _get_available_memory()) // n_workers

    usable = max(32 * 1024**2, total_mem - model_memory_bytes)
    max_raw_bytes = usable // cellpose_memory_factor

    n_spatial = min(3, len(shape))
    leading = [1] * (len(shape) - n_spatial)
    min_tile = int(4 * diameter) if diameter is not None else 1

    if n_spatial == 2 or not do_3D:
        max_pixels_2d = max(1, max_raw_bytes // itemsize)
        tile_side = max(min_tile, int(max_pixels_2d**0.5))
        if n_spatial == 2:
            y, x = shape[-2], shape[-1]
            chunk_spatial = [min(y, tile_side), min(x, tile_side)]
        else:
            z, y, x = shape[-3], shape[-2], shape[-1]
            chunk_spatial = [1, min(y, tile_side), min(x, tile_side)]
    else:
        z, y, x = shape[-3], shape[-2], shape[-1]
        max_pixels_per_slice = max(1, (max_raw_bytes // 3) // (z * itemsize))
        tile_side = max(min_tile, int(max_pixels_per_slice**0.5))
        chunk_spatial = [z, min(y, tile_side), min(x, tile_side)]

    result = tuple(leading + chunk_spatial)

    if verbose:
        raw_mib = np.prod(result) * itemsize / 1024**2
        logger.info(
            "auto_tile_shape_cellpose: shape=%s dtype=%s do_3D=%s "
            "→ tiles=%s (~%.0f MiB raw, ~%.0f MiB Cellpose estimate)",
            shape, np.dtype(dtype).name, do_3D, result,
            raw_mib, raw_mib * cellpose_memory_factor,
        )

    return result