Trouble in large data process

a2551465 · February 10, 2025, 8:12am

When performing batch processing of segmentation using Python, the groom stage encountered an exit139 error while executing the find_reference_image_index function. Currently, my solution is to manually specify an index, which is obviously too simplistic. By reviewing the source code, I found that the find_reference_image_index function performs a tomesh operation. Are there any optimization methods to control resource consumption?
The source code I used is as follows:

# -*- coding: utf-8 -*-
"""
====================================================================
Full Example Pipeline for Statistical Shape Modeling with ShapeWorks
====================================================================

In this example, we provide unpre-processed left atrial dataset,
which consists of the LGE scan and its segmentation across some number of subjects.
Since the data set is not pre-processed, it requires all re-processing steps;
resampling, padding, alignment, etc. To run the ShapeWorks particle-based optimization,
the processed segmentation files are converted to the signed distance transform.
This example is set to serve as a test case for users who want to process the raw(gray-sclae)
images as well as their binary segmentation images.

First import the necessary modules
"""
import os
import subprocess
import shapeworks as sw
import pandas as pd


os.environ['PATH'] = '/v16data/root/adg/packages/ShapeWorks-v6.5.1-linux/bin' + os.pathsep + os.environ['PATH']
print(os.environ['PATH'])

def Run_Pipeline():
    print("\nStep 1. Acquire Data\n")
    """
    Step 1: ACQUIRE DATA
    We define dataset_name which determines which dataset to download from
    the portal and the directory to save output from the use case in.
    """
    # sampled_data = []
    # behavior_data = pd.read_excel('../data/label_with_cog_filted.xlsx').values.tolist()
    # for x in behavior_data:
    #     if x[5] == 0 and len(sampled_data) <= 100:
    #         sampled_data.append(x)
    #
    # for x in behavior_data:
    #     if x[5] != 0 and len(sampled_data) <= 200:
    #         sampled_data.append(x)
    #
    # valid_names = [i[-2] for i in sampled_data]
    # print(valid_names)
    input_path = '/v16data/root/adg/python_projects/ventriclecal_brain_age/shapework/'
    file_root = input_path + '3d_brain_ventricle_vol_smooth_nrrd/'
    # file_list = sorted([file_root + i for i in os.listdir(file_root) if i in valid_names])
    file_list = sorted([file_root + i for i in os.listdir(file_root)])
    print(file_list)


    print("\nStep 2. Groom - Data Pre-processing\n")
    """
    Step 2: GROOM
    The following grooming steps are performed:
        - Crop (cropping first makes resampling faster)
        - Resample to have isotropic spacing
        - Pad with zeros
        - Compute alignment transforms to use in optimization
        - Compute distance transforms
    For more information about grooming, please refer to:
    http://sciinstitute.github.io/ShapeWorks/workflow/groom.html
    """

    # Create a directory for groomed output
    groom_dir = input_path + 'groomed/'

    os.makedirs(groom_dir, exist_ok=True)

    """
    We loop over the shape segmentation files and load the segmentations
    and apply the intial grooming steps
    """
    # list of shape segmentations
    shape_seg_list = []
    # list of shape names (shape files prefixes) to be used for saving outputs
    shape_names = []
    for shape_filename in file_list:
        print('Loading: ' + shape_filename)
        # get current shape name
        shape_name = shape_filename.split('/')[-1].replace('.nrrd', '')

        # load segmentation
        shape_seg = sw.Image(shape_filename)

        # do initial grooming steps
        print("Grooming: " + shape_name)
        # Individually crop each segmentation using a computed bounding box
        iso_value = 0.5  # voxel value for isosurface
        try:
            bounding_box = sw.ImageUtils.boundingBox([shape_seg], iso_value).pad(5)
            shape_seg.crop(bounding_box)
            # Resample to isotropic spacing using linear interpolation
            antialias_iterations = 30  # number of iterations for antialiasing
            iso_spacing = [1, 1, 1]  # isotropic spacing
            shape_seg.antialias(antialias_iterations).resample(iso_spacing, sw.InterpolationType.Linear).binarize()
            # Pad segmentations with zeros
            pad_size = 5  # number of voxels to pad for each dimension
            pad_value = 0  # the constant value used to pad the segmentations
            shape_seg.pad(pad_size, pad_value)
            shape_seg_list.append(shape_seg)
            shape_names.append(shape_name)
        except Exception as e:
            print(e)


    """
    To find the alignment transforms and save them for optimization,
    we must break the loop to select a reference segmentation
    """
    ref_index = sw.find_reference_image_index(shape_seg_list)
    ref_seg = shape_seg_list[400].write(groom_dir + 'reference.nrrd')
    ref_name = shape_names[400]
    print("Reference found: " + ref_name)

    """
    Now we can loop over all of the segmentations again to find the rigid
    alignment transform and compute a distance transform
    """
    rigid_transforms = []  # Save rigid transorm matrices
    for shape_seg, shape_name in zip(shape_seg_list, shape_names):
        print('Finding alignment transform from ' + shape_name + ' to ' + ref_name)
        # Get rigid transform
        iso_value = 0.5  # voxel value for isosurface
        icp_iterations = 100  # number of ICP iterations
        rigid_transform = shape_seg.createRigidRegistrationTransform(
            ref_seg, iso_value, icp_iterations)
        # Convert to vtk format for optimization
        rigid_transform = sw.utils.getVTKtransform(rigid_transform)
        rigid_transforms.append(rigid_transform)

        # Convert segmentations to smooth signed distance transforms
        print("Converting " + shape_name + " to distance transform")
        iso_value = 0  # voxel value for isosurface
        sigma = 2  # for Gaussian blur
        shape_seg.antialias(antialias_iterations).computeDT(iso_value).gaussianBlur(sigma)

    # Save distance transforms
    groomed_files = sw.utils.save_images(groom_dir + 'distance_transforms/', shape_seg_list,
                                         shape_names, extension='nrrd', compressed=True, verbose=True)

    # Get data input (meshes if running in mesh mode, else distance transforms)
    domain_type, groomed_files = sw.data.get_optimize_input(groomed_files, True)

    print("\nStep 3. Optimize - Particle Based Optimization\n")
    """
    Step 3: OPTIMIZE - Particle Based Optimization

    Now that we have the distance transform representation of data we create
    the parameter files for the shapeworks particle optimization routine.
    For more details on the plethora of parameters for shapeworks please refer
    to docs/workflow/optimze.md
    http://sciinstitute.github.io/ShapeWorks/workflow/optimize.html
    """

    point_dir = input_path + '/option_set'
    if not os.path.exists(point_dir):
        os.makedirs(point_dir)

    # Create spreadsheet
    project_location = input_path
    subjects = []
    number_domains = 1
    for i in range(len(shape_seg_list)):
        subject = sw.Subject()
        subject.set_number_of_domains(number_domains)
        rel_mesh_files = sw.utils.get_relative_paths([file_list[i]], project_location)
        subject.set_original_filenames(rel_mesh_files)
        rel_groom_files = sw.utils.get_relative_paths([groomed_files[i]], project_location)
        subject.set_groomed_filenames(rel_groom_files)
        transform = [rigid_transforms[i].flatten()]
        subject.set_groomed_transforms(transform)
        subjects.append(subject)

    project = sw.Project()
    project.set_subjects(subjects)
    parameters = sw.Parameters()

    # Create a dictionary for all the parameters required by optimization
    parameter_dictionary = {
        "number_of_particles": 512,
        "use_normals": 0,
        "normals_strength": 10.0,
        "checkpointing_interval": 200,
        "keep_checkpoints": 0,
        "iterations_per_split": 1000,
        "optimization_iterations": 1000,
        "starting_regularization": 1000,
        "ending_regularization": 10,
        "relative_weighting": 10,
        "procrustes": 1,
        "initial_relative_weighting": 0.1,
        "procrustes_interval": 1,
        "procrustes_scaling": 1,
        "save_init_splits": 0,
        "verbosity": 0
    }
    # If running a tiny test, reduce some parameters
    print('Generating project file')
    # Add param dictionary to spreadsheet
    for key in parameter_dictionary:
        parameters.set(key, sw.Variant([parameter_dictionary[key]]))
    project.set_parameters("optimize", parameters)
    spreadsheet_file = input_path + "ventricle_option_set.swproj"
    project.save(spreadsheet_file)

    # Run optimization
    print('Running optimization')
    optimize_cmd = ('shapeworks optimize --progress --name ' + spreadsheet_file).split()
    subprocess.check_call(optimize_cmd)




Run_Pipeline()

amorris · February 10, 2025, 6:58pm

Meshes are usually not too big. But it might also be loading all of your images at once.

About how many images are you processing and how big are the images?

Unfortunately there are not a lot of resource limiting features in some of the operations.

a2551465 · February 12, 2025, 7:15am

Hello, I have processed a total of 4,000 ventricular images, each with a size of 200kb. Currently, I have avoided the error of “reference search” by limiting the number of retrieved images. However, this problem has occurred again on the optimize platform. The large number of images during the execution of optimize has caused my server’s CPU to overload. May I ask if there is any way to optimize the process?

    print('Running optimization')
    optimize_cmd = ('shapeworks optimize --progress --name ' + spreadsheet_file).split()
    subprocess.check_call(optimize_cmd)

this is my code

amorris · February 12, 2025, 9:52pm

4,000 is quite a lot of images. We are currently working on improving scalability to numbers like these.

Are you running the optimization on distance transforms or on meshes (an option in Studio with images is to convert to mesh after initial processing). Meshes will need much less memory, particularly if the dev/nightly version of ShapeWorks is used.

a2551465 · March 13, 2025, 11:11am

Hi, recently, I have been runing DeepSSM for shapes analysis, and I split my dataset to resolve this problem, thank you！

Topic		Replies	Views
Different results Shapeworks studio and Python	2	306	January 6, 2023
SSM: Python API for Optimization	4	325	November 30, 2022
Issue with optimization	4	526	May 4, 2021
Algorithm behind reference mesh	8	29	August 29, 2024
Different Results Between ShapeWorks Studio and Python Pipeline for Multiple Domain SSM	3	38	May 6, 2025

Trouble in large data process

Related topics