Trouble in large data process

When performing batch processing of segmentation using Python, the groom stage encountered an exit139 error while executing the find_reference_image_index function. Currently, my solution is to manually specify an index, which is obviously too simplistic. By reviewing the source code, I found that the find_reference_image_index function performs a tomesh operation. Are there any optimization methods to control resource consumption?
The source code I used is as follows:

# -*- coding: utf-8 -*-
"""
====================================================================
Full Example Pipeline for Statistical Shape Modeling with ShapeWorks
====================================================================

In this example, we provide unpre-processed left atrial dataset,
which consists of the LGE scan and its segmentation across some number of subjects.
Since the data set is not pre-processed, it requires all re-processing steps;
resampling, padding, alignment, etc. To run the ShapeWorks particle-based optimization,
the processed segmentation files are converted to the signed distance transform.
This example is set to serve as a test case for users who want to process the raw(gray-sclae)
images as well as their binary segmentation images.

First import the necessary modules
"""
import os
import subprocess
import shapeworks as sw
import pandas as pd


os.environ['PATH'] = '/v16data/root/adg/packages/ShapeWorks-v6.5.1-linux/bin' + os.pathsep + os.environ['PATH']
print(os.environ['PATH'])

def Run_Pipeline():
    print("\nStep 1. Acquire Data\n")
    """
    Step 1: ACQUIRE DATA
    We define dataset_name which determines which dataset to download from
    the portal and the directory to save output from the use case in.
    """
    # sampled_data = []
    # behavior_data = pd.read_excel('../data/label_with_cog_filted.xlsx').values.tolist()
    # for x in behavior_data:
    #     if x[5] == 0 and len(sampled_data) <= 100:
    #         sampled_data.append(x)
    #
    # for x in behavior_data:
    #     if x[5] != 0 and len(sampled_data) <= 200:
    #         sampled_data.append(x)
    #
    # valid_names = [i[-2] for i in sampled_data]
    # print(valid_names)
    input_path = '/v16data/root/adg/python_projects/ventriclecal_brain_age/shapework/'
    file_root = input_path + '3d_brain_ventricle_vol_smooth_nrrd/'
    # file_list = sorted([file_root + i for i in os.listdir(file_root) if i in valid_names])
    file_list = sorted([file_root + i for i in os.listdir(file_root)])
    print(file_list)


    print("\nStep 2. Groom - Data Pre-processing\n")
    """
    Step 2: GROOM
    The following grooming steps are performed:
        - Crop (cropping first makes resampling faster)
        - Resample to have isotropic spacing
        - Pad with zeros
        - Compute alignment transforms to use in optimization
        - Compute distance transforms
    For more information about grooming, please refer to:
    http://sciinstitute.github.io/ShapeWorks/workflow/groom.html
    """

    # Create a directory for groomed output
    groom_dir = input_path + 'groomed/'

    os.makedirs(groom_dir, exist_ok=True)

    """
    We loop over the shape segmentation files and load the segmentations
    and apply the intial grooming steps
    """
    # list of shape segmentations
    shape_seg_list = []
    # list of shape names (shape files prefixes) to be used for saving outputs
    shape_names = []
    for shape_filename in file_list:
        print('Loading: ' + shape_filename)
        # get current shape name
        shape_name = shape_filename.split('/')[-1].replace('.nrrd', '')

        # load segmentation
        shape_seg = sw.Image(shape_filename)

        # do initial grooming steps
        print("Grooming: " + shape_name)
        # Individually crop each segmentation using a computed bounding box
        iso_value = 0.5  # voxel value for isosurface
        try:
            bounding_box = sw.ImageUtils.boundingBox([shape_seg], iso_value).pad(5)
            shape_seg.crop(bounding_box)
            # Resample to isotropic spacing using linear interpolation
            antialias_iterations = 30  # number of iterations for antialiasing
            iso_spacing = [1, 1, 1]  # isotropic spacing
            shape_seg.antialias(antialias_iterations).resample(iso_spacing, sw.InterpolationType.Linear).binarize()
            # Pad segmentations with zeros
            pad_size = 5  # number of voxels to pad for each dimension
            pad_value = 0  # the constant value used to pad the segmentations
            shape_seg.pad(pad_size, pad_value)
            shape_seg_list.append(shape_seg)
            shape_names.append(shape_name)
        except Exception as e:
            print(e)


    """
    To find the alignment transforms and save them for optimization,
    we must break the loop to select a reference segmentation
    """
    ref_index = sw.find_reference_image_index(shape_seg_list)
    ref_seg = shape_seg_list[400].write(groom_dir + 'reference.nrrd')
    ref_name = shape_names[400]
    print("Reference found: " + ref_name)

    """
    Now we can loop over all of the segmentations again to find the rigid
    alignment transform and compute a distance transform
    """
    rigid_transforms = []  # Save rigid transorm matrices
    for shape_seg, shape_name in zip(shape_seg_list, shape_names):
        print('Finding alignment transform from ' + shape_name + ' to ' + ref_name)
        # Get rigid transform
        iso_value = 0.5  # voxel value for isosurface
        icp_iterations = 100  # number of ICP iterations
        rigid_transform = shape_seg.createRigidRegistrationTransform(
            ref_seg, iso_value, icp_iterations)
        # Convert to vtk format for optimization
        rigid_transform = sw.utils.getVTKtransform(rigid_transform)
        rigid_transforms.append(rigid_transform)

        # Convert segmentations to smooth signed distance transforms
        print("Converting " + shape_name + " to distance transform")
        iso_value = 0  # voxel value for isosurface
        sigma = 2  # for Gaussian blur
        shape_seg.antialias(antialias_iterations).computeDT(iso_value).gaussianBlur(sigma)

    # Save distance transforms
    groomed_files = sw.utils.save_images(groom_dir + 'distance_transforms/', shape_seg_list,
                                         shape_names, extension='nrrd', compressed=True, verbose=True)

    # Get data input (meshes if running in mesh mode, else distance transforms)
    domain_type, groomed_files = sw.data.get_optimize_input(groomed_files, True)

    print("\nStep 3. Optimize - Particle Based Optimization\n")
    """
    Step 3: OPTIMIZE - Particle Based Optimization

    Now that we have the distance transform representation of data we create
    the parameter files for the shapeworks particle optimization routine.
    For more details on the plethora of parameters for shapeworks please refer
    to docs/workflow/optimze.md
    http://sciinstitute.github.io/ShapeWorks/workflow/optimize.html
    """

    point_dir = input_path + '/option_set'
    if not os.path.exists(point_dir):
        os.makedirs(point_dir)

    # Create spreadsheet
    project_location = input_path
    subjects = []
    number_domains = 1
    for i in range(len(shape_seg_list)):
        subject = sw.Subject()
        subject.set_number_of_domains(number_domains)
        rel_mesh_files = sw.utils.get_relative_paths([file_list[i]], project_location)
        subject.set_original_filenames(rel_mesh_files)
        rel_groom_files = sw.utils.get_relative_paths([groomed_files[i]], project_location)
        subject.set_groomed_filenames(rel_groom_files)
        transform = [rigid_transforms[i].flatten()]
        subject.set_groomed_transforms(transform)
        subjects.append(subject)

    project = sw.Project()
    project.set_subjects(subjects)
    parameters = sw.Parameters()

    # Create a dictionary for all the parameters required by optimization
    parameter_dictionary = {
        "number_of_particles": 512,
        "use_normals": 0,
        "normals_strength": 10.0,
        "checkpointing_interval": 200,
        "keep_checkpoints": 0,
        "iterations_per_split": 1000,
        "optimization_iterations": 1000,
        "starting_regularization": 1000,
        "ending_regularization": 10,
        "relative_weighting": 10,
        "procrustes": 1,
        "initial_relative_weighting": 0.1,
        "procrustes_interval": 1,
        "procrustes_scaling": 1,
        "save_init_splits": 0,
        "verbosity": 0
    }
    # If running a tiny test, reduce some parameters
    print('Generating project file')
    # Add param dictionary to spreadsheet
    for key in parameter_dictionary:
        parameters.set(key, sw.Variant([parameter_dictionary[key]]))
    project.set_parameters("optimize", parameters)
    spreadsheet_file = input_path + "ventricle_option_set.swproj"
    project.save(spreadsheet_file)

    # Run optimization
    print('Running optimization')
    optimize_cmd = ('shapeworks optimize --progress --name ' + spreadsheet_file).split()
    subprocess.check_call(optimize_cmd)




Run_Pipeline()

Meshes are usually not too big. But it might also be loading all of your images at once.

About how many images are you processing and how big are the images?

Unfortunately there are not a lot of resource limiting features in some of the operations.

Hello, I have processed a total of 4,000 ventricular images, each with a size of 200kb. Currently, I have avoided the error of “reference search” by limiting the number of retrieved images. However, this problem has occurred again on the optimize platform. The large number of images during the execution of optimize has caused my server’s CPU to overload. May I ask if there is any way to optimize the process?

    print('Running optimization')
    optimize_cmd = ('shapeworks optimize --progress --name ' + spreadsheet_file).split()
    subprocess.check_call(optimize_cmd)

this is my code

4,000 is quite a lot of images. We are currently working on improving scalability to numbers like these.

Are you running the optimization on distance transforms or on meshes (an option in Studio with images is to convert to mesh after initial processing). Meshes will need much less memory, particularly if the dev/nightly version of ShapeWorks is used.

Hi, recently, I have been runing DeepSSM for shapes analysis, and I split my dataset to resolve this problem, thank you!