Example FR2: Optimizing the performance of Example FR1

This examples is based on Example FR1: Custom file reader loading particle properties and the simulation cell, before continuing please make sure that your are familiar with the code in that example.

In this example we will address the two performance critical issues found in that file reader:

For long trajectories, seeking to a requested frame by reading all lines preceding that frame will become slower and slower for later frames.
For simulations containing many atoms, performing many property assignments such as type_property[i] = ... can become a bottleneck.

Seeking to the requested frame

As a first optimization we save the byte offset to the start of each frame instead of the line number. This allows the file reader to jump directly to the correct position inside the file instead of sequentially stepping through it line by line until the requested frame is found.

This requires two changes. First, the scan method needs to store byte offsets instead of line numbers. The current byte offset (from the start of the file) of the reading head can be obtained using the .tell() method. However, this method is deactivated in a for line in f: ... loop. Therefore, the for loop has to be replaced by a while loop and the readline method needs to be called manually. This leads to the following code:

def scan(self, filename: str, register_frame: Callable[..., None]):
    with open(filename, "r") as f:
        while line := f.readline():
            if line.startswith("Timestep "):
                offset = f.tell() - len(line.encode('utf-8'))
                register_frame(frame_info=offset, label=int(line[9:]))
            yield

Note that the := requires at least Python 3.8 to work. Unfortunately, the .tell() method itself is not too fast. Therefore, it is only call once a line starting a new frame, i.e., starting with the word Timestep has been identified. We need to shift the byte offset reported by the tell method by the (byte) length of the current line, as this line has already been read and therefore has advanced the file pointer. The stored by offset will point at the being of the Timestep line.

Once we have implemented the new scan method we can benchmark it against the one from Example FR1: Custom file reader loading particle properties and the simulation cell. Quick benchmarks show that the new scan method is about 1.5 - 2 times slower than the old one, however the speedup in the parse method will make up for this small difference.

Secondly, the parse method needs to be updated to handle the byte offset instead of a line number. To do so, the

# Seek to the beginning of the requested trajectory frame
starting_line_number = frame_info
for _ in range(starting_line_number):
    f.readline()

loop needs to be replaced by a simple file seek instruction:

# Seek to the beginning of the requested trajectory frame
starting_byte_offset = frame_info
f.seek(starting_byte_offset)

After the initial seek all remaining code can be left unchanged.

To benchmark these code changes we measured the total time to step through a trajectory file containing 30,000 frames of ~20 atoms per frame. Here we can see that the optimized version of the file reader takes about 14 seconds to open all 30,000 frames, while the old, unoptimized version, takes about 32 minutes.

../../../_images/custom_file_reader_benchmark_frames.png

Minimizing OVITO / Python data exchange

In a next step we want to optimize the the assignment of particle properties. Recall that the old version of the file reader developed in Example FR1: Custom file reader loading particle properties and the simulation cell copies both the positions and particle types for each particle individually from Python into OVITO memory. This takes time and can be optimized by collecting all data in Python / numpy and copying it into OVITO in one go.

This approach allows for an optimization of the conversion between reduced and Cartesian coordinates. While the original implementation transformed the coordinates for each particle individually, the usage of numpy.einsum now allow for vectorization and therefore acceleration of this calculation.

Be aware, that this optimization increases the memory requirements of the file reader.

We updated the parse method in the following way:

def parse(self, data: DataCollection, filename: str, frame_info: int, **kwargs: Any):
    ...
    particle_count = int(f.readline().split()[1])
    particles = data.create_particles(count=particle_count)
    type_property = particles.create_property("Particle Type")
    position_property = particles.create_property("Position")
    f.readline()

    # updated code:
    types = numpy.empty(particle_count, dtype=numpy.int32)
    positions = numpy.empty((particle_count, 4))
    positions[:, 3] = 1.0
    for i in range(particle_count):
        tokens = f.readline().split()
        types[i] = type_property.add_type_name(tokens[0], particles).id
        for j in range(3):
            positions[i, j] = float(tokens[j+1])
        yield i/particle_count
    type_property[...] = types
    position_property[...] = numpy.einsum('ij,kj -> ki', cell, positions)
    ...

First, we pre-allocate numpy arrays for both the type and the position property. Here we prepare 4 vector components for the position, as the conversion from reduced to cartesian coordinates requires the matrix multiplication of the simulation cell matrix \(M\) and the point \(p = (x_\mathrm{r}, y_\mathrm{r}, z_\mathrm{r}, 1)\), where subscript \(r\) denotes the reduced coordinates.

Afterwards we use a similar loop over all particles as already seen in Example FR1: Custom file reader loading particle properties and the simulation cell to extract the data from the actual file. Note, how that the data is written in the numpy arrays instead of the particle properties and that the coordinates are kept in reduced form.

Once we have loaded all particle data, the type array is copied into the corresponding OVITO particle property. The particle positions are also copied, however, the numpy.einsum function is used to perform bulk matrix multiplication to convert all particles from reduced to Cartesian coordinates at once.

To benchmark these code changes we measured the total time to load a single frames containing 10 to 10,000,000 atoms. Here we can see that the optimized version of the file reader takes about 37 seconds to open the largest file, while the old, unoptimized version, takes about 142 seconds.

../../../_images/custom_file_reader_benchmark_atoms.png

Final thoughts

In this example we showed two common optimization that can be implemented for a custom file reader. We saw that optimizing the seek behavior is really important for long trajectories, while the optimized data transfer becomes most relevant for trajectories containing a large number of atoms. Each project comes with its own challenges and only repeated measurements and optimizations can help you to build a fast and reliable file reader.