Question: Python element access performance

Question

Python element access performance

Answers 1
Added at 2017-10-04 15:10
Tags
Question

being a long time C++ developer I just started working on algorithms in Python. I'm currently profiling my code to get a feel for how to program efficiently in Python. One thing in particular stood out to me for which I would be very glad to get an expert explanation.

I wrote this wrapper function for a ray-triangle intersection:

def rayIntersectsTriangle( origin , direction , meshData , poly , worldCoordinateVertices ):
    return mathutils.geometry.intersect_ray_tri( worldCoordinateVertices[ meshData.loops[ poly.loop_start ].vertex_index ],
                                                 worldCoordinateVertices[ meshData.loops[ poly.loop_start + 1 ].vertex_index ],
                                                 worldCoordinateVertices[ meshData.loops[ poly.loop_start + 2 ].vertex_index ],
                                                 direction , origin ) != None

When profiling (using cProfile) code that executes this function a lot of times I have the following result:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
15694126   22.314    0.000   25.812    0.000 ****.py:176(rayIntersectsPoly)
[...]
15694126    3.497    0.000    3.497    0.000 {built-in method mathutils.geometry.intersect_ray_tri}

How come this wrapper adds so much overhead? The only thing I can really see going on here is the array element access. Coming from C++ this really confuses me :D

Any help on this would be super appreciated. I want to get my algorithms as fast as possible.

Thanks in advance! Cheers!

Answers to

Python element access performance

nr: #1 dodano: 2017-10-04 15:10

The time looks large in comparison because mathutils.geometry.intersect_ray_tri() is so fast. That method is implemented in an extension and executes at native speed.

The Python time for the method goes to:

  • Creating the new function frame (with only a single expression that time takes a relatively large proportion)
  • Global name lookups (these are done against a mapping, local names use an array).
  • Attribute lookups, like mathutils.geometry and mathutils.geometry.intersect_ray_tri and poly.loop_start
  • Indexing, so worldCoordinateVertices[ ... ].

You could make it a little faster by caching the results of some of these in local names or default parameters:

def rayIntersectsTriangle(
        origin, direction, meshData, poly, worldCoordinateVertices
        _intersect_ray_tri=mathutils.geometry.intersect_ray_tri):
    loop_start = poly.loop_start
    meshData_loops = meshData.loops
    return _intersect_ray_tri(
        worldCoordinateVertices[meshData_loops[loop_start].vertex_index],
        worldCoordinateVertices[meshData_loops[loop_start + 1].vertex_index],
        worldCoordinateVertices[meshData_loops[loop_start + 2].vertex_index],
        direction, origin) is not None

I also used is not None; that's a pointer operation that is recommended for testing for the None singleton.

This brings about 8 attribute lookups down to just 2, and removes the global name lookup for mathutils.

Still, these are micro-optimisations, only do these if they really have an impact (e.g. the method is called a lot in your code). If this really is a bottleneck for you consider using Cython as an easy path towards turning this code into a compiled extension that also can operate at native speeds.

Source Show
◀ Wstecz