This three part series outlines an application of multiprocessing in conjunction with Arcpy. It will describe the general method by which a script or model can be parallelised, which involves splitting a dataset into small parts that may be solved independently, and running these parts concurrently on multiple processors.
There are a few requirements before you can make an Arcpy script/model suitable for use with multiprocessing:
- The most calculation intensive (time consuming) part of the code must be able to be made into a Python ‘module’ and parallelised; this process will be described in the following posts.
- Once it is made into a module there must be no issues with data access – each invocation of the module should either write to a different output database (Arc locks the entrie *.gdb in use, not just the feature class being accessed) or pass data back in a Python structure and write it to an output database only at a later stage.
To explain using multiprocessing with Python I will set out a hypothetical example. The objective of the example is to identify the number and type, and accumulate a weight value, of all the Polygons within a certain distance of some Point features. This is just a simplified example that I thought up for explaining multiprocessing – it probably has no practical use whatsoever. The main purpose of these posts is to describe the process of making a model/script suitable for parallelisation.
The third post in this series will introduce multiprocessing techniques using both Pythons inbuilt Multiprocessing library and the Parallel Python library, as I both of these have some limitations. ESRI also has a blog post describing use of Python’s Multiprocessing library with Arcpy (in particular Spatial Analyst) which did not work for me in a complex situation involving Network Analyst. However, their application is much simpler and may be better in some applications (i.e. when using Spatial Analyst).
In the example the Polygon feature class has
# get variables from Arc # check all inputs are valid # for Polygon types: # make feature layer of Polygons # make feature layer of Points # for rows in Points: # get PointID # select [the] Point row corresponding to PointID (only way I could find to make a row selection) # select by location: Polygons within the search distance # for Polygon rows (within the selection): # store the sums of weighting and count to a Python dictionary by PointID and polygonType # for rows in Points: # for Polygon types: # access data from dictionary by PointID and Type # write value to row