Upcoming Posts page
If you have suggestions or questions relating to the general blog content (rather than specific posts), or requests for new tutorials/posts check out the Upcoming Posts page. I will use comments posted there to help direct new content for the blog.
Arcpy: sending a result email
This was actually surprisingly easy to do. The hardest part is logging in to your SMTP server (well, finding all the specific details you need to know to do so); fortunately it’s pretty straightforward for Gmail. If you are bound to another service, search around on the net to see what you can find out about it.
This code has been written to use a Python module/function (third line: def…), which may be unfamiliar to many arcpy users. A module is a piece of code that is loaded into memory as it is read, which is why you have to put it at the top, and can then be called later. It isn’t necessary for this code, but comes in handy where you are doing the same thing more than once; for example, if there were three slow operations being performed in this code, and you wanted to send an email after each part was complete, having the module would significantly shorten and de-clutter the code (vs. repeating the sending script three times). The main reason I did it here is to make it easier to copy and paste to other scripts.
import smtplib
import arcpy
def sendResultEmail(msgContents, success_TF):
'''sendResultEmail(msgContents, success_TF)
This is the module that does the sending - you should only need to edit the top four active lines'''
to = 'receiver@email.com'
send_username = 'sender@email.com'
send_password = 'password' # warning - will just be an unencrypted string - so be careful not to leave this file lying around!
smtpserver = smtplib.SMTP("smtp.gmail.com",587)
if success_TF:
subject = 'Arcpy testing results: script SUCCESS.'
else:
subject = 'Arcpy testing results: script FAILURE.'
smtpserver.ehlo()
smtpserver.starttls()
smtpserver.ehlo
smtpserver.login(send_username, send_password)
header = 'To:' + to + '\n' + 'From: ' + send_username + '\n' + 'Subject:' + subject + '\n'
msg = header + '\nArcpy results: \n\t' + msgContents + '\n'
smtpserver.sendmail(send_username, to, msg)
arcpy.AddMessage('Results email sent!')
smtpserver.close()
### Here is the geoprocessing code; add (+=) to strResult every time you need to add something to the message (i.e. completion times)
### In a complex script this could be a lot of information, so seperate with '\n' to force new lines in the text...
inFC = arcpy.GetParameterAsText(0)
outLoc = arcpy.GetParameterAsText(1)
outFC = arcpy.GetParameterAsText(2)
outTxt = arcpy.GetParameterAsText(3)
success = False # assume failure (changed to True if script is successful)
strResult = '' # start empty string
try:
arcpy.FeatureClassToFeatureClass_conversion(inFC, outLoc, outFC)
strResult += "Converted successfully"
success = True
except arcpy.ExecuteError:
strResult += arcpy.GetMessages()
arcpy.AddMessage(strResult)
# Now write results to the text file - if it exists it will be overwritten
txtFile = open(outTxt, "w")
txtFile.write(strResult)
txtFile.close()
# Now send the email. This line calls the module above, it requires the string and the T/F success variable...
sendResultEmail(strResult, success)
Tutorial: Arcpy basics
The following tutorial develops some basics for using the Arcpy library with Python.
1. Every script using Arc functions must start with import arcpy
'''script.py ''' import arcpy
2. Get user inputs with ‘GetParameterAsText’
'''script.py ''' import arcpy # get input feature class inputFeatureClass = arcpy.GetParameterAsText(0)
This gets the first user entered parameter as text. If you haven’t specified any parameters when adding to Arc (see step 3) it will cause the script to fail.
3. Using the tool in Arc
Right click on a toolbox you can edit (if you haven’t already, you will have to create one in “My Toolboxes” or within a folder), go down to ‘Add’ and select ‘Script…’, give it a name and label, select the script file, then add the parameters. For this example we are getting a Feature Class, so call it InputFeatureClass (or something meaningful) and set the Data Type to Feature Class. Click Finish. Double click on your new script tool to use it. At the moment it does nothing, but if you give it a feature class and click OK, it should run happily and say it was successful.
From now on we can just edit the script where it is and the changes will carry through, you won’t need to import it again.
It is important to note that paths to feature classes are just strings, e.g. ‘C:\\GISData\\NYC\\2007\\Networks.gdb\\BusRoutes’.
4. Adding messages to the display with ‘AddMessage’
'''script.py
'''
import arcpy
# get input feature class
inputFeatureClass = arcpy.GetParameterAsText(0)
# inform user of selected feature class path
arcpy.AddMessage(' Input Feature Class: '+inputFeatureClass)
Save the script and run it again, and it will now print the feature class path before finishing happily.
5. Performing a geoprocessing operation
Let’s do a buffer on the input features, and also get the user to input the buffer size for the operation; you will need to right click on the tool in Arc, go down to Properties, and in the parameters tab add the parameter BufferSize_m as a Long (integer).
Buffer also requires an output feature class, so you will have to add that to the parameters as well; its data type should be feature class, but you will need to set the Direction (under Parameter Properties) to Output.
The script now becomes (with the message made a little more meaningful):
'''script.py
'''
import arcpy
# get input feature class
inputFeatureClass = arcpy.GetParameterAsText(0)
# get buffer size
buffer = arcpy.GetParameterAsText(1)
# get output feature class
outputFeatureClass = arcpy.GetParameterAsText(2)
# inform user of selected feature class path
arcpy.AddMessage(' Buffering input Feature Class, '+inputFeatureClass+' by '+buffer+' Meters and writing to: '+outputFeatureClass)
buffer_corrected = buffer + ' Meters'
arcpy.Buffer_analysis(inputFeatureClass, outputFeatureClass, buffer_corrected)
This should help you get started… Other tips:
- do some Pyhton tutorials, and
- constantly refer to the documentation (either in Arc, or on the ESRI website)
Using Arcpy with Multiprocessing – Part 3
Posted by StacyR in ArcGIS, Arcpy, Multiprocessing, Python on July 11, 2011
The following post builds upon the script and methods developed in Part 1 and Part 2. The Arcpy script refers to two GDBs which I have zipped into an archive; it can be downloaded from the right hand side of this page.
Now that the most calculation intensive part of the process has been made into a module, we can actually implement the Multiprocessing or Parallel Python libraries. Both use a Python list to which jobs are appended, and from which the result is then extracted. Each job is processed in turn by the first available worker (CPU/core). So, if there are six jobs in the list on a system with two CUPs, the first two jobs will start near simultaneously, then as each CPU finishes its task and becomes free it will start working on the next job in the list. Both methods can use all available CPUs on the system, or they can be limited to a predefined number (see below), or they may be limited by the method itself – in this example, as there are only 6 Polygon types, only 6 CPUs would be used.
'''
Step 4 of 4: Example of using Multiprocessing/Parallel Python with Arcpy...
Can be run either:
1. from the command line/a Python IDE (adjust paths to feature classes, as necessary)
2. as a Script tool within ArcGIS (ensure 'Run Ptyhon script in Process' is NOT checked when importing)
The Parallel Python library must be installed before it can be used.
'''
import arcpy
import multiprocessing
import time
try:
import pp
forceMP = False
except ImportError:
forceMP = True
def performCalculation(points_fC, polygons_fC, searchDist, typeList, calcPlatform_input=None):
'''performCalculation(pointsFeatureClass, polygonsFeatureClass, searchDistance, typeList, calcPlatform)
All inputs are specific.
calcPlatform is either 'mp', to use the inbuilt Multiprocessing library, or 'pp', to use Parallel Python (if it exists, otherwise defaults to mp)
'''
# ---------------------------------------------------------------------------
## Pre-calculation checks
# ---------------------------------------------------------------------------
# Check calculation platform is valid, or resort to default...
defaultCalcTpye = 'mp'
calcTypeExplainDict = {'pp':'Parallel Python', 'mp':'Multiprocessing'}
if forceMP: # unable to import Parallel Python library (it has to be installed seperately)
arcpy.AddMessage(" WARNING: Cannot load Parallel Python library, forcing Multiprocessing library...")
calcPlatform = 'mp'
elif (calcPlatform_input not in ['mp', 'pp']) or (calcPlatform_input == None): # Invalid/no input, resort to default
arcpy.AddMessage(" WARNING: Input calculation platform '%s' invalid; should be either 'mp' or 'pp'. Defaulting to %s..." % (calcPlatform_input, calcTypeExplainDict[defaultCalcTpye]))
calcPlatform = defaultCalcTpye
else:
calcPlatform = calcPlatform_input
# ---------------------------------------------------------------------------
## Data extraction (parallel execution)
# ---------------------------------------------------------------------------
searchDist = int(searchDist) # convert search distance to integer...
# check all datasets are OK; if not, provide some useful output and terminate
valid_points = arcpy.Exists(points_fC)
arcpy.AddMessage("Points Feature Class: "+points_fC)
valid_polygons = arcpy.Exists(polygons_fC)
arcpy.AddMessage("Polygons Feature Class: "+polygons_fC)
dataCheck = valid_points & valid_polygons
if not dataCheck:
if not valid_points:
arcpy.AddError("Points database or feature class, %s, is invalid..." % (points_fC))
if not valid_polygons:
arcpy.AddError("Polygons database or feature class, %s, is invalid..." % (polygons_fC))
else: # Inputs are OK, start calculation...
for type in typeList: # add fields to Points file
arcpy.AddField_management(points_fC, "polygon_type%s_Sum" % type, "DOUBLE")
arcpy.CalculateField_management(points_fC, "polygon_type%s_Sum" % type, 0, "PYTHON")
arcpy.AddField_management(points_fC, "polygon_type%s_Count" % type, "DOUBLE")
arcpy.CalculateField_management(points_fC, "polygon_type%s_Count" % type, 0, "PYTHON")
arcpy.AddMessage(" Added polygon_type%s_Sum and polygon_type%s_Count fields to Points." % (type,type))
pointsDataDict = {} # dictionary: pointsDataDict[pointID][Type]=[sum of Polygon type weightings, count of Ploygons of type]
jobs = [] # jobs are added to the list, then processed in parallel by the available workers (CPUs)
if calcPlatform == 'mp':
arcpy.AddMessage(" Utilising Python Multiprocessing library:")
pool = multiprocessing.Pool() # initate the processing pool - use all available resources
# pool = multiprocessing.Pool(2) # Example: limit the processing pool to 2 CPUs...
for type in typeList: # For this specific example
arcpy.AddMessage(" Passing %s to processing pool..." % type)
jobs.append(pool.apply_async(findPolygons, (points_fC, polygons_fC, type, searchDist))) # send jobs to be processed
for job in jobs: # collect results from the job server (waits until all the processing is complete)
if len(pointsDataDict.keys()) == 0: # first output becomes the new dictionary
pointsDataDict.update(job.get())
else:
for key in job.get(): # later outputs should be added per key...
pointsDataDict[key].update(job.get()[key])
del jobs
elif calcPlatform == 'pp':
ppservers=()
job_server = pp.Server(ppservers=ppservers) # initate the job server - use all available resources
# job_server = pp.Server(2, ppservers=ppservers) # Example: limit the processing pool to 2 CPUs...
arcpy.AddMessage(" Utilising Parallel Python library:")
for type in typeList: # For this specific example, it would only initate three processes anyway...
arcpy.AddMessage(" Passing %s to processing pool..." % type)
jobs.append(job_server.submit(findPolygons, (points_fC, polygons_fC, type, searchDist), (), ("arcpy",))) # send jobs to be processed
for job in jobs: # collect results from the job server (waits until all the processing is complete)
if len(pointsDataDict.keys()) == 0: # first output becomes the new dictioanry
pointsDataDict.update(job())
else:
for key in job():
pointsDataDict[key].update(job()[key]) # later outputs should be added per key...
del jobs
# ---------------------------------------------------------------------------
## Writing data back to points file
# ---------------------------------------------------------------------------
arcpy.AddMessage(" Values extracted; writing results to Points...")
pointsRowList = arcpy.UpdateCursor(points_fC)
for pointsRow in pointsRowList: # write the values for every point
pointID = pointsRow.getValue("PointID")
for type in typeList:
pointsRow.setValue("polygon_type%s_Sum" % type, pointsRow.getValue("polygon_type%s_Sum" % type) + pointsDataDict[pointID][type][0])
pointsRow.setValue("polygon_type%s_Count" % type, pointsRow.getValue("polygon_type%s_Count" % type) + pointsDataDict[pointID][type][1])
pointsRowList.updateRow(pointsRow)
del pointsRow
del pointsRowList
# just make sure any locks are cleared...
del calcPlatform, calcPlatform_input, calcTypeExplainDict, dataCheck, defaultCalcTpye, job, key, pointID, pointsDataDict, points_fC, polygons_fC, type, valid_points, valid_polygons, searchDist, typeList
def findPolygons(points_FC, polygons_FC, Type, search_dist):
funcTempDict = {}
arcpy.MakeFeatureLayer_management(polygons_FC, '%s_%s' % (polygons_FC, Type))
arcpy.MakeFeatureLayer_management(points_FC, '%s_%s' % (points_FC, Type))
pointsRowList = arcpy.SearchCursor('%s_%s' % (points_FC, Type))
for pointRow in pointsRowList: # for every origin
pointID = pointRow.getValue("PointID")
try:
funcTempDict[pointID][Type] = [0,0]
except KeyError: # first time within row
funcTempDict[pointID] = {}
funcTempDict[pointID][Type] = [0,0]
arcpy.SelectLayerByAttribute_management('%s_%s' % (points_FC, Type), 'NEW_SELECTION', '"PointID" = \'%s\'' % pointID)
arcpy.SelectLayerByLocation_management('%s_%s' % (polygons_FC, Type), 'INTERSECT', '%s_%s' % (points_FC, Type), search_dist, 'NEW_SELECTION')
arcpy.SelectLayerByAttribute_management('%s_%s' % (polygons_FC, Type), 'SUBSET_SELECTION', '"polyType" = %s' % Type)
polygonsRowList = arcpy.SearchCursor('%s_%s' % (polygons_FC, Type))
for polygonsRow in polygonsRowList:
funcTempDict[pointID][Type][0] += polygonsRow.getValue("polyWeighting")
funcTempDict[pointID][Type][1] += 1
return funcTempDict
if __name__ == '__main__':
# Read the parameter values:
# 1: points feature class (string to location, e.g. c:\GIS\Data\points.gdb\points01)
# 2: polygons feature class (string to location)
# 3: search distance for polygons, integer, e.g 500
# 4: calculation type ('mp' to use Multiprocessing library, 'pp' to use Parallel Python library (if available, otherwise defaults to mp))
pointsFC = arcpy.GetParameterAsText(0) # required
polygonsFC = arcpy.GetParameterAsText(1) # required
search_Distance = arcpy.GetParameterAsText(2) # required
calcType = arcpy.GetParameterAsText(3) # optional (will default to MP)
t_start = time.clock()
type_list = [1,2,3,4,5,6] # polygon types to search for...
# run calculation
if '' in [pointsFC, polygonsFC, search_Distance]:# if not running from Arc, the input parameters all come out as ''
pointsFC = "c:\\Work\\GIS\\Data\\Points.gdb\\points01"
polygonsFC = "c:\\Work\\GIS\\Data\\Polygons.gdb\\polygons01"
search_Distance = 1000
performCalculation(pointsFC, polygonsFC, search_Distance, type_list)
else:
performCalculation(pointsFC, polygonsFC, search_Distance, type_list, calcType)
arcpy.AddMessage(" ... complete in %s seconds." % (time.clock() - t_start))
When importing, ‘Run Python script in Process’ must be unchecked. If you forget this, the script will try to open new instances of ArcMap/ArcCatalog and fail.
Execution times:
On my computer, the scripts posted in Part 2 all take about 300 seconds to execute if run from the command line, and about 400 seconds if run as a tool within ArcMap or ArcCatalog. The parallelised script above takes about 170 seconds to execute if run from the command line, and 340 seconds if run as a tool within ArcMap or ArcCatalog. It seems that the first set of processes through the queue (i.e. two if two CPUs are being used, four if four CPUs are being used) are delayed for some reason, but latter processes run a lot faster.
Feel free to ask questions or point out improvements with regard to any of these posts. The following is a list of tips/issues I have noticed with regards to arcpy scripting, and Multiprocessing in particular.
Notes on using Multiprocessing:
- Creating and debugging complex Arcpy scripts is a real pain in the ass, and is even worse when you are using Multiprocesssing. I wouldn’t recommend doing it unless it is really necessary and you will be able to use the code extensively in the future. Errors are a little harder to debug, as error message output isn’t produced until the job result is being accessed.
- Multiprocessing vs. Parallel Python:There doesn’t seem to be much speed difference between the two (because they are both limited by the speed of the Arc functions). Parallel Python, however:
- requires an extra package to be installed (Multiprocessing is included with Python)
- will fail if you use arcpy.addMessage() or arcpy.addError() within the parallelised function, and
- has issues if you are using tools that must be ‘Checked Out’ before execution (such as Network Analyst).
- arcpy.Exists() is essential– it should be used to check every input dataset. Not just because it will catch input errors, but without it, Arc will regularly fail to do basic operations for no apparent reason. It may be that checking for existence forces a file refresh…?
- Try…Except- I have found that using these statements, as ESRI seem to like doing in the Arcpy help/tutorials, will often cause the script to hang indefinitely when running from Arc. The best thing to do is test rigorously in Python (passing the paths as strings), where you get useful outputs, then try the script out in Arc. If it ran fine directly, but has problems at this stage, make sure to check that the paths coming out of Arc are what you were actually expecting.
- Cancelling the script running in Arc does not kill all the Python processes – unlike hitting Ctrl-C from the command line. Use task manager to end them.
Installing GDAL (and OGR) for Python on Windows
Overview:
1. Get the GDAL core files
2. Get the Python bindings
3. Install GDAL
4. Edit Environment Variables
5. Install Python bindings
After years of having different GDAL versions on my Windows and Linux installations, I have finally found out how to install recent versions of GDAL (and OGR) on Windows; and it’s surprisingly easy… The latest version offers some significant speed increases for some operations over version 1.6. Note that you will need administrator rights on your computer.
1. Get the GDAL core files
This scary looking site: http://www.gisinternals.com/sdk/ is where you download GDAL. Scroll down to the bottom of the top-most table and click on the relevant link in the ‘Downloads’ column – either 32 bit (second to last row) or 64 bit (last row).
From the new page download the GDAL Core file: gdal-[version]-[build]-core.msi (e.g. gdal-18-1600-core.msi)
2. Get the Python Bindings
Fortunately the Python bindings are located right here as well, called GDAL-[gdalVersion.system]-py[pythonVersion].[exe/msi] (e.g. GDAL-1.8.0.win32-py2.6.exe), just make sure to download the version matching your installed version of Python.
3. Install GDAL
Simply run the GDAL Core installer downloaded in step 1, and note the install path if you change it.
4. Edit Environmental Variables
WARNING: Don’t make any mistakes here (like deleting things), or your system might not work…
To bring up the Environment Variables dialogue in Windows XP:
- Right click on My Computer, select Properties, go to the Advanced tab, then click Environment Variables
On Windows 7 (and, I presume, Vista):
- Right click on Computer, click on Advanced system settings on the left, then click Environment Variables
In the ‘System variables’ pane, scroll the box down and select the ‘Path’ variable, then go to Edit. Place the cursor in the box, and hit End to get the cursor at the end, add a semi-colon (;) and then enter the path to your GDAL installation – so, for my installation, I added this:
;C:\Program Files (x86)\GDAL
Just be careful not to insert any additional spaces before or after the path (see comments)! Click Ok, then click New and enter the following:
Variable name: GDAL_DATA
Variable value is the path to the GDAL data directory – so for me:
C:\Program Files (x86)\GDAL\gdal-data
To test the installation, open up your command prompt (type cmd in the Run dialogue), type ogr2ogr and hit enter. This is one of the programs included with GDAL; if your Path is set correctly it will be accessible from anywhere. If some stuff about usage and options is printed out, GDAL is installed correctly and the Environment variables worked. If it instead says: ‘ogr2ogr’ is not recognized as an internal or external command, operable program or batch file., there is possibly a problem. First off try restarting the computer, if ogr2ogr still doesn’t work check your Environment variables for spelling mistakes or typos.
5. Install Python bindings
Run the Python bindings installer downloaded in step 2.
To test the entire setup, open a Python prompt and enter:
from osgeo import ogr
Hit enter, then:
from osgeo import gdal
If both of these run without printing out anything, you are good to go!
Using Arcpy with Multiprocessing – Part 2
Posted by StacyR in ArcGIS, Arcpy, Multiprocessing, Python on July 6, 2011
The following Arcpy scripts refer to two GDBs which I have zipped into an archive; it can be downloaded from the right hand side of the page. If using these scripts from the command line or a Python IDE remember to change the paths to the Feature Classes!
Arcpy script of the pseudo-code from Part 1:
'''
Step 1 of 4: Example of using Multiprocessing/Parallel Python with Arcpy...
Can be run either:
1. from the command line/a Python IDE (adjust paths to feature classes, as necessary)
2. as a Script tool within ArcGIS (ensure 'Run Ptyhon script in Process' is checked when importing)
'''
import arcpy
import time
# Read the parameter values:
# 1: points feature class (string to location, e.g. c:\GIS\Data\points.gdb\points01)
# 2: polygons feature class (string to location)
# 3: search distance for polygons, integer, e.g 500
points_fC = arcpy.GetParameterAsText(0) # required
polygons_fC = arcpy.GetParameterAsText(1) # required
searchDist = arcpy.GetParameterAsText(2) # required
t_start = time.clock()
typeList = [1,2,3,4,5,6] # polygon types to search for...
# ---------------------------------------------------------------------------
## Data extraction
# ---------------------------------------------------------------------------
# run calculation
# if not running from Arc, the input parameters all come out as ''
if '' in [points_fC, polygons_fC, searchDist]:
arcpy.AddMessage("Resorting to defaults...")
points_fC = "c:\\Work\\GIS\\Data\\Points.gdb\\points01"
polygons_fC = "c:\\Work\\GIS\\Data\\Polygons.gdb\\polygons01"
searchDist = 1000
searchDist = int(searchDist) # convert search distance to integer...
# check all datasets are OK; if not, provide some useful output and terminate
valid_points = arcpy.Exists(points_fC)
arcpy.AddMessage("Points Feature Class: "+points_fC)
valid_polygons = arcpy.Exists(polygons_fC)
arcpy.AddMessage("Polygons Feature Class: "+polygons_fC)
dataCheck = valid_points & valid_polygons
arcpy.AddMessage("Search Distance: "+str(searchDist)+' '+str(type(searchDist)))
if not dataCheck:
if not valid_points:
arcpy.AddError("Points database or feature class, %s, is invalid..." % (points_fC))
if not valid_polygons:
arcpy.AddError("Polygons database or feature class, %s, is invalid..." % (polygons_fC))
else: # Inputs are OK, start calculation...
for polyType in typeList: # add fields to Points file
arcpy.AddField_management(points_fC, "polygon_type%s_Sum" % polyType, "DOUBLE")
arcpy.CalculateField_management(points_fC, "polygon_type%s_Sum" % polyType, 0, "PYTHON")
arcpy.AddField_management(points_fC, "polygon_type%s_Count" % polyType, "DOUBLE")
arcpy.CalculateField_management(points_fC, "polygon_type%s_Count" % polyType, 0, "PYTHON")
arcpy.AddMessage(" Added polygon_type%s_Sum and polygon_type%s_Count fields to Points." % (polyType,polyType))
pointsDataDict = {} # dictionary: pointsDataDict[pointID][Type]=[sum of Polygon type weightings, count of Ploygons of type]
for polyType in typeList: # For this specific example
arcpy.AddMessage(" Calculating for type %s..." % polyType)
arcpy.MakeFeatureLayer_management(polygons_fC, '%s_%s' % (polygons_fC, polyType))
arcpy.MakeFeatureLayer_management(points_fC, '%s_%s' % (points_fC, polyType))
pointsRowList = arcpy.SearchCursor('%s_%s' % (points_fC, polyType))
for pointRow in pointsRowList: # for every origin
pointID = pointRow.getValue("PointID")
try:
pointsDataDict[pointID][polyType] = [0,0]
except KeyError: # first time within row
pointsDataDict[pointID] = {}
pointsDataDict[pointID][polyType] = [0,0]
# need to iterate Point here...
arcpy.SelectLayerByAttribute_management('%s_%s' % (points_fC, polyType), 'NEW_SELECTION', '"PointID" = \'%s\'' % pointID)
arcpy.SelectLayerByLocation_management('%s_%s' % (polygons_fC, polyType), 'INTERSECT', '%s_%s' % (points_fC, polyType), searchDist, 'NEW_SELECTION')
arcpy.SelectLayerByAttribute_management('%s_%s' % (polygons_fC, polyType), 'SUBSET_SELECTION', '"polyType" = %s' % polyType)
polygonsRowList = arcpy.SearchCursor('%s_%s' % (polygons_fC, polyType))
for polygonsRow in polygonsRowList:
pointsDataDict[pointID][polyType][0] += polygonsRow.getValue("polyWeighting")
pointsDataDict[pointID][polyType][1] += 1
# ---------------------------------------------------------------------------
## Writing data back to points file
# ---------------------------------------------------------------------------
arcpy.AddMessage(" Values extracted; writing results to Points...")
pointsRowList = arcpy.UpdateCursor(points_fC)
for pointsRow in pointsRowList: # write the values for every point
pointID = pointsRow.getValue("PointID")
for polyType in typeList:
pointsRow.setValue("polygon_type%s_Sum" % polyType, pointsRow.getValue("polygon_type%s_Sum" % polyType) + pointsDataDict[pointID][polyType][0])
pointsRow.setValue("polygon_type%s_Count" % polyType, pointsRow.getValue("polygon_type%s_Count" % polyType) + pointsDataDict[pointID][polyType][1])
pointsRowList.updateRow(pointsRow)
del pointsRow
del pointsRowList
# just make sure any locks are cleared...
del dataCheck, pointID, pointsDataDict, points_fC, polygons_fC, polyType, valid_points, valid_polygons, searchDist, typeList
arcpy.AddMessage(" ... complete in %s seconds." % (time.clock() - t_start))
This should look familiar to anyone who has used Arcpy before. I have found that using the arcpy.Exists() function on every input dataset seems to prevent random unexplained failures when running the script within Arc (the kind where running the script repeatedly with identical inputs will mostly work fine, but fail occasionally for no apparent reason).
To be suitable for Multiprocessing, the script must be able to be made modular – that is where the calculation is entirely run within modules/functions. If you don’t know what this is, look around on the internet for some more advanced Python tutorials – here is one that has a bit of information on functions: http://www.tutorialspoint.com/python/python_functions.htm.
Being modularised, the script:
# someScript.py
n = 10
x = 0
for i in xrange(n):
x += i
print x
Would become:
# someScript.py
def sumItem(N):
‘’’sumItem(N)
Python module that returns the sum of numbers, up to the specified input integer.
‘’’
sum = 0
for i in xrange(N):
sum += i
return sum
n = 10
print sumItem(n)
Or, alternatively:
# someScript.py
def sumItem(N):
‘’’sumItem(N)
Python module that returns the sum of numbers, up to the specified input integer.
‘’’
sum = 0
for i in xrange(N):
sum += i
return sum
if __name__ == '__main__':
n = 10
print sumItem(n)
The advantage of the latter method, aside from the ability to run modules in parallel, is that they can be imported by other scripts (if they are in the same folder, or the system path), i.e.:
import someScript n = 10 print someScript.sumItem(n)
The contents of the if __name__ == ’__main__’: statement are only executed if the script is run as a script (i.e. it is the ‘main’ process) – if it is imported they won’t be. If an import is in turn required by the imported function, it should be placed at the top of the script out of the if __name__ == ’__main__’: statement… The Arcpy script from earlier can be modularised, giving:
'''
Step 2 of 4: Example of using Multiprocessing/Parallel Python with Arcpy...
Can be run either:
1. from the command line/a Python IDE (adjust paths to feature classes, as necessary)
2. as a Script tool within ArcGIS (ensure 'Run Ptyhon script in Process' is checked when importing)
'''
import arcpy
import time
def performCalculation(points_fC, polygons_fC, searchDist, typeList):
'''performCalculation(pointsFeatureClass, polygonsFeatureClass, searchDistance, typeList, calcPlatform)
Runs calculation
'''
# ---------------------------------------------------------------------------
## Data extraction
# ---------------------------------------------------------------------------
searchDist = int(searchDist) # convert search distance to integer...
# check all datasets are OK; if not, provide some useful output and terminate
valid_points = arcpy.Exists(points_fC)
arcpy.AddMessage("Points Feature Class: "+points_fC)
valid_polygons = arcpy.Exists(polygons_fC)
arcpy.AddMessage("Polygons Feature Class: "+polygons_fC)
dataCheck = valid_points & valid_polygons
arcpy.AddMessage("Search Distance: "+str(searchDist)+' '+str(type(searchDist)))
if not dataCheck:
if not valid_points:
arcpy.AddError("Points database or feature class, %s, is invalid..." % (points_fC))
if not valid_polygons:
arcpy.AddError("Polygons database or feature class, %s, is invalid..." % (polygons_fC))
else: # Inputs are OK, start calculation...
for polyType in typeList: # add fields to Points file
arcpy.AddField_management(points_fC, "polygon_type%s_Sum" % polyType, "DOUBLE")
arcpy.CalculateField_management(points_fC, "polygon_type%s_Sum" % polyType, 0, "PYTHON")
arcpy.AddField_management(points_fC, "polygon_type%s_Count" % polyType, "DOUBLE")
arcpy.CalculateField_management(points_fC, "polygon_type%s_Count" % polyType, 0, "PYTHON")
arcpy.AddMessage(" Added polygon_type%s_Sum and polygon_type%s_Count fields to Points." % (polyType,polyType))
pointsDataDict = {} # dictionary: pointsDataDict[pointID][Type]=[sum of Polygon type weightings, count of Ploygons of type]
for polyType in typeList: # For this specific example
arcpy.AddMessage(" Calculating for type %s..." % polyType)
######################################################################
## START: Calculation intensive part of the script
######################################################################
arcpy.MakeFeatureLayer_management(polygons_fC, '%s_%s' % (polygons_fC, polyType))
arcpy.MakeFeatureLayer_management(points_fC, '%s_%s' % (points_fC, polyType))
pointsRowList = arcpy.SearchCursor('%s_%s' % (points_fC, polyType))
for pointRow in pointsRowList: # for every origin
pointID = pointRow.getValue("PointID")
try:
pointsDataDict[pointID][polyType] = [0,0]
except KeyError: # first time within row
pointsDataDict[pointID] = {}
pointsDataDict[pointID][polyType] = [0,0]
# need to iterate Point here...
arcpy.SelectLayerByAttribute_management('%s_%s' % (points_fC, polyType), 'NEW_SELECTION', '"PointID" = \'%s\'' % pointID)
arcpy.SelectLayerByLocation_management('%s_%s' % (polygons_fC, polyType), 'INTERSECT', '%s_%s' % (points_fC, polyType), searchDist, 'NEW_SELECTION')
arcpy.SelectLayerByAttribute_management('%s_%s' % (polygons_fC, polyType), 'SUBSET_SELECTION', '"polyType" = %s' % polyType)
polygonsRowList = arcpy.SearchCursor('%s_%s' % (polygons_fC, polyType))
for polygonsRow in polygonsRowList:
pointsDataDict[pointID][polyType][0] += polygonsRow.getValue("polyWeighting")
pointsDataDict[pointID][polyType][1] += 1
######################################################################
## END: Calculation intensive part of the script
######################################################################
# ---------------------------------------------------------------------------
## Writing data back to points file
# ---------------------------------------------------------------------------
arcpy.AddMessage(" Values extracted; writing results to Points...")
pointsRowList = arcpy.UpdateCursor(points_fC)
for pointsRow in pointsRowList: # write the values for every point
pointID = pointsRow.getValue("PointID")
for polyType in typeList:
pointsRow.setValue("polygon_type%s_Sum" % polyType, pointsRow.getValue("polygon_type%s_Sum" % polyType) + pointsDataDict[pointID][polyType][0])
pointsRow.setValue("polygon_type%s_Count" % polyType, pointsRow.getValue("polygon_type%s_Count" % polyType) + pointsDataDict[pointID][polyType][1])
pointsRowList.updateRow(pointsRow)
del pointsRow
del pointsRowList
# just make sure any locks are cleared...
del dataCheck, pointID, pointsDataDict, points_fC, polygons_fC, polyType, valid_points, valid_polygons, searchDist, typeList
if __name__ == '__main__':
# Read the parameter values:
# 1: points feature class (string to location, e.g. c:\GIS\Data\points.gdb\points01)
# 2: polygons feature class (string to location)
# 3: search distance for polygons, integer, e.g 500
pointsFC = arcpy.GetParameterAsText(0) # required
polygonsFC = arcpy.GetParameterAsText(1) # required
search_Distance = arcpy.GetParameterAsText(2) # required
t_start = time.clock()
type_list = [1,2,3,4,5,6] # polygon types to search for...
# run calculation
# if not running from Arc, the input parameters all come out as ''
if '' in [pointsFC, polygonsFC, search_Distance]:
arcpy.AddMessage("Resorting to defaults...")
pointsFC = "c:\\Work\\GIS\\Data\\Points.gdb\\points01"
polygonsFC = "c:\\Work\\GIS\\Data\\Polygons.gdb\\polygons01"
search_Distance = 1000
performCalculation(pointsFC, polygonsFC, search_Distance, type_list)
else:
performCalculation(pointsFC, polygonsFC, search_Distance, type_list)
arcpy.AddMessage(" ... complete in %s seconds." % (time.clock() - t_start))
This in itself doesn’t add much to the process, but the body of the calculation is kept separate from the part of the code that loads the variables from Arc, which means it could be imported by another script – with the variables passed in as the main function, performCalculation(), is called. The most calculation intensive part of the process can likewise be converted into a function, but it is important to be wary of the inputs and outputs of the process. Having multiple processes accessing the same *.GDB at the same time will cause the script to fail. Because of this, models that work in Model Builder or Arcpy cannot necessarily be parallelised; or a reworking of the method/outputs may be necessary…
After creating a parallelisable sub module, we have:
'''
Step 3 of 4: Example of using Multiprocessing/Parallel Python with Arcpy...
Can be run either:
1. from the command line/a Python IDE (adjust paths to feature classes, as necessary)
2. as a Script tool within ArcGIS (ensure 'Run Ptyhon script in Process' is checked when importing)
'''
import arcpy
import time
def performCalculation(points_fC, polygons_fC, searchDist, typeList):
'''performCalculation(pointsFeatureClass, polygonsFeatureClass, searchDistance, typeList, calcPlatform)
Runs calculation
'''
# ---------------------------------------------------------------------------
## Data extraction
# ---------------------------------------------------------------------------
searchDist = int(searchDist) # convert search distance to integer...
# check all datasets are OK; if not, provide some useful output and terminate
valid_points = arcpy.Exists(points_fC)
arcpy.AddMessage("Points Feature Class: "+points_fC)
valid_polygons = arcpy.Exists(polygons_fC)
arcpy.AddMessage("Polygons Feature Class: "+polygons_fC)
dataCheck = valid_points & valid_polygons
arcpy.AddMessage("Search Distance: "+str(searchDist)+' '+str(type(searchDist)))
if not dataCheck:
if not valid_points:
arcpy.AddError("Points database or feature class, %s, is invalid..." % (points_fC))
if not valid_polygons:
arcpy.AddError("Polygons database or feature class, %s, is invalid..." % (polygons_fC))
else: # Inputs are OK, start calculation...
for polyType in typeList: # add fields to Points file
arcpy.AddField_management(points_fC, "polygon_type%s_Sum" % polyType, "DOUBLE")
arcpy.CalculateField_management(points_fC, "polygon_type%s_Sum" % polyType, 0, "PYTHON")
arcpy.AddField_management(points_fC, "polygon_type%s_Count" % polyType, "DOUBLE")
arcpy.CalculateField_management(points_fC, "polygon_type%s_Count" % polyType, 0, "PYTHON")
arcpy.AddMessage(" Added polygon_type%s_Sum and polygon_type%s_Count fields to Points." % (polyType,polyType))
pointsDataDict = {} # dictionary: pointsDataDict[pointID][Type]=[sum of Polygon type weightings, count of Ploygons of type]
for polyType in typeList: # For this specific example
arcpy.AddMessage(" Calculating for type %s..." % polyType)
funcOutput = findPolygons(points_fC, polygons_fC, polyType, searchDist) # send jobs to be processed
if len(pointsDataDict.keys()) == 0: # first output becomes the new dictionary
pointsDataDict.update(funcOutput)
else:
for key in funcOutput: # later outputs should be added per key...
pointsDataDict[key].update(funcOutput[key])
del funcOutput
# ---------------------------------------------------------------------------
## Writing data back to points file
# ---------------------------------------------------------------------------
arcpy.AddMessage(" Values extracted; writing results to Points...")
pointsRowList = arcpy.UpdateCursor(points_fC)
for pointsRow in pointsRowList: # write the values for every point
pointID = pointsRow.getValue("PointID")
for polyType in typeList:
pointsRow.setValue("polygon_type%s_Sum" % polyType, pointsRow.getValue("polygon_type%s_Sum" % polyType) + pointsDataDict[pointID][polyType][0])
pointsRow.setValue("polygon_type%s_Count" % polyType, pointsRow.getValue("polygon_type%s_Count" % polyType) + pointsDataDict[pointID][polyType][1])
pointsRowList.updateRow(pointsRow)
del pointsRow
del pointsRowList
# just make sure any locks are cleared...
del dataCheck, key, pointID, pointsDataDict, points_fC, polygons_fC, polyType, valid_points, valid_polygons, searchDist, typeList
############################################################
## New Module - the most calculation intensive part
############################################################
def findPolygons(points_FC, polygons_FC, Type, search_dist):
funcTempDict = {}
arcpy.MakeFeatureLayer_management(polygons_FC, '%s_%s' % (polygons_FC, Type))
arcpy.MakeFeatureLayer_management(points_FC, '%s_%s' % (points_FC, Type))
pointsRowList = arcpy.SearchCursor('%s_%s' % (points_FC, Type))
for pointRow in pointsRowList: # for every origin
pointID = pointRow.getValue("PointID")
try:
funcTempDict[pointID][Type] = [0,0]
except KeyError: # first time within row
funcTempDict[pointID] = {}
funcTempDict[pointID][Type] = [0,0]
# need to iterate Point here...
arcpy.SelectLayerByAttribute_management('%s_%s' % (points_FC, Type), 'NEW_SELECTION', '"PointID" = \'%s\'' % pointID)
arcpy.SelectLayerByLocation_management('%s_%s' % (polygons_FC, Type), 'INTERSECT', '%s_%s' % (points_FC, Type), search_dist, 'NEW_SELECTION')
arcpy.SelectLayerByAttribute_management('%s_%s' % (polygons_FC, Type), 'SUBSET_SELECTION', '"polyType" = %s' % Type)
polygonsRowList = arcpy.SearchCursor('%s_%s' % (polygons_FC, Type))
for polygonsRow in polygonsRowList:
funcTempDict[pointID][Type][0] += polygonsRow.getValue("polyWeighting")
funcTempDict[pointID][Type][1] += 1
return funcTempDict
if __name__ == '__main__':
# Read the parameter values:
# 1: points feature class (string to location, e.g. c:\GIS\Data\points.gdb\points01)
# 2: polygons feature class (string to location)
# 3: search distance for polygons, integer, e.g 500
pointsFC = arcpy.GetParameterAsText(0) # required
polygonsFC = arcpy.GetParameterAsText(1) # required
search_Distance = arcpy.GetParameterAsText(2) # required
t_start = time.clock()
type_list = [1,2,3,4,5,6] # polygon types to search for...
# run calculation
# if not running from Arc, the input parameters all come out as ''
if '' in [pointsFC, polygonsFC, search_Distance]:
arcpy.AddMessage("Resorting to defaults...")
pointsFC = "c:\\Work\\GIS\\Data\\Points.gdb\\points01"
polygonsFC = "c:\\Work\\GIS\\Data\\Polygons.gdb\\polygons01"
search_Distance = 1000
performCalculation(pointsFC, polygonsFC, search_Distance, type_list)
else:
performCalculation(pointsFC, polygonsFC, search_Distance, type_list)
arcpy.AddMessage(" ... complete in %s seconds." % (time.clock() - t_start))
Here, the most calculation intensive part has been converted into a module, findPolygons(), which takes the Points and Polygons Feature Classes, the Type of polygon it is searching for and the search distance. The module returns the data in a Python dictionary, and it is written back to the origins all at once, avoiding the case of multiple processes attempting to write to the same workspace. The update() dictionary method just adds new keys to the existing dictionary. Dictionaries have a lot of other useful features that can make intermediate data processing much easier, but the actual process is very application-specific.
Using Arcpy with Multiprocessing – Part 1
Posted by StacyR in ArcGIS, Arcpy, Multiprocessing, Python on July 5, 2011
There are a few requirements before you can make an Arcpy process suitable for use with Python’s Multiprocessing library:
- The most calculation intensive (time consuming) part of the code must be able to be made into a Python ‘module’ and parallelised; which will be described in the following posts.
- Once it is made into a module, there must be no issues with data access – each invocation of the module should either write to a different output database (Arc locks the entrie *.gdb in use, not just the feature class being accessed) or pass data back for it to be written only at a later stage.
To explain using Multiprocessing with Python, I will set out a hypothetical example. The objective of the example is to identify the number and type, and accumulate a weight value, of all the Polygons within a certain distance of some Point features.
The Polygon feature class has ‘polyType’ and ‘polyWeight’ attributes. This is just a simplified example that I thought up for explaining Multiprocessing – sorry if there is a better method than what follows for actually doing this!
Method pseudo-code:
# get variables from Arc # check all inputs are valid # for Polygon types: # make feature layer of Polygons # make feature layer of Points # for rows in Points: # get PointID # select [the] Point row corresponding to PointID (only way I could find to make a row selection) # select by location: Polygons within the search distance # for Polygon rows (within the selection): # store the sums of weighting and count to a Python dictionary by PointID and polygonType # for rows in Points: # for Polygon types: # access data from dictionary by PointID and Type # write value to row
