Geoprocessing in Python
The Washington police department is looking to crack down on crime in their downtown precincts and tourist attracting landmarks. They have hired a GIS analyst (moi) to study crime rates in precincts and determine the most crime-ridden landmarks in order to focus police efforts where they're most needed. I used this study to explore geoprocessing in Python using the ArcPy module.
While I learn Python fundamentals, I try to use as many different modules as possible in my scripts to get familiar with their utility. As with all things GIS, there's never just one way to get an answer!
Step 1: Import modules and set workspaces:
This script is going to draw predominantly from ArcPy's geoprocessing module but also from the os and fnmatch modules for some specific file manipulation.
# Import all modules used for this script: import arcpy import os import fnmatch # Set Arcpy workspace environment: arcpy.env.workspace = (r"C:\GEOS456\Assign01")
Step 2: Create a file geodatabase:
I learned early on that asking Python to create a file geodatabase is a useful step in the process, but the code must be commented out. A simple alternative is to use the if exist function.
# Create and define a geodatabase: if arcpy.Exists(r"C:\GEOS456\Assign01\NicePlace.gdb"): pass else: arcpy.CreateFileGDB_management(r"C:\GEOS456\Assign01","NicePlace","CURRENT")
Step 3: Read original data and import shapefiles to the geodatabase:
The original data came in shapefile format and is to be imported to our new geodatabase for geoprocessing and analysis. At the time of writing this script, I had been learning functions within the fnmatch module, so I utilized it here. Another useful alternative is to use arcpy.ListFeatureClasses which just reads features and shapefiles without getting hung up on all the other associated .prj and .shx files.
# Define a list of files using os module: fclist = os.listdir(r"C:\GEOS456\Assign01") # Define our geodatabase's location: gdb = r"C:\GEOS456\Assign01\NicePlace.gdb" # Create a 'for' loop to read all files in the environment and perform # tasks on each: for files in fclist: if fnmatch.fnmatch(files,"*.shp"): arcpy.FeatureClassToGeodatabase_conversion(files, outworkspace)
The fnmatch.fnmatch function will make use of wildcards (*) to look for specific files in a folder, in this case .shp files.
Step 4: Project all feature classes:
Now that my geodatabase is set up, I'll change the workspace to reflect the new location. Another useful function is the arcpy.env.overwriteOutput, which, when set to TRUE, will overwrite the files within allowing the user to run the script as it is written without creating unnecessary duplicate features. I also defined the workspace as a variable for reference within the for loop.
# Set workspace outWorkspace = arcpy.env.workspace = (r"C:\GEOS456\Assign01\NicePlace.gdb") arcpy.env.overwriteOutput = True # Project all features in gdb with specified coordinate system: for infc in arcpy.ListFeatureClasses(): outfc = os.path.join(outWorkspace, infc+"_P") sr = arcpy.SpatialReference(26985) arcpy.Project_management(infc, outfc, sr)
I utilized another os function, this time to create a name for the output feature class. The path.join function will concatenate the path and the feature. Second, the sr variable is referring to the spatial reference defined by a coordinate system's WKID, a unique identifier found in Esri's documentation for referencing stock projections.
At this point, I can test my code out by visualizing the information on a map. No issues so far!
Step 5: Produce crime stats for each precinct:
Next I must summarize the three crime types for each precinct and produce three count tables in ArcMap. Again, there are many ways to go about this, but I chose to do a simple intersect and then use the frequency tool to calculate statistics.
# Define input variables fc = "Precincts_P" fc2 = "Arsons_P" fc3 = "burglaries_P" fc4 = "Assault_P" # Intersect each crime with Precincts and create stat tables: for infc in arcpy.ListFeatureClasses(): if infc == fc2 or infc == fc3 or infc == fc4: outfc = os.path.join(outWorkspace, infc+"_Int_d") stats = os.path.join(outWorkspace, infc+"_Stats") arcpy.Intersect_analysis((fc, infc), outfc) arcpy.Frequency_analysis(outfc, stats, "Precinct")
The outfc and stats variables must be defined within the for loop because they require iterated inputs. The
intersect tool is set up to look for all crimes within precincts, and no other parameters are required in this case. The frequency tool takes the resulting feaatures from the intersect tool and writes them to a stand alone table summarized on the field 'Precinct'. The resulting table for Assaults, for example is shown to the right.
Step 6: Determine the number of Assaults within 250m of any landmark:
The police department has noticed an increase in visitor reports of assaults during their sightseeing adventures. I'll select by location to determine the number of assaults being reported within 250 meters of any downtown landmark.
fc5 = "Landmarks" # Create a layer file from the Assaults feature class: arcpy.MakeFeatureLayer_management(fc4, "assaults_lyr_d") # Use select-by-location to calculate assaults close to landmarks: arcpy.SelectLayerByLocation_management("assaults_lyr_d", "WITHIN_A_DISTANCE", fc5, "250 meters","NEW_SELECTION", "NOT_INVERT") # Use GetCount to 'select' features and extract to new feature class: result = arcpy.GetCount_management("assaults_lyr_d") arcpy.CopyFeatures_management("assaults_lyr_d", "Q2a_Results")
This process is interesting in that it requires a number of functions and therefore is a process in itself. Some would say the easier way would be to use the buffer function, but then I wouldn't learn about selectlayer! I first needed to create a layer file of the input feature class, then create a 'new selection' with searching "within a distance" of "250 meters" from Landmarks. The second part of the snippet takes that selection and copies it to a new feature class. The result is 11 assaults:
Step 7: Which Landmark has the most assaults reported within 250m?
The last and final question the police department wants answered is which landmark is the most prone to assaults? With this information, they can consider modifying current patrols, support targeted outreach programs and restore landmark reputations. The use of undissolved buffers will solve this problem as we already know that 11 assaults have been reported nearby landmarks, but their vicinities overlap.
# Buffer Landmarks without dissolving boundaries buffer = arcpy.Buffer_analysis(fc5, "Assaults_buffer_d", "250 meters") # Intersect Assaults and summarize on the landmark's name field arcpy.Intersect_analysis((fc4, buffer),"Q2_int_d") Q2_Results = arcpy.Frequency_analysis("Q2_int_d", "Q2b_Results", "Landname") # To print the landmarks with the most assaults nearby: # Set a few new variables fc = "Q2b_Results" field = "Frequency" field2 = "Landname" # Define the SQL where clause to be used in search cursor expression = arcpy.AddFieldDelimiters(fc, field) + "= 3" # Use SearchCursor to read the stats table and return the two # Landmarks that have 3 assaults. with arcpy.da.SearchCursor(fc, (field2, field),where_clause=expression) as cursor: for row in cursor: print row + ", " + str(row)
At this point, you may be wondering how did I know to use the number 3 in my SQL statement? I have only begun to scratch the surface of Python integration, search cursors and their where clauses included. Even though this prints out the correct information, I would prefer that it didn't rely on my knowing what the highest number was in the table. I will be returning to this segment to revise it with a more precise way of extracting the highest values in a table array. For now, I can report back and let the police department know that Howard University and Gallaudet University need enhanced perimeter security.