Major bug in ArcGIS Zonal statistics?

Major bug in ArcGIS Zonal statistics?

Update: The bug has been fixed in the ArcGIS 10.4 release

I am using ArcGIS 10.2.2 to determine zonal statistics for a number of zones. If there is any NoData in the value raster, I want the zone results to be "NoData", precisely as advertised by the tools description. This tool description states:

DATA - Within any particular zone, only cells that have a value in the input Value raster will be used in determining the output value for that zone. NoData cells in the Value raster will be ignored in the statistic calculation.

NODATA - Within any particular zone, if any NoData cells exist in the Value raster, it is deemed that there is insufficient information to perform statistical calculations for all the cells in that zone; therefore, the entire zone will receive the NoData value on the output raster.

Please have a look at my setup in this picture:

I am using the NODATA option with a value raster that has one NoData pixel, and therefore expect the resulting zone value (zone 61154) to be 'NoData'. Instead, I get a value of 12.74 (rounded to 13 in the image), which confuses me on two levels: First, I expected 'NoData', and second, the resulting value of 12.74 is mathematically impossible, because the mean cannot be larger than the maximum value in the value raster, which is 10 in this case.

If I am using the DATA option, I get a value of about 9.1, which makes sense. We tested this on different datasets, computers, and ArcGIS versions.

What am I missing here?

Edit / Additional comment: I just noticed that the 'Count' attribute is also wrong for that particular zone. There are indeed 421 cells in that zone, but the tool only counted 297. Calculating 421 minus 297 results in 124 - oddly enough, this is the "position" where the NoData pixel is located, if one counts the pixels from upper left to lower right in the zone. The tool might be getting the cell count wrong (too low), which might explain the increase of the average.

Edit: Here is a link to the data I am using.

Edit: Dan Patterson and I did some further debugging here at the ESRI forum.

There is a bug that seems to correspond to what you're experiencing - it's registered as BUG-000084883 - The 'Ignore NoData in calculations' option in Zonal Statistics as Table tool {and Zonal Statistics tool} is not honored when checked off, producing incorrect results.

It occurs with 10.3 and 10.2.2 but not 10.1. Did you try the tool with this version?

It is a bug. Something terribly wrong with cell count.

Correct mean (9.0452380952381) times correct number of non-empty cells (420) divided by 297 (that is a cell count reported by tool) results in 12.7912457912458. That is a wrong average reported by tool.

Results of my own toy size grids test:

Similar to another answer, move the raster data into NumPy masked arrays to calculated your statistics. Assuming two overlaying rasters with same shape, this is simple:

import numpy as np zones = arcpy.RasterToNumPyArray("zones") value ="value"), arcpy.Raster("value").noDataValue) print("Zone	Count	NoData	Mean") for z in np.unique(zones): sel = (zones == z) print z, sel.sum(), value.mask[sel].sum(), value[sel].mean()


Zone Count NoData Mean 61131 53 0 8.92452830189 61154 421 1 9.04523809524 61207 1 0 8.0 61317 35 0 7.2 61644 644 0 7.90838509317 61677 12 0 7.41666666667 61789 7 0 9.0 61871 193 0 7.98445595855 187472 349 0 8.5787965616

Watch the video: Zonal statistics in ArcGis