Faster Heatmaps

We can create a nice summary of the search results by displaying Solr generated heatmap data on the map.  Solr implements heatmaps like a facet.  To obtain an array of counts the Solr query must include heatmap.facet=true, the map extent,  and some other details.  Using a pyramid of grid levels that were set during document ingest,  Solr returns an array of document counts for an area larger slightly larger than the map.  These documents counts must be turned into heatmap colors.  The colors are assigned using the Jenks classification algorithm.  If the heatmap color scheme as n colors, we request n classifications from Jenks.  Once the classification has been computed, the heatmap layer can be drawn.  As the map zooms in and out, Solr selects the appropriate level in its pyramid of grid levels.  Which level is selected determines the number of cells in the heatmap, the default level can be overridden via query parameters.  You can see some Solr based heatmaps at http://WorldWideGeoWeb.com.

I investigated the performance of client side heatmap rendering code because a high resolution heatmap updated slowly.  My analysis used Google Chrome’s Developer Tools.  Using OpenLayers 2, rendering the heatmap layer takes under 250ms, often closer to half that time is needed.  It varies a little with the number of cells, but not much.  It turns out most of the time is spent computing the Jenks classifications.

For example, when the counts array is 81×49 and contains 743 non-zero values, computing the Jenks classifications takes 700ms.  When the count array is 161×97 with 2033 non-zero values, Jenks classifications takes roughly 9 seconds.  Once those classifications are computed, rendering the time to render the data in OpenLayers increases very slightly.

I do not know the computational complexity of the Jenks algorithm.  According to wikipedia, many other classification algorithms are NP-Hard, perhaps Jenks is as well.  (For NP-Hard problems, as the size of input grows the time to compute the answer grows very quickly.)  To reduce the time required for Jenks classifications I explored running the classification algorithm on only a subset of the counts data returned by Solr.  That is, we request a high resolution heamap from Solr but compute the Jenks classification using only a subset of the heatmap data.  I believe these explorations were successful.  The first images below shows Jenks classification based on all the data followed by an image with classification based on the half of the data (those with an even index).

HeatmapJenksAllPoints

HeatmapJenksEvenPoints

To show a significant non-trivial difference, here’s an image where the classification algorithm only used every tenth point:

HeatmapJenksTenthPoints

Generating the heatmap classification based on just a sampling of Solr’s returned document counts seems to work well.  It has a small impact on the heatmap’s appearance but greatly improves performance.

While using a sample has a small impact on the image, it does affect the boundaries the Jenks algorithm computes.  The following table show the boundaries computed using all the data, the data at even indexes and the data data at odd indexes:

all data: [0, 33, 138, 324, 516, 758, 1000, 1266, 1801]

even indexes: [0, 37, 148, 324, 509, 858, 1165, 1509, 1801]

odd indexed: [0, 22, 97, 310, 516, 686, 928, 1266, 1801]

If you’re interested in reading more about Jenks, check out Tom MacWright’s interesting post.