Incubator 8.3

I spent a lot of time this weekend revisiting our new database design for KBMOD, and see last post. I had a good excuse to finally learn about the spatial database support in PostgreSQL — PostGIS. Through several iterations and redrafts, I sped some of our test queries up by a factor of about 28x (from 21s to 0.7s).

Here were the key insights:

Rather than think of each image as a 2-D box (ra and dec) with an associated timestamp, think of it as a 3-D box (ra, dec, time). This box is still fairly small because each image covers only a few seconds or minutes.
The image 3-D box lives in a single column in a single table, so there are no on-the-fly joins and there is less data read from disk (probably).
It turns out that PostGIS cannot do a hybrid index-lookup on space and time at the same time. So the 2-D box + time-range version of the query takes two steps — first, intersect each trajectory spatially with all images, then filter out the examples where the timestamps do not sync up.

By treating the image as a 3-D box (and the trajectory as a line in 3-D space), we are able to do both these phases at once.
To intersect a line with a box, it suffices to simply check that their bounding boxes overlap. PostGIS is faster if you ask for overlapping bounds rather than run the ST_Intersects code.

I think we’re now close enough with the database that we should put the end-to-end pipeline together:

pose a set of trajectories
use the database to find all the images these trajectories may overlap with
use the database to group these results by image, so that we have a set of (trajectory, image) pairs ordered by image.
read each intersected image off disk once, pulling out the relevant pixels for all trajectories as a group.
co-add the pixels and find new trans-Neptunian objects — aka, Pluto’s buddies!

Looking forward to our next meeting!

Dan's research blog

Incubator 8.3

Comments !

Comments !

blogroll

social