Incubator 8.3

I spent a lot of time this weekend revisiting our new database design for KBMOD, and see last post. I had a good excuse to finally learn about the spatial database support in PostgreSQL — PostGIS. Through several iterations and redrafts, I sped some of our test queries up by a factor of about 28x (from 21s to 0.7s).

Here were the key insights:

  • Rather than think of each image as a 2-D box (ra and dec) with an associated timestamp, think of it as a 3-D box (ra, dec, time). This box is still fairly small because each image covers only a few seconds or minutes.
  • The image 3-D box lives in a single column in a single table, so there are no on-the-fly joins and there is less data read from disk (probably).
  • It turns out that PostGIS cannot do a hybrid index-lookup on space and time at the same time. So the 2-D box + time-range version of the query takes two steps — first, intersect each trajectory spatially with all images, then filter out the examples where the timestamps do not sync up.

    By treating the image as a 3-D box (and the trajectory as a line in 3-D space), we are able to do both these phases at once.

  • To intersect a line with a box, it suffices to simply check that their bounding boxes overlap. PostGIS is faster if you ask for overlapping bounds rather than run the ST_Intersects code.

I think we’re now close enough with the database that we should put the end-to-end pipeline together:

  1. pose a set of trajectories
  2. use the database to find all the images these trajectories may overlap with
  3. use the database to group these results by image, so that we have a set of (trajectory, image) pairs ordered by image.
  4. read each intersected image off disk once, pulling out the relevant pixels for all trajectories as a group.
  5. co-add the pixels and find new trans-Neptunian objects — aka, Pluto’s buddies!

Looking forward to our next meeting!

Comments !

blogroll

social