Today I picked up some of the work that Sandra Anderson did in her summer internship, namely trying to find common citations (transitively) between pairs of papers in Jevin West‘s data sets.
Once again I identified a number of nice optimization opportunities:
- some query rewrites that result in better Myria plans
- some relational algebra optimizations we were leaving on the floor in Raco
- and some simple systems tricks to aggregate database inserts and thus amortize transaction overheads in Myria.
The query rewrites are an especially interesting use case. Sandra wrote correct, fantastic MyriaL programs, but: since I know how the system works at a deep level, I can suggest rewrites that result in much more efficient execution.
These queries that come from smart users and real science use cases are great as fodder for the future automatic query rewriting research I am planning on the side. In designing systems and services to make powerful tools accessible to scientists, the answer we give them when things are slow can’t always be “well, you wrote it wrong”.
Comments !