Tomorrow I will work more on the scaling issues!
We had a great Myria meeting this afternoon. We discussed Andrew Whitaker‘s user-defined aggregate (UDA) extensions to MyriaL, which provide a very nice way to get scalable, distributed partial aggregation to implement many complicated aggregations in a single scan rather than through joins. The poster child is arg_max
: return the entire row where the value of some field is maximized. Bill Howe has proposed a nice syntax that might help simplify the expression of arg_max
-like UDAs.
I presented Sandra’s least-common ancestor query to the group, and we discussed optimizations. At the meeting, Magda Balazinska, Bill, and Brandon Myers insisted this should work better if rewritten in incremental form, and Brandon helped me rewrite it afterwards. Next week, I’ll see if the incrementalization actually helps us scale.
Comments !