Started the day with a fantastic meeting with Sophie Clayton in the Armbrust Lab. Sophie is trying to analyze the entire SeaFlow corpus in Myria. Today: she wrote queries to analyze ~64K files at once in MyriaL, which she is logging on a GitHub wiki. We ran into some memory pressure joining two 1.7B-rows (# particles measured by SeaFlow) datasets, but were able to work around it. The remaining queries were on the order of the # sample files and all finished with no problems in under 2 minutes.
The other great part about working with Sophie is what I learn by seeing a real scientist who is extremely competent, but not a database expert, use the system. I generated at least 7 new issues related to making Myria more usable. I spent my productive time in the rest of the day working on these issues.
More code review for Myria.
Comments !