Other articles


  1. Myria updates

    Not much happening this week, what with the holidays. I took advantage of the break to handle some long-overdue code reviews and code improvements to Myria.

    • Jingjing Wang has added resource profiling to Myria. We can now measure the resource consumption of each operator during query execution. (Unfortunately, these data ...

    read more

    There are comments.

  2. Public cluster + private experiments

    The hot button issue today is what we do with our public Myria service.

    As part of the grant proposal, we promised that “the project develops and deploys a Web-based query-as-a-service interface to the new middleware. The service will be made available to domain scientists” (p.1). This service has ...

    read more

    There are comments.

  3. Data and databases

    Over the weekend, both Sophie Clayton and Andy Becker worked independently on their Data Science Incubator projects; I spent some time then and today answering emails :).

    Sophie has been loading underway data (GPS, temperature, salinity, etc. from ships in motion) into SQLShare for cleaning. Every research vessel is its own ...

    read more

    There are comments.

  4. 2014-09-19 daily

    Today Sophie Clayton and I hacked on Myria for SeaFlow once again. We found another few opportunities for language and usability improvements, and made little progress because of an issue introduced when fixing other bugs earlier this week.

    In the Myria research meeting, we had both Johannes Gehrke from Microsoft ...

    read more

    There are comments.

  5. 2014-09-16 daily

    I also did not get much time to do real work today. There were three major activities:

    1. UW Data Science Incubator applications are due Thursday! They have started rolling in, so I have started looking at them and have started a few clarifying discussions with some of the authors. Getting ...

    read more

    There are comments.

  6. 2014-09-15 daily

    Next week, I’ll see if the incrementalization actually helps us scale.

    Only had a tiny bit of time today; I worked more on the least common ancestor query. Here is what new work contributed to better scaling:

    • Incrementalizing the code (duh) did in fact let me scale it farther ...

    read more

    There are comments.

  7. 2014-09-11 daily

    Today I spent all day with Sandra Anderson’s citation graph lineage queries. Though I can compute “all-pairs reachability” for the first 10000 papers in the dataset… I can only currently compute “least-common ancestor” for the first 500 papers. There are some severe algorithmic scalability challenges here that we are ...

    read more

    There are comments.

  8. 2014-09-10 daily

    In between meetings, I spent most of today continuing yesterday’s work on the citation use case. Further query rewrites and testing exposed an interesting bug in the optimizer due to a mismatch between logical algebra representation and the actual system implementation behavior — the optimizer assumed the system could perform ...

    read more

    There are comments.

  9. 2014-09-09 daily

    Today I picked up some of the work that Sandra Anderson did in her summer internship, namely trying to find common citations (transitively) between pairs of papers in Jevin West‘s data sets.

    Once again I identified a number of nice optimization opportunities:

    • some query rewrites that result in better ...
    read more

    There are comments.

  10. 2014-09-08 daily

    Today we held the information session for the second installation of our Data Science Incubator projects which we will hold in the Spring. It was fairly well attended; maybe 20—25 people came and many of these indicated that they will be submitting proposals.

    Over the weekend and today I ...

    read more

    There are comments.

  11. 2014-08-28 daily

    We had our monthly SeaFlow/eScience group meeting meeting. For this grant the oceanographers have been doing lots of new science using tools like SQLShare, Myria, and popcycle, our software for storing indexing and analyzing SeaFlow data. We discussed needed improvements to popcycle and to the seaflow-viz web dashboard (see ...

    read more

    There are comments.

  12. 2014-08-25 daily

    Another fantastic hack session with Sophie today. We analyzed the quality and quantity of data in the existing files, including determining which of the 64K SeaFlow samples are within a reasonable amount (say, 1σ) of the “average” SeaFlow sample according to the calibration beads. Surprisingly/hearteningly, the vast majority of ...

    read more

    There are comments.

blogroll

social