Going deep and learning to love the haters
Advice for graduate school
Kay Ousterhout UC Berkeley PhD
Going deep and learning to love the haters Advice for graduate - - PowerPoint PPT Presentation
Going deep and learning to love the haters Advice for graduate school Kay Ousterhout UC Berkeley PhD Depth fi rst Starting grad school Draw an outline Starting a project Ask questions Log log log Running experiments Love the haters
Kay Ousterhout UC Berkeley PhD
Presenting Getting feedback Running experiments Starting a project Starting grad school
7.3 How do task constraints affect performance? 7.4 How do scheduler failures impact job response time? 7.6 How does Sparrow compare to Spark’s native, centralized scheduler? 7.7 How well can Sparrow’s distributed fairness enforcement maintain fair shares? 7.8 How much can low priority users hurt response times for high priority users? 5.2 How much to stragglers affect job completion time? 5.3 Are these results inconsistent with past work?
Is ths the graph we want in the paper? Do you agree these are the expected results? Is there any thing else I should graph?
(I used Latex. Don’t do this.)
One useful thing for debugging is vmstat -SM 1, which basically runs free -m at regular intervals and prints the output. Sangjin suggested this: echo 1000 > /proc/sys/vm/vfs_cache_pressure Another idea in a similar vein is to attempt the suggestions described here: http://serverfault.com/questions/516074/why-are-applications-in-a-memory- Also re-launch as ext4 / xfs. Also consider turning journalling off; this means that for every write, a bunch
the file system, do dumpe2fs /dev/xvdb To turn journalling off, you can run this command: tune2fs -O^has_journal /dev/xdy’’
jobs on higher priority users. Constraints Our current design does not handle inter- job constraints (e.g. “the tasks for job A must not run on racks with tasks for job B”). Supporting inter-job con- straints across frontends is difficult to do without signif- icantly altering Sparrow’s design. Gang scheduling Some applications require gang scheduling, a feature not implemented by Sparrow. Gang scheduling is typically implemented using bin-packing algorithms that search for and reserve time slots in which an entire job can run. Because Sparrow queues tasks on several machines, it lacks a central point from which to perform bin-packing. While Sparrow often places all jobs on entirely idle machines, this is not guaranteed, and deadlocks between multiple jobs that require gang scheduling may occur. Sparrow is not alone: many clus- ter schedulers do not support gang scheduling [8, 9, 16]. Query-level policies Sparrow’s performance could be
Presenting Getting feedback Running experiments Starting a project Starting grad school
Pick any project, learn as much as possible Draw graphs, get buy-in Measure deeper Fear good results Save commits + conf, Ctrl+F! Talk to the haters Put limitations in the paper Get feedback early Avoid paralysis
Presenting Getting feedback Running experiments Starting a project Starting grad school
Pick any project, learn as much as possible Draw graphs, get buy-in Measure deeper Fear good results Save commits + conf, Ctrl+F! Talk to the haters Put limitations in the paper Get feedback early Avoid paralysis