Thursday, December 11, 2008

Scribles from OSDI

A few paper where I took notes - mostly first day papers.

dryadlinq:

how to write distributed data-parallel programs from ds.

pragramer writes program on a single machine and the program takes care of distributing code and running it

LINQ : Microsoft language INtegrate Query

LINQ is an interface - there are many excution engines - The prgram is sritten in Standard common interface.

Dryad: takes care of runnning the jobs on multiple machines - uses Files/TCP/FIFo for communication.

Very similar to map reduce.

The input documents could be located anywhere - SQL, disk, mem, ..

Generate a distributed execution plan - similar to optimization in programing languages. These optimizations may help reduce the network consumption, .. Runtime code generation is used to optimize on the fly.

map reduce program can be easily converted to dryad linq interface. 3 lines of code
call mapper
sort intermediate values
call reducer

Dryadlinq is to be realeased for academic commujnity.

The main goal is completely opposite of virtual machines idea - Combine multiple computers and give a view of a single logical machine.

q. Ramki:
1. What are the syntax and symantic of this language?
There is no parallel expressions. You write the code in relational algebra
2. What are limitations - ex: recursion is hard to capture.

q. Google: Performace and intuition: Structuring programs is important since the same program could run in a second and could run in a huge amount of time:
We are still learning about it.


Everest: Scale down peak loads through I/O off-loading
Dushyanth Narayanan, Austin Donnelly, Eno Thereska, Sameh Elnikety, and Antony Rowstron, Microsoft Research Cambridge, United Kingdom

problem: IO peak in servers: Peak - short unexpected and high
motivate: IO trace from exchange servers.

Insert everest client on disk that we are bothered about. Monitor disk usage on each disk and offshore load during spikes to free disks.
Offloads writes during peaks. When the peak comes down - reclaim all the writes back to this disk.

properties of peaks: Uncorrelated across disks. Have writes - few foreground reads

Challenges: Want any write anywhere. Read should always return the correct last write. State must be consistent - apps would crash otherwise.

Keep metadata to track offloads - cached in memory.

Reclaiming: everest client when sees the disk is free - asks the cluster who has offloaded data.

Evaluated using exchange server trace - particularly three peaks that were observed. Second set of evaluation was done using OLTP benchmark.

Offload works for small high peaks and not to improve 24x7 performance.

q. Is there a synchronization problem that requires data on server to become persistent?
Wait for data to get persistent.
q. Can you configure what to offload or what to not? Yes that can be done at different levels.
q. How much correlations happens in volumes? Havent done it yet.
q. if you have JFS, do you do anything special for that? No
q. Would be using more write caches help? They would have to be very big - It would be the same - but have to have high cache
q. Underlying system has hardware raid? Does it introduce more failures levels? There are no more special dependencies introduced.


Improving MapReduce Performance in Heterogeneous Environments
Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, and Ion Stoica, University of California, Berkeley

q. Why have global parameters - rather than looking at local scenario?
q. What do you do when you have a small cluster? Launch all primary tasks - only relaunch in the last run.
q. How would different run times of each map/reduce job would effect LATE? You need to normalize with something that says this is a small task and is a large task.


Corey:
Many applications spend time in kernel. Bottlenecks in OS is due to DS shared in kernel. Continuous design change is required to increase concurrency. Applications dont need to share all the ds that existing interfaces share.

Main idea: Reduce contention.

Shares: control the kernel data used to resolve application references.
Cost of using FD table: expected to scale linearly with number of cores. Throughput drops from 1 core to two cores and stays constant after that.

Reason: Additional core takes 121 cycles instead of 3 cycles. lock serialyzes updates to the table.

Use shares - allows applications to control how cores share the kernel ds used to do lookups. Let the application decide the fd tables.

Benefits: Apps can control how cores share internal kernel ds.

They also implement address ranges for each app. They benchmarked performance using map reduce. Using address ranges, apps can avoid contention.



CuriOS:
Problem: Errors occuring in systems due to hardware / software faults. Ex/l bit flips, stuck at faults , bad usage of pointers.

Managing OS errors: Currently - let the errors happen and go on to fix them later.
The fundamental problem with restart recovery - state from previous instance is gone after a crash. Save per client state in the server address space and in case of a crash save this information.

Redline: First Class Support for Interactivity in Commodity Operating Systems


Network Imprescision: A new consistency metric from scalable monitoring:


How to monitor large networks
Goals for monitoring: Scalability, real time moitoring, acuracy despite failures.
Half of the reports can differ by atleast 30 % from the truth. How to safegaurd accuracy despite disruption. Instead of giving best effort results give reliable results.

Quantify the stability of the system.

Define lower and upper bounds on number of nodes which are good and number of nodes which are bad (double counted) resp.

Nall, Nreachable, Ndup matrixes exposes impact of disruptions.


Chopstix: lightweight, high resolution monitoring for trouble shooting production systems
Run with low overheads - collect rich information about system .. to generate system states later in time.

hard bugs - happens once - have no clue why it happens...

Good copstix gui that can be used to dwelve down into the problem.

Better than oprofile in terms of overhead.

q. scalability of the chopstix - how many metrics can we capture? We havent carried out the numbers but it is now under 1% of CPU utilization.


Automating Network Application Dependency Discovery: Experiences, Limitations, and New Solutions
Xu Chen, University of Michigan; Ming Zhang, Microsoft Research; Z. Morley Mao, University of Michigan; Paramvir Bahl, Microsoft Research

Managing enterprise netwroks is hard. Need to have dependency information which could help in troubleshooting networks.

eXpose and sherlocks do this work.

contributions: use delay distribution.

use passive sniffing and only parse tcp/ip headers - minimize false dependencies.

System orion proposed - time delay between dependent services reflects typical processing and network delay.

Ignore some transient services. Only consider services that occur close in time.

Target 5 dominant applications in MS datacenters.
Have low FP when compared to sherlocks/expose.

FP comes from normal trends - like opening a web page from an email would realte mail server with proxy if the user clicks the page immediately.