Paper: "Rethink the Sync"
Rethink the Sync
Edmund B. Nightingale, Kaushik Veeraraghavan, Peter M. Chen, and Jason Flinn, University of Michigan
This is an open forum for discussion of papers presented at OSDI 2006. Please add your comments to these postings. We invite comments from anyone who has read the paper or heard the presentation; please note that the papers themselves are not available for free online access until 2007.
Rethink the Sync
Edmund B. Nightingale, Kaushik Veeraraghavan, Peter M. Chen, and Jason Flinn, University of Michigan
6 Comments:
If an application behaves like:
write()
print()
write()
print()
...
Does this in effect revert to standard synchronous speeds?
Not entirely. The print will trigger a commit of the current FS transaction, but it will not block the application. So, the 2nd write and print can proceed while the 1st transaction is committing.
If your program keeps repeating this pattern, a bunch of writes will accumulate in the next FS transaction while the first is committing, and they will be group committed, greatly increasing performance.
The Apache build benchmark in the paper behaves somewhat like this.
Official scribe transcript:
The authors note that asynchronous I/O provides good user-perceptible performance, but does not provide reliable and timely safety of data on disk. Synchronous I/O provides data safety guarantees but incurs significant overhead. The authors propose a new model of "externally
synchronous" I/O that resolves this tension by approximating the performance of asynchronous I/O while providing the data safety of synchronous I/O.
The main reason synchronous I/O is slow is that applications must block until data has been written safely to disk. The authors argue that only the user, not applications, should be considered "external" to the system. Therefore, under their model, applications need not block on I/O operations, but can continue and perform other work while writes are queued. The system only blocks when some output that depends on a pending write is about to be externalized to the screen, disk, or network---in other words, any event that would convince the user that their I/O operating has completed.
External synchrony preserves the same causal ordering of writes as synchronous I/O. The fact that unrelated I/O operations can be batched and overlapped, without violating causal ordering, is what enables the performance wins in their system.
Their implementation leverages their prior work (Speculator, SOSP '05) to track causal dependencies across multiple applications and throughout the kernel. "Commit dependencies" inside the kernel track all the processes and objects that are causally dependent on a given pending write. Commit dependencies are forwarded to applications that become tainted by uncommitted data, to ensure preservation of causal ordering.
There is a tension between the length of time that output is buffered and the response time expectations of users. They bound this wait time by a five second timeout, and also trigger commit to disk if the amount of pending external outputs exceeds a certain threshold.
They modified the Linux ext3 file system to support external synchrony, and compared the performance of their system to native ext3 in both asynchronous and synchronous I/O mode, and ext3 synchronous with write barriers. Their evaluation results show that ext3, even using synchronous I/O, does not guarantee data durability across crashes or power failures. Ext3 with write barriers provides the same data safety of external synchonry but at a severe peformance cost.
Their performance on various file system benchmarks shows performance close to that of asynchronous ext3, while providing superior security guarantees to that of synchronously-mounted ext3. The performance of ext3 using synchronous I/O is also an order of magnitude slower than that of external synchrony. Their performance on the specweb99 benchmark also shows external synchrony adds minimal latency overhead as compared to asynchronous file systems.
Questions:
David Anderson, CMU: Is there a corner case where an application would do an asynchronous write, see it failed, and change behavior based on that?
Answer: Yes, it is true that in our system, by the time an application discovers a write has failed they have already moved on and it complicated recovery. We argue, however, that "failure notification" usually means kernel panic and crash due to hardware failure, so the user has bigger problems in that case.
George Candea, EPFL: In the mySql benchmark, clients and servers were on same box?
A: Yes.
Q: If they were on different boxes, what would be the expectation for performance results?
A: A different benchmark in our paper (SpecWeb) was responding to client reqs over the network. Improvement would not be as great, but we were more interested in testing the local case.
Q: I'm more interested in the group commit policy.
A: In that case, we're getting benefit of group commit, but won't see same benefit of being to comitt multiple trans from same client
Micah Brodsky, MIT: You aren't testing against high bw, sequential I/O. Using data logging, rather than just metadata logging. Would sequential I/O throughput suffer?
A: Good question. I believe postmark benchmark does sequential I/O? You'll still see improvement because won't be blocking app between each operation. Multiple writes can be grouped and committed as a block.
Q: Any intuition about how would compare to asynch doing large sequential?
A: Asynchronous I/O is limited by speed to write to memory. We are similarly limited.
I am wondering if the external sync will fail when two processes are syncing with something like mutex.
Suppose following case:
1) process a process b
2) acq_mutex(x)
3) write(filey) acq_mutex(x)
4) rel_mutex(x) ...success here
5) read(filey)
6) rel_mutex(x)
7) print(z)
the problem is:
a) will process b fail to read (step 5) the update by process a?
b) will the print (step 7) comes before the write (step 3) in process a really commited?
I am wondering if the external sync will fail when two processes are syncing with something like mutex.
Suppose following case:
1) process a...|process b
2) acq_mutex(x)|
3) write(filey)|acq_mutex(x)
4) rel_mutex(x)|(success here)
5) ............|read(filey)
6) ............|rel_mutex(x)
7) ............|print(z)
the problem is:
a) will process b fail to read (step 5) the update by process a?
b) will the print (step 7) comes before the write (step 3) in process a really commited?
The data written by process a will be in the kernel's in-memory page cache after step 3. In step 5, the data will be read from process b directly from the page cache. So, the answer to your first question is that process b will read what process a wrote.
At step 7, let's assume that the data has not yet been written to disk. Then process b is uncommitted and the output is buffered by the OS.
Both process a and process b will have a commit dependency on the uncommitted data. The dependency is inherited by process b when it gets the mutex (assuming it is a Linux futex) - without the futex/mutex, the dependency would be inherited at step 5 when process b reads the data.
After the data is committed, the output from process b will appear on the screen.
Post a Comment
<< Home