Tuesday, October 31, 2006

Paper: "Rethink the Sync"

Rethink the Sync
Edmund B. Nightingale, Kaushik Veeraraghavan, Peter M. Chen, and Jason Flinn, University of Michigan

Abstract

We introduce external synchrony, a new model for local file I/O that provides the reliability and simplicity of synchronous I/O, yet also closely approximates the performance of asynchronous I/O. An external observer cannot distinguish the output of a computer with an externally synchronous file system from the output of a computer with a synchronous file system. No application modification is required to use an externally synchronous file system: in fact, application developers can program to the simpler synchronous I/O abstraction and still receive excellent performance. We have implemented an externally synchronous file system for Linux, called xsyncfs. Xsyncfs provides the same durability and ordering guarantees as those provided by a synchronously mounted ext3 file system. Yet, even for I/O-intensive benchmarks, xsyncfs performance is within 7% of ext3 mounted asynchronously. Compared to ext3 mounted synchronously, xsyncfs is up to two orders of magnitude faster.

6 Comments:

Blogger valankar said...

If an application behaves like:

write()
print()
write()
print()
...

Does this in effect revert to standard synchronous speeds?

9:44 AM  
Blogger Jason Flinn said...

Not entirely. The print will trigger a commit of the current FS transaction, but it will not block the application. So, the 2nd write and print can proceed while the 1st transaction is committing.

If your program keeps repeating this pattern, a bunch of writes will accumulate in the next FS transaction while the first is committing, and they will be group committed, greatly increasing performance.

The Apache build benchmark in the paper behaves somewhat like this.

10:10 AM  
Blogger Anthony Nicholson said...

Official scribe transcript:

The authors note that asynchronous I/O provides good user-perceptible performance, but does not provide reliable and timely safety of data on disk. Synchronous I/O provides data safety guarantees but incurs significant overhead. The authors propose a new model of "externally
synchronous" I/O that resolves this tension by approximating the performance of asynchronous I/O while providing the data safety of synchronous I/O.

The main reason synchronous I/O is slow is that applications must block until data has been written safely to disk. The authors argue that only the user, not applications, should be considered "external" to the system. Therefore, under their model, applications need not block on I/O operations, but can continue and perform other work while writes are queued. The system only blocks when some output that depends on a pending write is about to be externalized to the screen, disk, or network---in other words, any event that would convince the user that their I/O operating has completed.

External synchrony preserves the same causal ordering of writes as synchronous I/O. The fact that unrelated I/O operations can be batched and overlapped, without violating causal ordering, is what enables the performance wins in their system.

Their implementation leverages their prior work (Speculator, SOSP '05) to track causal dependencies across multiple applications and throughout the kernel. "Commit dependencies" inside the kernel track all the processes and objects that are causally dependent on a given pending write. Commit dependencies are forwarded to applications that become tainted by uncommitted data, to ensure preservation of causal ordering.

There is a tension between the length of time that output is buffered and the response time expectations of users. They bound this wait time by a five second timeout, and also trigger commit to disk if the amount of pending external outputs exceeds a certain threshold.

They modified the Linux ext3 file system to support external synchrony, and compared the performance of their system to native ext3 in both asynchronous and synchronous I/O mode, and ext3 synchronous with write barriers. Their evaluation results show that ext3, even using synchronous I/O, does not guarantee data durability across crashes or power failures. Ext3 with write barriers provides the same data safety of external synchonry but at a severe peformance cost.

Their performance on various file system benchmarks shows performance close to that of asynchronous ext3, while providing superior security guarantees to that of synchronously-mounted ext3. The performance of ext3 using synchronous I/O is also an order of magnitude slower than that of external synchrony. Their performance on the specweb99 benchmark also shows external synchrony adds minimal latency overhead as compared to asynchronous file systems.

Questions:

David Anderson, CMU: Is there a corner case where an application would do an asynchronous write, see it failed, and change behavior based on that?
Answer: Yes, it is true that in our system, by the time an application discovers a write has failed they have already moved on and it complicated recovery. We argue, however, that "failure notification" usually means kernel panic and crash due to hardware failure, so the user has bigger problems in that case.

George Candea, EPFL: In the mySql benchmark, clients and servers were on same box?
A: Yes.
Q: If they were on different boxes, what would be the expectation for performance results?
A: A different benchmark in our paper (SpecWeb) was responding to client reqs over the network. Improvement would not be as great, but we were more interested in testing the local case.
Q: I'm more interested in the group commit policy.
A: In that case, we're getting benefit of group commit, but won't see same benefit of being to comitt multiple trans from same client

Micah Brodsky, MIT: You aren't testing against high bw, sequential I/O. Using data logging, rather than just metadata logging. Would sequential I/O throughput suffer?
A: Good question. I believe postmark benchmark does sequential I/O? You'll still see improvement because won't be blocking app between each operation. Multiple writes can be grouped and committed as a block.
Q: Any intuition about how would compare to asynch doing large sequential?
A: Asynchronous I/O is limited by speed to write to memory. We are similarly limited.

11:27 AM  
Blogger lianqiao said...

I am wondering if the external sync will fail when two processes are syncing with something like mutex.

Suppose following case:
1) process a process b
2) acq_mutex(x)
3) write(filey) acq_mutex(x)
4) rel_mutex(x) ...success here
5) read(filey)
6) rel_mutex(x)
7) print(z)

the problem is:
a) will process b fail to read (step 5) the update by process a?
b) will the print (step 7) comes before the write (step 3) in process a really commited?

7:06 PM  
Blogger lianqiao said...

I am wondering if the external sync will fail when two processes are syncing with something like mutex.

Suppose following case:
1) process a...|process b
2) acq_mutex(x)|
3) write(filey)|acq_mutex(x)
4) rel_mutex(x)|(success here)
5) ............|read(filey)
6) ............|rel_mutex(x)
7) ............|print(z)

the problem is:
a) will process b fail to read (step 5) the update by process a?
b) will the print (step 7) comes before the write (step 3) in process a really commited?

7:10 PM  
Blogger Jason Flinn said...

The data written by process a will be in the kernel's in-memory page cache after step 3. In step 5, the data will be read from process b directly from the page cache. So, the answer to your first question is that process b will read what process a wrote.

At step 7, let's assume that the data has not yet been written to disk. Then process b is uncommitted and the output is buffered by the OS.

Both process a and process b will have a commit dependency on the uncommitted data. The dependency is inherited by process b when it gets the mutex (assuming it is a Linux futex) - without the futex/mutex, the dependency would be inherited at step 5 when process b reads the data.

After the data is committed, the output from process b will appear on the screen.

3:21 PM  

Post a Comment

<< Home