Tuesday, October 31, 2006

Paper: "Stasis: Flexible Transactional Storage"

Stasis: Flexible Transactional Storage
Russell Sears and Eric Brewer, University of California, Berkeley

Abstract

An increasing range of applications requires robust support for atomic, durable and concurrent transactions. Databases provide the default solution, but force applications to interact via SQL and to forfeit control over data layout and access mechanisms. We argue there is a gap between DBMSs and file systems that limits designers of data-oriented applications.

Stasis is a storage framework that incorporates ideas from traditional write-ahead logging algorithms and file systems. It provides applications with flexible control over data structures, data layout, robustness, and performance. Stasis enables the development of unforeseen variants on transactional storage by generalizing write-ahead logging algorithms. Our partial implementation of these ideas already provides specialized (and cleaner) semantics to applications.

We evaluate the performance of a traditional transactional storage system based on Stasis, and show that it performs favorably relative to existing systems. We present examples that make use of custom access methods, modified buffer manager semantics, direct log file manipulation, and LSN-free pages. These examples facilitate sophisticated performance optimizations such as zero-copy I/O. These extensions are composable, easy to implement and significantly improve performance.

3 Comments:

Blogger Anthony Nicholson said...

Official scribe transcript:

The authors note that systems researchers (in particular) often abandon off-the-shelf storage solutions due to their poor performance, and reinvent the wheel on every project that juggles a large amount of data. The goal of their project was to provide a transactional storage framework that provides good performance "off the shelf" but is easily extensible to meet the needs of users without requiring applications be rewritten (because they are too tied into the underlying plumbing of what's going on). This saves users from having to concern themselves with details of logging, recovery, et cetera, unless they really, really want to.

Statis design principles: (1) Provide simple, thin APIs to low-level components. (2) Ensure high-level semantics via local invariants. (3) All module interactions must be explicit---this lets they place policy decisions in replacable modules.

The authors gave an example of implementing a concurrent hash table on top of Statis. They wrap operations with Statis calls so the underlying layers know how to undo the operation that is being wrapped, in case a transaction needs to be rolled-back. This is nice because there is no need to understand the underlying guts of recovery and logging. They showed that implementing common data structures (linked list, hashtables) on top of Statis is not horribly complicated.

The authors discussed a case study: persistent objects. In other words, systems that support transactional updates over a series of objects. At each update, one marshalls an object, writes old and new versions to a log, then writes the new version to a page cache. To optimize this operation, Statis can conserve log bandwidth by only logging diffs. Also, they halve memory usage by defering page cache updates.

Physical operations have a narrow interface: opcodes, redo() and undo() callbacks, readRecord and setRecord, etc. The nice benefit is high-level invariants are enforced by the way the underlying API is written. No need for application developers to concern themselves with that. The authors described details of their low-level modules that implement log updates and deferring writes to the page cache in order to save memory.

They implemented group commit in the log manager and evaluation shows good performance gains. Since the log manager sees all updates that are produced, it could be extended to transparently implement automatic replication, et cetera. Can also ensure temporal ordering of certain writes (like external synchrony). Similar to type-safe disks, could couple type-system of the disk to a type-system of a higher-level application, since these modules are all extensible. Authors are interested in implementing zero-copy I/O for the page file, or replicating the page file on multiple servers.

Their performance, compared to atop Berkeley DB and SQL, shows resonable performance, with the optimizations described above doubling throughput and halving required memory.

Main contributions: flexible (no hard-coded APIs), customizable semantics (no need to rewrite applications when problems arise), and novel recovery strategies.

Project webpage:

http://www.cs.berkeley.edu/~sears/stasis/

Questions: none.

12:00 PM  
Blogger Atul Adya said...

If I had a transactional file system, would I need Stassis? In particular, Stassis provides page-based transactional storage - a transactional file system would give you that as well.

The difference I see is that Stassis lets you control your undo/redo, i.e., lets you do it logically.

Is there any other difference relative to a transaction FS?
Thanks.

3:05 PM  
Blogger Russell Sears said...

Being able to control undo/redo gives you concurrent transactions, and allows to modify file layout or index directories. Also, Stasis lets you customize page replacement policies, and precisely define when changes should be made durable.

I'm not sure how a transactional FS would handle isolation. If it provided a general-purpose lock manager it could unnecessarily rollback work. Custom isolation mechanisms often need recovery callbacks, knowledge of index structures, and so on.

A transactional file system with mmap and mlock would elegantly avoid double buffering for non-concurrent object workloads. It would be interesting to find a narrow API that would add concurrency to the FS.

We think we could implement a transactional filesystem using Stasis (in user-space). However, Stasis can also support some applications that wouldn't work very well if layered on top of a transactional FS.

9:52 AM  

Post a Comment

<< Home