Tuesday, October 31, 2006

Paper: "Type-Safe Disks"

Type-Safe Disks
Gopalan Sivathanu, Swaminathan Sundararaman, and Erez Zadok, Stony Brook University


We present the notion of a type-safe disk (TSD). Unlike a traditional disk system, a TSD is aware of the pointer relationships between disk blocks that are imposed by higher layers such as the file system. A TSD utilizes this knowledge in two key ways. First, it enables active enforcement of invariants on data access based on the pointer relationships, resulting in better security and integrity. Second, it enables semantics-aware optimizations within the disk system. Through case studies, we demonstrate the benefits of TSDs and show that a TSD presents a simple yet effective general interface to build the next generation of storage systems.


Blogger Anthony Nicholson said...

This comment has been removed by a blog administrator.

11:49 AM  
Blogger Anthony Nicholson said...

Official scribe transcript...

On disk data consists of two things: data and pointers. Pointers convey three vital details:

If block A points to B, then they are related, and B can be accessed through A.

Pointers indicate grouping of blocks, and this is likely related to high-order abstractions.

Can infer relative importance of blocks because some are critical in tree of pointers.

Unfortunately, today's disks are pointer oblivious. The file system knows all about the semantics of data. The disk knows about the hardware details. But because the interface between OS and disk is so constrained, little info is exchanged between the two. Everything is just reading and writing blocks. The disk doesn't know what the high-level reason for a read/write is (a semantic gap in the storage stack). For example, would be nice for a RAID system to prioritize the handling of metadata blocks because those are more important.

Type-safe disks try to bridge semantic gap through pointers. The authors propose an extended interface. This enables both type-awareness (disks tracking pointers) and type-safety (disks using pointer info to enforce constraints).

In traditional file systems, the super block points to multiple directory blocks, and each directory block points to multiple inodes. The inodes in turns point to the data blocks that make up each file. Thus, the structure of pointers on disk mirrors higher-level logical file system abstractions. Importantly, understanding pointers lets the disk know which blocks of the disk are actually in use and which are idle. Observing the addition and deletion of pointers gives hints about high-level file operations (such as create and delete).

TSD offloads free-space management from the file system to the disk. This allows automatic garbage collection by reclaiming blocks that lack incoming pointers. The disk now can also prevent unauthorized access to dead blocks. A set of "root blocks" are statically allocated---possibly the first n blocks of the disk---for metadata.

Offloading these tasks to the disk lets the file system component of the operating system shrink in size and complexity. The authors added additional API calls between filesystem and disk, such as allocate block, create pointer, and delete pointer. They have implemented a prototype in Linux as a pseudo-device driver, and ported the ext3 and vfat file systems to support TSD. The porting effort was minimal (approximately 2 person-weeks).

Their case study is a security application (ACCESS: A capability conscious extended storage system). This is disk-enforced access control. To access data, an application must provide the disk with a valid capability (an encryption key). Thus the maximum amount of data that can be exposed is that which is currently in use or cached at a higher level. Traditionally, if the OS were compromised then all data on disk would be compromised. ACCESS establishes a security perimeter on the disk itself instead.

ACCESS authenticates groups of blocks (at per-file or per-directory level of granularity). Implicitly, ACCESS allows path-based capabilities, because it understands pointers relationships. All that higher-level software need do is protect the relevant pointer blocks and the disk will prevent direct access to the data without capability to access the necessary pointer blocks. They add an API command to set a capability for a given disk block. The disk stores a key for each protected block (read and/or write keys).

Thus, the file system decides what to protect and at what level of granularity, but the disk handles the heavy lifting of making it happen. For their prototype, they modifed ext2 to support per-file capabilities (protecting the inode-blocks). The authors showed how their framework can easily encode arbitrary trust relationships. ACCESS limits data disclosure, is less CPU-intensive than application-level encryption, key revocation is simple (since inodes are protected, not all data blocks), key losses do not destroy data (like in encryption systems), and their system enables flexible data sharing among users.

Other potential users of TSDs include:

On-disk secure deletion (when garbage-collecting unused blocks)

Intelligent replication and placement in RAID (because the RAID system understands the file metadata now)

Semantic consistency at disk level. Using pointer info, ensure atomic commit.

Semantic-guided placement and pre-fetching

They evaluated the TSD infrastructure and ACCESS prototype, using an I/O intensive postmark benchmark and a CPU-intensive kernel compile. Postmark shows comparable performance to native ext2, with the majority of overhead due to CPU time, with increased system time due to the various bookkeeping their system requires. In the kernel compile, the overhead was also comparable to native ext2. The main overhead there is due to an asynchronous commit thread in their prototype that is affected by CPU timeslicing.

Michael Scott, University of Rochester: TSD is based strongly on the assumption that file systems use pointers to create a hierarchy, like the traditional Unix file system. How will TSD work w/ a file system which does something completely different -- say something very simple like storing file data in a chain of blocks interconnected by pointers? also how to handle file systems that create pointers from a file to the blocks belonging to other files? say based on cylinder group or something?

A: We can support file systems that do not follow the general hierarchial pointer layout. For example, extent-based file systems. To support extent-based file systems all we need to do is to get the file system to create pointers from the pointer block for an extent to all blocks belonging to that extent. Basically, the pointers maintained by TSDs can be different from that maintained by the file system in its own metadata. Similarly, we can handle file systems that create pointers from a file to a block belonging to another file, by just not forwarding that creation to the disk maintained pointer information.

Margo Seltzer, Harvard University: How is this different than semantically smart disks? You said in that case, one has to supply disks with higher-level info. But aren't you encoding that info in the API calls from FS to disk?
A: SSDs have to do a lot of operations at disk level to infer this info, but in our system we just communicate this info through the API to save that work.
Q: So your answer is they are same but implementation cost is different?
A: No. (laughter)

Jay, Microsoft Research (sorry I missed last name!): If you don't trust the OS that can read application memory, where do you store private keys?
A: Good point--OS just passes keys down from apps to disk.
Q: But if OS is compromised it sees keys as they pass down. Decided to take discussion offline.

Emin Gun sirer, Cornell: Historicaly, disks expose a simple interface. You've added more and more to this interface. How did you decide this interface was sufficient? WHy not just move whole FS to disk?
A: If the FS were inside disk, then how do you run a database that accesses the disk directly? Our argument is there is a need to communicate generic data to disk.
Q: Do you argue this is the minimal interface?
A: Perhaps not most minimal.......

Chad Verbowski, MSR: How do you keep parity blocks properly in a RAID system?
A: Usually an intermediate layer routes calls to different disks. Each TSD has all its metadata blocks replicated in the RAID. The pointers each TSD has only correspond to the data blocks they contain. To compute parity we must simply consider that metadata is replicated across all disks, so don't need to worry about parity. Data blocks are handled as normal. If you want to use TSDs, all layers of the software MUST be modified (including the software RAID).

4:37 PM  

Post a Comment

<< Home