Tuesday, October 31, 2006

Paper: "Distributed Directory Service in the Farsite File System"

Distributed Directory Service in the Farsite File System
John R. Douceur and Jon Howell, Microsoft Research


We present the design, implementation, and evaluation of a fully distributed directory service for Farsite, a logically centralized file system that is physically implemented on a loosely coupled network of desktop computers. Prior to this work, the Farsite system included distributed mechanisms for file content but centralized mechanisms for file metadata. Our distributed directory service introduces tree-structured file identifiers that support dynamically partitioning metadata at arbitrary granularity, recursive path leases for scalably maintaining name-space consistency, and a protocol for consistently performing operations on files managed by separate machines. It also mitigates metadata hotspots via file-field leases and the new mechanism of disjunctive leases. We experimentally show that Farsite can dynamically partition file-system metadata while maintaining full file-system semantics.


Blogger Leonid Ryzhyk said...

Talk summary

Farsite is a distributed serverless file system for networks of workstation. The focus of the current talk was on the design and implementation of a distributed metadata service for Farsite. The design goals included support for fully functional directory move operation including moves across partitions, support for atomic moves, support for load balancing, and hotspot mitigation.

The authors observe that the conventional approach to metadata partitioning based on path names precludes load balancing. Instead they suggest implementing partitioning based on hierarchical immutable file identifiers. Since the identifier hierarchy is not tied to the path hierarchy, load balancing and directory (re)naming become completely orthogonal. File identifiers are compactly represented using a variant of the Elias y' coding. Efficient lookup is achieved by storing the file map in a data structure similar to the Lampson prefix table.

In order to implement atomic rename operations, the recursive path leases mechanism is introduced. It enables safe locking of the chain of file identifiers from the file-system root to a file, without putting excessive pressure on the root server. Finally, the Farsite metadata service minimises false sharing by implementing a fine-grained locking scheme, where a client acquires locks for individual file metadata fields required for the requested access mode, rather than the entire file.


Q: Does the repeated load rebalancing have an impact on the efficiency?
A: Yes, one of the design decisions we made was, when we delegate load we can never coalesce it again. In practice it seems to work well.

Q: Why aren't we already all using Farsite-like things on our mostly empty disks? Is there a drawback to this approach?
A: The greatest limitation is that Byzantine fault tolerance depends on assumptions about how many machines may fail, but if they are all running homogeneous software and have a similar vulnerability then they can all fail/misbehave in the same way.

Q: Suppose we adopt a less pure approach and put some stuff inside protected infrastructure. How does that change things?
A: Without having to care about Byzantine fault tolerance, things would be much simpler. However, metadata load is a huge factor, which we would like to distribute among multiple machines.

6:08 PM  

Post a Comment

<< Home