Paper: "Distributed Directory Service in the Farsite File System"
Distributed Directory Service in the Farsite File System
John R. Douceur and Jon Howell, Microsoft Research
This is an open forum for discussion of papers presented at OSDI 2006. Please add your comments to these postings. We invite comments from anyone who has read the paper or heard the presentation; please note that the papers themselves are not available for free online access until 2007.
Distributed Directory Service in the Farsite File System
John R. Douceur and Jon Howell, Microsoft Research
1 Comments:
Talk summary
============
Farsite is a distributed serverless file system for networks of workstation. The focus of the current talk was on the design and implementation of a distributed metadata service for Farsite. The design goals included support for fully functional directory move operation including moves across partitions, support for atomic moves, support for load balancing, and hotspot mitigation.
The authors observe that the conventional approach to metadata partitioning based on path names precludes load balancing. Instead they suggest implementing partitioning based on hierarchical immutable file identifiers. Since the identifier hierarchy is not tied to the path hierarchy, load balancing and directory (re)naming become completely orthogonal. File identifiers are compactly represented using a variant of the Elias y' coding. Efficient lookup is achieved by storing the file map in a data structure similar to the Lampson prefix table.
In order to implement atomic rename operations, the recursive path leases mechanism is introduced. It enables safe locking of the chain of file identifiers from the file-system root to a file, without putting excessive pressure on the root server. Finally, the Farsite metadata service minimises false sharing by implementing a fine-grained locking scheme, where a client acquires locks for individual file metadata fields required for the requested access mode, rather than the entire file.
Q&A
===
Q: Does the repeated load rebalancing have an impact on the efficiency?
A: Yes, one of the design decisions we made was, when we delegate load we can never coalesce it again. In practice it seems to work well.
Q: Why aren't we already all using Farsite-like things on our mostly empty disks? Is there a drawback to this approach?
A: The greatest limitation is that Byzantine fault tolerance depends on assumptions about how many machines may fail, but if they are all running homogeneous software and have a similar vulnerability then they can all fail/misbehave in the same way.
Q: Suppose we adopt a less pure approach and put some stuff inside protected infrastructure. How does that change things?
A: Without having to care about Byzantine fault tolerance, things would be much simpler. However, metadata load is a huge factor, which we would like to distribute among multiple machines.
Post a Comment
<< Home