Tuesday, October 31, 2006

Paper: "HQ Replication: A Hybrid Quorum Protocol for Byzantine Fault Tolerance"

HQ Replication: A Hybrid Quorum Protocol for Byzantine Fault Tolerance
James Cowling, Daniel Myers, and Barbara Liskov, MIT CSAIL; Rodrigo Rodrigues, INESC-ID and Instituto Superior T├ęcnico; Liuba Shrira, Brandeis University


There are currently two approaches to providing Byzantine-fault-tolerant state machine replication: a replica-based approach, e.g., BFT, that uses communication between replicas to agree on a proposed ordering of requests, and a quorum-based approach, such as Q/U, in which clients contact replicas directly to optimistically execute operations. Both approaches have shortcomings: the quadratic cost of inter-replica communication is unnecessary when there is no contention, and Q/U requires a large number of replicas and performs poorly under contention.

We present HQ, a hybrid Byzantine-fault-tolerant state machine replication protocol that overcomes these problems. HQ employs a lightweight quorum-based protocol when there is no contention, but uses BFT to resolve contention when it arises. Furthermore, HQ uses only 3f+1 replicas to tolerate f faults, providing optimal resilience to node failures.

We implemented a prototype of HQ, and we compare its performance to BFT and Q/U analytically and experimentally. Additionally, in this work we use a new implementation of BFT designed to scale as the number of faults increases. Our results show that both HQ and our new implementation of BFT scale as f increases; additionally our hybrid approach of using BFT to handle contention works well.


Blogger Anthony Nicholson said...

Official scribe comments:

The authors address the problem of building reliable client-server distributed systems. They note that the current state of the art either requires a small number of replicas (3f+1) but has high communication overhead, or reduces communication complexity by requiring a much larger number of replicas (5f+1). The second case still suffers from degraded performance in cases of write contention, however. They propose a hybrid scheme that combines the best aspects of both schemes to achieve a low number of replicas (3f+1) while bounding communication overhead.

Their system uses a two-phase write protocol. First, a client obtains a timestamp grant from each replica. This grant is essentially a promise to execute the given operation at a given sequence number, assuming agreement from a quorum of replicas. In the second phase, the client forms a certificate from 2f+1 matching grants, and send this certificate to all the replicas, which then complete the write operation. A certificate proves that a quorum of replicas have agreed to a given ordering of operations. Importantly, the existence of a certificate precludes existence of conflicting certificate. Replicas are forbidden to have two outstanding grants in progress, and return the currently outstanding grant to clients while a grant in progress, as proof that it is busy. The authors have deployed both their Hybrid Quorum (HQ) and BFT prototypes on Emulab. Their results show that HQ performs better than BFT up until around 25% contention.

Atul Adya from Microsoft Research noted that their protocol allows clients to commit operations on behalf of other clients, and asked if this was a security hole. James noted that, since certificates are free-standing and cryptographically signed, any client can send a certificate to a replica and it will commit faithfully on behalf of the originating client. Bill Blaskey (also from MSR) noted that commit could be painful because potentially thousands of operations occur per second. James noted that all data lives completely in RAM in their experiments, and a reboot is considered a "failure". Petros Maniatis from Intel Research asked why the authors chose to do a two-phase protocol with 3f+1 replicas, rather than a one-phase protocol with 5f+1 replicas. James noted that they could have done so, but decided that 5f+1 is just too many. The last question asked if colluding replicas would just always tell clients that they had an outstanding grant, to force the slow path of the protocol. James responded that yes, in the worst case this would happen.

8:10 PM  

Post a Comment

<< Home