HDFS RBF Resource Isolation Design.

CCNA 200-301

CCNA 200-301

CCNP Enterprise

CCNP Enterprise

CCNP Security

CCNP Security

CCIE Enterprise Lab

CCIE Enterprise Lab

CCIE Security Lab

CCIE Security Lab

CCNP Service Provider

CCNP Service Provider

CCNP Data Center

CCNP Data Center

CCNP Collaboration

CCNP Collaboration

CCIE DC Lab

CCIE DC Lab

ic_r
ic_l
HDFS RBF Resource Isolation Design.
images

As science and technology develops, people have cost their eyes into the HDFS RBF. SPOTO is a leader in the IT industry. Recently, Cisco has a big change in the certification system. SPOTO has updated the latest news so that you can take the Cisco certification.

Foreword

The Hadoop community implements a route-based federation function in HDFS-10467. This feature has been greatly improved over the traditional HDFS federation+viewfs approach. It really implements back-end based route mapping instead of views on the client side. Address resolution forwarding. Based on the backend, the mount mapping management right behind it belongs to the system administrator. More importantly, when you want to replace the mount information, you don't need to push the updated mapping file to all clients, but you can use the simple administrator command to operate the RBF mount table information. As a metaphor, the Router in RBF is an intermediate role between the downstream NN and the client. In this mode, the Router becomes a key role, and its response speed will affect the request access of all clients. However, the Router here is not stable enough. For example, the downstream NN responds slowly, and then it will be back pressured to the Router (the Router shares a thread pool for call execution). This way other normal NN requests will be blocked, so there will be potential resource isolation issues. In this article, I will briefly talk about the current design of the community. The main principle is to control the fairness of the client at the Router Client level, which is not the same as the current Fairoopqueue of Hadoop.

RBF resource isolation and FCQ resource isolation

When we are often talking about resource isolation, we will inevitably mention the issue of fairness. Because there is no good isolation, fairness can't be guaranteed, and FCQ (Fair Callqueue) does this. But note that fairness here does not mean fairness in absolute terms. It can only be said to be relatively fair, a kind of fairness according to a certain degree we give.

However, the types of resource isolation issues between FCQ and RBF are still different. FCQ is aimed at the impact of users in a single cluster, and in RBF, this so-called "user" becomes the NN of each cluster. Also in FCQ, we can get the details of the user request, and in RBF, we can only get the most basic NN information.

Let's take a look at the schematic model diagram of FCQ:

At present, FCQ's fairness strategy is relatively intelligent, including defining priorities according to request frequency, then defining weights according to priorities, and finally performing weighted round-robin scheduling processing. Within each iteration, each type of request is guaranteed to be processed.

Compared with FCQ's fine-grained control at the user level, RBF's fairness control can be implemented in the initial implementation as long as the permitted control is required for each cluster. For example, in a unit of time, I allow the Router to dispatch 6 requests to the A cluster, while the B cluster only allows 4 requests. By limiting the number of allowed requests, the downstream NN overload is prevented from causing the Router to slow down.

The following is an expected rendering of RBF fairness:

Through the intervention of RBF fairness manager, the handler is proportionally divided into different namespaces at the logical level. Note that the author refers to the logical level here. It is not true that RBF modifies the handler processing in NN Server.

RBF's principle of fairness control

The following SPOTO to talk about RBF's fairness control principle. Fairness control is done in the Router service, more specifically at the Router Client level. The Router Client here can be understood as a Proxy that the Router requests to turn to the actual NN.

In the implementation of the community, the FairManager object is introduced, and the FairManager contains the control strategy of fairness. A current default strategy is to limit the request by setting a semaphore object Semaphore for each namespace.

At the time of initialization, the Semaphore is set by reading the number of permissions specified for each ns in the configuration.

When the client requests a call, the fairness policy will perform the permit operation in the corresponding s Semaphore. If the requested resource for the ns is not completely released, the Exception is thrown. This behavior achieves the effect of perfect resource control, and the purpose of the isolation is to achieve the isolation within the Router. After the request is completed, the releasePermit method is called.

The flow chart of RBF's fairness control principle is as follows:

Of course, at present, the control strategy of the early fairness of RBF is compared with FCQ, and it mainly lacks some flexibility through the control of the pre-defined configuration. Subsequent improvements can be made in the direction of dynamically adjusting the permit value and implementing a new policy.