CN104168332A - Load balance and node state monitoring method in high performance computing - Google Patents

Load balance and node state monitoring method in high performance computing Download PDF

Info

Publication number
CN104168332A
CN104168332A CN201410440328.5A CN201410440328A CN104168332A CN 104168332 A CN104168332 A CN 104168332A CN 201410440328 A CN201410440328 A CN 201410440328A CN 104168332 A CN104168332 A CN 104168332A
Authority
CN
China
Prior art keywords
node
load
performance
server
supervising
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410440328.5A
Other languages
Chinese (zh)
Inventor
杨漾
张若曦
刘文彬
苏凯
董召杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Center of Guangdong Power Grid Co Ltd
Original Assignee
Information Center of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Center of Guangdong Power Grid Co Ltd filed Critical Information Center of Guangdong Power Grid Co Ltd
Priority to CN201410440328.5A priority Critical patent/CN104168332A/en
Publication of CN104168332A publication Critical patent/CN104168332A/en
Pending legal-status Critical Current

Links

Abstract

The invention relates to a load balance and node state monitoring method in high performance computing. The method is suitable for reducing resources which occupy a clustering system, increasing the utilization rate of the resources, effectively improving the performance of server clustering and providing high-quality service for a user. The method concretely comprises the steps that firstly, according to the operation state and load and performance parameters of server nodes, computing is performed to obtain load weights of all the nodes, and an alternative node set for next task distribution is selected; afterwards, according to load difference values and distribution probabilities, the probabilities of all node distribution tasks in the alternative node set are computed, and new requests are distributed to the selected nodes in a random probability distribution mode to solve the problem of uneven load distribution; finally, based on a load correction formula, node loads of the distribution tasks are corrected to improve the load balance effect and improve the reliability and stability of the cluster system.

Description

Load balancing and node state method for supervising in high-performance calculation
Technical field
The present invention relates to the field of computer network load balance process, especially load balancing and node state method for supervising in high-performance calculation.
Background technology
Along with the sharp increase of the day by day universal and Internet service of network application, no matter at enterprise network, Campus Networks or at wide area network, the development of traffic carrying capacity has all exceeded estimation in the past.Enterprise is more and more stronger to the dependence of network, also increasing to having the application demand of distributed system of extensibility and reliability.In the time that enterprise provides Web service for user, along with the quick increase of number of visitors, the webserver need to possess the ability that a large amount of Concurrency Access services are provided.Its data traffic and calculating strength are big, and single equipment cannot be born at all; On the other hand, how between the multiple network equipments that complete said function, realizing rational traffic carrying capacity and distribute, make it to be unlikely to occur one equipment is excessively busy and other equipment is not given full play to the situation of disposal ability, is also problem in the urgent need to address.Load-balancing mechanism is exactly to produce in this case.The response time of enterprise to network system and the content that provides, reliability of service, instantaneity etc. require also more and more higher, and the system that supports whole website with separate unit server cannot have been met customer need, the substitute is one group of server zone.
Load balancing (ServerLoadBalance) forms a cluster of servers by multiple servers with symmetric mode, and every station server all has par, and service all can be externally provided separately.By specific load-balancing technique, concentrate the load state on each server to be reasonably allocated on certain station server according to server zone external request, take this significantly to improve the speed of obtaining data, improve the disposed of in its entirety ability of server, solve massive concurrent access problem, and improve reliability, availability, maintainability, final purpose is to accelerate the response speed of server, thereby improves user's Experience Degree.This cluster technology can obtain the performance close to mainframe with minimum investment.
A kind of solution of reasonable is to adopt server cluster loading balancing technique.Server cluster refers to a lot of servers connected into an entirety by local area network (LAN), and forming a group of planes externally provides service, in client just as traditional individual server.Group system can make full use of existing various resource, effectively strengthens the disposal ability of network data, improves the integrity service performance of system, makes system more reliable and more stable, but the problem of load balancing of settlement server cluster is to improve the key point of cluster performance.
Load-balancing technique is mainly used to improve scalability and the availability of the service routine of carrying out mission critical on server, and it is divided into conventionally according to the difference of implementation method, and hardware is realized and software is realized two kinds.Disposal ability and the load performance of the hard-wired system of load balancing are stronger, but expensive; Although integrated multiple load-balancing algorithm, flexibility is not strong, does not support the load balancing more optimized and more complicated application protocol, and it just judges data traffic from network layer in addition, cannot effectively grasp the state of server and application.Software
Realization can distribute load according to the situation of system and application better, flexibility is large, and cost performance is high, and is easy to upgrade up-to-date, outstanding load balancing, but load-balancing algorithm can affect to the performance of server, so the complexity of algorithm is had relatively high expectations.
In software is realized, the core component of realizing cluster load balancing is load-balancing algorithm, and according to the difference of design philosophy, it is mainly divided into static equilibrium algorithm and dynamic load balancing algorithm two classes.The probability assignments task of the statistical information that static equilibrium algorithm just utilizes cluster to fix, and the ruuning situation of taking into account system reality not, load effect is very undesirable; Dynamic load balancing algorithm carrys out the load condition of evaluating system by gathering the real time execution information of cluster, and then the distribution of the task of adjustment, avoids run-off the straight under system long-play.Experiment shows, dynamic load balancing algorithm can be obtained than the better performance of static equilibrium algorithm, even in extreme situation, dynamic load balancing algorithm still can be obtained more satisfactory performance.Generally, with respect to for static equilibrium algorithm, dynamic algorithm can obtain 30% ~ 40% performance improvement, and along with the research to Dynamic Load-balancing Algorithm, dynamic equalizing technology will replace static equilibrium technology.
But, how to obtain more easily system operation information, reduce the resource consumption of interbehavior as far as possible and make great efforts to reduce the impact that dynamic load balancing algorithm itself produces group system and will become the problem of the necessary solution of dynamic load equilibrium technology institute, therefore be necessary to find a kind of load-balancing algorithm of more optimizing, the alap resource that takies group system, improve the utilance of resource, effectively promote the performance of server cluster, for user provides high-quality service.
Summary of the invention
The primary technical problem solving of the present invention is to provide load balancing and node state method for supervising in a kind of high-performance calculation, it is suitable for the resource that takies group system reducing, and improve the utilance of resource, effectively promote the performance of server cluster, for user provides high-quality service.
In order to solve the problems of the technologies described above, load balancing and node state method for supervising in high-performance calculation provided by the invention, it comprises: the load, the performance information that use server node, the request of group system is evenly distributed on each node, and the load that each node is shared is directly proportional to its performance, and then the resource of system is fully used, farthest improve the performance of cluster.
Specifically: according to the running status of server node and load thereof, performance parameter, by calculating the load weights of each node, and alternative node set while choosing next allocating task; Calculate the probability of each node allocating task in alternative node set according to load difference, allocation probability, and the mode that uses random chance to distribute, new request is assigned on the node of selection, distribute uneven problem to solve load; Finally utilize load correction formula to revise the node load of allocating task, to improve load balancing effect, strengthen the reliability and stability of group system.
Load-balancing algorithm based on load weights is described below:
1. set a threshold epsilon;
Whether 2. new request of every arrival, need the state of update server node according to timer inspection, if desired upgrade;
3. according to the performance C (Si) of the running status computing node of node and load L (Si), and according to its result computational load weights W (Si) and load difference Δ L (Si);
4. choose candidate allocation node set J according to the load weights of each node; First choose node Sm, it satisfies condition: W (Sm)=min{W (Si) }, i=0,1 ..., n-1;
If other arbitrary nodes Si meets the following conditions: W (Si) <W (Sm)+ε, i=0,1 ..., n-1;
Node Si is joined in set J;
5. the probability P (Si) that in calculated candidate distribution node set J, each node load distributes;
The method of 6. distributing according to random chance is selected suitable node from set J, and task is assigned on this node;
7. revise the load of selected node, during for lower sub-distribution request.
The performance C (Si) that carrys out evaluation node Si from CPU quantity n, cpu frequency C (Ci), magnetic disc i/o speed C (Di), memory size C (Mi), network bandwidth C (Ni) index of server node, uses following formula (8.1) to calculate:
….(8.1)
Wherein, k is the weighting parameter of indices, reflects the influence degree of each index to server node performance.
Carry out the load L (Si) of evaluation node Si from CPU usage L (Ci), memory usage L (Mi), magnetic disc i/o utilization rate L (Di), network bandwidth utilization rate L (Ni) index, and use following formula (8.2) to calculate:
……(8.2);
R is the weighting parameter of indices, reflects the influence degree of each index to dissimilar service.
The load weights W (Si) of node is defined as the ratio of node load L (Si) and joint behavior C (Si), adopts formula (8.3) to calculate;
……(8.3)。
The load difference Δ L (Si) of so-called node, the difference that is defined as the maximum WMAX of all node load weights and the load weights of this node is multiplied by this joint behavior, adopts formula (8.4) to calculate:
……(8.4)。
The load difference Δ L (Si) of node accounts for the load allocation probability P (Si) that the percentage of all node load difference sums is node, adopts formula (8.5) to calculate:
……(8.5)
The allocation probability of node in set of computations J, therefrom chooses suitable node and distributes load; Node allocation probability adopts formula (8.6) to calculate:
……(8.6)。
Incremental loading δ refers to that a request of certain COS is assigned on certain server node, the load capacity that this node is increased, adopt formula to be fitted on certain server node, the load capacity that this node is increased, adopt formula to be fitted on certain server node, the load capacity that this node is increased, adopts formula to be fitted on certain server node, the load capacity that this node is increased, adopts formula: δ=L (S)/N ... (8.7)
Wherein, N is the number of request of this COS on this node, the load that L (S) brings node for N request.
Adopt formula (8.8) to calculate to the correction of server node load, wherein δ is incremental loading value, the performance of the node that C (S) uses while being computational load increment, L (Si) and C (Si) are respectively load and the performance of this node; (8.8).
With respect to prior art, the technique effect that the present invention has is: load balancing and node state method for supervising in the high-performance calculation in the present invention, on the basis of several load-balancing algorithms of comprehensive prior art, adopt the dynamic feedback of load equalization algorithm based on load weights, mainly utilize the factor such as disposal ability and actual loading of node, the concepts such as load weights and load difference have been proposed, and instruct the distribution of task with this, make server node work according to his ability as far as possible, give full play to the advantage of group system, ensure the stability of a system, improve reliabilty and availability.The main information such as load, performance of using server node, through relevant computing and method, by the request of group system as far as possible evenly, fair, be reasonably assigned on each node, guarantee that the load that each node is shared is directly proportional to its performance, the resource of system is fully used, farthest improves the performance of cluster.
Brief description of the drawings
In order to clearly demonstrate innovative principle of the present invention and the technical advantage than existing product thereof, by applying the limiting examples of described principle, possible embodiment is described by means of accompanying drawing below.In the drawings:
Fig. 1 is the probability space distribution map of candidate allocation node of the present invention;
Fig. 2 is dynamical feedback illustraton of model of the present invention.
Embodiment
Load balancing and node state method for supervising in high-performance calculation of the present invention, adopt the dynamic feedback of load equalization algorithm based on load weights, mainly utilize the factor such as disposal ability and actual loading of node, the concepts such as load weights and load difference have been proposed, and the distribution of instructing task with this, make server node work according to his ability as far as possible, give full play to the advantage of group system, ensure the stability of a system, improve reliabilty and availability.
Described dynamic feedback of load equalization algorithm mainly uses the information such as load, performance of server node, through relevant computing and method, by the request of group system as far as possible evenly, fair, be reasonably assigned on each node, guarantee that the load that each node is shared is directly proportional to its performance, the resource of system is fully used, farthest improves the performance of cluster.
The core concept of dynamic feedback of load equalization algorithm is: consider the parameter such as running status and load, performance of server node, and by calculating the load weights of each node, and alternative node set while choosing next allocating task; Calculate the probability of each node allocating task in alternative node set according to the computing formula such as load difference, allocation probability, and the mode that uses random chance to distribute, new request is assigned on the node of selection, solve preferably load and distribute uneven problem; Finally utilize load correction formula to revise the node load of allocating task, improve as far as possible load balancing effect, strengthen the reliability and stability of group system.
Load-balancing algorithm based on load weights is described below:
1. set a threshold epsilon (needing to make clear the scope of the value that this threshold epsilon chooses, the relevant foundation of value size herein).
Whether 2. new request of every arrival, need the state of update server node according to timer inspection, if desired upgrade (need to make clear: the correlated condition of " needing the state of update server node " or according to) herein.
3. according to the performance C (Si) of the running status computing node of node and load L (Si), and according to its result computational load weights W (Si) and load difference Δ L (Si) etc.
4. choose candidate allocation node set J according to the load weights of each node.First choose node Sm, it satisfies condition: W (Sm)=min{W (Si) }, i=0,1 ..., n-1
If other arbitrary nodes Si meets the following conditions: W (Si) <W (Sm)+ε, i=0,1 ..., n-1
Node Si is joined in set J.
5. use the probability P (Si) that in formula (8.6) calculated candidate distribution node set J, each node load distributes.
The method of 6. distributing according to random chance is selected suitable node from set J, and task is assigned on this node.
7. use formula (8.8) to revise the load of selected node, during for lower sub-distribution request.
Dynamic feedback of load equalization algorithm is mainly started with from load and two aspects of performance of node, and its integrated application was mixed to tolerance and the dividing of load of node state, and therefore the load of node and the calculating of performance are particularly important.Node S=in group system (S1, S2 ..., Sn) and isomery often, the hardware configuration of server varies, and especially utilizes old equipment is set up or the cluster of expansion progressively, and its disposal ability difference is very large.In view of the difference on the each node hardware of group system, in the time considering the information such as load condition of node, can not put on an equal footing, otherwise the node of high configuration can be for a long time in idle condition, the node of low configuration can, because of the performance of the overweight reduction system of load, even there will be the machine phenomenon of delaying.
Therefore, the performance that we define server node is distinguished the disposal ability of different server node, divides timing also to treat according to the performance difference of different nodes in load, and to reach, able people should do more work, the effect of load balancing.The calculating of server node performance has several different methods, for example, can on different server, move identical calculation task, portrays the performance height of different nodes according to the needed time of finishing the work.This method specific aim is stronger, and the result accuracy obtaining is higher, but due to the certainty of calculation task, the joint behavior applicability that the method is calculated is not strong, and can not free adjustment.The most frequently used computational methods are the hardware configuration parameters according to server, obtain the performance of node by specific computational methods (as weighted sum etc.), the performance versatility that this method is calculated is stronger, and has very large adjusting space, is easy to upgrade in system running.
Due to the isomerism of cluster node, the different hardware parameter of server also has very big difference to the percentage contribution of performance, therefore adopt the mode of various hardware parameter weighted sums herein, the indexs such as main CPU quantity n, cpu frequency C (Ci) from server node, magnetic disc i/o speed C (Di), memory size C (Mi), network bandwidth C (Ni) are carried out the performance C (Si) of evaluation node Si, use formula (8.1) to calculate.
……(8.1)
Wherein, k is the weighting parameter of indices, reflect the influence degree of each index to server node performance, and the COS that server performance provides with server is relevant, the i.e. degree of dependence difference of different services to various indexs, such as hypertext transmission service (HTTPService) is mainly had relatively high expectations to arithmetic speed and the internal memory etc. of server central processing unit (CPU), and file transfer services (FTPService) lays particular emphasis on hard disk I/O, the network throughput etc. of server.
The indices of evaluating server node performance is relatively fixing, can often not change once hardware configuration is determined.But the load state of node can change constantly, be subject to the impact of various factors larger, be a key factor that affects group system load balancing.The final goal of group system is to adjust in time, exactly the distribution of group system flow according to the load state of node, realizes overall load balancing.Therefore, must formulate the standard of weighing load state, collect the load state of abundant load information computing node, and then instruct the reasonable distribution of group system task.
The method of computing node load state is varied, and the development that is accompanied by Dynamic Load-balancing Algorithm is gradually improved.People use number of processes in the group system evaluation index as server load the earliest, but present operating system is all generally the system of multi-user's multi-process, even if node is more idle, still likely move many processes, as finger daemon of the critical processes of system or other services etc.So, weigh node load situation with process number and be inaccurate.
More consistent algorithm is, according to the load state that the behaviour in service of the larger index of server node performance impact is carried out to evaluation node, if under the occasions such as special applications, also will consider the impact of other specific factors at present.Can adopt such computational methods, and the index of choosing with computing node performance is consistent, mainly carry out the load L (Si) of evaluation node Si from indexs such as CPU usage L (Ci), memory usage L (Mi), magnetic disc i/o utilization rate L (Di), network bandwidth utilization rate L (Ni), use formula (8.2) to calculate.
……(8.2)
Identical with the computing formula (8.1) of joint behavior, r is the weighting parameter of indices, reflect the influence degree of each index to dissimilar service, and weighting parameter is relevant with COS, the weighting parameter that different COS is chosen is not identical yet.In addition, this parameter is not in full accord with k, and in actual application process, can regulate according to running situation, to reaching better load balancing effect.
Load weights and load difference:
We have defined load and the method for evaluating performance of server, thereby can obtain load state and the performance etc. of node, how to utilize the distribution of these information guiding group system flows just to become one of algorithm urgent problem.The simplest method is according to the node load and the performance information that calculate, directly selects suitable node to distribute new arriving of task.As filtered out the less part of nodes of load according to the load state of node, and then calculate according to the performance of node the request arriving that makes new advances and be assigned to the probability on these nodes, a node is selected in the specific system of selection of last basis, and task is distributed on this node.This Method And Principle is simple, and be easy to realize, but its performance is not too high, because the load state of node is relevant with its performance, if two node load states that for example performance is different are identical, its remaining disposal ability is not identical, so can not only determine the busy-idle condition of server according to the load state of node.
Therefore, for task more balanced, that reasonably distribute group system, the concept of node load weights be can introduce, the integrated load of computing node, the busy of response service device node more exactly come with this.The load weights W (Si) of node is defined as the ratio of node load L (Si) and joint behavior C (Si), adopts formula (8.3) to calculate.Weights are larger, illustrate that node is more busy, and its rest processing capacity is fewer; Otherwise node is more idle, rest processing capacity is stronger.Therefore, can decide according to the size of load weights the busy of server, understand the size of its rest processing capacity, and then instruct the distribution of task.
……(8.3)
Load weights have reacted node busy preferably, make us have a clear understanding of the load state of each node, but only from the result of load weights, we cannot learn the busy degree of server node.In order to represent more accurately node busy degree, the scientifically size of computing node rest processing capacity, has introduced the concept of load difference.The load difference Δ L (Si) of so-called node, the difference that is defined as the maximum WMAX of all node load weights and the load weights of this node is multiplied by this joint behavior, adopts formula (4.4) to calculate.
……(8.4)
From definition, the load difference of the most busy server node is zero.Load difference has been reacted the size of each node rest processing capacity, in the time that computational load is assigned to the probability on node, can use the indexs such as load difference to calculate more accurately.
Allocation probability calculates and random chance is distributed:
Probability assignments is more fair distribution method, for the load of distribution system more equably, first needs the allocation probability of server node to calculate, and then selects suitable node according to probability and distribution method separately.More reasonable for load is distributed, the load difference Δ L (Si) of our defined node accounts for the load allocation probability P (Si) that the percentage of all node load difference sums is node, adopts formula (8.5) to calculate.
……(8.5)
But in actual applications, the scale of group system is generally huger, server node is more, in order to reduce the consumption of dispensed probability to system resource, and increasing the science of distributing, is not to calculate for all nodes, but select the lower node of fractional load weights, the set J of composition candidate allocation node, the allocation probability of node in set of computations J, therefrom chooses suitable node and distributes load.Improved node allocation probability adopts formula (8.6) to calculate.
……(8.6)
Select the method for distribution node a lot of according to the allocation probability P (Si) of each node in set J, modal is to select the node of maximum probability to distribute, but this likely causes the load of this node sharply to increase, then select probability time large node to distribute, go down to cause the inhomogeneous of distribution with this.The stationarity of distributing for fear of load, the mode that we adopt random chance to distribute is selected the node distributing.
Random chance is distributed, and refers in simple terms the node of determining allocating task according to the random number of the allocation probability of node and the random generation of system.In Assumption set J, the number of node is n, and the allocation probability of each node Si is P (Si), can be obtained by formula (8.6), and the allocation probability sum of n node equals 1, and the probability space of its composition distributes as shown in Figure 1.
In the time that new request arrives, system generates one [0,1] interval random number, drop point according to this random number at the probability space of candidate allocation node, select corresponding server node to distribute this request, thereby reach the effect of Random assignment, the stationarity of avoiding task to distribute, the possibility of minimizing system run-off the straight.
Dynamical feedback and load correction:
The final goal of group system is that task is reasonably allocated on each server node, inquires about the state of each node when best bet is each allocating task, then selects suitable node.Although this method accuracy is the highest, can increase the expense of system, reduce the performance of group system, particularly when the peak period in request, the state of query node will increase the response time greatly.For this reason, group system can only periodically be upgraded the state information of each node, but within the system update cycle, and the actual loading situation of each server node can have deviation with the value that upgraded last time of recording on load divider.Therefore, within the update cycle, new task needs to revise in time the load of node after distributing, and ensures that deviation is not too large; Update cycle is while arrival, then upgrades by inquiry the record that load equalizer is preserved, and the mode of this " inquiry-upgrade-correction " is called dynamic state feedback mechanism.
The dynamical feedback model of the dynamic feedback of load equalization algorithm based on load weights as shown in Figure 2.Wherein, the load regulation module of F (L, C) reaction group system, it calculates load weights and the allocation probability etc. of node according to the load of node and performance, offer load divider and process the solicited message of client, completes the distribution of request.Server cluster has the server node of many isomeries to form, and be responsible for request that distributor is sent and respond, and periodically to the state information of load regulation module feedback node.
Due to load information that can not the each node of real-time query within the update cycle, must use good method to revise in time the load state of node, for this reason, we introduce the concept of incremental loading the load of node are predicted, guarantee that in the update cycle, system can run-off the straight.Incremental loading δ refers to that a request of certain COS is assigned on certain server node, the load capacity that this node is increased, adopt formula to be fitted on certain server node, the load capacity that this node is increased, adopt formula to be fitted on certain server node, the load capacity that this node is increased, adopts formula to be fitted on certain server node, the load capacity that this node is increased, adopts formula:
δ=L(S)/N……(8.7)
Wherein N is the number of request of this COS on this node, the load that L (S) brings node for N request.In order to improve the stability of system, incremental loading can dynamically be adjusted in system running.In the time of computational load increment, in order to reduce the impact of other factors, server node only provides the service of single type, and request number needs reaches certain magnitude, and adopt the method such as average of repeatedly measuring, guarantee the validity of the incremental loading calculating.
Obtained after the incremental loading of certain service, in the interval of update cycle, distributed the load of the node of new task will add increment size, the load of the node of finishing the work will deduct increment size, could react more really so the real-time load of each node.This method is fairly simple, also more satisfactory to the correction effect of node load, but the method needs the performance of the each node task of Real-Time Monitoring, or node itself is reported completing of its task to load dispatch device, this will increase the burden of service node, take the network bandwidth, increase the response time of group system.In order both to avoid the generation of this phenomenon, can revise the load information of node again, the situation that in real time detection node task completes of the dynamic feedback of load equalization algorithm based on load weights, but this is considered in the dynamic adjustment of incremental loading and is gone.
In the interval of update cycle, if the task that node completes more than newly assigned task, the load total amount of node will reduce; After the update cycle arrives, by inquiring about the more load of new node, compare the load of upgrading last time less so, revised incremental loading value also can reduce.In the next update cycle, this can make up the impact of node load not being revised when node is finished the work, and reaches the effect of same reduction server load.The feasibility hypothesis of this method is that the request of group system is slowly to change, and can not have situation fluctuated.In the time that the request of group system changes, the incremental loading of revising after next update node load changes thereupon, and can within next period, play good regulating action to the load of node, can make up preferably the not defect of Real-Time Monitoring node task performance, in reducing system consumption, reach same regulating effect.Therefore, adopt formula (8.8) to calculate to the correction of server node load, wherein δ is incremental loading value, the performance of the node that C (S) uses while being computational load increment, and L (Si) and C (Si) are respectively load and the performance of this node.
……(8.8)。
For the deficiency of traditional equalization algorithm, the information such as performance and load of the load-balancing algorithm comprehensive utilization server node based on load weights, by the calculating of load weights and allocation probability, instructs the distribution of task, improves load balancing effect; By the anti-locking system run-off the straight of load correction, improve the stability of system.
Obviously, above-described embodiment is only for example of the present invention is clearly described, and is not the restriction to embodiments of the present invention.For those of ordinary skill in the field, can also make other changes in different forms on the basis of the above description.Here without also giving exhaustive to all execution modes.And these belong to apparent variation that spirit of the present invention extended out or variation still among protection scope of the present invention.

Claims (10)

1. load balancing and a node state method for supervising in high-performance calculation, is characterized in that comprising:
A, according to the running status of server node and load thereof, performance parameter, by calculating the load weights of each node, and alternative node set while choosing next allocating task;
B, calculate the probability of each node allocating task in alternative node set according to load difference, allocation probability, and the mode that uses random chance to distribute, new request is assigned on the node of selection;
C, utilize load correction formula to revise the node load of allocating task.
2. load balancing and node state method for supervising in high-performance calculation according to claim 1, is characterized in that: the load-balancing algorithm based on load weights comprises:
1. set a threshold epsilon;
Whether 2. new request of every arrival, need the state of update server node according to timer inspection, if desired upgrade;
3. according to the performance C (Si) of the running status computing node of node and load L (Si), and according to its result computational load weights W (Si) and load difference Δ L (Si);
4. choose candidate allocation node set J according to the load weights of each node; First choose node Sm, it satisfies condition: W (Sm)=min{W (Si) }, i=0,1 ..., n-1;
If other arbitrary nodes Si meets the following conditions: W (Si) <W (Sm)+ε, i=0,1 ..., n-1;
Node Si is joined in set J;
5. the probability P (Si) that in calculated candidate distribution node set J, each node load distributes;
The method of 6. distributing according to random chance is selected suitable node from set J, and task is assigned on this node;
7. revise the load of selected node, during for lower sub-distribution request.
3. load balancing and node state method for supervising in high-performance calculation according to claim 2, it is characterized in that: carry out the performance C (Si) of evaluation node Si from CPU quantity n, cpu frequency C (Ci), magnetic disc i/o speed C (Di), memory size C (Mi), network bandwidth C (Ni) index of server node, use following formula (8.1) to calculate:
….(8.1)
Wherein, k is the weighting parameter of indices, reflects the influence degree of each index to server node performance.
4. load balancing and node state method for supervising in high-performance calculation according to claim 2, it is characterized in that: carry out the load L (Si) of evaluation node Si from CPU usage L (Ci), memory usage L (Mi), magnetic disc i/o utilization rate L (Di), network bandwidth utilization rate L (Ni) index, and use following formula (8.2) to calculate:
……(8.2)
R is the weighting parameter of indices, reflects the influence degree of each index to dissimilar service.
5. load balancing and node state method for supervising in high-performance calculation according to claim 2, it is characterized in that: the load weights W (Si) of node is defined as the ratio of node load L (Si) and joint behavior C (Si), adopt formula (8.3) to calculate;
……(8.3)。
6. load balancing and node state method for supervising in high-performance calculation according to claim 5, it is characterized in that: the load difference Δ L (Si) of so-called node, the difference that is defined as the maximum WMAX of all node load weights and the load weights of this node is multiplied by this joint behavior, adopts formula (8.4) to calculate:
……(8.4)。
7. load balancing and node state method for supervising in high-performance calculation according to claim 5, it is characterized in that: the load difference Δ L (Si) of node accounts for the load allocation probability P (Si) that the percentage of all node load difference sums is node, adopt formula (8.5) to calculate:
……(8.5)。
8. load balancing and node state method for supervising in high-performance calculation according to claim 5, is characterized in that: the allocation probability of node in set of computations J, and therefrom choose suitable node and distribute load; Node allocation probability adopts formula (8.6) to calculate:
……(8.6)。
9. load balancing and node state method for supervising in high-performance calculation according to claim 5, it is characterized in that: incremental loading δ refers to that a request of certain COS is assigned on certain server node, the load capacity that this node is increased, adopt formula to be fitted on certain server node, the load capacity that this node is increased, adopt formula to be fitted on certain server node, the load capacity that this node is increased, adopt formula to be fitted on certain server node, the load capacity that this node is increased, adopts formula: δ=L (S)/N ... (8.7)
Wherein, N is the number of request of this COS on this node, the load that L (S) brings node for N request.
10. load balancing and node state method for supervising in high-performance calculation according to claim 5, it is characterized in that: adopt formula (8.8) to calculate to the correction of server node load, wherein δ is incremental loading value, the performance of the node that C (S) uses while being computational load increment, L (Si) and C (Si) are respectively load and the performance of this node; (8.8).
CN201410440328.5A 2014-09-01 2014-09-01 Load balance and node state monitoring method in high performance computing Pending CN104168332A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410440328.5A CN104168332A (en) 2014-09-01 2014-09-01 Load balance and node state monitoring method in high performance computing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410440328.5A CN104168332A (en) 2014-09-01 2014-09-01 Load balance and node state monitoring method in high performance computing

Publications (1)

Publication Number Publication Date
CN104168332A true CN104168332A (en) 2014-11-26

Family

ID=51911953

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410440328.5A Pending CN104168332A (en) 2014-09-01 2014-09-01 Load balance and node state monitoring method in high performance computing

Country Status (1)

Country Link
CN (1) CN104168332A (en)

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104580476A (en) * 2015-01-13 2015-04-29 北京京东尚科信息技术有限公司 Method and device for selecting node in distributed system
CN104618743A (en) * 2014-12-30 2015-05-13 北京国双科技有限公司 Method, device and system for allocating code rate resource
CN105141541A (en) * 2015-09-23 2015-12-09 浪潮(北京)电子信息产业有限公司 Task-based dynamic load balancing scheduling method and device
CN105260245A (en) * 2015-11-04 2016-01-20 浪潮(北京)电子信息产业有限公司 Resource scheduling method and device
CN105389392A (en) * 2015-12-18 2016-03-09 浪潮(北京)电子信息产业有限公司 Metadata load statistical method and system
CN105491138A (en) * 2015-12-15 2016-04-13 国网智能电网研究院 Load rate based graded triggering distributed load scheduling method
CN105763636A (en) * 2016-04-15 2016-07-13 北京思特奇信息技术股份有限公司 Optimal host selection method and system in distributed system
CN106682980A (en) * 2017-01-18 2017-05-17 北京云知科技有限公司 Method for designing probability generator
CN107547650A (en) * 2017-08-29 2018-01-05 中国民航大学 Towards the improved weighted least-connection scheduling algorithm of SWIM systems
CN107707680A (en) * 2017-11-24 2018-02-16 北京永洪商智科技有限公司 A kind of distributed data load-balancing method and system based on node computing capability
CN107707612A (en) * 2017-08-10 2018-02-16 北京奇艺世纪科技有限公司 A kind of appraisal procedure and device of the resource utilization of load balancing cluster
CN107783860A (en) * 2016-08-31 2018-03-09 阿里巴巴集团控股有限公司 The recovery point objectives monitoring method and equipment of a kind of data transfer
CN108449394A (en) * 2018-03-05 2018-08-24 北京华夏电通科技有限公司 A kind of dispatching method of data file, dispatch server and storage medium
CN109343942A (en) * 2018-09-03 2019-02-15 北京邮电大学 Method for scheduling task based on edge calculations network
CN109426646A (en) * 2017-08-30 2019-03-05 英特尔公司 For forming the technology of managed node based on telemetry
CN109542586A (en) * 2018-11-19 2019-03-29 郑州云海信息技术有限公司 A kind of node resource state update method and system
CN109614228A (en) * 2018-11-27 2019-04-12 南京轨道交通系统工程有限公司 Comprehensively monitoring front-end system and working method based on dynamic load leveling mode
CN109711526A (en) * 2018-12-20 2019-05-03 广东工业大学 Server cluster dispatching method based on SVM and ant group algorithm
CN110113399A (en) * 2019-04-24 2019-08-09 华为技术有限公司 Load balancing management method and relevant apparatus
CN110505109A (en) * 2018-05-17 2019-11-26 阿里巴巴集团控股有限公司 The method, apparatus and storage medium of test macro isolation performance
CN110545450A (en) * 2019-09-09 2019-12-06 深圳市网心科技有限公司 Node distribution method, system, electronic equipment and storage medium
CN110928676A (en) * 2019-07-18 2020-03-27 国网浙江省电力有限公司衢州供电公司 Power CPS load distribution method based on performance evaluation
WO2020062277A1 (en) * 2018-09-30 2020-04-02 华为技术有限公司 Management method and apparatus for computing resources in data pre-processing phase of neural network
CN111049919A (en) * 2019-12-19 2020-04-21 上海米哈游天命科技有限公司 User request processing method, device, equipment and storage medium
CN111459677A (en) * 2020-04-01 2020-07-28 北京顺达同行科技有限公司 Request distribution method and device, computer equipment and storage medium
CN111597041A (en) * 2020-04-27 2020-08-28 深圳市金证科技股份有限公司 Calling method and device of distributed system, terminal equipment and server
CN111897816A (en) * 2020-07-16 2020-11-06 中国科学院上海微系统与信息技术研究所 Interactive method for computing information between satellites and generation method of information table applied by interactive method
WO2021052199A1 (en) * 2019-09-18 2021-03-25 中兴通讯股份有限公司 Server load balancing method and apparatus, and cdn node
CN113329067A (en) * 2021-05-21 2021-08-31 广州爱浦路网络技术有限公司 Edge computing node load distribution method, core network, device and storage medium
CN113992691A (en) * 2021-12-24 2022-01-28 苏州浪潮智能科技有限公司 Method, device and equipment for distributing edge computing resources and storage medium
CN114079656A (en) * 2022-01-19 2022-02-22 之江实验室 Probability-based load balancing method and device, electronic equipment and storage medium
CN114584565A (en) * 2020-12-01 2022-06-03 中移(苏州)软件技术有限公司 Application protection method and system, electronic equipment and storage medium
CN115174583A (en) * 2022-06-28 2022-10-11 福州大学 Server load balancing method based on programmable data plane
CN116382892A (en) * 2023-02-08 2023-07-04 深圳市融聚汇信息科技有限公司 Load balancing method and device based on multi-cloud fusion and cloud service

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103327072A (en) * 2013-05-22 2013-09-25 中国科学院微电子研究所 Method for cluster load balance and system thereof

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103327072A (en) * 2013-05-22 2013-09-25 中国科学院微电子研究所 Method for cluster load balance and system thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张玉芳 等: "基于负载权值的负载均衡算法", 《计算机应用研究》 *

Cited By (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104618743A (en) * 2014-12-30 2015-05-13 北京国双科技有限公司 Method, device and system for allocating code rate resource
CN104580476A (en) * 2015-01-13 2015-04-29 北京京东尚科信息技术有限公司 Method and device for selecting node in distributed system
CN104580476B (en) * 2015-01-13 2018-09-14 北京京东尚科信息技术有限公司 The method and apparatus for choosing node in a distributed system
CN105141541A (en) * 2015-09-23 2015-12-09 浪潮(北京)电子信息产业有限公司 Task-based dynamic load balancing scheduling method and device
CN105260245A (en) * 2015-11-04 2016-01-20 浪潮(北京)电子信息产业有限公司 Resource scheduling method and device
CN105260245B (en) * 2015-11-04 2018-11-13 浪潮(北京)电子信息产业有限公司 A kind of resource regulating method and device
CN105491138B (en) * 2015-12-15 2020-01-24 国网智能电网研究院 Distributed load scheduling method based on load rate graded triggering
CN105491138A (en) * 2015-12-15 2016-04-13 国网智能电网研究院 Load rate based graded triggering distributed load scheduling method
CN105389392A (en) * 2015-12-18 2016-03-09 浪潮(北京)电子信息产业有限公司 Metadata load statistical method and system
CN105763636A (en) * 2016-04-15 2016-07-13 北京思特奇信息技术股份有限公司 Optimal host selection method and system in distributed system
CN105763636B (en) * 2016-04-15 2019-01-15 北京思特奇信息技术股份有限公司 The selection method and system of optimal host in a kind of distributed system
CN107783860A (en) * 2016-08-31 2018-03-09 阿里巴巴集团控股有限公司 The recovery point objectives monitoring method and equipment of a kind of data transfer
CN106682980A (en) * 2017-01-18 2017-05-17 北京云知科技有限公司 Method for designing probability generator
CN107707612A (en) * 2017-08-10 2018-02-16 北京奇艺世纪科技有限公司 A kind of appraisal procedure and device of the resource utilization of load balancing cluster
CN107547650A (en) * 2017-08-29 2018-01-05 中国民航大学 Towards the improved weighted least-connection scheduling algorithm of SWIM systems
CN109426646A (en) * 2017-08-30 2019-03-05 英特尔公司 For forming the technology of managed node based on telemetry
CN107707680A (en) * 2017-11-24 2018-02-16 北京永洪商智科技有限公司 A kind of distributed data load-balancing method and system based on node computing capability
CN108449394A (en) * 2018-03-05 2018-08-24 北京华夏电通科技有限公司 A kind of dispatching method of data file, dispatch server and storage medium
CN108449394B (en) * 2018-03-05 2021-08-13 北京华夏电通科技股份有限公司 Data file scheduling method, scheduling server and storage medium
CN110505109A (en) * 2018-05-17 2019-11-26 阿里巴巴集团控股有限公司 The method, apparatus and storage medium of test macro isolation performance
CN109343942B (en) * 2018-09-03 2020-11-03 北京邮电大学 Task scheduling method based on edge computing network
CN109343942A (en) * 2018-09-03 2019-02-15 北京邮电大学 Method for scheduling task based on edge calculations network
WO2020062277A1 (en) * 2018-09-30 2020-04-02 华为技术有限公司 Management method and apparatus for computing resources in data pre-processing phase of neural network
CN112753016A (en) * 2018-09-30 2021-05-04 华为技术有限公司 Management method and device for computing resources in data preprocessing stage in neural network
CN109542586A (en) * 2018-11-19 2019-03-29 郑州云海信息技术有限公司 A kind of node resource state update method and system
CN109614228A (en) * 2018-11-27 2019-04-12 南京轨道交通系统工程有限公司 Comprehensively monitoring front-end system and working method based on dynamic load leveling mode
CN109711526A (en) * 2018-12-20 2019-05-03 广东工业大学 Server cluster dispatching method based on SVM and ant group algorithm
CN110113399A (en) * 2019-04-24 2019-08-09 华为技术有限公司 Load balancing management method and relevant apparatus
CN110928676B (en) * 2019-07-18 2022-03-11 国网浙江省电力有限公司衢州供电公司 Power CPS load distribution method based on performance evaluation
CN110928676A (en) * 2019-07-18 2020-03-27 国网浙江省电力有限公司衢州供电公司 Power CPS load distribution method based on performance evaluation
CN110545450A (en) * 2019-09-09 2019-12-06 深圳市网心科技有限公司 Node distribution method, system, electronic equipment and storage medium
WO2021052199A1 (en) * 2019-09-18 2021-03-25 中兴通讯股份有限公司 Server load balancing method and apparatus, and cdn node
CN111049919A (en) * 2019-12-19 2020-04-21 上海米哈游天命科技有限公司 User request processing method, device, equipment and storage medium
CN111049919B (en) * 2019-12-19 2022-09-06 上海米哈游天命科技有限公司 User request processing method, device, equipment and storage medium
CN111459677A (en) * 2020-04-01 2020-07-28 北京顺达同行科技有限公司 Request distribution method and device, computer equipment and storage medium
CN111597041A (en) * 2020-04-27 2020-08-28 深圳市金证科技股份有限公司 Calling method and device of distributed system, terminal equipment and server
CN111897816A (en) * 2020-07-16 2020-11-06 中国科学院上海微系统与信息技术研究所 Interactive method for computing information between satellites and generation method of information table applied by interactive method
CN111897816B (en) * 2020-07-16 2024-04-02 中国科学院上海微系统与信息技术研究所 Interaction method of calculation information between satellites and generation method of information table applied by same
CN114584565B (en) * 2020-12-01 2024-01-30 中移(苏州)软件技术有限公司 Application protection method and system, electronic equipment and storage medium
CN114584565A (en) * 2020-12-01 2022-06-03 中移(苏州)软件技术有限公司 Application protection method and system, electronic equipment and storage medium
CN113329067A (en) * 2021-05-21 2021-08-31 广州爱浦路网络技术有限公司 Edge computing node load distribution method, core network, device and storage medium
CN113992691A (en) * 2021-12-24 2022-01-28 苏州浪潮智能科技有限公司 Method, device and equipment for distributing edge computing resources and storage medium
CN114079656A (en) * 2022-01-19 2022-02-22 之江实验室 Probability-based load balancing method and device, electronic equipment and storage medium
CN115174583B (en) * 2022-06-28 2024-03-29 福州大学 Server load balancing method based on programmable data plane
CN115174583A (en) * 2022-06-28 2022-10-11 福州大学 Server load balancing method based on programmable data plane
CN116382892B (en) * 2023-02-08 2023-10-27 深圳市融聚汇信息科技有限公司 Load balancing method and device based on multi-cloud fusion and cloud service
CN116382892A (en) * 2023-02-08 2023-07-04 深圳市融聚汇信息科技有限公司 Load balancing method and device based on multi-cloud fusion and cloud service

Similar Documents

Publication Publication Date Title
CN104168332A (en) Load balance and node state monitoring method in high performance computing
CN102185779B (en) Method and device for realizing data center resource load balance in proportion to comprehensive allocation capability
US20200287961A1 (en) Balancing resources in distributed computing environments
CN106776005B (en) Resource management system and method for containerized application
EP3161632B1 (en) Integrated global resource allocation and load balancing
CN105491138B (en) Distributed load scheduling method based on load rate graded triggering
US7472159B2 (en) System and method for adaptive admission control and resource management for service time guarantees
US9489222B2 (en) Techniques for workload balancing among a plurality of physical machines
US8291424B2 (en) Method and system of managing resources for on-demand computing
CN105279027B (en) A kind of virtual machine deployment method and device
CN109120715A (en) Dynamic load balancing method under a kind of cloud environment
CN108667878A (en) Server load balancing method and device, storage medium, electronic equipment
CN104881325A (en) Resource scheduling method and resource scheduling system
CN103401939A (en) Load balancing method adopting mixing scheduling strategy
CN103338228A (en) Cloud calculating load balancing scheduling algorithm based on double-weighted least-connection algorithm
CN109617826A (en) A kind of storm dynamic load balancing method based on cuckoo search
US11496413B2 (en) Allocating cloud computing resources in a cloud computing environment based on user predictability
CN105471985A (en) Load balance method, cloud platform computing method and cloud platform
CN102664814A (en) Grey-prediction-based adaptive dynamic resource allocation method for virtual network
CN109032800A (en) A kind of load equilibration scheduling method, load balancer, server and system
CN110099083A (en) A kind of load equilibration scheduling method and device for server cluster
CN112711479A (en) Load balancing system, method and device of server cluster and storage medium
CN108632394A (en) A kind of web cluster load balancing method of adjustment and device
Tan et al. Dynamic task assignment in server farms: Better performance by task grouping
Kim et al. Virtual machines placement for network isolation in clouds

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20141126

RJ01 Rejection of invention patent application after publication