WO1999021098A2 - Determining cluster membership in a distributed computer system - Google Patents
Determining cluster membership in a distributed computer system Download PDFInfo
- Publication number
- WO1999021098A2 WO1999021098A2 PCT/US1998/022161 US9822161W WO9921098A2 WO 1999021098 A2 WO1999021098 A2 WO 1999021098A2 US 9822161 W US9822161 W US 9822161W WO 9921098 A2 WO9921098 A2 WO 9921098A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- nodes
- node
- proposed
- computer system
- proposed membership
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1415—Saving, restoring, recovering or retrying at system level
- G06F11/142—Reconfiguring to eliminate the error
- G06F11/1425—Reconfiguring to eliminate the error by reconfiguration of node membership
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0803—Configuration setting
- H04L41/0823—Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0866—Checking the configuration
- H04L41/0873—Checking configuration conflicts between network elements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0893—Assignment of logical groups to network elements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0805—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
- H04L43/0811—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking connectivity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2023—Failover techniques
- G06F11/2033—Failover techniques switching over of hardware resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2038—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with a single idle spare processing component
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2046—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant where the redundant components share persistent storage
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/505—Clust
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
Definitions
- the present invention relates to fault tolerance in distributed computer systems and, in particular, to a particularly robust mechanism for determining which nodes in a failing distributed computer system form a cluster and have access to shared resources.
- the problems associated with providing membership services in a distributed computer system have generated a considerable amount of interest in both academic and industrial fronts.
- the Parallel Database (PDB) system available from Sun Microsystems, Inc. of Palo Alto, California, being a distributed system, has used the cluster membership monitor to provide mechanisms to keep track of the member nodes and to coordinate the reconfiguration of the cluster applications and services when the cluster membership changes.
- PDB Parallel Database
- a further impact of the configuration of external devices is the issue of failure fencing.
- the shared resources (often disks) are fenced against intervention from nodes that are not part of the cluster.
- the issue of fencing was simple due to the fact that only two nodes existed in a cluster and they were connected to all the shared resources. The node that remained in the cluster would reserve all the shared resources and would disallow the non-member node from accessing these resources until that node became part of the cluster. Such a simple operation is not possible for an architecture in which all disks are not connected to all nodes.
- CMM Cluster Membership Monitor
- a node can join a cluster after the node is restarted and after other members of the cluster accepted it as a new member, following a reconfiguration.
- the cluster membership monitor handles communication failures that isolate one or more nodes from those nodes with a majority quorum. Note that the detection of the communication failure, i.e. detecting that the communication graph is not fully connected, is the responsibility of the communication monitor which is not part of the membership monitor. It is assumed that the communication monitor will notify the membership monitor of communication failures and that the membership monitor will handle this via a reconfiguration.
- the CMM does not guarantee the health of the overall system or that the applications are present on any given node.
- the only guarantees made by the CMM is that the system's hardware is up and running and that the operating system is present and functioning.
- failures are considered in the design of the system. There are three failures that we consider; node failures, communication failures, and device failures. Note that the failures of the client nodes, terminal concentrators, and the administration workstation are not considered to be failures within "our" system.
- Node Failures A node is considered to have failed when it stops sending its periodic heart-beat messages (SCI or CMM) to other members of the duster. Furthermore, nodes are considered to behave in a non-malicious fashion, a node that is considered failed by the system will not try to intentionally send conflicting information to other members of the cluster. It is possible for nodes to fail intermittently, as in the case of a temporary dead-lock, or to be viewed as failed by only part of the remaining system, as in the case of a failed adaptor or switch. The cluster membership monitor should be able to handle all these cases and should remove failed nodes from the system within a bounded time.
- SCI or CMM periodic heart-beat messages
- the private communication medium may fail due to a failure of a switch, a failure of an adaptor card, a failure of the cable, o failure of various software layers. These failures are masked by the cluster communication monitor (CCM or CIS) so that the cluster membership monitor does not have to deal with the specific failure. In addition, the cluster membership monitor will either send its messages through all available links of the medium. Hence, failure of any individual link does not affect the correct operation of the CMM and the only communication failure affecting CMM is the total loss of communication with a member node. This is equivalent to a node failure as there are no physical paths to send a heart-beat message over the private communication medium.
- CCM cluster communication monitor
- n is the number of nodes in the system.
- Device Failures Devices that affect the operation of the cluster membership monitor are the quorum devices. Traditionally these have been disk controllers on the Sparc Storage Arrays (SSA's), however, in some distributed computer systems, a disk can also be used as a quorum device. Note that the failure of the quorum device is equivalent to the failure of a node and that the CMM in some conventional systems will not use a quorum device unless it is running on a two node cluster.
- SSA Sparc Storage Arrays
- Some distributed computer systems are specified to have no single point of failures. Therefore, the system must tolerate the failure of a single node as well as consecutive failures of n - 1 nodes of the system. Given the above discussion on communication failures, this specification implies that we cannot tolerate the total loss of the communication medium in such a system. While it may not be possible, or desirable, to tolerate a total loss of the private communication medium, it should be possible to tolerate more than a single failure at any given time. First, let us define what a cluster is and how various failures affect it.
- a cluster is defined as having N nodes, a private communication medium, and a quorum mechanism, where the total failure of the private communication medium is equivalent to the failure of N - 1 nodes and the failure of the quorum mechanism is equivalent to the failure of one node.
- a cluster with N nodes, where N > 3, a private communication medium, and a quorum mechanism, should be able to provide services and access to the data, however partial, in the case of ⁇ f /2 - 1 node failures.
- the cluster can tolerate only one of the following failures:
- cluster membership in a distributed computer system is determined by determining with which other nodes each node is in communication and distributing that connectivity information through the nodes of the system. Accordingly, each node can determine an optimized new cluster based upon the connectivity information. Specifically, each node has information regarding with which nodes the node is in communication and similar information for each other node of the system. Therefore, each node has complete information regarding interconnectivity of all nodes which are directly or indirectly connected.
- Each node applies optimization criteria to such connectivity information to determine an optimal new cluster.
- Data represent the optimal new cluster is broadcast by each node.
- the optimal new cluster determined by the various nodes are collected by each node.
- each node has data representing the proposed new cluster which is perceived by each respective node to be optimal.
- Each node uses such data to elect a new cluster from the various proposed new clusters. For example, the new cluster represented by more proposed new clusters than any other is elected as the new cluster. Since each node receives the same proposed new clusters from the potential member nodes of the new cluster, the new cluster membership is reached unanimously. In addition, since each node has more complete information regarding the potential member nodes of the new cluster, the resulting new cluster consistently has a relatively optimal configuration.
- Figure 1 is a block diagram of a distributed computer system in which communications between two nodes and two respective switches have failed.
- Figure 2 is a block diagram of a distributed computer system which includes dual- ported devices.
- the cluster membership can be partitioned into two or more fully-connected subsets of nodes having a majority of the votes, a minority of the votes, or exactly half of the votes.
- the first two cases may be resolved by only allowing a subset having a majority vote to form the next generation of the cluster. In the latter case, a tie breaking mechanism must be employed.
- Some cluster membership algorithms take advantage of the restrictions imposed by a two node architecture in resolving these issues. To generalize for architectures involving more than two nodes, the following new issues are resolved by the algorithm according to the present invention.
- this external device is a disk or a controller that resides on the SSA's.
- the choice of a quorum device, particularly for a disk, has some unfavorable properties which adversely affect the overall availability of the cluster.
- the original algorithm considers the explicit simultaneous cluster shutdown of more than half the nodes to be equivalent to a partition excluding those nodes. To avoid the resulting loss of quorum and the complete cluster shutdown, the new algorithm uses the notification of an explicit shutdown by a node to reduce the quorum requirement.
- the original algorithm does not reach an agreement.
- the algorithm would converge when a set of nodes agree on the same membership proposal, but this condition is never satisfied.
- this condition is suspected (using a timeout)
- some nodes modify their membership proposal to a subset that is maximally connected.
- the membership monitors on different nodes of a cluster exchange messages with each other to notify that they are alive, i.e. exchange heart-beats, and to initiate a cluster reconfiguration. While it is possible to distinguish between these two types of messages, in practice they are the same message and we refer to them as RECONF msg messages to stress that they cause reconfiguration on the receiving nodes.
- V t A vector that contains the connectivity information of node / ' .
- SD A vector that contains node t's view of nodes that have voluntarily left the cluster as of the time the most recent stable membership was established.
- the membership algorithm assumes that the cluster is made up of equally valuable nodes, i.e. that the cluster is a homogeneous cluster.
- the membership algorithm is based on a set of rules that are stated in the following precedence order and are used to develop the membership algorithm:
- a node must include itself in its proposed set.
- a node will vote for nodes that are already in the cluster over the ones that are trying to join it.
- a node will propose a set that includes itself and has the maximum number of fully connected nodes.
- All nodes agree on a statically defined preference order among nodes, e.g. lower numbered nodes are preferred to higher numbered ones.
- the above set of rules define a hierarchy of rules with the statically defined preference being at the bottom of such a hierarchy. Note that at the above set of rules also defines an optimal membership set, A.
- Joins This is when nodes either form a new cluster or join an already existing one.
- First Join This is done only for the first node of a cluster and is implemented via a new command, pdbadmin startcluster, which signals to the CMM running at that node that it should not expect to hear from other members of the cluster, as there are not any. This command can only be issued once in the beginning of the life of a cluster, where the life of a cluster is defined as the time that spans from the moment that the pdbadmin startcluster is issued to the time where the cluster returns to having no members.
- Leaves This is when nodes that were members of the cluster leave the cluster either voluntarily or involuntarily.
- (a) Voluntarily Leaves The operator issues a pdbadmin stopnode command to a node, or a set of nodes. This will cause the affected nodes to go through the stop sequence which would result in the node sending a message to all other nodes in the cluster informing of them that it is leaving the cluster. This information can be used, and is used, by the membership algorithm to optimize certain aspects of the membership.
- the node can send a message, the same one as the voluntarily leaves, that would inform other members of the cluster that this node will not be part of the cluster.
- the same optimizations that can be performed for the voluntarily leaving of a node can actually be implemented here.
- a node may leave the cluster due to a request from an application program with the appropriate privileges, ⁇ .
- the node does not complete its abort sequence and panics the system. This is the most difficult of all failures to deal with and is usually detected by the absence of a heart-beat message from the failed node. This failure is un-distinguishable from a network failure in an asynchronous distributed system.
- Each node, / ' will update its connectivity state matrix, C Being as soon as it hears from a node.
- the matrix C is node 's understanding of the overall connectivity of the system. If node / does not hear from a nodey within the specified time, or is informed that node is down or unreachable, it will mark the e,, element of C, as zero. Furthermore, it will mark all the elements of theyth row as NULL, which implies that node / does not have any information about the connectivity of node /. For other rows of the matrix, node / will update them by replacing the kth row of its connectivity matrix with the connectivity vector V*- that it receives from node k.
- Each node / ' will initially include its /th row of C, in its RECONF msg as the
- M prop proposed by node / ' is different from the vector V,.
- ' is a proposed set that states a node's vote for other nodes in a binary form, whereas V, is a state vector
- MP P which deals with the connectivity of the nodes in the system. Note that ' will have different elements, each element being a node id and a binary vote value, than V View when nodes cannot agree on a stable membership and a subset of V, needs to be proposed as the new membership set.
- Each node keeps the total number of the nodes that are in its current view of the cluster membership, whether agreed or proposed, in a local variable N,. Note that N, is subject to the following rules during the execution of the membership algorithm:
- M prop (a)N is initialized to the cardinality of .
- N is incremented for each node that is trying to join the cluster. (One increment per node as enforced via the nodeid check which is embedded in the message, done by the receiver thread.)
- (c)N is decremented for each node that aborts, as defined in part 2(b)i of Section 4.3, or voluntarily leaves. (Done by the receiver thread.)
- Each node, i, prior to entering the membership algorithm will have a sequence number, seq num, that is the same for all nodes in the current cluster.
- each node will have its connectivity state matrix, C,. Note that C, is a n x n matrix, where n is the maximum number of nodes as defined by the current cluster configuration file, i.e. the current cdb file.
- Each node that is a joiner will have a variable, joining_node, set to TRUE.
- M agreed node has become a member of ' , it is no longer a joiner, and joining ⁇ node is set to FALSE.
- a node that is executing the first join will have a variable, start_cluster, set to TRUE. Nodes that are trying to join the cluster will have their start cluster variable initialized to FALSE.
- a joiner may not join a partially connected or nonexistent */
- joining node FALSE
- the function membership_proposal() returns a membership proposal based on C / , including all nodes that are not in the DOWN state. It also excludes
- M prop set ' is agreed upon by all the members of that set. In order to count the number of
- the fijnction share_quorum_dev() is implemented by using the CCD dynamic file and informs the membership algorithm of the cases, such as a two node cluster, in which two nodes do share a quorum device.
- the binary function reserve_quorumQ returns false if and only if the device is already reserved by another node.
- the function ait_for_user_input() is discussed in greater detail below.
- the message would identify, for the appropriate nodes, the sets X or Y that must be shut down or informed to stay up.
- the operator will break the tie by issuing the command pdbadmin stopnode to one set of nodes while issuing the new command pdbadmin continue to the other set.
- the set that receives the stop command will abort, while the other set will stop printing the messages and continue its reconfiguration.
- the operator can issue a clustm reconfigure command, which is a valid option if there was a communication break down and the operator has fixed it. Issuing a clustm reconfigure command will cause a new reconfiguration to take place.
- the command reader thread will not signal the transitions thread that is waiting for one of those commands and will simply ignore the command.
- the print thread meanwhile, will be printing these messages continuously once every few seconds, to inform the operator that some immediate action is required.
- Node receives a message that indicates that a remote node in its current membership set has gone down.
- Node has not received a message from a remote node in its current membership set for node down timeout.
- failure fencing mechanism Another component of the system that requires modification due to the new architecture is the failure fencing mechanism that is employed in some distributed computer systems.
- failure fencing mechanism that is employed in some distributed computer systems.
- the solution provided is generic in the sense that it handles the various array topologies-cascaded, n + 1, cross-connected, and others-as well as different software configurations-CVM with Netdisk, stand alone VxVM, or others. It also handles the 2-node cross connected array configuration without treating it as a special case.
- every node has a backup node.
- the backup node becomes the master of the set of NetDisk devices owned by a failed node.
- the backup node becomes the primary owner of the set of disk group resources owned by a failed node.
- N denote a node of a cluster
- D denote a set of storage devices (composed of one or more SSA and/or Multipacks).
- CCD maintains this information in its database and can be queried to obtain this information and. also whether the currently reconfiguring node is the backup of the failed node.
- information about the primary ownership of a disk group is maintained in the cdb file in the following format:
- the primary owner of cluster disk-groups dgl and dg2 is node 0 and its backup node is node 1.
- node N owns resources whose primary owners are part of the cluster membership and have it undertake the appropriate actions. If the algorithms are implemented correctly, at no point in time should node N, own resources belonging to node N, if N, was already part of the cluster. This is a safe assumption required for correctness and integrity. This algorithm is slightly expensive in terms of reconfiguration times, but in no way it constitutes a bottleneck.
- Resources that need to be highly available are migrated over to a surviving node from a failed one.
- examples of such resources are disk groups in a shared nothing database environment, disk groups for HA- NFS file-systems and, logical IP addresses.
- Arbitrary resources can be designated in the CCD with a master and a backup node.
- the logical IP addresses can be migrated from failed nodes to surviving ones. Note that for a switch-over to take place, the backup node would have to release the resources of the joining node one step before the joining node takes over its resources.
- the node N 2 has a disk group G whose devices are scattered on disks in arrays D and D3. lfN 2 now fails, G cannot be imported in its entirety on either Nj or N3 since all of its disks won't be visible on either N or N3. Such configurations are not supported in some distributed computer systems. If a node owns a disk group and if the node fails, it should be possible for the disk group in its entirety to be taken over by one of the surviving nodes. This does not constrain the array topology, but places restraints on how data is scattered across the arrays.
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA002306718A CA2306718A1 (en) | 1997-10-21 | 1998-10-20 | Determining cluster membership in a distributed computer system |
EP98953773A EP1025506A2 (en) | 1997-10-21 | 1998-10-20 | Determining cluster membership in a distributed computer system |
JP2000517348A JP2001521222A (en) | 1997-10-21 | 1998-10-20 | Method for determining cluster membership in a distributed computer system |
AU11054/99A AU1105499A (en) | 1997-10-21 | 1998-10-20 | Determining cluster membership in a distributed computer system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US08/955,885 US5999712A (en) | 1997-10-21 | 1997-10-21 | Determining cluster membership in a distributed computer system |
US08/955,885 | 1997-10-21 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO1999021098A2 true WO1999021098A2 (en) | 1999-04-29 |
WO1999021098A3 WO1999021098A3 (en) | 1999-07-01 |
Family
ID=25497484
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1998/022161 WO1999021098A2 (en) | 1997-10-21 | 1998-10-20 | Determining cluster membership in a distributed computer system |
Country Status (6)
Country | Link |
---|---|
US (2) | US5999712A (en) |
EP (1) | EP1025506A2 (en) |
JP (1) | JP2001521222A (en) |
AU (1) | AU1105499A (en) |
CA (1) | CA2306718A1 (en) |
WO (1) | WO1999021098A2 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000058825A2 (en) * | 1999-03-26 | 2000-10-05 | Microsoft Corporation | Data distribution in a server cluster |
EP1117041A2 (en) * | 2000-01-10 | 2001-07-18 | Sun Microsystems, Inc. | Method and apparatus for managing failures in clustered computer systems |
EP1117039A2 (en) | 2000-01-10 | 2001-07-18 | Sun Microsystems, Inc. | Controlled take over of services by remaining nodes of clustered computing system |
EP1134658A2 (en) * | 2000-03-14 | 2001-09-19 | Sun Microsystems, Inc. | A system and method for comprehensive availability management in a high-availability computer system |
EP1877901A2 (en) * | 2005-05-06 | 2008-01-16 | Marathon Technologies Corporation | Fault tolerant computer system |
JP2015535970A (en) * | 2012-09-07 | 2015-12-17 | アビジロン コーポレイション | Physical security system having multiple server nodes |
US9959109B2 (en) | 2015-04-10 | 2018-05-01 | Avigilon Corporation | Upgrading a physical security system having multiple server nodes |
US20230229361A1 (en) * | 2020-08-14 | 2023-07-20 | Inspur Suzhou Intelligent Technology Co., Ltd. | Cluster arbitration method and system based on heterogeneous storage, and device and storage medium |
Families Citing this family (195)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5999712A (en) * | 1997-10-21 | 1999-12-07 | Sun Microsystems, Inc. | Determining cluster membership in a distributed computer system |
US6279032B1 (en) * | 1997-11-03 | 2001-08-21 | Microsoft Corporation | Method and system for quorum resource arbitration in a server cluster |
US6092220A (en) * | 1997-11-17 | 2000-07-18 | International Business Machines Corporation | Method and apparatus for ordered reliable multicast with asymmetric safety in a multiprocessing system |
US6421787B1 (en) * | 1998-05-12 | 2002-07-16 | Sun Microsystems, Inc. | Highly available cluster message passing facility |
US6243744B1 (en) * | 1998-05-26 | 2001-06-05 | Compaq Computer Corporation | Computer network cluster generation indicator |
US6311217B1 (en) * | 1998-06-04 | 2001-10-30 | Compaq Computer Corporation | Method and apparatus for improved cluster administration |
US6438582B1 (en) * | 1998-07-21 | 2002-08-20 | International Business Machines Corporation | Method and system for efficiently coordinating commit processing in a parallel or distributed database system |
US6192443B1 (en) * | 1998-07-29 | 2001-02-20 | International Business Machines Corporation | Apparatus for fencing a member of a group of processes in a distributed processing environment |
US6212595B1 (en) * | 1998-07-29 | 2001-04-03 | International Business Machines Corporation | Computer program product for fencing a member of a group of processes in a distributed processing environment |
US6205510B1 (en) * | 1998-07-29 | 2001-03-20 | International Business Machines Corporation | Method for fencing a member of a group of processes in a distributed processing environment |
US6687754B1 (en) * | 1998-08-27 | 2004-02-03 | Intel Corporation | Method of detecting a device in a network |
US6275847B1 (en) * | 1999-01-07 | 2001-08-14 | Iq Net Solutions, Inc. | Distributed processing systems incorporating processing zones which communicate according to both streaming and event-reaction protocols |
US6424990B1 (en) | 1999-01-07 | 2002-07-23 | Jeffrey I. Robinson | Distributed processing systems incorporating a plurality of cells which process information in response to single events |
US6272524B1 (en) * | 1999-01-07 | 2001-08-07 | Iq Netsolutions Inc. | Distributed processing systems incorporating a plurality of cells which process information in response to single events |
US6272526B1 (en) * | 1999-01-07 | 2001-08-07 | Iq Netsolutions, Inc. | Distributed processing systems having self-advertising cells |
US6272527B1 (en) * | 1999-01-07 | 2001-08-07 | Iq Net Solutions, Inc. | Distributed processing systems incorporating nodes having processing cells which execute scripts causing a message to be sent internodally |
US6272525B1 (en) * | 1999-01-07 | 2001-08-07 | Iq Net Solutions, Inc. | Distributed processing systems including processing zones which subscribe and unsubscribe to mailing lists |
US6401120B1 (en) | 1999-03-26 | 2002-06-04 | Microsoft Corporation | Method and system for consistent cluster operational data in a server cluster using a quorum of replicas |
US7774469B2 (en) * | 1999-03-26 | 2010-08-10 | Massa Michael T | Consistent cluster operational data in a server cluster using a quorum of replicas |
US6745241B1 (en) * | 1999-03-31 | 2004-06-01 | International Business Machines Corporation | Method and system for dynamic addition and removal of multiple network names on a single server |
US6968390B1 (en) | 1999-04-15 | 2005-11-22 | International Business Machines Corporation | Method and system for enabling a network function in a context of one or all server names in a multiple server name environment |
US6502203B2 (en) * | 1999-04-16 | 2002-12-31 | Compaq Information Technologies Group, L.P. | Method and apparatus for cluster system operation |
US7020695B1 (en) | 1999-05-28 | 2006-03-28 | Oracle International Corporation | Using a cluster-wide shared repository to provide the latest consistent definition of the cluster (avoiding the partition-in time problem) |
US6532494B1 (en) * | 1999-05-28 | 2003-03-11 | Oracle International Corporation | Closed-loop node membership monitor for network clusters |
US6871222B1 (en) | 1999-05-28 | 2005-03-22 | Oracle International Corporation | Quorumless cluster using disk-based messaging |
US7076783B1 (en) * | 1999-05-28 | 2006-07-11 | Oracle International Corporation | Providing figure of merit vote from application executing on a partitioned cluster |
US6490693B1 (en) * | 1999-08-31 | 2002-12-03 | International Business Machines Corporation | Dynamic reconfiguration of a quorum group of processors in a distributed computing system |
US7464147B1 (en) * | 1999-11-10 | 2008-12-09 | International Business Machines Corporation | Managing a cluster of networked resources and resource groups using rule - base constraints in a scalable clustering environment |
US6745240B1 (en) * | 1999-11-15 | 2004-06-01 | Ncr Corporation | Method and apparatus for configuring massively parallel systems |
US6662219B1 (en) * | 1999-12-15 | 2003-12-09 | Microsoft Corporation | System for determining at subgroup of nodes relative weight to represent cluster by obtaining exclusive possession of quorum resource |
US6658470B1 (en) * | 1999-12-17 | 2003-12-02 | International Business Machines Corporation | Centralized logging of global reliability, availability, and serviceability (GRAS) services data for a distributed environment and backup logging system and method in event of failure |
US6769008B1 (en) * | 2000-01-10 | 2004-07-27 | Sun Microsystems, Inc. | Method and apparatus for dynamically altering configurations of clustered computer systems |
US6920454B1 (en) * | 2000-01-28 | 2005-07-19 | Oracle International Corporation | Techniques for DLM optimization with transferring lock information |
US7246120B2 (en) | 2000-01-28 | 2007-07-17 | Oracle International Corporation | Techniques for achieving higher availability of resources during reconfiguration of a cluster |
US6751616B1 (en) | 2000-01-28 | 2004-06-15 | Oracle International Corp. | Techniques for DLM optimization with re-mapping responsibility for lock management |
US6636982B1 (en) * | 2000-03-03 | 2003-10-21 | International Business Machines Corporation | Apparatus and method for detecting the reset of a node in a cluster computer system |
US20020198996A1 (en) | 2000-03-16 | 2002-12-26 | Padmanabhan Sreenivasan | Flexible failover policies in high availability computing systems |
US6775703B1 (en) * | 2000-05-01 | 2004-08-10 | International Business Machines Corporation | Lease based safety protocol for distributed system with multiple networks |
US6847993B1 (en) | 2000-05-31 | 2005-01-25 | International Business Machines Corporation | Method, system and program products for managing cluster configurations |
US6725261B1 (en) * | 2000-05-31 | 2004-04-20 | International Business Machines Corporation | Method, system and program products for automatically configuring clusters of a computing environment |
US6801937B1 (en) * | 2000-05-31 | 2004-10-05 | International Business Machines Corporation | Method, system and program products for defining nodes to a cluster |
US7325046B1 (en) * | 2000-05-31 | 2008-01-29 | International Business Machines Corporation | Method, system and program products for managing processing groups of a distributed computing environment |
US6807557B1 (en) * | 2000-05-31 | 2004-10-19 | International Business Machines Corporation | Method, system and program products for providing clusters of a computing environment |
US6968359B1 (en) * | 2000-08-14 | 2005-11-22 | International Business Machines Corporation | Merge protocol for clustered computer system |
US7113995B1 (en) * | 2000-10-19 | 2006-09-26 | International Business Machines Corporation | Method and apparatus for reporting unauthorized attempts to access nodes in a network computing system |
US6886038B1 (en) * | 2000-10-24 | 2005-04-26 | Microsoft Corporation | System and method for restricting data transfers and managing software components of distributed computers |
US7606898B1 (en) * | 2000-10-24 | 2009-10-20 | Microsoft Corporation | System and method for distributed management of shared computers |
US7093288B1 (en) | 2000-10-24 | 2006-08-15 | Microsoft Corporation | Using packet filters and network virtualization to restrict network communications |
US6915338B1 (en) * | 2000-10-24 | 2005-07-05 | Microsoft Corporation | System and method providing automatic policy enforcement in a multi-computer service application |
US6907395B1 (en) | 2000-10-24 | 2005-06-14 | Microsoft Corporation | System and method for designing a logical model of a distributed computer system and deploying physical resources according to the logical model |
US7113900B1 (en) | 2000-10-24 | 2006-09-26 | Microsoft Corporation | System and method for logical modeling of distributed computer systems |
US6839752B1 (en) | 2000-10-27 | 2005-01-04 | International Business Machines Corporation | Group data sharing during membership change in clustered computer system |
US7185099B1 (en) | 2000-11-22 | 2007-02-27 | International Business Machines Corporation | Apparatus and method for communicating between computer systems using a sliding send window for ordered messages in a clustered computing environment |
US7769844B2 (en) | 2000-12-07 | 2010-08-03 | International Business Machines Corporation | Peer protocol status query in clustered computer system |
US7035938B2 (en) * | 2000-12-07 | 2006-04-25 | Telcordia Technologies, Inc. | Determination of connection links to configure a virtual private network |
US7502857B2 (en) * | 2000-12-15 | 2009-03-10 | International Business Machines Corporation | Method and system for optimally allocating a network service |
US6785678B2 (en) * | 2000-12-21 | 2004-08-31 | Emc Corporation | Method of improving the availability of a computer clustering system through the use of a network medium link state function |
US7792977B1 (en) * | 2001-02-28 | 2010-09-07 | Oracle International Corporation | Method for fencing shared resources from cluster nodes |
US6952766B2 (en) * | 2001-03-15 | 2005-10-04 | International Business Machines Corporation | Automated node restart in clustered computer system |
US7305450B2 (en) * | 2001-03-29 | 2007-12-04 | Nokia Corporation | Method and apparatus for clustered SSL accelerator |
US6918051B2 (en) * | 2001-04-06 | 2005-07-12 | International Business Machines Corporation | Node shutdown in clustered computer system |
US6675264B2 (en) * | 2001-05-07 | 2004-01-06 | International Business Machines Corporation | Method and apparatus for improving write performance in a cluster-based file system |
US7617292B2 (en) | 2001-06-05 | 2009-11-10 | Silicon Graphics International | Multi-class heterogeneous clients in a clustered filesystem |
US7640582B2 (en) | 2003-04-16 | 2009-12-29 | Silicon Graphics International | Clustered filesystem for mix of trusted and untrusted nodes |
US20040139125A1 (en) * | 2001-06-05 | 2004-07-15 | Roger Strassburg | Snapshot copy of data volume during data access |
US8010558B2 (en) | 2001-06-05 | 2011-08-30 | Silicon Graphics International | Relocation of metadata server with outstanding DMAPI requests |
US7016946B2 (en) * | 2001-07-05 | 2006-03-21 | Sun Microsystems, Inc. | Method and system for establishing a quorum for a geographically distributed cluster of computers |
US6880100B2 (en) * | 2001-07-18 | 2005-04-12 | Smartmatic Corp. | Peer-to-peer fault detection |
US20030028594A1 (en) * | 2001-07-31 | 2003-02-06 | International Business Machines Corporation | Managing intended group membership using domains |
US6925582B2 (en) * | 2001-08-01 | 2005-08-02 | International Business Machines Corporation | Forwarding of diagnostic messages in a group |
US7239606B2 (en) * | 2001-08-08 | 2007-07-03 | Compunetix, Inc. | Scalable configurable network of sparsely interconnected hyper-rings |
US7243374B2 (en) | 2001-08-08 | 2007-07-10 | Microsoft Corporation | Rapid application security threat analysis |
DE10143142A1 (en) * | 2001-09-04 | 2003-01-30 | Bosch Gmbh Robert | Microprocessor-controlled operation of vehicular EEPROM memory, employs two memory areas with data pointers and cyclic validation strategy |
US7231461B2 (en) * | 2001-09-14 | 2007-06-12 | International Business Machines Corporation | Synchronization of group state data when rejoining a member to a primary-backup group in a clustered computer system |
US7277952B2 (en) * | 2001-09-28 | 2007-10-02 | Microsoft Corporation | Distributed system resource protection via arbitration and ownership |
US7350098B2 (en) * | 2001-11-30 | 2008-03-25 | Oracle International Corporation | Detecting events of interest for managing components on a high availability framework |
US20030177213A1 (en) * | 2002-01-18 | 2003-09-18 | Wallace Chris E. | Determining connectivity in communication networks |
US6950855B2 (en) * | 2002-01-18 | 2005-09-27 | International Business Machines Corporation | Master node selection in clustered node configurations |
US7240088B2 (en) * | 2002-01-25 | 2007-07-03 | International Business Machines Corporation | Node self-start in a decentralized cluster |
US7203748B2 (en) * | 2002-02-15 | 2007-04-10 | International Business Machines Corporation | Method for detecting the quick restart of liveness daemons in a distributed multinode data processing system |
US8321543B2 (en) * | 2002-03-04 | 2012-11-27 | International Business Machines Corporation | System and method for determining weak membership in set of computer nodes |
US7631066B1 (en) * | 2002-03-25 | 2009-12-08 | Symantec Operating Corporation | System and method for preventing data corruption in computer system clusters |
US7379970B1 (en) * | 2002-04-05 | 2008-05-27 | Ciphermax, Inc. | Method and system for reduced distributed event handling in a network environment |
US7051102B2 (en) * | 2002-04-29 | 2006-05-23 | Microsoft Corporation | Peer-to-peer name resolution protocol (PNRP) security infrastructure and method |
US7093010B2 (en) * | 2002-05-20 | 2006-08-15 | Telefonaktiebolaget Lm Ericsson (Publ) | Operator-defined consistency checking in a network management system |
US6925541B2 (en) * | 2002-06-12 | 2005-08-02 | Hitachi, Ltd. | Method and apparatus for managing replication volumes |
US7590985B1 (en) | 2002-07-12 | 2009-09-15 | 3Par, Inc. | Cluster inter-process communication transport |
US6965957B1 (en) * | 2002-07-12 | 2005-11-15 | 3Pardata, Inc. | Automatic cluster join protocol |
US20040027155A1 (en) * | 2002-08-08 | 2004-02-12 | Schlansker Michael S. | System and method for self configuration of reconfigurable systems |
JP3800158B2 (en) * | 2002-09-27 | 2006-07-26 | ブラザー工業株式会社 | Data transmission system, terminal device, and program |
US20040181707A1 (en) * | 2003-03-11 | 2004-09-16 | Hitachi, Ltd. | Method and apparatus for seamless management for disaster recovery |
US7447786B2 (en) | 2003-05-09 | 2008-11-04 | Oracle International Corporation | Efficient locking of shared data that is accessed for reads in a cluster database |
US7085897B2 (en) * | 2003-05-12 | 2006-08-01 | International Business Machines Corporation | Memory management for a symmetric multiprocessor computer system |
US7376724B2 (en) * | 2003-05-30 | 2008-05-20 | Oracle International Corporation | Dynamic reconfiguration of nodes in a cluster file system |
US7467168B2 (en) * | 2003-06-18 | 2008-12-16 | International Business Machines Corporation | Method for mirroring data at storage locations |
US7562154B2 (en) * | 2003-06-30 | 2009-07-14 | International Business Machines Corporation | System and method for filtering stale messages resulting from membership changes in a distributed computing environment |
US7739541B1 (en) * | 2003-07-25 | 2010-06-15 | Symantec Operating Corporation | System and method for resolving cluster partitions in out-of-band storage virtualization environments |
US7370101B1 (en) | 2003-12-19 | 2008-05-06 | Sun Microsystems, Inc. | Automated testing of cluster data services |
US7165189B1 (en) * | 2003-12-19 | 2007-01-16 | Sun Microsystems, Inc. | Distributed test framework for clustered systems |
US8005888B2 (en) * | 2003-12-30 | 2011-08-23 | Microsoft Corporation | Conflict fast consensus |
US7379952B2 (en) * | 2004-01-30 | 2008-05-27 | Oracle International Corporation | Techniques for multiple window resource remastering among nodes of a cluster |
JP2005310243A (en) * | 2004-04-20 | 2005-11-04 | Seiko Epson Corp | Memory controller, semiconductor integrated circuit apparatus, semiconductor apparatus, microcomputer, and electronic equipment |
US20050268151A1 (en) * | 2004-04-28 | 2005-12-01 | Nokia, Inc. | System and method for maximizing connectivity during network failures in a cluster system |
US7856502B2 (en) * | 2004-06-18 | 2010-12-21 | Microsoft Corporation | Cheap paxos |
US7334154B2 (en) * | 2004-06-18 | 2008-02-19 | Microsoft Corporation | Efficient changing of replica sets in distributed fault-tolerant computing system |
US20050289228A1 (en) * | 2004-06-25 | 2005-12-29 | Nokia Inc. | System and method for managing a change to a cluster configuration |
US7590737B1 (en) | 2004-07-16 | 2009-09-15 | Symantec Operating Corporation | System and method for customized I/O fencing for preventing data corruption in computer system clusters |
EP1805947A1 (en) * | 2004-09-29 | 2007-07-11 | Telefonaktiebolaget LM Ericsson (publ) | Installing a new view of a cluster membership |
US20080201403A1 (en) * | 2004-09-29 | 2008-08-21 | Telefonaktiebolaget Lm Ericsson (Publ) | Maintaning a View of a Cluster's Membership |
US8185776B1 (en) * | 2004-09-30 | 2012-05-22 | Symantec Operating Corporation | System and method for monitoring an application or service group within a cluster as a resource of another cluster |
US20060074940A1 (en) * | 2004-10-05 | 2006-04-06 | International Business Machines Corporation | Dynamic management of node clusters to enable data sharing |
US8090880B2 (en) | 2006-11-09 | 2012-01-03 | Microsoft Corporation | Data consistency within a federation infrastructure |
US7694167B2 (en) * | 2004-10-22 | 2010-04-06 | Microsoft Corporation | Maintaining routing consistency within a rendezvous federation |
US7730220B2 (en) * | 2004-10-22 | 2010-06-01 | Microsoft Corporation | Broadcasting communication within a rendezvous federation |
US8392515B2 (en) * | 2004-10-22 | 2013-03-05 | Microsoft Corporation | Subfederation creation and maintenance in a federation infrastructure |
US20110082928A1 (en) * | 2004-10-22 | 2011-04-07 | Microsoft Corporation | Maintaining consistency within a federation infrastructure |
US20060090003A1 (en) * | 2004-10-22 | 2006-04-27 | Microsoft Corporation | Rendezvousing resource requests with corresponding resources |
US8095601B2 (en) | 2004-10-22 | 2012-01-10 | Microsoft Corporation | Inter-proximity communication within a rendezvous federation |
US8549180B2 (en) | 2004-10-22 | 2013-10-01 | Microsoft Corporation | Optimizing access to federation infrastructure-based resources |
US7958262B2 (en) * | 2004-10-22 | 2011-06-07 | Microsoft Corporation | Allocating and reclaiming resources within a rendezvous federation |
US8095600B2 (en) * | 2004-10-22 | 2012-01-10 | Microsoft Corporation | Inter-proximity communication within a rendezvous federation |
US8014321B2 (en) * | 2004-10-22 | 2011-09-06 | Microsoft Corporation | Rendezvousing resource requests with corresponding resources |
US20060200469A1 (en) * | 2005-03-02 | 2006-09-07 | Lakshminarayanan Chidambaran | Global session identifiers in a multi-node system |
US7209990B2 (en) * | 2005-04-05 | 2007-04-24 | Oracle International Corporation | Maintain fairness of resource allocation in a multi-node environment |
US20060242453A1 (en) * | 2005-04-25 | 2006-10-26 | Dell Products L.P. | System and method for managing hung cluster nodes |
US20070022314A1 (en) * | 2005-07-22 | 2007-01-25 | Pranoop Erasani | Architecture and method for configuring a simplified cluster over a network with fencing and quorum |
US8812501B2 (en) * | 2005-08-08 | 2014-08-19 | Hewlett-Packard Development Company, L.P. | Method or apparatus for selecting a cluster in a group of nodes |
US7941537B2 (en) * | 2005-10-03 | 2011-05-10 | Genband Us Llc | System, method, and computer-readable medium for resource migration in a distributed telecommunication system |
US7941309B2 (en) | 2005-11-02 | 2011-05-10 | Microsoft Corporation | Modeling IT operations/policies |
US7953890B1 (en) * | 2006-01-27 | 2011-05-31 | Symantec Operating Corporation | System and method for switching to a new coordinator resource |
US7979460B2 (en) | 2006-02-15 | 2011-07-12 | Sony Computer Entainment America Inc. | Systems and methods for server management |
GB0622553D0 (en) * | 2006-11-11 | 2006-12-20 | Ibm | A method, apparatus or software for managing partitioning in a cluster of nodes |
US7613947B1 (en) * | 2006-11-30 | 2009-11-03 | Netapp, Inc. | System and method for storage takeover |
JP4505763B2 (en) * | 2007-01-31 | 2010-07-21 | ヒューレット−パッカード デベロップメント カンパニー エル.ピー. | Managing node clusters |
US8209417B2 (en) * | 2007-03-08 | 2012-06-26 | Oracle International Corporation | Dynamic resource profiles for clusterware-managed resources |
US7890555B2 (en) * | 2007-07-10 | 2011-02-15 | International Business Machines Corporation | File system mounting in a clustered file system |
US8275866B2 (en) * | 2007-11-13 | 2012-09-25 | At&T Intellectual Property I, L.P. | Assigning telecommunications nodes to community of interest clusters |
US8706858B2 (en) * | 2008-04-17 | 2014-04-22 | Alcatel Lucent | Method and apparatus for controlling flow of management tasks to management system databases |
US20090265450A1 (en) * | 2008-04-17 | 2009-10-22 | Darren Helmer | Method and apparatus for managing computing resources of management systems |
US8892689B1 (en) * | 2008-04-30 | 2014-11-18 | Netapp, Inc. | Method and apparatus for a storage server to automatically discover and join a network storage cluster |
US8498647B2 (en) * | 2008-08-28 | 2013-07-30 | Qualcomm Incorporated | Distributed downlink coordinated multi-point (CoMP) framework |
WO2010044096A2 (en) * | 2008-09-12 | 2010-04-22 | Computational Research Laboratories Limited | Cluster computing |
US8443062B2 (en) * | 2008-10-23 | 2013-05-14 | Microsoft Corporation | Quorum based transactionally consistent membership management in distributed storage systems |
US8675511B2 (en) * | 2008-12-10 | 2014-03-18 | Qualcomm Incorporated | List elimination for distributed downlink coordinated multi-point (CoMP) framework |
KR101042908B1 (en) * | 2009-02-12 | 2011-06-21 | 엔에이치엔(주) | Method, system, and computer-readable recording medium for determining major group under split-brain network problem |
US8903917B2 (en) * | 2009-06-03 | 2014-12-02 | Novell, Inc. | System and method for implementing a cluster token registry for business continuity |
CN101925183A (en) * | 2009-06-15 | 2010-12-22 | 中兴通讯股份有限公司 | Coordinated multi-point transmission based data transmission method and device |
US8437282B2 (en) * | 2009-06-21 | 2013-05-07 | Clearone Communications Hong Kong Limited | System and method of multi-endpoint data conferencing |
US8108712B1 (en) * | 2009-10-30 | 2012-01-31 | Hewlett-Packard Development Company, L.P. | Method and apparatus for removing a computer from a computer cluster observing failure |
US8910176B2 (en) * | 2010-01-15 | 2014-12-09 | International Business Machines Corporation | System for distributed task dispatch in multi-application environment based on consensus for load balancing using task partitioning and dynamic grouping of server instance |
US8560365B2 (en) | 2010-06-08 | 2013-10-15 | International Business Machines Corporation | Probabilistic optimization of resource discovery, reservation and assignment |
US9646271B2 (en) | 2010-08-06 | 2017-05-09 | International Business Machines Corporation | Generating candidate inclusion/exclusion cohorts for a multiply constrained group |
US8370350B2 (en) | 2010-09-03 | 2013-02-05 | International Business Machines Corporation | User accessibility to resources enabled through adaptive technology |
US8968197B2 (en) | 2010-09-03 | 2015-03-03 | International Business Machines Corporation | Directing a user to a medical resource |
US8726274B2 (en) * | 2010-09-10 | 2014-05-13 | International Business Machines Corporation | Registration and initialization of cluster-aware virtual input/output server nodes |
US9292577B2 (en) | 2010-09-17 | 2016-03-22 | International Business Machines Corporation | User accessibility to data analytics |
US8429182B2 (en) | 2010-10-13 | 2013-04-23 | International Business Machines Corporation | Populating a task directed community in a complex heterogeneous environment based on non-linear attributes of a paradigmatic cohort member |
US9443211B2 (en) | 2010-10-13 | 2016-09-13 | International Business Machines Corporation | Describing a paradigmatic member of a task directed community in a complex heterogeneous environment based on non-linear attributes |
US8949828B2 (en) | 2011-01-11 | 2015-02-03 | International Business Machines Corporation | Single point, scalable data synchronization for management of a virtual input/output server cluster |
US9081839B2 (en) | 2011-01-28 | 2015-07-14 | Oracle International Corporation | Push replication for use with a distributed data grid |
US9164806B2 (en) | 2011-01-28 | 2015-10-20 | Oracle International Corporation | Processing pattern framework for dispatching and executing tasks in a distributed computing grid |
US9262229B2 (en) * | 2011-01-28 | 2016-02-16 | Oracle International Corporation | System and method for supporting service level quorum in a data grid cluster |
US9201685B2 (en) | 2011-01-28 | 2015-12-01 | Oracle International Corporation | Transactional cache versioning and storage in a distributed data grid |
US9063852B2 (en) | 2011-01-28 | 2015-06-23 | Oracle International Corporation | System and method for use with a data grid cluster to support death detection |
WO2013054425A1 (en) * | 2011-10-13 | 2013-04-18 | 富士通株式会社 | Node device and communication method |
GB2496840A (en) | 2011-11-15 | 2013-05-29 | Ibm | Controlling access to a shared storage system |
US10706021B2 (en) | 2012-01-17 | 2020-07-07 | Oracle International Corporation | System and method for supporting persistence partition discovery in a distributed data grid |
US20140075170A1 (en) * | 2012-09-12 | 2014-03-13 | International Business Machines Corporation | Automated firmware voting to enable multi-enclosure federated systems |
US9619668B2 (en) * | 2013-09-16 | 2017-04-11 | Axis Ab | Managing application data in distributed control systems |
US9667496B2 (en) * | 2013-12-24 | 2017-05-30 | International Business Machines Corporation | Configuration updates across peer storage systems |
CN103995750B (en) * | 2014-06-04 | 2017-03-22 | 重庆大学 | Asymmetric distributed constrained optimization method for multi-Agent system |
US9742692B2 (en) * | 2014-06-23 | 2017-08-22 | Microsoft Technology Licensing, Llc | Acquiring resource lease using multiple lease servers |
US10664495B2 (en) | 2014-09-25 | 2020-05-26 | Oracle International Corporation | System and method for supporting data grid snapshot and federation |
WO2016106682A1 (en) * | 2014-12-31 | 2016-07-07 | 华为技术有限公司 | Post-cluster brain split quorum processing method and quorum storage device and system |
US10585599B2 (en) | 2015-07-01 | 2020-03-10 | Oracle International Corporation | System and method for distributed persistent store archival and retrieval in a distributed computing environment |
US10798146B2 (en) | 2015-07-01 | 2020-10-06 | Oracle International Corporation | System and method for universal timeout in a distributed computing environment |
US10860378B2 (en) | 2015-07-01 | 2020-12-08 | Oracle International Corporation | System and method for association aware executor service in a distributed computing environment |
US11163498B2 (en) | 2015-07-01 | 2021-11-02 | Oracle International Corporation | System and method for rare copy-on-write in a distributed computing environment |
US10341252B2 (en) * | 2015-09-30 | 2019-07-02 | Veritas Technologies Llc | Partition arbitration optimization |
WO2017149354A1 (en) * | 2016-03-01 | 2017-09-08 | Telefonaktiebolaget Lm Ericsson (Publ) | Neighbor monitoring in a hyperscaled environment |
US10049011B2 (en) * | 2016-05-03 | 2018-08-14 | International Business Machines Corporation | Continuing operation of a quorum based system after failures |
US10521344B1 (en) * | 2017-03-10 | 2019-12-31 | Pure Storage, Inc. | Servicing input/output (‘I/O’) operations directed to a dataset that is synchronized across a plurality of storage systems |
US11550820B2 (en) | 2017-04-28 | 2023-01-10 | Oracle International Corporation | System and method for partition-scoped snapshot creation in a distributed data computing environment |
US10459810B2 (en) | 2017-07-06 | 2019-10-29 | Oracle International Corporation | Technique for higher availability in a multi-node system using replicated lock information to determine a set of data blocks for recovery |
US10769019B2 (en) | 2017-07-19 | 2020-09-08 | Oracle International Corporation | System and method for data recovery in a distributed data computing environment implementing active persistence |
US10721095B2 (en) | 2017-09-26 | 2020-07-21 | Oracle International Corporation | Virtual interface system and method for multi-tenant cloud networking |
US10862965B2 (en) | 2017-10-01 | 2020-12-08 | Oracle International Corporation | System and method for topics implementation in a distributed data computing environment |
US10671494B1 (en) * | 2017-11-01 | 2020-06-02 | Pure Storage, Inc. | Consistent selection of replicated datasets during storage system recovery |
US10476744B2 (en) * | 2017-12-15 | 2019-11-12 | Nicira, Inc. | Coordinator in cluster membership management protocol |
US10572293B2 (en) | 2017-12-15 | 2020-02-25 | Nicira, Inc. | Node in cluster membership management protocol |
JP6714037B2 (en) * | 2018-04-26 | 2020-06-24 | 株式会社日立製作所 | Storage system and cluster configuration control method |
US11429441B2 (en) | 2019-11-18 | 2022-08-30 | Bank Of America Corporation | Workflow simulator |
US11106509B2 (en) | 2019-11-18 | 2021-08-31 | Bank Of America Corporation | Cluster tuner |
US11392423B2 (en) * | 2019-12-13 | 2022-07-19 | Vmware, Inc. | Method for running a quorum-based system by dynamically managing the quorum |
US11372553B1 (en) * | 2020-12-31 | 2022-06-28 | Seagate Technology Llc | System and method to increase data center availability using rack-to-rack storage link cable |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5659777A (en) * | 1992-09-25 | 1997-08-19 | Hitachi, Ltd. | Method for intraprocessor communication |
US5799305A (en) * | 1995-11-02 | 1998-08-25 | Informix Software, Inc. | Method of commitment in a distributed database transaction |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4925311A (en) * | 1986-02-10 | 1990-05-15 | Teradata Corporation | Dynamically partitionable parallel processors |
JPH02125544A (en) * | 1988-11-04 | 1990-05-14 | Mitsubishi Electric Corp | Multiple address communication method |
IL99923A0 (en) * | 1991-10-31 | 1992-08-18 | Ibm Israel | Method of operating a computer in a network |
JP2615408B2 (en) * | 1993-09-20 | 1997-05-28 | 工業技術院長 | Parallel computer system |
EP0678993A1 (en) * | 1994-03-30 | 1995-10-25 | International Business Machines Corporation | Method and apparatus for controlling the configuration definitions in a data processing system with a plurality of processors |
US5666486A (en) * | 1995-06-23 | 1997-09-09 | Data General Corporation | Multiprocessor cluster membership manager framework |
US5918017A (en) * | 1996-08-23 | 1999-06-29 | Internatioinal Business Machines Corp. | System and method for providing dynamically alterable computer clusters for message routing |
US6151688A (en) * | 1997-02-21 | 2000-11-21 | Novell, Inc. | Resource management in a clustered computer system |
JP3790602B2 (en) * | 1997-04-25 | 2006-06-28 | 富士ゼロックス株式会社 | Information sharing device |
US6108699A (en) * | 1997-06-27 | 2000-08-22 | Sun Microsystems, Inc. | System and method for modifying membership in a clustered distributed computer system and updating system configuration |
US5999712A (en) * | 1997-10-21 | 1999-12-07 | Sun Microsystems, Inc. | Determining cluster membership in a distributed computer system |
-
1997
- 1997-10-21 US US08/955,885 patent/US5999712A/en not_active Expired - Lifetime
-
1998
- 1998-10-20 AU AU11054/99A patent/AU1105499A/en not_active Abandoned
- 1998-10-20 JP JP2000517348A patent/JP2001521222A/en active Pending
- 1998-10-20 CA CA002306718A patent/CA2306718A1/en not_active Abandoned
- 1998-10-20 WO PCT/US1998/022161 patent/WO1999021098A2/en not_active Application Discontinuation
- 1998-10-20 EP EP98953773A patent/EP1025506A2/en not_active Withdrawn
-
1999
- 1999-03-16 US US09/268,793 patent/US6449641B1/en not_active Expired - Lifetime
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5659777A (en) * | 1992-09-25 | 1997-08-19 | Hitachi, Ltd. | Method for intraprocessor communication |
US5799305A (en) * | 1995-11-02 | 1998-08-25 | Informix Software, Inc. | Method of commitment in a distributed database transaction |
Non-Patent Citations (2)
Title |
---|
CHANDRA T.D. et al., " On The Impossibility of Group Membership", 1996, ACM 0-89791-800-2/96/05, pages 322-330, XP000829553 * |
JOURNAL OF THE ASSOCIATION FOR COMPUTING MACHINERY FISCHER M.J. ET AL: 'IMPOSSIBILITY OF DISTRIBUTED CONSENSUS WITH ONE FAULTY PROSESS' vol. 32, no. 02, April 1985, CONNECTICUT, pages 374 - 382, XP000829554 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000058825A3 (en) * | 1999-03-26 | 2001-01-04 | Microsoft Corp | Data distribution in a server cluster |
WO2000058825A2 (en) * | 1999-03-26 | 2000-10-05 | Microsoft Corporation | Data distribution in a server cluster |
EP1117039A3 (en) * | 2000-01-10 | 2005-09-07 | Sun Microsystems, Inc. | Controlled take over of services by remaining nodes of clustered computing system |
EP1117041A2 (en) * | 2000-01-10 | 2001-07-18 | Sun Microsystems, Inc. | Method and apparatus for managing failures in clustered computer systems |
EP1117039A2 (en) | 2000-01-10 | 2001-07-18 | Sun Microsystems, Inc. | Controlled take over of services by remaining nodes of clustered computing system |
EP1117041A3 (en) * | 2000-01-10 | 2006-02-22 | Sun Microsystems, Inc. | Method and apparatus for managing failures in clustered computer systems |
EP1134658A2 (en) * | 2000-03-14 | 2001-09-19 | Sun Microsystems, Inc. | A system and method for comprehensive availability management in a high-availability computer system |
US6691244B1 (en) | 2000-03-14 | 2004-02-10 | Sun Microsystems, Inc. | System and method for comprehensive availability management in a high-availability computer system |
EP1134658A3 (en) * | 2000-03-14 | 2002-06-19 | Sun Microsystems, Inc. | A system and method for comprehensive availability management in a high-availability computer system |
EP1877901A2 (en) * | 2005-05-06 | 2008-01-16 | Marathon Technologies Corporation | Fault tolerant computer system |
EP1877901A4 (en) * | 2005-05-06 | 2014-05-07 | Stratus Technologies Bermuda Ltd | Fault tolerant computer system |
JP2015535970A (en) * | 2012-09-07 | 2015-12-17 | アビジロン コーポレイション | Physical security system having multiple server nodes |
US10454997B2 (en) | 2012-09-07 | 2019-10-22 | Avigilon Corporation | Distributed physical security system |
CN110598444A (en) * | 2012-09-07 | 2019-12-20 | 威智伦公司 | Physical security system with multiple server nodes |
US9959109B2 (en) | 2015-04-10 | 2018-05-01 | Avigilon Corporation | Upgrading a physical security system having multiple server nodes |
US10474449B2 (en) | 2015-04-10 | 2019-11-12 | Avigilon Corporation | Upgrading a physical security system having multiple server nodes |
US20230229361A1 (en) * | 2020-08-14 | 2023-07-20 | Inspur Suzhou Intelligent Technology Co., Ltd. | Cluster arbitration method and system based on heterogeneous storage, and device and storage medium |
US11762601B2 (en) * | 2020-08-14 | 2023-09-19 | Inspur Suzhou Intelligent Technology Co., Ltd. | Method for arbitrating heterogeneous storage-based cluster, and system, computer device and non-transitory computer-readable medium thereof |
Also Published As
Publication number | Publication date |
---|---|
US6449641B1 (en) | 2002-09-10 |
US5999712A (en) | 1999-12-07 |
EP1025506A2 (en) | 2000-08-09 |
CA2306718A1 (en) | 1999-04-29 |
AU1105499A (en) | 1999-05-10 |
WO1999021098A3 (en) | 1999-07-01 |
JP2001521222A (en) | 2001-11-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5999712A (en) | Determining cluster membership in a distributed computer system | |
EP1402363B1 (en) | Method for ensuring operation during node failures and network partitions in a clustered message passing server | |
US7016946B2 (en) | Method and system for establishing a quorum for a geographically distributed cluster of computers | |
US8621263B2 (en) | Automated node fencing integrated within a quorum service of a cluster infrastructure | |
US5991518A (en) | Method and apparatus for split-brain avoidance in a multi-processor system | |
US6363495B1 (en) | Method and apparatus for partition resolution in clustered computer systems | |
US8140623B2 (en) | Non-blocking commit protocol systems and methods | |
US8850018B2 (en) | Consistent cluster operational data in a server cluster using a quorum of replicas | |
US6889253B2 (en) | Cluster resource action in clustered computer system incorporation prepare operation | |
US6279032B1 (en) | Method and system for quorum resource arbitration in a server cluster | |
US7451359B1 (en) | Heartbeat mechanism for cluster systems | |
US7975006B2 (en) | Method and device for managing cluster membership by use of storage area network fabric | |
US8812501B2 (en) | Method or apparatus for selecting a cluster in a group of nodes | |
JPH11167558A (en) | Membership for distributed computer system of low reliability | |
US20110044209A1 (en) | Systems and methods for providing a quiescing protocol | |
WO1998033121A9 (en) | Method and apparatus for split-brain avoidance in a multi-process or system | |
US7120821B1 (en) | Method to revive and reconstitute majority node set clusters | |
WO2004051474A2 (en) | Clustering system and method having interconnect | |
US7143316B2 (en) | Fault diagnosis in a network | |
EP1704472A1 (en) | Configuring a shared resource | |
JP3446652B2 (en) | Hierarchical network management system | |
Black et al. | Determining the Last Membership of a Process Group after a Total Failure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AL AM AT AT AU AZ BA BB BG BR BY CA CH CN CU CZ CZ DE DE DK DK EE EE ES FI FI GB GD GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SK SL TJ TM TR TT UA UG UZ VN YU ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
AK | Designated states |
Kind code of ref document: A3 Designated state(s): AL AM AT AT AU AZ BA BB BG BR BY CA CH CN CU CZ CZ DE DE DK DK EE EE ES FI FI GB GD GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SK SL TJ TM TR TT UA UG UZ VN YU ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
ENP | Entry into the national phase |
Ref document number: 2306718 Country of ref document: CA Ref country code: CA Ref document number: 2306718 Kind code of ref document: A Format of ref document f/p: F |
|
ENP | Entry into the national phase |
Ref country code: JP Ref document number: 2000 517348 Kind code of ref document: A Format of ref document f/p: F |
|
NENP | Non-entry into the national phase |
Ref country code: KR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1998953773 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 1998953773 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1998953773 Country of ref document: EP |