CN103389715A - High-performance distributed data center monitoring framework - Google Patents

High-performance distributed data center monitoring framework Download PDF

Info

Publication number
CN103389715A
CN103389715A CN2013103181767A CN201310318176A CN103389715A CN 103389715 A CN103389715 A CN 103389715A CN 2013103181767 A CN2013103181767 A CN 2013103181767A CN 201310318176 A CN201310318176 A CN 201310318176A CN 103389715 A CN103389715 A CN 103389715A
Authority
CN
China
Prior art keywords
monitoring
framework
alarm
data center
scheduling process
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013103181767A
Other languages
Chinese (zh)
Other versions
CN103389715B (en
Inventor
王恩东
张东
刘正伟
陆峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Yingxin Computer Technology Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201310318176.7A priority Critical patent/CN103389715B/en
Publication of CN103389715A publication Critical patent/CN103389715A/en
Application granted granted Critical
Publication of CN103389715B publication Critical patent/CN103389715B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Abstract

The invention discloses a high-performance distributed data center monitoring framework. The framework structurally comprises monitoring core engines, monitoring and dispatching progresses, active monitoring and roll poling devices, passive monitoring receivers, warning engines and monitoring data processing centers. The monitoring framework is designed in a distributed manner, the involved processing processes in the framework are separated, detailed and modularized, the framework is divided into six modules for finishing work of each stage, and one monitoring core engine is reserved for finishing dispatching operation of each module, so that the consumed resource quantity during operation of the whole monitoring framework is reduced, equal distribution of the consumed resources in each module is realized, finally, the high performance during data center monitoring is realized, and the monitoring scale can be expanded to tens of thousands of nodes even hundreds of thousands of nodes.

Description

A kind of high performance distributed data center monitoring framework
Technical field
The present invention relates to distributed monitoring and data center's monitoring field, be specifically related to a kind of data center high performance, distributed, that monitoring is in large scale and monitor framework.
Background technology
Current, the scale of data center is increasing, high performance data center monitoring demand is more and more stronger, but because traditional monitoring framework only has too fat to move, monitoring core poor efficiency, the work efficiencies such as various data sampling and processings and analysis are very low, and there is insurmountable performance bottleneck problem, can't be to large-scale data center implementing monitoring.In actual applications, along with data center build larger and larger, more and more higher to the requirement of data center monitoring, traditional monitoring framework can't reach user demand, the performance bottleneck problem is also very serious.This traditional, that have performance bottleneck, integrate the too fat to move monitoring framework that all are processed, when to the device resource monitoring in enormous quantities of data center, efficiency is very low, resource cost serious, have performance bottleneck, can only the data center of 2000 node scales be monitored.
Summary of the invention
Technical matters to be solved by this invention is to provide a kind of more efficient, framework of data center's monitoring more accurately of the monitoring to the large-scale data center.
The technical solution used in the present invention is: a kind of high performance distributed data center monitoring framework, the architecture of this framework comprises: monitoring core engine, monitoring and scheduling process, active monitoring interrogator, passive type monitoring receiver, alarm engine and monitor data processing enter, wherein:
The monitoring core engine is the core of this framework, is responsible for driving, dispatching each module, also is responsible for reading every configuration of necessary for monitoring, will automatically configure according to being distributed to the monitoring and scheduling module after the division of dispatching process number;
The monitoring and scheduling process is responsible for mainly that monitoring according to the distribution of monitoring core engine configures to drive and dispatch active monitoring interrogator or the passive type monitoring receiver carries out monitoring data collection or reception, can be automatically a plurality of according to the startup of configuration scale, can carry out efficiently work to guarantee each monitoring and scheduling process;
Active monitoring interrogator is initiatively carried out the Monitoring Data collection;
The passive reception Monitoring Data of passive type monitoring receiver;
The alarm engine mainly be responsible for to be monitored alarm notification or the event handling action of institute's monitor device resources, according to monitoring that content is carried out the mail alarm, short message alarm sends or processing the event that produces etc.;
The monitor data processing enter is responsible for collecting, recording the Monitoring Data of generation, it is recorded to daily record, database or RRD database, and (RRD is the abbreviation of Round Robin Database, be used for recording fixed number, has cycle characteristics, and the data that particular value is arranged at current point in time, such as take sky as unit record temperature) in, and carry out data and process, analyze and obtain fault trend, historical monitor state curve, usability analyses form etc.
Node scale for data center, monitoring capacity according to each monitoring and scheduling process, enable the monitoring and scheduling process, the monitoring core engine can be automatically will be monitored configuration according to the dispatching process number and divides, and ready-portioned monitoring configuration is distributed to each monitoring and scheduling process gets on; Then, the monitoring core engine can drive the monitoring and scheduling process and start to carry out work, after each monitoring and scheduling process operation, drives active monitoring interrogator or passive type monitoring receiver and gathers, collects Monitoring Data; After monitoring data collection, the monitoring and scheduling process can send it to the monitor data processing enter, carries out processing, analysis and the record of data; The alarm engine is as the alarm core in whole monitoring framework, carry out work in self-driven mode, monitor alarm notification or the event handling action of monitored node, and according to monitoring content, make corresponding actions, send alarm email, note or process the event that produces.
In architecture, monitoring core engine, monitoring and scheduling process, active monitoring interrogator, passive type monitoring receiver, alarm engine, monitor data processing enter be modularized processing all, be the monitoring of whole data center framework is distributed is deployed on different servers, take full advantage of the resource that possesses separately and form one and can monitor that hundreds of thousands node scale is data center, high performance supervisory system.
The large module of in this framework six has all designed a spare module in the supervisory system application, with fault-tolerance and the stability that guarantees this system.
Active monitoring interrogator is designed to mode extending transversely automatically, distribute the monitoring task of getting off according to the monitoring and scheduling process and automatically adjust the number of active monitoring interrogator, guarantee the moderate pressure of each interrogator, with this, reach efficiently, initiatively carry out accurately the purpose that Monitoring Data gathers.
beneficial effect of the present invention: the present invention has broken traditional, there is performance bottleneck, integrate the too fat to move monitoring framework that all are processed, when data center carries out the monitoring of device resource in enormous quantities, efficiency is very low, resource cost is serious, there is performance bottleneck (as can only the data center of 2000 node scales be monitored) etc., carry out Distributed Design by monitoring framework, the processing procedure that relates in framework is separated, refinement, modularization, be divided into is that six large modules are completed the work in each stage, and keep a monitoring core engine and complete the management and running of each module, and then the consumes resources amount while reducing the operation of whole monitoring framework, and accomplish that each module institute's cost source is impartial and distribute.Finally, the high-performance while having realized the data center monitoring, the popularization that can monitor has arrived several ten thousand, a hundreds of thousands node.
Description of drawings
Accompanying drawing 1 is traditional monitoring configuration diagram;
Accompanying drawing 2 is based on distributed high-performance data center monitoring configuration diagram;
Accompanying drawing 3 is the distributed data central monitor system schematic diagram of 100,000 node scales.
Embodiment
With reference to Figure of description, content of the present invention is done following detailed explanation with an instantiation:
The supervisory system that builds as shown in Figure 3 100,000 node scales is example, sets forth the specific implementation of high performance distributed data center monitoring framework.
For the data center with 1,000,000,000 node scales, be about 10000 nodes according to the monitoring capacity of each monitoring and scheduling process, need to enable 10 monitoring and scheduling processes and 1 standby monitoring and scheduling process while therefore disposing monitoring.The monitoring core engine will be automatically will be monitored configuration according to the dispatching process number and be divided into 10 parts, and ready-portioned monitoring configuration is distributed to each monitoring and scheduling process will get on.Then; the monitoring core engine can drive the monitoring and scheduling process and start to carry out work; after each monitoring and scheduling process operation; just according to the configuration in the configuration active or the passive type monitoring mode drives active monitoring interrogator or the passive type monitoring receiver gathers, collects Monitoring Data; can start 5 active monitoring interrogator according to the node scale and respond the monitoring data collection task that 10, upper strata monitoring and scheduling process issues, namely two monitoring and scheduling processes of 1 active monitoring interrogator response issue task.After monitoring data collection, the monitoring and scheduling process can send it to data processing centre (DPC), carries out processing, analysis and the record etc. of data.The alarm engine is to carry out work as the alarm core in whole monitoring framework in self-driven mode, it will monitor alarm notification or the event handling action of monitored node, and according to monitoring content, make corresponding actions, send the event of alarm email, note or processing generation etc.As shown in FIG., the large module of six in this framework has all designed a spare module in a little supervisory system application, with fault-tolerance and the stability that guarantees this system.

Claims (4)

1. high performance distributed data center monitoring framework, it is characterized in that: the architecture of this framework comprises: monitoring core engine, monitoring and scheduling process, active monitoring interrogator, passive type monitoring receiver, alarm engine and monitor data processing enter, wherein:
The monitoring core engine is the core of this framework, is responsible for driving, dispatching each module, also is responsible for reading every configuration of necessary for monitoring, will automatically configure according to being distributed to the monitoring and scheduling module after the division of dispatching process number;
The monitoring and scheduling process is responsible for mainly that monitoring according to the distribution of monitoring core engine configures to drive and dispatch active monitoring interrogator or the passive type monitoring receiver carries out monitoring data collection or reception, can automatically according to the configuration scale, start a plurality of monitoring and scheduling processes;
Active monitoring interrogator is initiatively carried out the Monitoring Data collection;
The passive reception Monitoring Data of passive type monitoring receiver;
The alarm engine mainly is responsible for monitoring alarm notification or the event handling action of institute's monitor device resources, and content is carried out the mail alarm, short message alarm sends or process the event that produces according to monitoring;
The monitor data processing enter is responsible for collecting, recording the Monitoring Data of generation, it is recorded in daily record, database or RRD database, and carries out data and process, analyze and obtain fault trend, historical monitor state curve, usability analyses form;
Node scale for data center, monitoring capacity according to each monitoring and scheduling process, enable the monitoring and scheduling process, the monitoring core engine can be automatically will be monitored configuration according to the dispatching process number and divides, and ready-portioned monitoring configuration is distributed to each monitoring and scheduling process gets on; Then, the monitoring core engine can drive the monitoring and scheduling process and start to carry out work, after each monitoring and scheduling process operation, drives active monitoring interrogator or passive type monitoring receiver and gathers, collects Monitoring Data; After monitoring data collection, the monitoring and scheduling process can send it to the monitor data processing enter, carries out processing, analysis and the record of data; The alarm engine is as the alarm core in whole monitoring framework, carry out work in self-driven mode, monitor alarm notification or the event handling action of monitored node, and according to monitoring content, make corresponding actions, send alarm email, note or process the event that produces.
2. high performance distributed data center monitoring framework according to claim 1, it is characterized in that: described monitoring core engine, monitoring and scheduling process, active monitoring interrogator, passive type monitoring receiver, alarm engine, monitor data processing enter be modularized processing all, i.e. whole data center monitoring framework distributed earth is deployed on different servers.
3. high performance distributed data center monitoring framework according to claim 2 is characterized in that: described monitoring core engine, monitoring and scheduling process, active monitoring interrogator, passive type monitoring receiver, alarm engine, the large module of monitor data processing enter six all design a spare module in the supervisory system application.
4. according to claim 1,2 or 3 described high performance distributed data center monitoring frameworks, it is characterized in that: described active monitoring interrogator is designed to mode extending transversely automatically, distributes the monitoring task of getting off according to the monitoring and scheduling process and automatically adjusts the number of active monitoring interrogator.
CN201310318176.7A 2013-07-26 2013-07-26 A kind of high performance distributive data center monitoring framework Active CN103389715B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310318176.7A CN103389715B (en) 2013-07-26 2013-07-26 A kind of high performance distributive data center monitoring framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310318176.7A CN103389715B (en) 2013-07-26 2013-07-26 A kind of high performance distributive data center monitoring framework

Publications (2)

Publication Number Publication Date
CN103389715A true CN103389715A (en) 2013-11-13
CN103389715B CN103389715B (en) 2016-03-23

Family

ID=49534014

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310318176.7A Active CN103389715B (en) 2013-07-26 2013-07-26 A kind of high performance distributive data center monitoring framework

Country Status (1)

Country Link
CN (1) CN103389715B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103618644A (en) * 2013-11-26 2014-03-05 曙光信息产业股份有限公司 Distributed monitoring system based on hadoop cluster and method thereof
CN105094698A (en) * 2015-07-08 2015-11-25 浪潮(北京)电子信息产业有限公司 Method for predicting disc capacity based on historical monitoring data
CN106027306A (en) * 2016-05-26 2016-10-12 浪潮(北京)电子信息产业有限公司 Resource monitoring method and device
CN106100938A (en) * 2016-08-19 2016-11-09 浪潮(北京)电子信息产业有限公司 The monitoring of a kind of distributed cluster system and alarm method and system
CN106202324A (en) * 2016-06-30 2016-12-07 北京奇虎科技有限公司 The data processing method of a kind of real-time calculating platform and device
CN106354616A (en) * 2016-08-18 2017-01-25 北京并行科技股份有限公司 Method and device for monitoring application execution performance and high-performance computing system
CN106407078A (en) * 2016-09-26 2017-02-15 中国工商银行股份有限公司 An information interaction-based client performance monitoring device and method
CN107508731A (en) * 2017-10-10 2017-12-22 郑州云海信息技术有限公司 A kind of large-scale data center monitoring method and system
CN108234150A (en) * 2016-12-09 2018-06-29 中兴通讯股份有限公司 For the data acquisition and processing (DAP) method and system of data center's monitoring system
CN108259270A (en) * 2018-01-11 2018-07-06 郑州云海信息技术有限公司 A kind of data center's system for unified management design method
CN108563550A (en) * 2018-04-23 2018-09-21 上海达梦数据库有限公司 A kind of monitoring method of distributed system, device, server and storage medium
WO2018199817A1 (en) * 2017-04-24 2018-11-01 Telefonaktiebolaget Lm Ericsson (Publ) Message queue performance monitoring
CN108809701A (en) * 2018-05-23 2018-11-13 郑州云海信息技术有限公司 A kind of data center's wisdom data platform and its implementation

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020002443A1 (en) * 1998-10-10 2002-01-03 Ronald M. Ames Multi-level architecture for monitoring and controlling a functional system
US20080082181A1 (en) * 2006-09-29 2008-04-03 Fisher-Rosemount Systems, Inc. Statistical signatures used with multivariate analysis for steady-state detection in a process
CN101232515A (en) * 2008-02-25 2008-07-30 浪潮电子信息产业股份有限公司 Distributed type colony management control system based on LDAP
CN102591282A (en) * 2012-02-14 2012-07-18 浙江鼎丰实业有限公司 Distributed data collection and transmission system
CN102608970A (en) * 2012-03-05 2012-07-25 浪潮通信信息系统有限公司 Distributed data acquisition method based on centralized management and automatic scheduling
CN102970183A (en) * 2012-11-22 2013-03-13 浪潮(北京)电子信息产业有限公司 Cloud monitoring system and data reflow method thereof

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020002443A1 (en) * 1998-10-10 2002-01-03 Ronald M. Ames Multi-level architecture for monitoring and controlling a functional system
US20080082181A1 (en) * 2006-09-29 2008-04-03 Fisher-Rosemount Systems, Inc. Statistical signatures used with multivariate analysis for steady-state detection in a process
CN101232515A (en) * 2008-02-25 2008-07-30 浪潮电子信息产业股份有限公司 Distributed type colony management control system based on LDAP
CN102591282A (en) * 2012-02-14 2012-07-18 浙江鼎丰实业有限公司 Distributed data collection and transmission system
CN102608970A (en) * 2012-03-05 2012-07-25 浪潮通信信息系统有限公司 Distributed data acquisition method based on centralized management and automatic scheduling
CN102970183A (en) * 2012-11-22 2013-03-13 浪潮(北京)电子信息产业有限公司 Cloud monitoring system and data reflow method thereof

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103618644A (en) * 2013-11-26 2014-03-05 曙光信息产业股份有限公司 Distributed monitoring system based on hadoop cluster and method thereof
CN105094698A (en) * 2015-07-08 2015-11-25 浪潮(北京)电子信息产业有限公司 Method for predicting disc capacity based on historical monitoring data
CN105094698B (en) * 2015-07-08 2018-09-11 浪潮(北京)电子信息产业有限公司 A kind of disk size prediction technique based on Historical Monitoring data
CN106027306A (en) * 2016-05-26 2016-10-12 浪潮(北京)电子信息产业有限公司 Resource monitoring method and device
CN106202324A (en) * 2016-06-30 2016-12-07 北京奇虎科技有限公司 The data processing method of a kind of real-time calculating platform and device
CN106202324B (en) * 2016-06-30 2020-10-30 北京奇虎科技有限公司 Data processing method and device for real-time computing platform
CN106354616B (en) * 2016-08-18 2019-05-03 北京并行科技股份有限公司 Monitor the method, apparatus and high performance computing system of application execution performance
CN106354616A (en) * 2016-08-18 2017-01-25 北京并行科技股份有限公司 Method and device for monitoring application execution performance and high-performance computing system
CN106100938A (en) * 2016-08-19 2016-11-09 浪潮(北京)电子信息产业有限公司 The monitoring of a kind of distributed cluster system and alarm method and system
CN106407078A (en) * 2016-09-26 2017-02-15 中国工商银行股份有限公司 An information interaction-based client performance monitoring device and method
CN106407078B (en) * 2016-09-26 2019-06-25 中国工商银行股份有限公司 Client performance monitoring device and method based on information exchange
CN108234150A (en) * 2016-12-09 2018-06-29 中兴通讯股份有限公司 For the data acquisition and processing (DAP) method and system of data center's monitoring system
WO2018199817A1 (en) * 2017-04-24 2018-11-01 Telefonaktiebolaget Lm Ericsson (Publ) Message queue performance monitoring
US10853153B2 (en) 2017-04-24 2020-12-01 Telefonaktiebolaget Lm Ericsson (Publ) Message queue performance monitoring
CN107508731A (en) * 2017-10-10 2017-12-22 郑州云海信息技术有限公司 A kind of large-scale data center monitoring method and system
CN108259270A (en) * 2018-01-11 2018-07-06 郑州云海信息技术有限公司 A kind of data center's system for unified management design method
CN108563550A (en) * 2018-04-23 2018-09-21 上海达梦数据库有限公司 A kind of monitoring method of distributed system, device, server and storage medium
CN108809701A (en) * 2018-05-23 2018-11-13 郑州云海信息技术有限公司 A kind of data center's wisdom data platform and its implementation

Also Published As

Publication number Publication date
CN103389715B (en) 2016-03-23

Similar Documents

Publication Publication Date Title
CN103389715B (en) A kind of high performance distributive data center monitoring framework
CN106651633B (en) Power utilization information acquisition system based on big data technology and acquisition method thereof
CN104407964B (en) A kind of centralized monitoring system and method based on data center
CN108845878A (en) The big data processing method and processing device calculated based on serverless backup
CN109873499B (en) Intelligent power distribution station management terminal
CN105608223A (en) Hbase database entering method and system for kafka
CN107302466A (en) A kind of power & environment supervision system big data analysis platform and method
CN101673100B (en) Acquisition method and system of parameters of technique process
CN105320757A (en) Business intelligent analysis method for quickly processing data
CN105094698A (en) Method for predicting disc capacity based on historical monitoring data
CN106097161A (en) Water affairs management system and data processing method thereof
CN105430030A (en) OSG-based parallel extendable application server
CN102355696A (en) Large scale Internet of things gateway system and realization method thereof
CN112462724A (en) Data monitoring system based on industrial internet
CN107480027A (en) A kind of distributed deep learning operational system
CN103973516A (en) Method and device for achieving monitoring function in data processing system
CN105373620A (en) Mass battery data exception detection method and system for large-scale battery energy storage power stations
US10331484B2 (en) Distributed data platform resource allocator
CN104391990A (en) Multi-task type collecting and harvesting method based on vertical industry
CN114598586B (en) Multi-cloud scene computing power gridding method and system
CN102571424A (en) Processing method, device and system for engineering event
CN113487170A (en) Full link monitoring system with layered technical architecture
CN202231739U (en) Large-scale internet of things gateway system
CN109302723A (en) A kind of multinode real-time radio pyroelectric monitor control system Internet-based and control method
CN205158617U (en) Equipment inspection maintains and data acquisition system based on RFID radio frequency technology

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20190715

Address after: 250100 North 3-storey North District, No. 1036 Tidal Road, Tidal Science Park S05 Building, Jinan High-tech Zone, Shandong Province

Patentee after: Shandong Yingxin Computer Technology Co., Ltd.

Address before: 250014 Shandong Province, Ji'nan City hi tech Development Zone, Nga Road No. 1036

Patentee before: Langchao Electronic Information Industry Co., Ltd.

TR01 Transfer of patent right