Summary:To maintain a well-regulated power environment within the data center, promptly identify and rectify hazards and faults, reduce management costs and energy consumption, and enhance operational efficiency, ensuring the safety, efficiency, environmental friendliness, and stability of the data center's operation, it is essential to construct a more intuitive and efficient automated monitoring system for the current data center management. This article is based on the technical architecture of an integrated real-time business monitoring system, designing a unified and centralized monitoring platform. It integrates and processes the critical information required for the entire data center's monitoring and management, achieving a highly unified interface display, information sharing, coordination, and interconnection functions, thereby alleviating the workload of management personnel, and realizing an integrated "monitoring, management, and control" system.
Keywords:Data Center; Server Room; Environmental Control; Monitoring; Visualization
0 Introduction
With the development and widespread use of information technology, the data centers in Sichuan Province's Meteorological Detection Data Center, which house systems like Tianqing, Tianjing, resource pools, core networks, and security equipment, are becoming increasingly centralized and refined. The volume of meteorological data is soaring, and the number of computer systems and communication devices is growing daily, expanding in scale. The center's data center, computer systems, and communication networks have become the core part of business management for various units. To ensure their safe and stable operation, the power, environmental, fire protection, and security systems that accompany them must work in constant harmony. Traditional data center management often fails to promptly resolve faults and lacks scientific management of the timing and responsibility of accidents. Moreover, it lacks comprehensive analysis of past faults, preventing issues from being fully resolved.
Based on the technical architecture of the provincial comprehensive business real-time monitoring system, this article thoroughly investigates the urgent comprehensive monitoring needs of the data center's power and environmental conditions. It designs a data center machine room monitoring system. This system achieves centralized monitoring of various equipment in the machine room, primarily including power system monitoring, environmental system monitoring, and video monitoring, with comprehensive monitoring and control functions. Through web pages, it records various events in real-time and provides timely alerts for fault events, effectively improving the management efficiency of operation and maintenance personnel. It can quickly and efficiently eliminate faults and conduct comprehensive data analysis of occurred faults, achieving reliable scientific management of the machine room.
1 Feature Design
1.1 Physical Architecture
The system is composed of communication stations, regional monitoring centers, central monitoring centers, user terminals, and upper management platforms, featuring clear hierarchy, simple structure, and logical clarity. It employs distributed deployment, with one monitoring server deployed in each data center.
The data collector at the forefront of this data center records local monitoring data, operational records, alarm dispatch records, and supports sensor control. The physical flowchart of the system is shown in Figure 1.
Figure 1: Data Center Physical Structure Monitoring
1.1.1 Communication Bureau (Station): Comprising an environmental monitoring host, sensors, monitoring modules, etc., it achieves data collection of on-site environment, power equipment, fire safety, and security. On-site sensors are connected to the environmental monitoring host interface via twisted pair cables.
Intelligent devices directly connect collected data to the environmental monitoring host via an intelligent monitoring module.
1.1.2 Regional Monitoring Center: Comprised of regional monitoring servers (which can integrate services from the monitoring center server and allocate access permissions), it oversees and operates the communication stations within the region, storing the data uploaded by these stations. Additionally, it forwards data from all communication stations within the region to the superior monitoring center.
1.1.3 Monitoring Center: Comprising the main monitoring server, backup monitoring server, and intelligent cloud center, it stores the data uploaded by communication stations (sites). The system employs a B/S architecture, serving as the data backend and supporting access via various means such as front-end PC devices and client apps. It also provides interfaces for data transmission to third parties, facilitating integration.
1.1.4 User Endpoints: Manage, inquire, control, operate, and maintain through various methods such as client apps, PC browsers, and more. Capable of generating various statistical reports, data analysis and mining, alert management, operation and maintenance management, permission management, and system configuration management.
1.1.5 Upper Management Platform: Transmits environmental monitoring data to third-party and regulatory platforms via VPN, public internet, and other methods, displaying the data on a centralized large-screen dashboard. The system boasts excellent scalability, allowing for the addition or removal of monitoring objects and data centers without altering the original system design. It simply requires adding the corresponding sensors and monitoring modules based on the existing design, seamlessly integrating new monitoring content and objects into the existing system.
1.2 Technical Architecture
The server-side will utilize the Linux operating system, while the client-side will support Windows XP, Windows 7, and Windows 10 operating systems.
1.2.1 Programming Languages: Java and Python will be utilized for system development, with the frontend operations portal and management interface developed using J2EE technology architecture and Java programming language. Backend functionalities such as data collection and storage management will be developed using probes, Python scripts, web crawlers, and Spring JPA, to meet the diverse data source requirements of the system.
1.2.2 Data Storage Service: The data storage service program is planned to be developed using programming languages such as Java, with data collection primarily through interfaces for read/write operations. The configuration information is intended to reuse the Tianjing system as the data storage platform.
Data Communication Services: Plans to develop using programming languages such as Java, with the data communication service program offering multiple interfaces including FTP, HTTP, and Socket, for data exchange with various data sources.
2018, Jasper Report, iReport, FCKeditor, HXGIS, MySQL, Restful, etc.
The statistical analysis program utilizes a WebService technology-based API call interface, achieving high concurrency response through the load balancing strategy of the application server. As shown in Figure 2.
Figure 2: Data Center Monitoring Technology Architecture
1.3 Platform Architecture
The system is based on the National Unified Meteorological Comprehensive Business Real-time Monitoring System (Tianjing) database, integrating existing provincial bureau system information. Adhering to the Tianjing database entry interface specifications, the monitoring data resources of the data center system are recorded in the Tianjing database, and then the data resources already stored are accessed through the data retrieval interface. The architecture of the data center power environment monitoring platform includes five parts: data support layer, data storage layer, technical support layer, application layer, and presentation layer, as shown in Figure 3.
Figure 3: Data Center Monitoring Platform Architecture
2 System Features
The system features primarily include modules such as power monitoring, environmental monitoring, system logs, alarm management, data statistics, and permission management. As shown in Figure 4 below, this is the system's homepage.
Figure 4: System Home Page Display
2.1 Data Center Power Monitoring
Including power supply monitoring, distribution switch status monitoring, and UPS monitoring. Power supply monitoring is achieved by installing an electric power meter in the distribution cabinet, connecting the power meter to the incoming line of the main power supply and the connection line of the current transformer, and then connecting the electric power meter to the sensor interface of the environmental monitoring host using twisted pair cables. This enables the monitoring of parameters such as voltage U, current I, frequency, power factor, active power, and reactive power. Distribution switch status monitoring is implemented by installing a distribution switch module in the distribution cabinet, with the single switch module connected in parallel.。
2.2 Data Center Environment Monitoring
Including temperature and humidity monitoring, leak detection, and video surveillance. Temperature and humidity monitoring is achieved by placing temperature and humidity sensors in the areas to be monitored, transmitting the collected signals to the environmental monitoring host, and displaying real-time temperature and humidity values at different locations on a dynamic electronic map on the web page. Leak detection is implemented by laying leak detection wires in areas prone to leaks, connecting the wires to a leak detection controller, and then connecting the controller's output signal to the environmental monitoring host. When the leak detection wires detect a leak, the system will immediately sound an alarm. Video surveillance can monitor the server room in real-time and view it on the web page, with the system supporting multi-screen video browsing, video playback, and video control management.
2.3 System Logs
The system log records operations, performance, access, and alerts for both the system and the host for issue tracing. The system log includes: operation logs, access logs, and alert dispatch logs. The host log includes: access logs, operation logs, and event logs.
2.4 Alert Management
When monitoring items exceed the set upper and lower limits, they are deemed as alarm events. To prevent the system from continuously repeating alarm judgments when the environmental monitoring values fluctuate around the set thresholds, the system only triggers an alarm event after a certain period of time has passed beyond the upper and lower limits. The warning time can be set. Alarm notifications display alarm information and alarm feedback information in a chronological order, allowing for a sequential view of critical alarm information and feedback on the alarm dashboard's main page. This enables a time-based view of the generation of critical alarms, and by reviewing alarm events at a specific time, alarm information can automatically link to the configuration information of the fault source, providing relevant management information about the fault resources, thereby enhancing the efficiency of fault handling.
2.5 Data Query
Users can select the start and end time as well as the sensor for querying the data at the monitoring point within that period. The query results include data collection time, description, type, and data. Historical data can be analyzed for trends and comparisons, and alarm data can be counted for alarm occurrences and offline instances. More data can also be viewed. Users can choose the start and end time, and select a specific monitoring object for hourly, daily, and monthly statistics.
Ankore Environmental Monitoring System: Equipment Selection Introduction
3.1 Software Introduction
Through the data center's environmental and physical monitoring system, we have achieved real-time monitoring of access control, water leakage, smoke, video, environmental conditions, high and low voltage power distribution, and equipment operation status. This system also provides real-time alarms to ensure the normal operation of the data center, prevent equipment failures due to uncontrolled operating environments, ensure the safety of maintenance personnel, extend the lifespan of equipment, and reduce costs associated with extensive management of the power distribution room. Additionally, the system enables environmental monitoring and energy efficiency analysis of energy consumption, assisting users in optimizing energy usage efficiency.
System Features
(1) Display the total energy consumption of the current data center, IT energy consumption, air conditioning energy consumption, and other energy consumption, and calculate the real-time PUE value of the data center. Present this information visually through a dashboard.
(2) Select to view the main circuit diagram of the medium and low-voltage distribution system in the data center, and display the current remote telemetry, remote signaling data, and status of the distribution system in a single view. Real-time monitor the voltage, current, and other electrical parameters of each distribution cabinet, as well as the environmental conditions of the substation, such as temperature and humidity, smoke detection, water immersion, and access control.
(3) Real-time temperature monitoring of electrical contacts is implemented by installing wireless temperature sensors at critical locations such as circuit breaker contacts, terminals, busbars, and cable connections, facilitating early detection of abnormal temperatures that could lead to accidents.
(4) Monitor various parameters of transformers, including load factor, frequency, power factor, and three-phase unbalance, and display historical curve charts with real-time data changes.
(5) Online monitoring of electrical power quality can detect current and voltage harmonic distortion rates, record transient events such as voltage sags, surges, and interruptions, and ITIC tolerance curves.
(6) The system collects data from the UPS input/output terminals and bypass, including three-phase voltage, current, active power, power factor, and frequency. It also monitors UPS temperature, battery voltage, and remaining runtime under the current load.
(7) Display individual battery voltage, internal resistance, and temperature, and predict the remaining time when the battery is under load. Each battery's data can be set for abnormal alarm, enabling timely detection of lead-acid battery anomalies.
(8) Display electrical parameters of the incoming and outgoing circuitry within the precision distribution cabinet, including current, voltage, power, energy, and switch status. The system allows for data alarm settings and classification, with data sourced from the measurement module of the precision distribution cabinet.
(9) Display the electrical parameters of the intelligent mini busbar's terminal box and junction box, including current voltage, switch status, and plug-in point temperature, and set up alarm and grading for the data.
(10) Visualize data center energy distribution and equipment placement through a floor plan, displaying energy consumption data for each device. Clicking on a device within the floor plan accesses a specific device monitoring interface.
4 Closing Remarks
The research, starting from the top-level design, has established a comprehensive monitoring and management platform for data centers, achieving integrated monitoring and management of the infrastructure within the机房. The system boasts comprehensive monitoring and management functions, system compatibility, and system scalability. A standardized, informatized, automated, intelligent, and visual infrastructure monitoring and management system for data centers has been designed and constructed.
To provide highly stable and reliable monitoring information resources for the operation of various systems and equipment within the server room, which reduces the workload of the management staff, enables quick and efficient troubleshooting, and offers comprehensive data analysis of any faults that have occurred.
[Reference]
Yang Weifa, Cai Ming, Cheng Changyu. Provincial Ground-based Meteorological Observation Automation Operation Monitoring Platform established
Journal of Computer Science and Application [J]. Information and Communication, 2020(2): 153-155.
Zhou Lihua, Chen Wu. Application of Intelligent Control Systems in Data Centers[J]. Yunnan Electric Power Technology
2016 (12): 141.
Xu Liujunxi, Ding Hongwei, Shi Weifan. Fault Monitoring of Transmission Ring Network in TV and Radio Broadcast Control System
Measurement Research[J]. Digital Communication World, 2018(10): 20-21.
Xiong Anyuan, Zhao Fang, Wang Ying, et al. Design and Implementation of the National Comprehensive Meteorological Information Sharing System







