The storage space and cost for Smart Grid datasets has been growing exponentially due to its high data-rate of various sensor readings from Automated Metering Infrastructure (AMI), and Phasor Measurement Units (PMUs). The paper focuses on Phasor Data Concentrators (PDCs) that aggregate data from PMUs. PMUs measure real-time voltage, current and frequency parameters across the electrical grid. A typical PDC can process data from anywhere ten to forty PMUs. The paper exploits the need for appropriate security and data compression challenges simultaneously. As a result, an optimal compression method ER1c is investigated for efficient storage of IREG and C37.118 timestamped PDC data sets. We expect that our approach can greatly reduce the storage cost requirements of commercial available PDCs (SEL 3373, GE Multilin P30) by 80%. For example, 2 years of PDC data storage space can be easily replaced with only 10 days of storage space. In addition, our approach in combination with AES 256 encryption can protect PDC data to larger degree as per National Institute of Standards and Technology (NIST) standards.
The number of PMU deployment is growing exponentially in North America, and hence the amount of data to be stored even for a short-period is large [
This optimal reduction in file or storage sizes of PDC data can help in reducing storage cost, efficiently secure and organize SQL queries and retrieval with faster download-time. The parameters (compression rates, data retrieval time) have considered as a benchmark performance metrics at super PDC level.
This paper specifically uses a two-level of compression for PDC data with AES 256- bit encryption (see
For example, a file with initial size of 1200 KB can be reduced to 300 KB through compression, and will have a compression ratio of 4. The higher the ratio is, the better the compression method and for efficient storage will be. The PMUs data are a measurement of voltage, current, frequency and phase angle of voltage and current marked with a timestamp. The timestamp follows rules as per IEEE C37.118 or IRIG 200-04 standards. IRIG-B Timestamp format is a standard for encoding timestamp information for PMU data. The specific standard followed for data is IRIG STANDARD 200-04 [
C37.118 standard, because of the index position marker, which use some bits too. This IRIG based PMU timestamp allows a better implementation for a human readable application, because the time corresponds to real-time.
All PMU measurements are encoded in ASCII characters. Mostly, the parameters in PMU measurements (f, v, and i) and some section of time-stamp (date, month, or day) remain same and redundantly repeated, and thus encoded value do not have to be changed frequently. For example, frequency (f) should be at 60 Hz, the voltage v at 300 kV and the current i at 500 A. Only during the event of any anomalies or changes in measurements, any altered values from this value need to be correctly encoded. Any such sudden change in data pattern can easily be tracked in our approach. Due to limited encoding fields (FRACSEC, SOC) on the time-stamp, the proposed compression technique is able to reduce the original file sizes of PDC’s data. The results of our compression is discussed in the next section. PMUs considered for this investigation has a data rate of 30 samples per second. So, the number of samples remain constant for every second. These data rate patterns are not detected by the classical compression methods (Huffman coding, Dictionary coders or prediction by partial matching [
Frame 1: 05-Dec-2015 17:41:36.666, 59.10 Hz.
Frame 2: 05-Dec-2015 17:41:36.700, 60.01 Hz.
The focus restricts only to redundant time-stamp information in the PMU data. The classical compression methods such as dictionary coders could compress all fields like date, hour, minute, seconds and milli-second. Thus, there exist a possibility of compressing the redundant data fields such as the date, day, hour, minute and seconds to certain period of duration. These are repeated information which are encoded again line by line. This is an un-necessary process and results in wastage of CPU time and storage costs. To avoid this problem, our ER1c program will capture the initial time- stamp information only once with its date, day, hour, minute, and second. Any repeated information will not be encoded or compressed again to save storage cost. The only varying and non-repeated data field is the milli-second (ms) information, which will be encoded continuously. See the pseudo code shown in Example 1.
The ER1-c program can also detect and check whether the duration between each measurement is consistent or not. In other words, it would check the number of samples per second for data validation depending on PMU type. We assumed the PMU used in the PDC data set has a data rate of 30 samples per second. ER1c is a simple and optimal program to detect time-stamp errors. For example, if a second time-stamp (frame 2) following a first time stamp (frame 1) are not respecting duration between each measurement, the ER1c program will catch these duration or sampling errors.
The decompression process is shown in
Example 1. Pseudocode for handling redundant time-stamps.
Compression Type | CRf | CRC | CRv | CRph |
---|---|---|---|---|
7z | 17.15 | 20.04 | 19.88 | 19.88 |
rar | 10.11 | 10.72 | 10.65 | 11.79 |
zip | 10.16 | 6.25 | 7.13 | 5.83 |
zipx | 18.66 | 11.72 | 12.84 | 11.43 |
uha | 11.92 | 20.83 | 22.22 | 20.88 |
ER1c | 30.69 | 11.40 | 10.97 | 11.69 |
ER1c + 7z | 160.08 | 81.74 | 96.97 | 86.66 |
ER1c + rar | 173.76 | 80.26 | 93.43 | 85.18 |
ER1c + zip | 167.89 | 78.73 | 94.98 | 84.61 |
ER1c + zipx | 184.49 | 81.33 | 100.89 | 90.43 |
ER1c + uha | 180.63 | 83.43 | 109.48 | 90.93 |
Compression Type | CRf | CRC | CRv | CRph |
---|---|---|---|---|
7z | 18.05 | 24.45 | 23.87 | 24.94 |
rar | 9.99 | 13.53 | 12.18 | 15.02 |
zip | 10.21 | 6.27 | 7.14 | 5.84 |
zipx | 19.12 | 13.30 | 13.89 | 13.62 |
uha | 12.02 | 26.18 | 27.03 | 29.24 |
ER1c | 29.11 | 11.38 | 10.95 | 11.65 |
ER1c + 7z | 271.50 | 192.74 | 223.13 | 199.50 |
ER1c + rar | 270.49 | 181.61 | 197.09 | 190.01 |
ER1c + zip | 241.18 | 176.51 | 193.00 | 182.05 |
ER1c + zipx | 274.62 | 211.95 | 243.84 | 229.28 |
ER1c + uha | 298.62 | 206.44 | 250.60 | 220.22 |
An optimal compression method for streaming time-stamped data sets for PDCs is presented. The proposed approach is suitable at PDC level for efficient data storage, retrieval and post-event analysis. The preliminary results indicate that ER1c with combination from existing compression techniques can yield better compression ratio. We expect that our approach can greatly reduce the storage cost requirements of commercial available PDCs to 80%. For example, 2 years of PDC data storage capacity can be easily replaced by only 10 days of capacity. In addition, our approach with combination of AES 256 encryption can protect PDC data with a greater confidence and thus increase the security of growing big data sets in smart grid network.
This work is made possible through UND’s RD & C (21418-4010-02294).
Olivo, E., Campion, M. and Ranganathan, P. (2016) Data Compression for Next Generation Phasor Data Concentrators (PDCs) in a Smart Grid. Journal of Information Security, 7, 291-296. http://dx.doi.org/10.4236/jis.2016.75024