We implemented a generalized infrastructure for Internet of Things (IoT infrastructure) to be applicable in various areas such as Smart Grid. That IoT infrastructure has two methods to store sensor data. They commonly have the features of double overlay structure, virtualization of sensors, composite services as federation using publisher/subscriber. And they are implemented as synthesizing the elemental architectures. The two methods majorly have the common architectural elements, however there are differences in how to compose and utilize them. But we observed the non-negligible differences in their achieved performance by the actual implementations due to operational items beyond these architectural elements. In this paper, we present the results of our analysis about the factors of the revealed differences based on the measured performance. In particular, it is clarified that a negative side effect due to combining independent elemental micro solutions naively could be amplified, if maximizing the level of loose coupling is applied as the most prioritized design and operational policy. Primarily, these combinations should be evaluated and verified during the basic design phase. However, the variation of how to synthesize them tends to be a blind spot when adopting the multiple independent architectural elements commonly. As a practical suggestion from this case, the emphasized importance in carrying out a new synthetization with multiple architectures is to make a balance naturally among architectural elements, or solutions based on them, and there is a certain demand to establish a methodology for architectural synthetization, including verification.
We implemented a generalized IoT infrastructure containing the functionalities such as message routing, message mediation, data storing and analyzing for Big Data. The main purpose of developing this IoT infrastructure was to use this for multiple evaluative experiments in various areas such as Smart Grid. Therefore, it was greatly expected as the maximum requirement that it should easily execute data aggregation with the various data used or generated by the numerous applications without a trouble. The following items are basic architectural requirements:
1) Realizing the maximum level of loose coupling and service oriented for various sensors, devices and utilities whenever connecting them are required. [BAR-1]
2) Applying an overlay structure, in order to make various sensor networks be connectable. [BAR-2]
3) Realizing the wide distributed computational environment in order to handle the huge amount and various types of data. [BAR-3]
4) Implementing the functionalities of the message routing, the data mediation to transforming among various messages’ types, furthermore services that provide the abstract information models and meta data to support the above three items. [BAR-4]
In particular, [BAR-1], [BAR-2] and [BAR-4] are demanded as basic architectural requirements for realizing appropriate flexibilities needed in executing the numerous experiments. And the meaning of the maximum level of loose coupling in [BAR-1] corresponds to not only architectural aspects but also operational items such as procedures to exclude the any obstacles and constraints in practical operations. Furthermore, the overlay structure mentioned in [BAR-2] is quite general, and required as well in other platforms explained in the later section.
In order to fulfill the above set of the requirements, various elemental functionalities including several experimental mechanisms are applied. The first is, for instance, to virtualize sensors and to abstract the structures embedding these sensors. Then they are mapped into a generalized information model [
However, there are two actual implementations for storing data by changing the configuration consisting of the above three elemental functionalities commonly in order to respond to various demands on the sensors’ variation. In other words, the double overlay structure, the virtualization of sensors, the composite services as federation using the publisher/subscriber are adopted commonly, but there is a difference in how to compose and utilize them. However, we observed the non-negligible differences in their achieved performance by the actual implementations due to operational items beyond features of these architectural elements. Thus, we analyzed the factors of these differences in performance. In particular, we clarified that a negative side effect due to combining independent elemental micro solutions unsophisticatedly could be amplified, if maximizing the level of loose coupling is applied as the most prioritized design and operational policy. Generally, these combinations should be evaluated and verified during the basic design phase. However, the variation of how to synthesize them tends to be a blind spot when adopting the multiple independent architectural elements as common ones. As a practical suggestion from this case, the emphasized importance in carrying out a new synthetization with multiple architectures is to make a balance naturally among architectural elements, or solutions based on them, and there is a certain demand to establish a methodology for architectural synthetization, including verification.
The remainder of this paper is organized as follows: We briefly explain about the related works in Section 2. Then, we will provide an overview of the basic architecture in Section 3. And, we mention our two equivalent system models in Section 4. This is an essential model on the viewpoint of traffic conditions, and generally it is required in performance measurement and evaluation to identify the scope of the measurement through the model of system configurations by using the equivalent model. In Section 5, we will present the results of performance measurement, especially the average throughput and average staying periods in system. In Section 6 we will demonstrate the analysis of the results by showing steps of the procedures and evaluating them. Furthermore, we will give explanations about the operational factors to cause the degraded performance. Then, we will conclude in the last section.
As this research can be regarded as an integration consisting of multiple disciplines, the related works should be identified across many fields. Furthermore, there are some difficulties in evaluating this study only with the novelty against various approaches in the existing works, because the main points of the evaluation derive from comparison of the implementations between our two internal methods. Thus, we will limit our explanation about the storages for sensor data as our major area after touching on the general trends.
As for the general trends, there is a symbolic study about comparing IoT platform architectures [
As for the area of the storages for sensor data, it could be roughly categorized into the following two generations, although the area itself is currently being updated. The first generation was to develop the functionality of persistence to store the stream data, and corresponds to the studies such as [
Our study might have the same stance with web service benchmark such as [
internal domains and the Internet with a firewall, and that of the messaging router. This corresponds to the previous “IoT Integration MiddleWare” in [
Set of “Storage” plus several services “IEEE1888 Registry” on the right of the
1) RawData(x) (x:1~n): This manages a set of the raw data from the sensors. These data sent from Communication Agents wrapping various sensors and sensor networks, are regarded as events with timestamps. And they are stored into the storages or processed as CEP events. In the case of storing, the certain data will be only inserted without any updates and deletions due to their own temporality. Accordingly, there is a demand to implement vast capability of these storages. It is substantially impossible to provide them only by local implementation, and it may also be halfway mandatory to apply the services of storages. The management systems are not always RDBMS and in many cases NoSQL or Cloud storage services applying secret sharing are also available. Thus, a Data Access Component (DAC), especially as the abstract functionality to maintain the connection resources to the data persistence, is implemented to fulfill the above requirements. For instance, in the case of using storages managed locally, a serialized object of the connection to these storages will be created through accessing Java Naming and Directory Interface (JNDI) when being demanded. This is essentially different from the Connection Pooling in order to improve the performance, and sacrifices the requirement of the performance first. On the other hand, in the case of accessing by Web services such as
stateless RESTful, the above instantiating process can be skipped. Furthermore, as the data will be stored over multiple storages spread and deployed widely due to their features of temporality with the growth of themselves, a part of the information for routing in querying are managed in the Repository CORE. When a demand of query arises, this routing information will be utilized.
2) Extracted Fact Data: This manages the extracted data of features of structures embedding sensors through aggregating the raw data with timestamps which are yielded by the sensors and stored in the above RawData (x). In order to extract these fact data, it is strongly demanded to access the Repository CORE frequently. As previously mentioned, the Repository CORE manages the information model about the abstract sensors and Ontology on them. Currently, it is implemented on RDBMS with allocating the specialized processors titled as two “Extract Processes” inside of the Repository CORE.
3) MetaData: This corresponds to meta data such as the information model about the abstract sensors and the Ontology. This is managed inside of the Repository CORE. Due to complicity of the model itself, these data are implemented by using RDBMS.
by the Green University of Tokyo project [
IEEE1888GW as a subscriber depicted in
sensors and Ontology on them by doing the normal procedures in step.2. Whenever the cache results are under management after querying, the normal procedures in step.2 will be omitted with a part of step.1, as far as there are no new demands.
FIAP Storage works as a temporal data store. Adapter is also invoked as a single thread according to “Singleton” pattern, because of retaining it as a simplified architecture from complicity caused by concurrency control. The part of the scope #2 including this adapter is executed periodically. The interval time for this invocation is specified by a parameter titled as “Wait_Period”, and this will be specified explicitly in the following section. A predefined ontology expressed in XML form is preloaded at the adapter before its execution. There are some similarities between the previous caching in method #1 and this predefined ontology on the point of view of putting them on memory for excluding the needless messaging. However, this predefined ontology has an obvious weakness in regard to the maximum level of loose coupling, as it is impossible to change the contents dynamically during its running, conversely the caching can flexibly respond on demands.
In this section, despite depicting the results, we will omit the detailed specification of the machinery environment because of space constraints for listing the set of multiple machines. In any case, we already carried out an evaluation and measurement of both methods by using almost the same or equivalent machinery environment. Our major concern here is how the performance would be influenced by variations of combination with common system components. Therefore, even a relative comparison between them without a detailed list of the machinery environment could be sufficiently capable as our evaluation.
as the number of query decreases due to the caching effect. However, once an over consumption of connection resources against the constraint takes place, the request for a temporal suppression is invoked and the state is accordingly changed into waiting for the release of the resources. Thus, the average throughput temporally falls down as a degradation. After passing sufficiently beyond the occurrence of releasing the resources, a temporal pausing is broken and the average throughput regains the original performance. However, we can observe a different behavior when the procedure for a temporal suppression is applied to Repository CORE instead of the storages shown in
However, once the pausing state due to the suppression is broken, the defined procedures are carried out at a stretch, and the average throughput can recover the loss in performance. Then, the caching literally works as what it was intended for. The average throughput is maintained with its original performance because there are almost no queries to Repository CORE even under a shortage of connection resources at Repository CORE.
Based on the current results, that is 100 to 150 rows per second under a usual case. So that, we could estimate the actual performance of this method #2 might reach 2K to 3K rows per second, if we can use the CPU of the server with 60% as a rate of the average usage of CPU. This means that method #2 could have its performance capability beyond 100 times of that of method #1 shown in
As mentioned previously, the double overlay structure, the virtualization of sensors, the composite services as federation using publisher/subscriber are adopted as common elements, but there is a difference in how to compose and utilize them. In the actual performances, we have significant gaps between both methods due to some other factors, such as the operational conditions rather than just the ways in combining the above common elements.
are mainly identified as “seeking the data on the cache at IEEE1888GW” and “adding and verifying the XML signature” to identify the sensor nodes as whether they are permitted. Furthermore “executing the flow control” that is not explicitly described, is also included. Conversely, method #2 has twice the procedures of storing data into the storages in scope #1 and #2. In the case of applying the meta data in the cache instead of querying in method #1, the number of common procedures between both methods relatively increases. According to this, it is probably difficult to regard that minimum set of the above overhead in the procedures as a crucial factor to make both previous metrics worse.
Of course, there are other unidentified items only in comparison with
Method #1 includes several negative factors. The first factor is to increase the frequency of querying the metadata. In this method, the raw data from multiple sensor nodes are gathered, mapped their CSV forms into the XML format at Communication Agent with synchronizing, then sent to IEEE1888GW. Accordingly, IEEE1888GW would receive the set of multiple raw data from the multiple sensor nodes every time. However, the IEEE1888GW is actually implemented on the assumption that there is no preliminary information about these sensor nodes, for instance, sensor type and frequency of uploading. Thus, the query to extract these meta data is carried out every time. In this case, each query is executed for individual sensor due to prioritizing the requirement of the loose coupling in which every required access should be initialized and invoked at the demanded time. This policy invites the increase of frequency in querying. The second factor is to create a connection resource through accessing JNDI in order to make an advanced adoption of the loose coupling. However, the negative side effects by applying the loose couple are not limited only to the above. The constraints of querying individually for each sensor can dominate the following procedure to maintain the consistency. This can become another constraint about the unit of execution in data storing. As a result of these constraints, the serialization in storing the data from the sensor nodes may be caused as the third factor in spite of receiving multiple data at a time at IEEE1888GW. In particular, when the previous “Singleton” pattern is applied there, a negative influence could be given to the performance more. However, if applying the simplified multi-threads without sufficient verification in their implementation, an issue in regard to isolation at the service level could take a shape because of no transactional management at the service level. Due to combining the above three factors each other, performance degradation could be invited.
On the other hand, this issue about amplifying the negative effect by the independent elemental micro solutions, does not arise in method #2. As mentioned previously, in this method #2, the predefined ontology implemented in XML form is preloaded at the adapter before its running. In order to support this, the following is assumed; sacrificing the priority of the maximum level of loose coupling could be accepted because any sensor node is identifiable before running. Furthermore, it is not required to synchronize occurrences of raw data over multiple sensor nodes prior to storing them into the FIAP storage in the scope #1 as the front side. The following data transportation into RawData(x) in scope #2 is just less influenced. This is because there are completed correspondences between data instances at the FIAP storage of scope #1 and those at the RawData(x) of scope #2, and no room to implement any specialized procedures to map them according to data semantics. These procedures generally tend to bring a negative side effect to the operational conditions. Additionally, as a generalized batch program, it is possible to execute the commitment over the huge amount of multiple worked instances at a time. Consequently, amplifying the negative effect by the independent elemental micro solutions, does not take place anymore.
Accordingly, there should be naturally some attentions in the designing phase, for instance, performance estimation and tuning in the design, and making a delicate balance and a tradeoff among the several solutions when applying the multiple elements. It is further desired to establish these as a concrete methodology for synthesizing the multiple architectural elements. However, method #1 should not be regarded in a negative sense. In the actual operations with receiving data from a huge amount of sensor nodes, the uploaded data could be irregularly received and regarded as receiving from substantially unidentifiable sensor nodes preliminarily anytime for the backward processes, even though these nodes would be identifiable. Therefore, it is definitely required to adopt certain solutions to realize the maximum level of loose coupling as seen in the method #1. As one of our conclusions, both methods #1 and #2 should be selectable based on the features of the individual applied cases. For instance, partitioning under the shared nothing by individual unit of the sensor node, and scale out seem to be reasonable, as the method #1 is obviously difficult to be tuned any more than the its reasonable level.
We presented the outline of our IoT infrastructure having two implemented methods to store the sensor data, those methods majorly have common architectural elements in spite of the differences in how to compose and utilize them. Then, we analyzed the factors causing the differences in their achieved performance of the actual implementations. Furthermore, we pointed out that these differences are derived from the policy; whether the maximum level of loose coupling was fully pursued, or was defused with a sacrifice for maintaining performance. In particular, we also mentioned the negative side effect that is to amplify the effects negatively due to the independent elemental micro solutions which are adopted for the maximum level of loose coupling, through combining them. Primarily, these combinations should be evaluated and verified during the basic design phase. However, the variation of how to synthesize them tends to be a blind spot when adopting the multiple independent architectural elements commonly. As a practical suggestion from this case, the emphasized importance in carrying out a new synthetization with multiple architectures is to make a balance naturally among architectural elements, or solutions based on them, and there is a certain demand to establish a methodology for architectural synthetization, including verification. It is obvious that there is certain dependency on the use cases in identifying advantages and disadvantages of various architectural synthetization. However, with the above methodology for architectural synthetization including verification, the differences in measured performance shown in this paper might be more avoidable.
Kikuchi, S., Watanabe, S., Kenmotsu, T., Yoshino, D., Na- kamura, A. and Hayashi, T. (2017) Analysis of Impactful Factors on Performance in Combining Architectural Elements of IoT. Advances in Internet of Things, 7, 121-138. https://doi.org/10.4236/ait.2017.74009