This work proposes the adoption of Autonomic Computing System (ACS) in Cloud environment. ACS was first introduced by IBM to create systems capable of managing automatic self-configuration, self-healing, self-optimization and self-protection. These systems detect errors that cause failure, and then recover and reconfigure itself. The concept is wildly adapted by many software applications that have many restoring and recovery functionality such as operating systems (e.g. Windows Server 2012). This paper proposes a cloud ACS (CACS) for cloud computing environment that monitors, diagnoses, checks and heals cloud applications automatically and immediately with almost unnoticeable recovery time. In order to evaluate CACS, an application has been developed and applied for real time cloud applications. The results of different experiments scenarios demonstrate the ability of adopting the proposed system to heal well cloud applications. CACS is also compared with Windows Server 2012 operating system in terms of healing ability, speed, cost, methodology and other informative information. CACS showed domination in almost all of these properties.
The wide and fast spread of Internet motivated large number of companies to adopt the cloud solution and offer their services and business online, i.e. through the World Wide Web (WWW). This wide spread requires more research to be developed to handle this quota of cloud applications in order to manage and self-heal themselves. ACS was first proposed by IBM in 2007 [
In cloud environment, the application on hosting cloud server could face many problems including the deletion, replacement or modification of a component. The risk of having one of these three problems is very high. For instance, when an attacker replaces an application component by another one that functions in the same way as the original one but has minor changes, it could (as an example) allow the attacker to steal credit card information which will cause a serious problem for both customers and owners. Most online cloud applications owners do not perform tests to check if the component has been changed or not; this is due to the complex architecture of this kind of applications and the lack of knowledge at the owner level. This paper proposes a solution to such problem and many others by applying CACS that has the functionality of self-healing, self-monitoring, self-diagnosis and self-recovering to keep the cloud application in good health!
Software systems have many anomalous conditions that appear among the components of software systems. To handle such situations the software architecture for this purpose has been splitted into two layers: functional layer and healing layer. This type of software systems provides software with many capabilities [
Some authors introduce architecture for hybrid software models which combine “endogenous” and “exogenous” approaches [
This research, mainly focuses on techniques for “self-healing” cloud applications from functional failures by automatically detecting failures, diagnosing faults, and healing these applications to behave and run as supposed to before the failure happens.
To evaluate CACS mechanism, we perform a black box testing on the tested software considering the whole cloud application files as one component. The main goal is to ensure that this component runs well and has not been accidentally or intentionally changed when compared with the original file. CACS ensures that all software components remain the same, without any modifications or changes by any other external authorized and unauthorized effect and to ensure that these components have not been omitted or deleted from the server and that the application’s directory does not contain any injected or added files.
Our research suggests the existence of a system that analyzes the content of the released application components, a mechanism for monitoring the application, a mechanism for diagnosing and detecting of failure and a healing mechanism that brings back the software to its healthy status.
The major contribution of this paper is to define an automatic mechanism for cloud applications fault recovery despite the cause of the fault. In summary the research defines a mechanism for an external self-healing software application that monitors, diagnoses and detects a failure automatically and efficiently. The development and implementation of CACS considered a framework that managed cloud application files regardless of their programming language. We also provide an experimental result that demonstrates the efficiency of the proposed system.
The reset of this paper is organized as follows: Section two presents related work on ACS, and the mechanisms that are used for recovering different application; Section three presents a full description of the proposed system; Section four demonstrates the evaluation experiments and discusses their results; and finally Section five provides the conclusion and future work.
A framework for runtime monitoring and recovery of cloud service conversations is proposed by Simmonds et al. [
Athanasopoulos et al. [
CACS consists of automatic exterior healing system that monitors cloud files and manage to maintain it unchanged at 24/7 working rate. The proposed system apply black box testing concept to verify the stability of the cloud applications files. Hence there is no need to examine neither the internal code nor the flow of its internal functions; rather than that CACS conceders the clouds application files as one component and test its characteristics such as the existence of the component, file size, hash key, creator and its correct location path. The system monitors, diagnoses and recovers the cloud application files immediately at the time of the external or internal effect that could cause any unexpected change. In order to achieve that the proposed system were designed with three main phases and have a life cycle run to guarantee the full time running of the cloud files.
Pre-healing phase is the initial phase that prepares the system for the healing phase. Starting by initializing the system and goes throw building the CACS database. After that the cloud application files (cloudsite) were identified and backup copies were created. The Pre-Healing phase also consists of running the implemented system settings for the first time to determine and select the specific folder for the cloud application to monitor and sets the initial parameters needed.
This phase also analyse the cloudsite files by gathering information such as: the file size, date of creation, manufacturers of the file and the hash key. The output of the released application is the input of the self-healing phase. As shown in
Moreover, this phase includes building a database that stores all the information’s about the cloud application that results from the analysis step, containing the major and necessary information for the diagnosis process. The aim of using a database is to keep a fast and organized method for diagnosing and referencing cloud application components for any time in order to access a review or make diagnosis. The phase also comprises creating a copy of original components of the cloudsite to be reused later in the healing phase. This copy will be compressed and stored in a separated directory specified by the CACS system and not on the published cloud directory. The CACS system is independent programming language; this enhance the system with the capability of analysing any type of cloud application files in any programming language such as PHP, ASP, HTML, etc.
This phase consists of four basic processes; it starts by monitoring and ends by fixing process.
The first step in phase 2 is monitoring. In which the system observe the cloudsite component’s (files) for 24/7, This includes tracking all the cloud application components as well as the cloudsite folder for any changes detected, including the deletion, replacing, modification and addition of any new component to the cloud directory folder.
The second step in phase 2 is comparing. This step conduct a deep comparison between the monitored components and the status record stored in the database by the analysis phase. As mentioned previously, the database contains full details of all components of the cloud application that are required for the diagnosis. The result provided by this step will be the input of the diagnosis step.
The third step in phase 2 is diagnosis. This step a decision will be made whether to make an action or not, by mean if the system needs to be healed or not (i.e. it is in good health). In this step a solution to the system will be required and a suggestion if the system is infected or becomes in faulty state or in a good health.
The fourth step in phase 2 is fixing. It is the process of restoring the original component of the system and replacing or compensating the affected component in order to maintain the system in a good health. In this step the solution to the problem suggested by the diagnosis step is applied; when a fault is detected, the latest saved copy of the application monitored will be restored to the cloudsite directory. The restoring is triggered by detecting a change. This process will take only few seconds before the application can restart online and became available again for the users. The changes along with the healing event will be stored in the database for further analysis and the process is automated.
To take a deep look at CACS self-healing, we present a full details flowchart for the mechanism in
After the healing process ends, the post healing phase starts.
The first step in phase 3 is storing change in the database. This process records all the information that has been done in the healing process including storing the date and time of healing and the component that has been restored. Storing this information will give the administrator a clear summary about the history of the application after releasing it.
The second step in phase 3 is storing affected component and analysing reasons. If the healing process resulted from a change in the component itself either for any of the mentioned reason then keeping this file will give us indicator about the reason that caused the fault and this will help the application developers to avoid such situations and to enhance or develop mechanisms to update the software or the server so that it can face such cases. For example, if the reason for a change was due to an illegal access to the server, then a certain policy could be in effect but if the change was due to a virus then the server should act by clearing the virus itself.
The third step in phase 3 is updating all cloud application components. The analysis process is an important step to maintain a future enhanced healthy cloud application because of the previously mentioned reasons and due to the fact that the analysis process results can be used to enhance and update the cloud application itself, and in the case of distributing the application to may servers, the updated component can be distributed to other servers as a precaution to avoid been infected by the same way.
CACS healing involves the following cases:
・ Deletion of a component that causes the system to fail to run
・ Change of a component by external factor either human or non-human
・ Original component replacing
・ Addition of external component to the software folder
CACS dynamically modifies the cloudsite to correct the failure. The changes that have been made will be stored to be analysed in the future by the system administration. However if the same fault is frequently repeated this may indicate the need to analyse the stored information’s about the recovery processes that has been made to the affected components.
Analysing the changes along with the results of checking the diagnosis and monitoring will provide a good indicator about the reasons that cause the system failure. It also gives a brief overview about the main causes and their indicators. By defining the reason the system administrator can find an appropriate solution to handle the problem for good.
A research method or tool has more chances to be transferred to practitioners if its usefulness is investigated through empirical user studies [
・ What advantages can we get when using CACS to heal cloud applications that are affected by different performance scenarios?
・ What is the time of heal using CACS when compared with other healing approaches?
There are three sources that might affect the cloudsite components:
1) external non-human factors
・ virus
・ Worm
2) software
・ defect in the components
・ conflict with other software
・ operating system related
3) external human factors
・ attacker
・ spy
・ fraud
To evaluate the proposed approach we need to evaluate the effectiveness and the ability of the proposed system to recover from any different failure causes. To this reason four experimental scenarios were tested: deletion of a file, moving of a file, replacement of a file and editing a file. We initialized the implemented auto cloud application monitoring system and selected the cloud application directory to be monitored. CACS will analyse the cloudsite directory and build the database; see
In
CACS responds to this case by restoring the deleted file from the original copy that has been prepared in the initialization stage. CACS records the problem in the database including the time, date, type of problem and the name of the file that was deleted and replaced see
CACS responds to this case by deleting the full directory of the cloud application and restoring the original copy of the cloudsite that has been prepared in the initialization stage. CACS records the problem in the database including the time, date, and type of problem and the name of the file that was replaced and recovered; see
CACS responds to this case by deleting the full directory of the cloud application
and restoring the original copy of the cloudsite that has been prepared in the initialization stage. CACS records the problem in the database including the time, date, and type of problem and the name of the file that was recovered see
In this experiment, we added cloudsite files of size 10 g and then we deleted 5 m of the file as shown in
Since the CACS heals by recovering the cloudsite files (components) and not the full system restore or recovery, we notice that windows system restoration works by restoring all files in windows server 2012. This took about 2400 second while in CACS took only 5 seconds; this clearly makes the CACS a better choice.
In
Method | Size on server | Size of cloudsite files | Time for healing |
---|---|---|---|
CACS | 12 g | 10 m | 5 seconds |
Windows server 2012 | 12 g | 10 m | 40 × 60 (2400) seconds |
Criteria | Microsoft Windows [System Restore] | Proposed system (CACS) | Antivirus | Firewall | Spyware | Reinstall the cloud server |
---|---|---|---|---|---|---|
Recover error resulting from deleting software component | Yes | Yes | No | No | No | Yes |
Heal Replaced component that has same functionality | No. | Yes | No | No | No | Yes |
Heal at run time | No | Yes | Yes | No | Yes | No |
Generate reports of the diagnosis of the problem and the healing process | Yes | Yes | Yes | Yes | Yes | No |
Store the affected component for future analysis | No | Yes | Yes | No | Yes | No |
Methodology of repairing | Operating system dependent | Automatically Compares, analyses, diagnoses and heals the cloud application files; it returns the file to its original state similar to the manufacturer from the backup files | Only files changed by virus signature or worms | No repairing | Only files changed by virus signature or worms | Install fresh new operating system |
State of the healing | To a specified restore point | To the manufacturer state either the original or with updates | No healing | No healing | No healing | No healing |
Level of recovery | Full restore | Per file | Per file | No recover y | Per file | Full |
Speed of recovery | Relatively slow at least 10 min | Fast less than 1 min (recover only the affected file) | No recovery | No recovery | No recovery | Relatively slow at least 20 min |
As can be inferred from
Integrating self-healing approaches into cloudsites introduces a very efficient improvement for the cloudsites performance. Many companies tried different methods and approaches that aim at reducing the cost and time needed for the rerun of the cloudsites after failures and tried to build a software system that has the ability to heal itself. This research presents CACS, an approach for self- healing cloudsites, CACS monitors the software for 24/7 duration and it has the ability to capture continual information about the specific cloudsite components that are being monitored. Our experimental results show the efficiency of CACS in detecting failures and errors and efficiency in healing them.
As a future work, we hope that our work may inspire biological software engineering processes aiming to improve the self-learning of the proposed approach and to generalize the concept to self-learning and self-adaptation.
Al-Sayyed, R.M.H., Fakhouri, H.N., Murad, S.F. and Fakhouri, S.N. (2017) CACS: Cloud Environment Autonomic Computing System. Journal of Software Engineering and Applications, 10, 273-287. https://doi.org/10.4236/jsea.2017.103016