Journal of Software Engineering and Applications, 2013, 6, 53-58 http://dx.doi.org/10.4236/jsea.2013.62009 Published Online February 2013 (http://www.scirp.org/journal/jsea) 53 Stochastic Modeling of Database Backup Policy for a Computer System Syouji Nakamura1, Xufeng Zhao2, Toshio Nakagawa2 1Department of Human Life and Information, Kinjo Gakuin University, Nagoya, Japan; 2Department of Business Administration, Aichi Institute of Technology, Toyota, Japan. Email: snakam@kinjo-u.ac.jp, g09184gg@aitech.ac.jp, toshi-nakagawa@aitech.ac.jp Received January 3rd, 2013; revised February 1st, 2013; accepted February 9th, 2013 ABSTRACT As the computer system has developed much in this highly information-oriented society, database security has become a very important problem and its backup strategies need to be made more efficiently and safety. The image copy method has been used as the most simple and dependable recovery mechanism for media failure. However, this method spends high overhead costs for massive data transmission and much processing time in the normal operation of the da- tabase. To cover such weak points, incremental and full backup methods are adopted before updated trucks reach a pre- determined level. Moreover, when the number of full backup files exceeded a predetermined level, we stop incremental and full backups and switch it to the image copy. This paper applies cumulative damage model to backup of files in a database system, by putting damage shock by update, failure shock by database failure and damage by dumped files, and considers the tradeoff among overhead costs of image copy and incremental, full backup methods, and discusses analytically an optimal policy for the image copy backup interval. Finally, numerical examples are given in the case of Poisson process and exponential distributions. Keywords: Database Management System; Image Copy; Full Backup; Incremental Backup; Cumulative Process 1. Introduction In this highly information-oriented society, database se- curity [1] in computer systems has become a very im- portant problem. Some recovery techniques in a database management system have to be prepared previously for emergency of troubles, using backup, check pointing, re- organization, fault-tolerant technologies [2,3]. The back- up service and its protection techniques have been stud- ied and programed by many researchers and engineers, e.g., the recent work Peer-to-Peer (P2P) backup system [4-6], backup and protection schemes for a distributed system [7,8]. When we refer to the backup techniques for the data- base systems, image copy [9] is the most simple and de- pendable method to ensure the safety of data and is al- ways to take the backup copies of all files in other places and to take out them if files in the original secondary media are broken. However, this method takes many hours, storages and costs when files become large be- cause it stores all files. Frequent image copy backup seems not be so reasonable so that it is often restricted to a weekly or monthly schedule, although the increasing speed and capacity of backup media could make over- night backup to be a more realistic proposition. To make the backup copies efficiently, we might dump only files that have changed since the last backup, which is called an incremented backup [9]. This would lessen signifi- cantly both time and size of backup. Further, the recovery techniques for database failures and the backup schemes for broken hard disks were studied [2,10,11]. However, there are only a few studies that focus to the scheduling problems of backup and recovery methods, although many related techniques are available. That is, we propose that backup schedule is a stochastic decision making process from the viewpoints of management and can be analyzed by the theory of stochastic processes. Especially, with regarding to backup modeling, there have been very few research papers that studied analyti- cally optimal policies for a database system. Most prob- lems were concerned with several ways to introduce backup methods in techniques. Optimal full backup poli- cies and incremental backup schedules have been studied [12-14]. Even so, as referred as in [9], image copy back- up is necessary to any database system for its whole se- curity strategy, although it is restricted to be performed weekly or monthly until now. Such a strict periodic im- age copy backup policy is unreasonable and could be optimized from the following two points: 1) this backup technique has its superiority that it could copy all files Copyright © 2013 SciRes. JSEA
Stochastic Modeling of Database Backup Policy for a Computer System 54 without any compression and make backup simpler and reliable; 2) its performance needs to be combined with incremental and full backup schedules, i.e., when the total cost for incremental and full backups exceeds some level, it is reasonable to do such an image copy backup to renewal the database. Thus, we propose the following backup policy which ensures the safety of data and saves hours: The image copy is carried out at scheduled times, and between these backups, the incremental backup or full backup at each files, which takes all copies of newly updated files since the image copy, is done. That is, the image copy with large overhead is done at long interval and the incre- mental and full backups with small overhead are done at short interval. In this paper, we apply the cumulative damage model [15,16] to the backup of files in a database system, by putting damage shock by update, failure shock by data- base failure and damage by dumped files. These models, which play an important role in reliability theory, are considered as a sequence of shocks that occur randomly in time and give some amount of damage to a unit. The damage is accumulated to the current damage level, weakens the unit gradually, and makes it failure when the total damage exceeds a failure level [15]. As applications of cumulative damage processes in computer science, such models have been applied successfully to backup policies for a database system [12-14] and garbage col- lection policies in memory management [17,18]. The following sections are organized as: Section 2 in- troduces working schemes of incremental, full and image copy backups, by taking a database system with n files as an example. Section 3 formulates the stochastic backup model, which combines the incremental, full and image copy backup, i.e., the incremental backup is done when the number of updated trucks does not exceed a certain threshold value to an individual file; the full backup is done when the number of updated trucks exceeds a cer- tain threshold value to an individual file; and the image copy backup is done at a planned time and when the number of full backup files exceeds a certain threshold value in a database. Then, we introduce costs suffered for the overheads of three backups and obtain the expected cost rate between the image copies. Section 4 discusses an optimal interval of the image copy that minimizes the expected cost in Section 3, and in Section 4, we compute the model and its policies as a numerical example in the case of Poisson process and exponential distribution. Finally, in Section 5, concluding remarks and further studies are given. 2. Backup Schedule We consider a database system with n files that are com- posed of the same size of trucks. In this database system, we consider an incremental backup that takes all copies of newly updated trucks since the previous image copy and its overhead is increasing with the number of newly updated trucks. For example, if all updated trucks are included in the previous updated ones, then the number of transferred data is the same as the previous one. How- ever, if the updated trucks have some different ones from the previous ones, then the number of transferred data is increasing by their differences. Taking out copies of pre- vious backups can make the recovery of a database easily and rapidly, when some errors have occurred in storage media. For example, we consider a database with 6 files: In Figure 1, when the updated track exceeds a threshold value Z at each file, e.g., (a), (e) and (f), we dofull backup to these files. Moreover, files (b)-(d) which do not exceed the threshold value Z execute the incremental backup. In Figure 2, when the number of full backup files in the database exceeds a threshold value, we do the image copy backup for this database. In this figure, when the number of 5/6 files in the database is targeted, e.g., files (a)-(c), (e) and (f) are needed to be done by full backup, we stop the full and incremental backups and switch it to the operation of image copy backup. It is well known that when the number of updated trucks exceeds a threshold level Q in a file, the overhead of incremental backup is larger than that of full backup [10]. The value of Q/m is about 60% [10] in a database system, where m is the total trucks in a file. Thus, if the number of updated trucks exceeds a level Z (0 < Z ≤ Q), we should make the full backup instead of the incre- mental backup. Moreover, when the number of updated files exceeds a threshold value in a database, the over- head of full backup is larger than that of image copy backup [9]. Figure 3 shows in the incremental backup method that a usual backup storage volume is small, however, it is necessary to preserve all backed up files. Therefore, the Figure 1. Execution of full and incremental backups in a database. Copyright © 2013 SciRes. JSEA
Stochastic Modeling of Database Backup Policy for a Computer System 55 Figure 2. Execution of image copy backup in a database. Figure 3. Storage volume every time of incremental, full and image copy backups. accumulation of incremental backup files becomes large storage volume. In the full backup method, the storage volume is equal to the size of the object files. Moreover, in the image copy backup method, the storage volume is the same size of database. From above discussions, it becomes an important problem in actual backup schemes when to create an im- age copy. We want to lessen the number of the image copy with large overhead, however, the overheads of the incremental and full backup increase adaptively with the number of newly updated trucks and files. From this point of view and we should decide the image copy in- terval, by comparing the overheads among three backups. 3. Expected Cost We formulate the following stochastic backup models: The image copy backup is done at a planned time T and database initialization is made at such a scheduled time. The incremental backup is done when the volume of up- dated trucks does not exceed a certain threshold value Z to an individual file. The full backup is done when the volume of updated trucks exceeds a certain threshold value Z to an individual file. It is assumed that a database system with n files is up- dated according to a nonhomogeneous Poisson process with an intensity function t and a mean-value func- tion ,Rt i.e., [15]. Then, the prob- 0 d t Rtu u ability that j-th update occurs exactly during (0, t] is e0,1,2, ! j Rt j Rt Ht j j , (1) where 0.Rt Further, let W denote an amount of trucks of each file, which they are updated or are new created since the last backup at the j-th update. It is assumed that each W has an identical probability distribution Pr1, 2,. j Gx Wj j x Then, the total amount of updated trucks 1 j ii W up to j-th update where 00Z has a distribution Pr1, 2,, j j ZxGxj (2) and 01Gx for 0 for where is the j-fold convolution of 0,x0,x j G x Gx with itself. Then, the probability that the total amount of updated trucks exceeds exactly a threshold level K at the j-th up- date is [15]. Let 1j GK j GK t be the total amount of updated trucks at time t. Then, the distri- bution of t is 0 Pr . j j j tx HtGx (3) Suppose that when the total amount of updated trucks exceeds a threshold level Z at time we want to do the full backup for this file. When the total amount of updated trucks does not exceeds Z, we do the incremental backup, and this probability is 1, 2,,jTj 1 0 Φ. jj j TGZHT (4) Oppositely, when the total amount of updated trucks exceeds Z, we do the full backup, and this probability is 1 1 00 0 d 1. T jj j j jj j TGZGZHtt GZHT t (5) Suppose that there exist n files in the database. How- ever, it would be useless to do separately backup policies for each file. It is assumed that if the total amount of up- dated trucks of ,1,, jKK n files exceeds a threshold level Z at time T, then the full backup is done for such j files. In this case, the probability that the image copy backup is done for this database is Copyright © 2013 SciRes. JSEA
Stochastic Modeling of Database Backup Policy for a Computer System 56 11 1 11 0 ΦΦΦ !ΦΦdΦ. !1! nnj j njK TnK K n n TTT j nuu nK K u (6) If the total amount of updated trucks of files exceeds a threshold level Z at time T, then the full backup is done for such j files. In this case, the probability that full backup is done for the database is 0,1, ,1jj K 1 11 0 ΦΦΦ. Knj j nj n TT j T (7) Next, we introduce the following costs: 1 Full and incremental backup costs per unit of time when the number of files whose updated trucks ex- ceeds Z is less than K files. :c 2 Image copy backup cost per unit of time when the number of files whose updated trucks exceeds Z is more than K files with . :c :c21 3Cost of database backup and initialization at time T. The expected cost rate is cc 123 00 123 00 dd dd . TT nn n TT nn cTTttcT ttc CT T cttcttc T (8) The first item of right-hand side of (8) is the full and incremental backup cost and the second item is the image copy backup cost. Note that 0C. If ,T then the policy corresponds to the image copy only at the total amount of updated trucks exceeds the threshold level Z, and the expected cost rate is 2 lim , Tn c CCT (9) where 0 1 11 00 Φd Φd, nn Knj j j tt ntt j t (10) which is the mean time until the total updated trucks ex- ceeds K. 4. Optimal Policy We obtain an optimal time 0TT CT which minimizes the expected cost in (8). Letting 0,CT 3 210 d, nn Tt t cc Tc (11) i.e., 3 21 d T n c tt cc , (12) whose left-hand side is strictly increasing from 0 to . n Thus, if 321nccc , then there exists a finite and unique 0TT that satisfies (12) and the resul- ting cost rate is 121 . n CTcc cT (13) 5. Numerical Example Suppose that e ! j t j t Ht j and 1e . GZ Then, we have e0,1,2, ! i jZ ij Z GZ j i . Thus, (4) is rewritten as 1 0 1 00 Φee !! 1e e !! ij , T jij ji j TZ ji ZT tij TZ ji (14) i.e., 1 1 00 Φe !! e, i j T ji TZ Tji Z (15) where 1 0 1. i Because 11 11 00 0 e 1! !! ee, !! jj i j TZ ji jj TZ j TT TT Z jj i TZ jj e (16) from (6), 1 111 1 1 00 1 00 0 ! !1! !ee !1!!! 1e e !! ee. !! nK K n ji j tZ ji nK ji j tZ ji jj tZ j n ttt nKK tZ n nK Kji tZ ji tZ jj t (17) Therefore, (12) becomes Copyright © 2013 SciRes. JSEA
Stochastic Modeling of Database Backup Policy for a Computer System 57 0 1 1 00 0 1 00 3 021 ! d!1! ee !! 1e e !! eed !! T n K ji Tj tZ ji nK ji j tZ ji jj tZ j n tt nKK tZ tji tZ ji tZ c t jj c . c (18) We obtain the optimal value which satisfies (18), and the resulting expected cost rate is T 12111. n nj jK n CTcc cTT j (19) Tables 1 and 2 indicate the cost rates 321 ccc for optimal times T when and 6.12K10.20n . From Tables 1 and 2, when the value of 321 increases, the interval of becomes long. This shows that we should lengthen the image copy backup interval, if the value of 3 c is large compared to the value of From the tables, when ccc T Z 21 .cc5.10, the value of 321 ccc to the value of becomes about twice. When T 10, 0.05n5. 6,0,ZK and 3219.875,ccc becomes 25. When the unit of T is a day, we should execute the image copy backup T Table 1. Optimal T* when µZ = 5.0. 321 ccc 6K, 10n12K, 20n T 0.05 0.08 0.05 0.08 15 0.115 4.9726 0.001 1.150 20 1.654 32.613 0.128 23.581 25 9.875 93.454 2.653 104.720 30 33.862 166.324 19.410 202.318 Table 2. Optimal T* when µZ = 10.0.. 321 ccc 6K, 10n12K, 20n T 0.05 0.08 0.05 0.08 15 0.240 10.457 0.003 2.487 20 3.473 68.794 0.278 32.346 25 20.773 197.659 5.735 223.785 30 71.369 352.425 41.837 428.991 every 25 days. In addition, when 10.0, 12ZK , 20, 0.05n and 32141.837cc in Table 2, c T becomes 30. When the unit of T is a day, we should execute the image copy backup every 30 days. In the real world, we can request the value of 321 ccc com- paratively easily. Then, when 32141ccc.837, 10.0, 12ZK , 20n and 0.05, it is neces- sary to do the image copy back up every month. 6. Conclusions We have considered the problem when to make the in- cremental, full and image copy backups, under the as- sumptions that the overhead of backups depends on the total amount of newly updated files in a database. We have obtained the expected cost rate until the image copy backup, and have discussed the optimal interval of the image copy backup that minimizes it. It has been shown that the optimal interval is given by a finite and unique solution of an equation. As a further problem, it would be necessary to con- sider the model where the backup of only newly updated files is made, and to compare it with the model studied in this paper. 7. Acknowledgements This work is partially supported by the Grant-in-Aid for Scientific Research (C) of Japan Society for the Promo- tion of Science under Grant No. 22500897 and No. 24530371. REFERENCES [1] T. Ward, “Security of Backup Data,” Information Systems Security, Vol. 15, No. 1, 2006, pp. 31-34. doi:10.1201/1086.1065898X/45926.15.1.20060301/9268 3.6 [2] T. Haerder and J. Muckstadt, “Optimal Policy for Batch Operations: Backup, Checkpointing, Reorganization, and Updating,” ACM Transactionson Database Systems, Vol. 2, No. 3, 1977, pp. 209-222. doi:10.1145/320557.320558 [3] I. Lazaridis, et al., “Fault Tolerant Evaluation of Con- tinuous Selection Queries over Sensor Data,” Interna- tional Journal of Distribute d Sensor Networks, Vol. 5, No. 4, 2009, pp. 338-360. doi:10.1080/15501320701585600 [4] H. Jarraya and M. Laurent, “A Secure Peer-to-Peer Back- up Service Keeping Great Autonomy While under the Supervision of a Provider,” Computers & Security, Vol. 29, No. 2, 2010, pp. 180-195. doi:10.1016/j.cose.2009.10.003 [5] M. Gramaglia, M. Urueña and I. Martinez-Yelmo, “Off- Line Incentive Mechanism for Long-term P2P Backup Storage,” Computer Communications, Vol. 35, No. 12, 2012, pp. 1516-1526. doi:10.1016/j.comcom.2012.04.017 [6] M. Fesci-Sayita, E. TurhanTunalib and A. Murat Tekalpc, “Resilient Peer-to-Peer Streaming of Scalable Video over Copyright © 2013 SciRes. JSEA
Stochastic Modeling of Database Backup Policy for a Computer System Copyright © 2013 SciRes. JSEA 58 Hierarchical Multicast Trees with Backup Parent Pools,” Signal Processing: Image Communication,Vol. 27, No. 2, 2012, pp. 113-125. doi:10.1016/j.image.2011.11.004 [7] G. Bella, C. Pistagna and S. Riccobene, “Distributed Backup through Information Dispersal,” Electronic Notes in Theoretical Computer Science, Vol. 142, No. 3, 2006, pp. 63-77. doi:10.1016/j.entcs.2004.11.046 [8] J. Ma, et al., “A Novel Adaptive Current Protection Sch- eme for Distribution Systems with Distributed Genera- tion,” International Journal of Electrical Power & En- ergy Systems, Vol. 43, No. 1, 2012, pp. 1460-1466. [9] http://www.backup4all.com/ [10] K. Suzuki and K. Nakajima, “Storage Management Soft- ware,” FUJITSU, Vol.48, No. 4, 1997, pp. 389-397. [11] G. Lohman and A. Reuter, “Principles of Transaction Oriented Database Recovery,” ACM Computing Surveys, Vol. 15, No. 4, 1983, pp. 287-317. doi:10.1145/289.291 [12] C. Qian, S. Nakamura and T. Nakagawa, “Cumulative Damage Model with Two Kinds of Shocks and Its Appli- cation to the Backup Policy,” Journal of the Operations Research Society of Japan, Vol. 42, No. 4, 1999, pp. 501-511. doi:10.1016/S0453-4514(00)87116-1 [13] S. Nakamura, C. Qian, S. Fukumoto and T. Nakagawa, “Optimal Backup Policy for a Database System with In- cremental and Full Backups,” Mathematical and Com- puter Modelling, Vol. 38, No. 11-13, 2003, pp. 1373- 1379. doi:10.1016/S0895-7177(03)90140-3 [14] C. Qian, Y. Huang, X. Zhao and T. Nakagawa, “Optimal Backup Interval for a Database System with Full and Pe- riodic Increment Backup,” Journal of Computers, Vol. 5, No. 4, 2010, pp. 557-564. doi:10.4304/jcp.5.4.557-564 [15] T. Nakagawa, “Shock and Damage Models in Reliability Theory,” Springer, London, 2007. [16] S. Nakamura and T. Nakagawa, “Stochastic Reliability Modeling, Optimization and Applications,” World Scien- tific, Singapore, 2010. [17] X. Zhao, S. Nakamura and T. Nakagawa, “Two Genera- tional Garbage Collection Models with Major Collection Time,” IEICE Transactions on Fundamentals of Elec- tronics Communications and Computer Sciences, Vol. E94-A, No. 7, 2011, pp.1558-1566. [18] X. Zhao, S. Nakamura and T. Nakagawa, “Optimal Te- nuring and Major Collection Times for a Generation Gar- bage Collector,” Asia-Pacific Journal of Operational Re- search, Vol. 29, No. 3, 2012, p. 1240018 (17 pages).
|