Paper Menu >>
Journal Menu >>
![]() Journal of Information Security, 2010, 1, 1-10 doi:10.4236/jis.2010.11001 Published Online July 2010 (http://www.SciRP.org/journal/jis) Copyright © 2010 SciRes. JIS Micro-Architecture Support for Integrity Measurement on Dynamic Instruction Trace Hui Lin1, Gyungho Lee2 1ECE Department, University of Illinois at Chicago, Chicago, USA 2College of Information and Communications, Korea University, Seoul, Korea E-mail: hlin33@uic.edu, ghlee@korea.ac.kr Received July 12, 2010; revised July 16, 2010; accepted July 20, 2010 Abstract Trusted computing allows attesting remote system’s trustworthiness based on the software stack whose in- tegrity has been measured. However, attacker can corrupt system as well as measurement operation. As a result, nearly all integrity measurement mechanism suffers from the fact that what is measured may not be same as what is executed. To solve this problem, a novel integrity measurement called dynamic instruction trace measurement (DiT) is proposed. For DiT, processor’s instruction cache is modified to stores back in- structions to memory. Consequently, it is designed as a assistance to existing integrity measurement by in- cluding dynamic instructions trace. We have simulated DiT in a full-fledged system emulator with level-1 cache modified. It can successfully update records at the moment the attestation is required. Overhead in terms of circuit area, power consumption, and access time, is less than 3% for most criterions. And system only introduces less than 2% performance overhead in average. Keywords: Integrity Measurement, Remote Attestation, Software Vulnerability, Trusted Computing 1. Introduction Nowadays, computer under different platforms interacts with each other through internet environment. Although this provides convenience and increased functionality, it is necessary to securely indentify software stack running in remote systems. Effective remote attestation mecha- nism has drawn lots of research interests. Trusted Com- puting Group (TCG) first standardized the procedure to launch a remote attestation [1]. As defined, the protocol consists of three stages: integrity measurement, integrity logging, and integrity reporting [2]. The function of in- tegrity measurement is to derive a proper measure that is an effective representation of a given platform status. In order to narrow down the range of such measures, Trusted Computer Base (TCB) is defined as hardware components and/or software modules whose integrity decides the status of a whole platform. Consequently, integrity measurement can simply based on measures from the TCB, which reduce performance overhead in measurement and attestation. Integrity logging is the process of storing aforementioned integrity measure in protected storing space. This process is not mandatory, but highly recommended to reduce the overhead due to repeated calculation for integrity measurement. The last step, which is called integrity reporting, is to attest sys- tem based on the stored or calculated integrity measures. Computer systems emphasize different security goals per contexts. While system integrity is more important in one situation, the other may concern more about data privacy. Integrity measurement is strongly related to se- curity policy applied to specific computer system and consequently results in different attestation mechanism. TCG’s specification describes an integrity measurement during system’s booting process. This mechanism is called “trusted boot”. At the very beginning, a hardware signature, which is stored in some security-related hard- ware components, is used as the root of the trust. Current hardware vendors design Trusted Platform Module (TPM) to provide such functionality. As each entity is loaded into memory, the integrity measures on the bina- ries are calculated one by one and form a trust chain at last. Unlike secure booting, system takes measurements and leaves them to the remote party to determine sys- tem’s trustworthiness. TCG’s attestation based on such a trusted booting is also called binary attestation [2]. Other integrity measurements still follow TCG’s “measure-before-load” principle. Property attestation and semantic attestation both try to extract the high level ![]() H. LIN ET AL. Copyright © 2010 SciRes. JIS 2 property or semantic information from binary measure- ment. So it will be more efficient and effective to vali- date whether a security policy is hold or violated on such a measured property a priori. IBM’s Integrity Measure- ment Architecture (IMA) based on the TCG’s trusted booting extends the approach into application software stack. IMA is now a security module provided by Linux kernel since version 2.30 [3]. A good integrity measurement should be able to derive a reliable measure that represents the status of computer system. From the resulting measure, a challenger (the remote entity which is interested in attesting the system) should be able to tell the system’s updated security-rela- ted capability such as whether the memory has been ever corrupted by attacker, or whether programs can be prop- erly executed in isolation, or whether cryptography keys are securely stored, and so on. On the other hand, meas- urement procedure should be transparent to the local user and introduces little performance overhead. Current integrity measurements face problems of gath- ering sufficient history information on what has been done to the computing device. When each entity is loaded into memory, measurement of its binary codes is recorded. However, there will be a “measurement gap” at the moment when measurement results are requested. System status may be different from the recording in measure. Furthermore, measurements are made directly on program’s executable code residing in main memory. There exists another “behavior gap” between instructions executed in the processor and executable codes in the memory. The integrity measure of executable code in the memory can be a good measure to represent the system state. However, as different attacks occur from internet, this is becoming less sufficient for a remote challenger. For those programs running for a long time, such as server programs, a static measurement prior to execution may have little relation to the system status at the current moment. As a result, more accurate measurement, which can include program behavior, needed to tell challenger all history of bad behavior. This results in a better deci- sion on trustworthiness of the system. However, with more information included, overhead to measure programs’ state increases. As a result, some measurements are targeted to specific data, such as proc- essor control data, function pointer in memory, network traffic, intrusion detection, and so on. Measurement is often restricted in order to utilize only limited amount of information. Consequently validation of system against a certain security policy introduces little performance overhead. This policy-driven attestation or validation schemes are largely based on limited information spe- cific per intended attack scenarios. The problem is that although it is efficient in their proposed situation, port- ability of such measurement is very low. In different situation, attestation may require a big modification which also exerts a large performance penalty. In order to provide updated integrity measurement as system evolves, we propose an original dynamic instruc- tion trace measurement (DiT) to include in the metric dynamic instructions-level behaviour in the processor with the help of simple micro-architecture modification. However, instruction-level trace can vary from time to time, with some part of the program being executed more frequent than the other. Directly recording the processor behavior causes lots of performance overhead and with- out increasing any accuracy. In stead of applying meas- urement in processor, we still perform the operations on the memory. As a result, most function interfaces pro- vided before, such as the ones proposed in TCG or IBM’s IMA, can be maintained. Cache is an evolutionary design building a bridge be- tween the memory and processor to reduce access delay. However, in this paper, we modify the structure of the instruction cache to the one similar to the data cache. The consequence is that instructions can also be written back to the memory. As program continues its execution, code region in its address space no longer stores codes loaded before execution but records instructions which are exe- cuted. We improve the integrity measurement for trusted computing in the following aspects: 1) Extending the measurement scope. When the secu- rity-sensitive program is loaded and starts execution, DiT writes back instructions into memory. Consequently, binary code located in its address space records instruc- tions which are actually executed. 2) Facilitating attestation for different security policy. DiT only replaces static measurement with dynamic one. As a result, it changes little on the high level interface and provides a better general solution to diverse scenar- ios. 3) Writing back instructions does not require the in- volvement of operating system. Thus, DiT builds a con- nection between what has seen inside processor and what resides in memory. This procedure does not require trusting operating system, which in some cases can be corrupted by attackers. The paper is structured as follows. Section 2 presents the background on trusted computing and integrity meas- urement. In Section 3, we present DiT’s design in details. To avoid potential hazards from attacks, we propose several hardware-wise recommendations in Section 4. The experimental results and analysis are given in Sec- tion 5. Finally, the related work and conclusion are made in Section 6 and Section 7. 2. Background 2.1. Trusted Computing Trusted computing deals with computer system in a haz- ![]() H. LIN ET AL. Copyright © 2010 SciRes. JIS 3 ard environment. Though there is lack of ubiquitous definition of trust, this paper refers the one from Trusted Computing Group (TCG) specification. Trust is men- tioned as the expectation that a device will behave in a particular manner for a specific purpose [2]. Trusted Computing Base (TCB) is specified as any hardware and/or software components within the inter- ested platform, whose safety can affect the status of the whole system. The assumption is made that if TCB is safe, system can be trusted. However, TCB’s compo- nents vary from systems. In some situations, it may work with integrity validation mechanism; as a result, run-time critical data values are included in TCB. However, on other situations, execution of security-sensitive programs, such as encryption/decryption operation, is important to system’s proper function; some architecture components, which guarantee privacy of such application program, are chosen in TCB. TCG has summarized diverse appli- cation scenarios and concludes that it should include the following two characteristics: 1) Isolated Execution, or protected execution. The computing platform should be able to equip security- related application program with an isolated environment. As a result, no other legacy programs can access or cor- rupt information it relies on. To achieve this property, many researchers adopt the virtualization approach or hardware extension to legacy computer architecture [4]. 2) Remote Attestation. Each computing platform should be able to provide mechanisms to: (1) securely measure TCB’s safety state; (2) protect measure log stored locally; (3) transmit measure to remote challenger. 2.2. TCG’s Binary Attestation TCG defines a binary attestation to provide a trusted booting. Whenever an entity is loaded into memory from the moment machine is physically turned on, TPM ap- plies cryptographic hash function, say Hash, on its ex- ecutable code to make a measurement result, say M. The binary measurement for each entity is logged separately. Additionally, each measurement is also stored in one of Platform Configuration Registers (PCRs) in TPM by making the cryptographically extend operations with PCR’s current value, PCRt, i.e., new PCR values PCRt + 1 = Hash(PCRt|M), where|denotes concatenation. When verifier requires attestation, TPM sends measurement logs (in local hard disk) and the corresponding PCR value to the verifier. He will recalculate hash result based on measurement logs. The comparison between newly- computed hash result and PCR value can tell whether untrusted behaviour within the environment has ever modified PCR value, measurement log, or executable code itself. Using binary attestation facilitate verification in mainly two aspects. 1) measurement with such format hides many different high-level implementations and reduces the complexity to calculate measure log and PCR value; 2) It successfully separates measuring and verification. Attestation does not try to prevent a system from illegal behaviour that might compromise system. It only records the history of loaded code, securely sends them to the verifier and leaves the verifier to make trustworthiness decision. 2.3. Integrity Measurement on the Application Program Starting from the root of trust provided by TCG, Integ- rity measurement architecture (IMA) from IBM takes the first step to extend measurements from booting process to application level programs. IMA is provided as a software module to Linux kernel from the version 2.30. It provides measurements regarding to current system’s software stack. The whole project provides integrity measurement but does not propose any detailed attesta- tion mechanism. Measurements provide evidences sh- owing whether system is corrupted by certain rootkit attacks or not. IMA measures each individual component before it is loaded. With the help of extend operation, trusted boot- ing forced execution to follow only one legal order. However, in application level, programs can execute different threads in parallel; program order does not re- lated to trusted condition any more. So IMA groups measure together instead of applying extend operation one by one. But IMA’s is following TCG’s “measure before load- ing” principle, therefore it inevitable maintains short- comings of the binary attestation, such as its ineffective- ness to reveal hardware attacks or the software attacks after the program is loaded and executing. 3. Architecture Extension to Measure Instruction Level Behaviour 3.1. Design of Integrity Measurement in Application Level DiT is based on IBM’s IMA which provides comprehen- sive measurement over software stack. In IMA, all ex- ecutable codes and chosen structured data are included in the measurement log. Any data which are loaded by op- erating system, dynamic loaders, or applications with identifiable integrity semantics are hashed. Measurement can be made automatically at the moment when codes or data are loaded into main memory. As programs continue their execution, kernel is able to measure its own changes. Similarly, every user level process can measure ![]() H. LIN ET AL. Copyright © 2010 SciRes. JIS 4 its own security sensitive inputs, such as its configuration files or scripts. The consequent 160 bit value from hash calculation becomes an unambiguous identity for such software module. Challenger can distinguish different file types, versions, and extensions by this unique fin- gerprint. As system evolves, IMA collects hash results into a measurement list which is stored locally. The integrity of this list is of a great importance. Therefore, IMA uses TPM to prevent any modifications made on measurement list. Platform Configuration Register whose value can only be changed by physically system rebooting or TPM extend operation provides protected storage. Extend op- eration is applied on each value stored in the measure- ment list. Since it is impossible to restrict application- level softwares into a small number of orders, order of each value in the list is not used to validate the trustwor- thiness of the system. 3.2. Writing Back the Instructions Although IMA provides measurement of all loaded soft- ware, it still follows TCG’s “measure-before-loading” mechanism. As a result, “metric gap” and “behaviour gap” can largely degrade efficacy of measure log. The “metric gap” occurs when measurement does not represent the updated state of the system. Application program can run for a long time, such as server program. So it may be a long period since the measurement is made. During this time interval, memory is possible to be corrupted. Attacks, who can take root privilege, can modify loaded executable codes. However, it is possible to detect such modification when the codes are being executed again. This is the basic assumption made in former tamper resistance design [5]. As executable code is hashed again, resulting measure will be different. However, attestation is made asynchronously to system’s operation. It is possible that attestation is made before executable codes are hashed again. As a result, meas- urements may give challenger a misinformation about what is running at the moment. Figure 1 makes a comparison between three meas- urement mechanisms: DiT proposed in this paper, IMA and Aegis which is a typical secure processor design to achieve tamper evidence and resistance environment [5]. When IMA measures executable code, it makes com- parison to values which are calculated before. In Aegis, if software’s execution relies on a program, the meas- urement of this program is calculated again and com- parison is made to former calculated value. In these two situations, the challenger may still get measurements from which the system can regarded as trusted but actu- ally the memory is already corrupted. “Metric gap” can be resolved by applying a measure to executable code at the moment of attestation is made Load Execute Don e Measure & Va lidat e Measure Measure & Validate Measure & Validate DiT IMA Aegis ○ 1 ○ 2 ○ 3 Program Execution Procedure ○ 1 Represent the event of possible attacks which corrupt memory ○ 2 Represent the event that remote attestation is required ○ 3 Represent the event the program is used by other application Measure & Validate Figure 1. “Metric Gap” occurs in the design of IMA and Aegis. (which is also reflected in Figure 1). However, “behav- iour gap” can further introduce more severe problem. This describes the fact that static codes in memory are different from instructions executed in processor. But it is instructions executed in processor finally corrupt the system. On the other words, executing instructions are truly represent the trustworthiness of the system. What makes things worse is that many attacks do not rely on the modification on program’s executable code to launch malicious behaviour any longer. For example, buffer overflow attack has diverse implementations. One of them is to insert codes directly in stacks which make detection only possible for a very short period of time. Challenger should also be able to know such deleterious execution since this system is vulnerable to attacks in the future. No matter how attacks exploit software vulnerability, it finally needs to execute its code in the processor. As a result, researchers also propose to records behaviour in the processor. To reduce performance overhead, they only analyse behaviour of critical instructions, such as indirect branch or critical data. Measuring those data may work for certain security policy but lacks of port- ability and extendibility to future unknown attacks. Measuring all instructions is a challenge. Instructions are fetched from memory, but dynamic execution flow varies from situation to situation. It is impossible to pro- vide limited number of unique state to represent safety of such execution. On the other hand, collecting all possible states are computationally impossible to make. DiT does not directly measure all executed instruction in processor. It maintains large part of original measure- ment interfaces which measure codes in memory. What DiT successfully makes is to extend architecture’s pipe- line to build connection between processor and memory (Figure 2). It proposes to store back instructions into its original locations after they are fetched into pipeline. The ![]() H. LIN ET AL. Copyright © 2010 SciRes. JIS 5 IF ID EX WB CM Processor Pi p eline Instruction Cache Main Memor y …… Figure 2. Strcture to measure dynamic instruction trace. purpose is to resolve “behaviour gap” between processor and memory. This is not an intention to record all possi- ble run-time execution paths but to store instructions which are truly executed into measure log. With such modification, what to measure and when to measure have to be carefully designed. Program’s ad- dress space consists of data region, code region, and stack to record program execution context. In IMA, all executable code and part of related data, which are dy- namically loaded by operating system, are measured (Figure 3). DiT will cover all code regions, data regions and stack as long as there are some instructions being written back to them. Due to attacks, instructions can come from other loca- tions rather the code region. This not only makes DiT to expand measurement range to include memory region such as stack, but also require it to add several temporal points to make such measurement. We can still use the aforementioned buffer overflow as the example. Stack contents vary as program enters into different contexts. Malicious code hidden there may soon be overlapped by unrelated information, such as parameter passed by fol- lowing function call. As a result, malicious code should be measured on time before it is eliminated by legal ones. To insert proper temporal points is a trade-off between detection ability and performance overhead. The per- formance overhead in original integrity measurement mechanisms is amortized, which is due to the fact that hash calculation is made at the frequency of program loading. From many former anomaly-detection ap- proaches, successful corruption usually results in some changes in instructions level behaviour, such as cache miss, prediction miss and so on [6]. Furthermore, hash operation, which calculates memory code, is easily per- formed in parallel with program’s normal operation. In the current work, one inevitable measurement is added. DiT launches the measurement at the moment of attesta- tion requirement is made, which at least resolve the met- ric gap between measure and system state. Processor Code Data Stack Code Data Stack Interpreter Dynamic Code ○ 1 ○ 2 ○ 3 ○ 4 Main Memory ○ 1: Data with integrity semantic is loaded by operating ○ 2: Executable codes are fetched from memory ○ 3: Malicious codes are fetched in statck or other illegal location ○ 4: Codes are executed dynamically Figure 3. Behavior gap occurs due to attacks or dynamic generated code. 3.3. Introduce Randomization through the Use of Cache Most personal computers usually have two level caches. Instruction and data are divided in the level-1 cache while level-2 cache is usually a unified cache which stores them together. DiT includes cache into the proce- dure of writing back instructions to the memory which “reverse” the procedure when instructions are fetching from it. In order to make write back work, instruction cache should be appended with few state bit just as data cache does. By replacing structure of individual cache to the one of data cache, processor actually does not need to have the actual action of “writing back”. It only needs to set a corresponding status bit and leave the work to cache and memory management unit. Whenever cache miss occurs, instruction cache first stores values in cache entry back to the memory and then read other instructions instead of overwriting it directly. Usually, it is hard to predict cache miss. This random- izes the time to write back instructions. As a result, an- other level of protection which prevents attacks from learning this measurement and hide its malicious codes can be made. Besides, this operation does not need the involvement of operating system. Even when OS is not trusted, such as the kernel is corrupted, writing back op- eration can always be executed properly. Current micro-architecture design can further help our design to write back instructions. Since level-2 cache is unified, only level-1 instruction cache requires modifica- tion. And the modification is restricted to small number of status bits added to each cache entry. As a result, overhead on chip area, power consumption and access time to cache entry (which is also called cache hit la- ![]() H. LIN ET AL. Copyright © 2010 SciRes. JIS 6 tency) is reasonable. Furthermore, instructions usually holds much better locality references than data cache which results in much less cache miss. Consequently, performance effect from writing back instructions is also possible to be restricted to a small amount. 4. Further Micro-Architecture Recommendations With the proposed design, DiT is able to measure large amount of program’s execution. However, it may still miss some situation due to current operating system de- sign as well as diverse attacking mechanism. In this sec- tion, we propose several extra hardware recommenda- tions to further resolve those issues. 4.1. Adding Measurement Point With the aid of DiT, measurement will be recalculated with program’s execution. There is still a possible hazard that attacker replaces correct codes to the malicious ones (that he injects before) in memory to avoid proper meas- uring (similar to the way he/she can insert malicious code) after malicious codes are stored back. As a result, adding more measurement points is necessary to provide another protection level on DiT itself. The cache miss or branch prediction miss indicate a behaviour change in instruction level, which can be used as a point to recalculate measurement. To further reduce performance, we propose to make the measurement at the moment when the potential attacks are going to hap- pen. However, from current study in software vulnerabil- ity, to detect the proper attacking potential is proved to be another difficult issue. As program is running, its ad- dress space records its execution state through the use of stack and/or heap and so on. However, its code space remains stable. Operating system design provides a good protection when it launches different code space to exe- cute, such as the design of context switch. However, at- tackers successfully inject or exploit new or existing code space to avoid reliable operation provided from operating system. As a result, we can make measurement when instruc- tions are written back to the memory location which is outside of the code region (not address space) for the current running programs. As each program is loading its code, we can records its physical address in memory into a table and store it in a memory management unit. A comparison between written back instruction and each physical address of a code region can indicate which program this instruction is belong to. If it does not be- long to any legal program, we can raise an exception. On seeing this exception, measurement is not also necessary since action of avoiding measurement is made. By such architecture recommendation, DiT can achieve the validation such that every instruction executed in processor should be from executable code space which is properly loaded into memory before. Consequently, DiT can prevent injected code attacks while making meas- urement. 4.2. Measuring the Run-Time Generated Code Different from compiler which generates executable codes, interpreter executes machine instructions on the fly. In our proposed design, integrity measurement is only capable of measuring binary codes of interpreter itself, dynamic codes generate by interpreter to processor are not recorded (Figure 3). On the other hand, more popular attacks begin to adopt this mechanism. Such attacks, including sql injection, cross script attacks, dominate current web applications. This presents a big challenge to provide accurate measurements to remote challenge, as malicious behaviours are extracted from user input and getting execution one instruction by an- other. Measuring executable codes from memory be- comes impossible. When instructions are generated from interpreter, DiT finds that there is no source memory location to which such dynamic instructions can transmit. Our proposed method is to “deceive” the interpreter that the dynamic executed codes is actually dynamic loaded. As a result, it can follow the predefined procedure to make such meas- urement. This is achieved by creating a new memory region which can be linked to the memory space of interpreter’s process. Current operating system, such as Linux kernel, provides safe interface to dynamically add or remove memory region from process’ address space. It will be easy to include such secondary code region to inter- preter’s address space. This is equivalent to adding a container to store dy- namically executed code; however, the measurement will not be possible at the “load” time, since the container is empty at this moment. Only at the end of execution when all executed codes are written back, proper measure is going to be made on the full container. 5. Experiment and Result Analysis In order to analyse applicability of DiT, two sets of ex- periments are conducted respectively. The first one simu- lated measurement mechanism, especially the situation to hash program’s code upon asynchronous attestation. Then another set of experiments are made to detect hard- ware and performance overhead caused by modification on level-1 instruction cache. ![]() H. LIN ET AL. Copyright © 2010 SciRes. JIS 7 5.1. Implementing Measurement Different from IBM’s IMA which implements all integ- rity measurements within Linux kernel, we implement it in the hardware level. DiT is integrated into Bochs which is a full-fledged open source × 86 PC emulator. It is used to emulate entire system from × 86 architecture to virtu- ally instrumented monitor. Through our experiment, we find that write back in- structions to memory causes some instability for emu- lated system. As a result, DiT focuses on certain target program and only stores its on-fly instructions into m- emory. As mentioned before, TCB provides an isolation execution environment for the security-related programs. By implementing writing back instructions for only in- terested program, we believe that DiT can more practi- cally simulate TCB’s execution model. We install Gentoo Linux with the kernel of version 2.6.29 in the emulation. To track process information, kernel is modified so that hardware emulator becomes aware of software context switch. Since version 2.6.x, kernel introduces the late binding for the context switch, so both exec () and sched () functions are modified. Con- sequently, process identity, such as Process ID and Process name is updated into a global variable as soon as process is created and loaded into memory. Besides the operating system modification, we also implement several virtual debugging monitor. One of the most critical interfaces which DiT inserts is the one that halts the execution of current program in emulated oper- ating system and hashes the code region in the address space of current active process. This efficiently emulates the situation that measurement is made upon the attesta- tion request is sent from remote challenger. 5.2. Performance Overhead In order to make instructions cache to write back, several extra status bits are required to each cache entry which is similar to the structure in data cache. Since in most mi- cro-architecture design, level-2 cache is designed as a unified cache, only level-1 instructions cache needs modifications. To make a comprehensive analysis of such change, area, power consumption and access time is emulated under CACTI 5.0 [7]. The parameter of un- modified cache is the same as the one used in Table 1, which is also used in SimpleScalar for performance ex- periment. Five extra bits are added to each entry of the instruction cache to implement the write back mecha- nism. With the simulation results given from Table 2, largest power overhead is less than 10%. Overhead of other criterion is actually ignorable. Especially, modifi- cation has little effect on access time of level-1 cache. Table 1. Architecture parameters. Parameter Value Fetch/dispatch/issue width 4 Instruction window 128 entries register update unit size 128 entries Load/Store Queue 64 entries I-cache 128K 1 way set-asso., 1-cycle hit time D-cache 128K 1 way set-asso., 1-cycle hit time L2 cache Unified, 1M, 4 way set-asso, 6 –cycle hit time Memory 100 cycles access time, 2 memory ports Function unit 4 Int ALUs, 1 Int MUL/DIV, 4 FP Adder, 1 FP MUL/DIV Table 2. Area, power and access time overhead for modified L1 cache. Technology node Overhead criterion Normal L1 Cache Modified L1 Cache Overhead Area (mm^2)2.59811765 2.669091732.73% Power (W) 5.23044172 5.237871430.142%90 nm Access time (ns)1.40756434 1.407564340.00% Area (mm^2)0.36714162 0.369299740.588% Power (W) 3.54005779 3.879765419.59% 32 nm Access time (ns)0.43442463 0.438758090.998% We tested SPEC2000 benchmarks running in Simples- calar which models an out-of-order superscalar processor [8]. Reference inputs are adopted and we skip instruc- tions of the number which is specified by SimPoint [9]. Writing back instructions are not supported in Sim- plescalar, as a result, we modify source codes of sim- outorder (the out of order simulators) such that right after each time a read access is performed to the level-1 cache, a write access to the same entry in the cache is launched. The parameter to run Simplescalar is given in Table 1. We collect all number of level 2 cache access and cache misses for each program in SPEC 2000. The num- ber of level-2 cache access varies to different programs. In eon, perlbmk and vortex, the modified level-1 cache increases more than 50% of level 2 cache accesses. But for other benchmarks, the change is not that obvious. We only select the increase of level-2 cache access with more than 0.01% among all 26 programs (Figure 4). ![]() H. LIN ET AL. Copyright © 2010 SciRes. JIS 8 0.00% 50.00% 100.00% 150.00% 200.00% 250.00% eon gap perlbmk crafty vortex mesa parser sixtrack twolf gzip vpr gcc bzip2 L2 Cache access L2 Cache Miss Figure 4. Normalized Level-2 cache access and cache misss. Although there are big increases in level-2 caches ac- cess, this does not simply increase the corresponding cache miss. All cache miss due to the modification of level-1 cache is increased with less than 1%. This is probably due to the fact that level-2 cache holds a good locality references for instructions. As a result, perform- ance overhead for all benchmark programs is ignorable as shown in Figure 5. The largest performance overhead measured in IPC is less than 5%. 6. Related Work 6.1. Tamper Resistance Design Execute Only Memory (XOM) has included whole me- mory space in the trusted computing base as most adver- saries launch the attacks to corrupt memory [10]. In or- der to guarantee both integrity and privacy of the data in memory, encryption components are included in the leg- acy architecture design. Data transmitted from processor to memory is encrypted and reversely, they are decrypted for execution in processor Aegis [5] follows the same assumption that memory can not be trusted. It hashes executable code when a program is loaded into the memory for execution. At this moment, any other code and data that the program relies on is checked to guarantee that the program is started in a trusted environment. In the situation that operating sys- tem can not be trusted, Aegis introduce security related module and hardware component into the legacy proc- essor. Tamper resistance design does not make assump- tion on how memory is corrupted thus it is able to detect simple hardware attacks. Tamper resistance design is similar to our approach in the way of measuring untrusted code. However, they are holding the assumption that detection of static code can be found on moment the software is used again. As men- tioned before, attestation can be made before next-use of 0 0.5 1 1.5 2 2.5 3 eon gap perlbmk crafty vortex mesa parser sixtrack twolf gzip vpr gcc bzip2 IPC Normal L1 Cache Modified L1 Cache Figure 5. Comparison of IPC number with normal Level-1 cache and modified L1 cache in processor. software modules, so directly adopting tamper resistance approach introduces “metric gap”. On the other hand, they are unable to measure program’s runtime behaviour as well. 6.2. Integrity Measurement TCG first standardize the procedure to make a remote attestation, besides, it also recommends an integrity measurement methods which is efficient during system booting. This binary attestation can only record what the programs are running on the platform and use the iden- tity and the loading order of programs to system state after booting. IBM’s IMA, Integrity Measurement Architecture, in- serts measurement interface into Linux kernel. As each program is loaded into memory, its executable code is hashed. When a program is further loading other codes or security-critical data structure, measurement is made as program transfer its control flow. However, software vulnerability which is exploited by attackers during each individual program’s execution can also spoil measure- ment. Based on the observation that modifications made in kernel space is usually permanent, Loscocco et al. pro- pose to measure dynamic data structure which is critical to kernel control flow [11]. Such dynamic data structure is called contextual information, which is used to repre- sent the state of the whole computing system. But this method is not efficient to be used in the user space op- erations. 6.3. Property Driven Remote Attestation Binary measurement has the advantage of easy calcula- tion and application-independence. Since hash calcula- tion is irreversible, directly exploiting such metrics pro- ![]() H. LIN ET AL. Copyright © 2010 SciRes. JIS 9 vides a big challenge and performance overhead. As a result, different attestation, which adopts different met- rics, is proposed. With specific security policy being set for the attesting system, property attestation and semantic attestation [12- 14] propose to derive system high level information in- stead of the pure software stack. The extracted metrics can be directly used against security policy. Measure- ment methods may be implemented differently, but measurement is decided by security policy. As security policy changes, it is less flexible to change measurement implementation accordingly. As they indirectly include validation part into attesting platform, attesting plat- form’s performance overhead is increased and validation procedure is also put under the hazardous environment. We propose DiT which designs an application-inde- pendent measurement which separates validation and measurement just as binary attestation does. Some other researches also consider that program’s run-time behaviour as a validation metrics, however, with many limitations. Alam et al. propose a behaviour attestation method [15]. However, the behaviour is de- fined as the quality of service the system can provide, connection latency, and so on. Consequently, this attesta- tion implementation designed for web services only which lack the portability to be applied to other applica- tions programs. 7. Conclusions Ever since TCG standardized the procedure to launch a remote attestation, how to exchange the trust measure efficiently between computer systems under diverse pla- toforms has been a popular open research issue. Locally, attesting mechanism derives integrity measure based on software stacks on which trust decision is made. TCG introduces a binary attestation during system booting and many integrity measurement implementations are pro- posed following the “measure-before-loading” principle. Those measurements do not take into the account the actions after each program begins its execution. As a result software vulnerability which can corrupt both sys- tem status as well as measurement operation can intro- duce the “behavior gap” and the “metric gap” between program runtime behavior and consequent measurement. DiT, the dynamic instruction trace integrity measurement, is proposed as assistance to the current integrity meas- urement methods. By changing the structure of instruc- tion cache, instructions are stored back into memory when cache miss occurs. As a result, code region in pro- grams address space actually contains dynamic instruc- tions trace executed in processor. By applying integrity measurement based on this change, DiT successfully include most updated system state to the moment when attestation is required. We have experimented this attestation mechanism in bochs, a full-fledged emulator, with a current updated version of Linux kernel installed. We have successfully simulated the procedure of measuring program’s code (or trace) at the time when attestation is made. To further analyze the change made in level-1 instruction cache, Cacti is exploited to check area, power consumption and access time overhead. SPEC2000 benchmarks are run on the modified Simplescalar to analyze the performance overhead. As we only limit our small modification in level 1 instruction cache, the overhead in terms of circuit area, power consumption, and access time are all rea- sonable, and also the performance overhead is marginal. 8. Acknowledgement This work was supported by the IT R&D Program of MKE/KEIT (2010-KI002090, Development of Technol- ogy Base for Trustworthy Computing). 9. References [1] “Trusted Computing Group.” http://www.trustedcompu- tinggroup.org [2] TCG Specification Architecture Overview Specification Revision 1.4, Trusted Computing Group (TCG), 2007. [3] IBM Integrity Measurement Architecture (IMA). http: //domino.research.ibm.com/comm/research_people.nsf/pa ges/sailer.ima.html [4] J. M. McCune, B. Parno, A. Perrig, M. K. Reiter and A. Seshadri, “How Low can you Go Recommendations for Hardware-Supported Minimal TCB Code Execution,” Proceedings of ASPLOS’08, Seattle, Vol. 43, No. 3, 2008, pp. 14-25. [5] G. Edward Suh, D. Clarke, B. Gassend, M. Dijk and S. Devadas, “AEGIS: Architecture for Tamper-Evident and Tamper-Resistant Processing,” Proceedings of ICS’03, San Francisco, 2003, pp. 160-171. [6] Y. X. Shi and G. H. Lee, “Augmenting Branch Predictor to Secure Program Execution,” Proceedings of DSN 07. [7] http://www.hpl.hp.com/research/cacti/ [8] T. Austin and D. Burger, “The SimpleScalar Tool Set,” University of Wisconsin CS Department, Technical Re- port No. 1342, June 1997. [9] T. Sherwood, E. Perelman, G. Hamerly and B. Calder, “Automatically Characterizing Large Scale Program Be- havior,” Proceedings of the 10th ASPLOS, California, Vol. 37, No. 10, 2002, pp. 45-57. [10] D. Lie, C. Thekkath, M. Mitchell, P. Lincoln, et al., “Archi- ![]() H. LIN ET AL. Copyright © 2010 SciRes. JIS 10 tectural Support for Copy and Tamper Resistant Software,” SIGPLAN Notice, Vol. 35, No. 11, 2000, pp. 178-179. [11] P. Loscocco, P. Wilson, A. Pendergrass and C. McDonell, “Linux Kernel Integrity Measurement Using Contextual Inspection,” STC’07: Proceedings of the 2007 ACM Workshop on Scalable Trusted Computing, Virginia, 2007. [12] L. Chen, R. Landfermann, H. Lohr and C. Stuble, “A Protocol for Property-Based Attestation,” Proceedings of STC’06, the ACM Press, Virginia, 2006, pp. 7-16. [13] A. Sadeghi and C. Stuble, “Property-Based Attestation for Computing Platforms: Caring about Properties, not Mechanisms,” Proceedings of NSPW’04, New York, 2004, pp. 67-77. [14] V. Haldar, D. Chandra and M. Franz, “Semantic Remote Attestation: A Virtual Machine Directed Approach to Trusted Computing,” Proceedings of VM’04, San Jose, 2004, p. 3. [15] M. Alam, X. W. Zhang, M. Nauman and T. Ali, “Behav- ioral Attestation for Web Services (BA4WS),” Proceed- ings of the 2008 ACM Workshop on Secure Web Services, 2008. |