Journal of Data Analysis and Information Processing
Vol.04 No.03(2016), Article ID:69365,14 pages

Risk Analysis Technique on Inconsistent Interview Big Data Based on Rough Set Approach

Riasat Azim1, Abm Munibur Rahman2, Shawon Barua3, Israt Jahan4

1School of Computer Science & Engineering, Wuhan University of Technology, Wuhan, China

2School of Management, Wuhan University of Technology, Wuhan, China

3Infolytx Inc., Dhaka, Bangladesh

4East West University, Dhaka, Bangladesh

Copyright © 2016 by authors and Scientific Research Publishing Inc.

This work is licensed under the Creative Commons Attribution International License (CC BY).

Received 9 May 2016; accepted 30 July 2016; published 2 August 2016


Rough set theory is relativly new to area of soft computing to handle the uncertain big data effici- ently. It also provides a powerful way to calculate the importance degree of vague and uncertain big data to help in decision making. Risk assessment is very important for safe and reliable invest- ment. Risk management involves assessing the risk sources and designing strategies and proce- dures to mitigate those risks to an acceptable level. In this paper, we emphasize on classification of different types of risk factors and find a simple and effective way to calculate the risk exposure.. The study uses rough set method to classify and judge the safety attributes related to investment policy. The method which based on intelligent knowledge accusation provides an innovative way for risk analysis. From this approach, we are able to calculate the significance of each factor and relative risk exposure based on the original data without assigning the weight subjectively.


Rough Set Theory, Big Data, Risk Analysis, Data Mining, Variable Weight, Significance of Attribute, Core Attribute, Attribute Reduction

1. Introduction

Rough Set Theory, proposed in 1982 by Zdzislaw Pawlak, this theory is now in a state of constant development. Its methodology is concerned with the classification and analysis of imprecise, uncertain or incomplete information and knowledge, and of is considered one of the first non-statistical approaches in data analysis (Pawlak, 1982) [1] . The theory has found applications in many domains, such as decision support engineering, environment, banking, medicine and others [2] .

Over the years, rough set theory has become a valuable tool in the resolution of various problems, such as: representation of uncertain or imprecise knowledge; knowledge analysis; evaluation of quality and availability of information; identification and evaluation of data dependency; reasoning based an uncertain and reduct of information data.

In this paper, we describe the different risk factors of investment risk and find a big data approach to emphasize the significance risk factors to more smother way to invest. The key point of this paper is we can calculate the importance degree of different level risk factor from the inconsistent and incomplete data by rough set theory.

2. Data Preprocessing

2.1. Understanding Data

There are mainly three types of investment risk. In Figure 1, we show the main risks. There are

1) Strategic Risk

2) Operational Risk

3) Financial Risk

We can also divide the micro level risks in macro level risk. Here we show the financial risks hierarchy.

Financial risk is an umbrella term for multiple types of risk associated with financing [3] , including financial transactions that include company loans in risk of default. Risk is a term often used to imply downside risk, meaning the uncertainty of a return and the potential for financial loss [4] . Figure 2 shows the risk hierarchy of the financial risk as an example of parent level risk.

Types of Financial Risk:

1) Prices

Figure 1. Types of business organizational risk.

Figure 2. Risk hierarchy of financial risk [5] .

Ÿ Interest rates

Ÿ Currencies

Ÿ Stock market

Ÿ Energy market risk

Ÿ Non energy market risk

2) Complex financial products

3) Liquidity risk

4) Customer credit

In the same way we can divide the other high level risks (Strategic & Operational Risk) in a hierarchy way [6] .

2.2. Data Collection & Representation

We process the collected data on such a fashion that it fit on our context. First phase of the data processing is encode it in simple and recognition able way. Here, in Table 1, we encode the financial risk as FR and its child as FR-n. Example: Prices as FR1. The Prices is immediate child of financial risk. There are also 5 Childs of Parent Prices. We encode the Childs as FR1n, example: Interest rates as FR11, Currencies as FR12.

The second phase is arranging the data in matrix format. We already encoded the risk factors as attributes. Now we arrange the each attribute scores by each managerial rank people as a row of matrix. Figure 3 repre- sents the risk matrix of financial risks.

3. USACE & Hierarchical Holographic Model Based Investment Risk Analysis

3.1. Basic Concepts

3.1.1. USACE Model

USACE has been managing risk for a long time, beginning well before risk analysis grew into prominence.

Risk management components can be found in a number of USACE programs. In the 1980s, USACE grappled with the problem of modernizing its approach to the major rehabilitation of existing projects [7] [8] . Efforts to objectively assess the reliability of the existing structures gave rise to the use of risk-based analytical techniques and analyses that supported decision-making.

3.1.2. Hierarchical Holographic Modeling (HHM)

Haimes (1981) started the research in the field of HHM. HHM addresses the issues related to hierarchical institutional, managerial, organizational or functional decision-making structures [9] . Kaplan et al. (2001) suggested that HHM has been regarded as a general method for identifying the set of risk scenarios [10] . HHM is parti-

Table 1. Encoded risk attributes of financial risk [5] [11] .

Figure 3. Risk matrix of financial risk.

cularly useful in modeling large-scale, complex, and hierarchical systems. The HHM methodology recognizes that most organizational as well as technology-based systems are hierarchical in structure, and thus the risk ma- nagement of such systems must be driven by and responsive to this hierarchical structure.

Himes, et al. (2002) suggested that the nature and capability of HHM is to identify a comprehensive and large set of risk scenarios [12] . To deal with this large set we need a systematic process that filters and ranks these identified scenarios is needed so that risk mitigation activities can be prioritized. In addition, Kaplan et al. (2001) [10] suggested that HHM could be viewed as one of the methods of Theory of Scenario Structuring (TSS), which is the part of QRA that is useful in identifying the set of risk scenario.

3.2. Technical Approach

Figure 4 shows the risk assessment model we use to assess the risks. This model consists of 6 sub elements. There are communicate and consult, establish decision context, identify risk, analyze risk, evaluate risk and risk management decision.

Every investment involves some degree of risk. Risk is quantifiable both in absolute and in relative terms. A solid understanding of risk in its different forms can help investors to better understand the opportunities, trade-offs and costs involved with different investment approaches.

We can implement USACE model to analysis investment risk [10] [13] [14] .

3.2.1. Establish Decision Context

All but the simplest investments expose investors to multiple financial risks that can result from a range of events and scenarios. Risk can involve the collapse of a specific company, industry sector or currency.

In this decision context the future investor can analyze the risk of investment and improve their decision making ability. To mitigate the risks of investment next phases are also very important.

3.3.2. Identify Risks

There are mainly three types of investment risk. There are

Ÿ Strategic Risk

Ÿ Operational Risk

Ÿ Financial Risk.

Here we will only discuss about Financial Risk as an example.

1) Prices

Ÿ Interest rates

Ÿ Currencies

Figure 4. Proposed model for risk assessment.

Ÿ Stock market

Ÿ Energy market risk

Ÿ Non energy market risk

2) Complex financial products

3) Liquidity risk

4) Customer credit

3.3.3. Analyze Risk

To calculate the importance degree and ordered list of each risk attributes we follow some steps. Those steps shortly describe in Table 2. In bellow we will describe the steps.

U= {1, 2, 3, 4, 5, 6…..} represents the study objects, i.e. a set of company managers evaluation value about investment risk factor, represents the all risks evaluation indicators as explained in section understanding data. Based on interview data from the evaluation value by the company, a score of 1 to 5 is given to each indicator, with 5 being the highest risk exposure level.

In above Table 3 shows the likertscale for scoring the risk attributes. The scoring means that in one case, if certain factor is very important, such as country risk, then 5 is given to represent very high risk level. On the contrary, if an indicator is relatively reliable and safe, for instance non energy source, then 1 can be given to represent very low risk exposure in this aspect. In addition, the outcome on investment is represented by D = {outcome}. Y stands for loss, and N means no loss.

SIM (A) denotes binary similarity relation between objects that are indiscernible with regards to indicator’s value. The similarity relation can be defined as


stands for pair of study objects. This means, two study objects (x, y) has binary similarity relation if the value of each attribute for object x, i.e. a(x), is the same as the value of the corresponding attribute for object y, i.e. a(y). For any value of attribute which is missing, i.e. a(x) = * or a(y) = *, a(x) and a(y) are considered the same since * can represent any number.

For any value of attribute which is missing, i.e. a(x) = * or a(y) = *, a(x) and a(y) are considered the same since * can represent any number.

Table 2. Basics steps of risk analysis.

Table 3. Description of exposure of the risk [15] .

SA (x) represents the maximal set of objects which are possibly indiscernible by A with x.


1) Determine all reducts

A reduct is a minimal set of indicators from A that preserves the original classification defined by A. This can be determined by establishing Boolean Discernibility Matrix [1] [2] [16] with for any pair (x, y) of the objects.

Δ is a discernibility function for information table.


Δ(x) is a discernibility function for object x in information table.


Table 4 shows the discernibility matrix which we already showed mathematically.

2) Calculate the importance degree of each risk indicator

Then the importance degree of each indicator can be calculated by using [1] [2] [17] [18] the following equation:

Table 4. Discernibility matrix of financial risks.


Here Card (Eij) number of items in one index where a is present.


Figure 5. Importance degrees of financial risks.

Thereafter, the importance degree can be normalized for easier comparison, showed in Figure 5, which can

by the following equation:


3) Integration with Attribute Weight and Expert Opinion

Attribute values are collected from the old investor. All values are distributed in liker scale 1 to 5. This attribute value is a qualitative value, so it’s important to integrate with the distribution of importance degree. The basic rule of integration is multiply with the average of attribute value, shown in Figure 6.


3.3.4. Evaluate Risks

Using HHM We can divide the risks in hierarchical way. In the top level the organizational risk, then macro level risk and then micro level risk.

Described in Figure 7, the hierarchical separation of risks we can calculate the significance of micro level risk, macro level risk and investment risk for an organization and also can put more clear gesture on risk management decision.

1) Calculation of Parents Risk Assessment

After calculating all micro level risk, we can combine using HHM model. The basic rule of HHM model is the summation of child level risk represents the parent level risk.


So if we want to calculate the Assessment value of operation risk the below equation can serve our purpose.

After this phase we can generate all risk assessment value. Now we can arrange the list ascending order or descending order to evaluate the risk priority.

3.3.5. Risk Management Decision

Risk management provides the mechanism to make intelligent decisions with risk reduction as a key input driver. Risk management provides a disciplined environment for proactive decision making in order to:

Ÿ Proactively identify risks

Ÿ Prioritize risks

Ÿ Implement strategies for dealing with risks

Ÿ Assure and measure effectiveness of implemented strategies

Figure 6. Risk assessments of financial risks.

Figure 7. Risk assessment hierarchies of financial risks.

4. Result Analysis

Traditional risk assessments that include asset valuation do not always capture the essence and uncertainty of the underlying risks. Based on those attributes, we analysis the risk attributes. But because of the sensitive informative and conduct the questionnaire surveys, it is not the elaborative or deep analysis, It is justified with other literature and shows the risk importance of risk attributes.

4.1. Calculate the Importance Degree of the Risk Indicators

To calculate the risk degree importance, we have taken the same methodology shown to calculate risk attributes using RST theory in a process of factors decisional matrix and weighted average to calculate the risk values. Figure 6 shows the importance degree of risk indicators values (ω) for financial risk analysis. The normalized risk (ω) values are for the easier comparison and relate with other risk justification. Also from Figure 5, we may conclude the risk indicators values. In the financial policy (FR1), the energy risk (FR14 = 0.088) is the highest importance of degree in risk indicators because of the lack sufficient energy supply shortage. In the row of risk indicators, the monetary system (FR = 15) may affect the overseas companies in financial risk uncertainties. In the rest of the financial attributes, researches find as per the weighted average, liquidity risk (FR31 = 0.013) is the significant risk values in the company solvency and financial regulation (FR21 = 0.071) in the segment of internal policy of that industry. Overall, financial global uncertainties (FR51 = 0.068) has also the degree of importance in risk analysis.

Figure 8 represents the risk indicator value f based graph, where FR31, FR52 got highest exposure then followed by FR14, FR41, and FR15.

4.2. Risk Analysis and Comparative Risk Ranking Analysis

After calculating the normalized risk values (ω), we have calculated the final risk analysis value based on the experts knowledge. It makes the justification for risk analysis and to assume the reliable risk attributes degree of importance results where financial risk is sensible to any company’s performance.

Figure 8. Importance degrees of the risk indicators.

From Figure 9, we can describe the highest degree of risk attributes; (FR31 = 0.39) liquidity risk has the highest value which indicates the most degree of importance in terms of experts’ knowledge (Q). Then, the interest risk uncertainties (FR11 = 0.324) indicates the degree of importance in the Financial policy risk types segment. Among the other risk uncertainties, FR14 = 0.25 and FR41 = 0.22 has the significant degree of importance in the financial risk uncertainties.

Table 5 shows the comparative analysis between the risks attributes results which contains the risk values of normalized value (ω) and the average of expert’s opinions (Q).

Most of the risk attributes are having same rank between the two types the risk values. Here, an interest rates uncertainty (FR11) has the highest rank in the experts’ knowledge but it importance as the less significant for companies under the financial policy segment. Similarly, energy risk has the top degree of significance in company manager’s view but its second type degree of risk in case of expert’s eyes. It is normal that experts’ calculation may differ with company managers because the knowledge and practical gap between the two views have shown in Figure 10.

4.3. Risk Analysis Using HMM Method of Financial Risk

The major advantage of the HHM framework for risk assessment and management is its ability to identify risk scenarios that result from and propagate through the multiple overlapping hierarchies in real-life systems. In the planning, design, or operational modes, the ability to model and quantify the risks contributed by each subsystem facilitates understanding, quantifying, and evaluation the risks of the whole system. In particular, the ability to model the intricate relations among the various subsystems and the ability to account for all relevant and important elements of risk and uncertainty renders the modeling process more representative and encompassing.

Using this equation we can calculate the parent risk.

4.3.1. Importance Degree

Using the above equation we calculate the micro level risk of the financial risk.

Financial Policy 8.96 + 7.09 + 6 + 9.70 + 9.47 = 41.22

Internal Policy 7.76 + 7.63 = 15.39

Company solvency 11.34 + 6.75 = 18.09

Figure 9. Financial risks analysis result.

Table 5. Risks attributes ranking for financial risks.

FR4 9.51 + 7.82 = 17.33

FR5 7.51 + 10.47 = 17.98

In Figure 11, result shows business environment risk got the highest risk exposure. Then regulatory environment, brand and communication and strategic information got almost same level risk exposure. The lowest risk exposure is organization behavior design on the basis of importance degree of micro level risk.

4.3.2. Normalize Value

Financial Policy 0.19 + 0.8 + 0.16 + 0.22 + 0.19 = 0.84

Internal Policy 0.40 + 0.10 = 0.50

Figure 10. Comparative images between the risk indicators and risk analysis ranking.

Figure 11. Distribution of importance degree.

Company solvency 0.14 + 0.19 = 0.33

FR4 0.25 + 0.07 = 0.32

FR5 0.23 + 0.32 = 0.55

In Figure 12, results show business environment risk got the highest risk exposure. Then regulatory environ- ment, brand and communication and strategic information got almost same level risk exposure. The lowest risk exposure is organization behavior design on the basis of distribution of micro level risk.

4.4. Comparative Risk Analysis

From the SAP Risk Management we know that the risk score calculation method differs if the probability is enabled in the Maintain Analysis Profile Customizing activity [19] .

Ÿ If the probability is enabled, the risk score = probability X impact.

Ÿ If the probability is disabled, the risk score = sum of all impact values.

By solving the deferring characteristic of risk in engineering project using variable weight theory [20] to improve the accuracy of risk evaluation. And result produce from our Rough Set Theory Approach, if we compare all three methods we can find the similarity between them.

From the graphs shown in Figure 13, we can realize on all three approaches the certain risks get highest exposure.

5. Conclusions

We have identified a set of key internal and external uncertainties, which are eventually highlighted as “risk de terminants” based on their occurrence and consequential effects on the business performance. This paper pres- ents the identified risk determinants and describes a methodology to identify them.

The merits of RST to handle incomplete and uncertain information, and its capability of minimizing subjective analysis have been exploited in this study. After identifying the uncertainties and categorizing in major risk types, we set the data table and put in RST software coding to Initialized Information. Then, to find out the

Figure 12. Distribution of normalize value.

Figure 13. Comparisons with score & variable weight approaches.

similarity relation and set up a discernibility function for information table and discernibility matrix table. To find out the significant risk attributes, the weighted average function is used to calculate the most significant risk evaluation indicators. Thereafter, the importance degree can be normalized for easier comparison. We can find the most important attributes from each risk types. Such as in the risk types of business environment (SR1), the competitive environment is the most degree of important and the economic environment is the second most important factors for business which indicates the normalized risk value of SR18 = 0.042 and SR13 = 0.042 respectively. Companies emphasize on the business partner (SR17) and industry moves SR19) respectively.

Cite this paper

Riasat Azim,Abm Munibur Rahman,Shawon Barua,Israt Jahan, (2016) Risk Analysis Technique on Inconsistent Interview Big Data Based on Rough Set Approach. Journal of Data Analysis and Information Processing,04,101-114. doi: 10.4236/jdaip.2016.43009


  1. 1. Pawlak, Z. (1983) Rough Sets. International Journal of Computer and Information Science, 11, 341-356.

  2. 2. Pawlak, Z. (1982) Rough Sets. International Journal of Computer and Information Science, 11, 341-356.

  3. 3. Blinkowitz, B.S. and Wartenberg, D. (2001) Disparity in Quantitative Risk Assessment: A Review of Input Distribution. Risk Analysis, 21, 75-89.

  4. 4. Shi, H.W., Li, W.Q. and Meng, W.Q. (2008) A New Approach to Construction Project Risk Assessment Based on Rough Set and Information Entropy. 2008 International Conference on Information Management, Innovation Manage- ment and Industrial Engineering, 1, 187-190.

  5. 5. Islam and Tedford (2012) Risk Determinants of Small and Medium-Sized Manufacturing Enterprises (SMEs)—An Ex- ploratory Study in New Zealand. Journal of Industrial Engineering International, 8, 12.

  6. 6. Types of Risk Management, World Finance.

  7. 7. Adapted from ISO 31000: 2009 Risk Management-Principles and Guidelines.

  8. 8. National Research Council, National Academy of Sciences, Scientific Review of the Proposed Risk Assessment Bulletin from the Office of Management and Budget (2007) [Hereinafter 2007 NAS Report on the Proposed Risk Assessment Bulletin]. 6-7.

  9. 9. Haimes, Y.Y. (1981) Hierarchical Holographic Modeling. IEEE Transaction on Systems, Man, and Cybernetics, 11, 606-617.

  10. 10. Kaplan, S., Haimes, Y.Y. and Garrick, B.J. (2001) Fitting Hierarchical Holographic Modeling (HHM) into the Theory of Scenario Structuring and Refinement to the Qunatititative Definition of Risk. Risk Analysis, 21, 807-819.

  11. 11. Business Risk, Classification.

  12. 12. Haimes, Y.Y., Kaplan, S. and Lambert, J.H. (2002) Risk Filtering, Ranking, and Management Framework Using Hie- rarchical Holographic Modeling (HHM). Risk Analysis, 22, 383-397.

  13. 13. Lund, J.R. (2008) A Risk Analysis of Risk Analysis. Journal of Contemporary Water Research and Education, 53-60.

  14. 14. Chapter 2—Qualitative Methods for Analyzing Risk. 2.0 Qualitative Methods for Analyzing Risk.

  15. 15. Boone Jr., H.N. and Boone, D.A. (2012) Analyzing Likert Data. Journal of Extension. Analyzing Likert Data, 50.

  16. 16. Tiwari, K.S. and Kothari, A.G. (2013) Attribute Reduction Algorithm for Inconsistent Information System Using Rough Set Theory. Ciit 2013.

  17. 17. Chen, P. and Yuan, T. (2011) Information Security Risk Warning Method Research That Based on Rough Set Theory. 2011 International Conference on Electrical and Control Engineering (ICECE), Location, 2011, 3039-3042.

  18. 18. Bai, L., Zhang, Y.B. and Zhao, Y.L. (2009) Applying Rough Set Theory into Risk Identification. Future Information Technology and Management Engineering, 2009. FITME’09. Second International Conference on Year: 2009. 481- 485.

  19. 19. SAP Risk Management, Risk Analysis Using Scoring.

  20. 20. Huang, Y.S. and Tian, C.F. (2008) Research on Risk Assessment in Engineering Project Based on Route Analysis and Hierarchical Variable Weight Fuzzy Evaluation. The 2008 International Conference on Risk Management & Engineering Management, NCEPU-China, 478-481.