With the social development, we are stepping into an information technology world. In such a world, our life is getting more and more diversified and rich because of e-business. E-business not only provides us convenience but also large amounts of business data. However, how shall we better store, manage and use these business data has become a major field being studied by e-business. With the rapid growth of data volume, the relational database system cannot meet the requirements of the current status. In this paper, focusing on the visualized analysis model of Hadoop business data, it analyzed the business data in terms of the visualized platform, database and analysis model etc. Depending on the analysis, offline-data analysis and data visualization for Hive database will be greatly improved, so that references and suggestions can be provided for the visualized analysis model of Hadoop business data.
With the great development of the society, people are living in a world full of information. Today, the world where we live has more and more information carriers, for example mobile intelligent device and TV commerce website are the commonly-used information carrier. Because of these information carriers, large amounts of business data have been generated and delivered. For the business data, there are useful one which can help people correctly analyze the trend and make a right decision if people are able to timely realize the information value and rationally use the information. For data visualization, it is that valuable information is extracted from a large group of information, which will be presented by charts and figures. So we can say that data visualization is a kind of form whose presentation is visualized. Normally, under the ground of business intelligence, decision-maker must make a conclusion and analysis on the previously-obtained data experience, trying to make innovations and perfections based on the original data. In this way, one is able to get a favorable advantage in the competition. But the way to extract valuable information from a large amount of information is quite complicated and complex, which will not only waste the human resources but also adversely affect the extraction efficiency. Therefore, data visualization can get the business data deliver to people via charts or figures, so that people can get valid information in a more convenient way. Anyway, data visualization can greatly help the analysis of business data [
With the advancement of the society, the servers used to establish the traditional e-business system were quite expensive and relational database system is used as the business database. Being affected by cloud computing and internet, business data is experiencing an exponential growth and the traditional database system is unable to well handle such a situation and fails to satisfy the basic requirements on data analysis and data processing. Under this background, here comes Hadoop technology, by which large amounts of business data can be rationally processed. But what is Hadoop platform technology? Normally, Hadoop platform technology has two major cores―HDFS (the distributed file system) and MapReduce (parallel computation framework). For HDFS, it has a superior ability of fault tolerance, which is mainly used by low-price hardware. HDFS can provide a high throughput for the users so that they can conveniently visit the data. Therefore, HDFS can greatly push forward the business with large amounts of data sets. What’s more, HDFS loosens the restrictions to POSIX so that the form to visit the data information existed in the file system can be greatly improved. Now, many Chinese enterprises are utilizing HDFS. According to the current situation in China, this paper studied the Hadoop platform, hoping to realize the business data visualization; also, this paper analyzed how to solve the existing problems and how to get it feasible [
Hadoop is a basic framework for the distributed system that is developed by Apache Software Foundation, by which the users are able to develop distributed programs though the users don’t well know about the distribution details of business data visualization. Once a distributed program is developed, high-speed data computation and data storage can be fulfilled by using the strength of cluster [
Today, several data visualization forms are being widely used by business activities, including matrix graph, teaching coordinate diagram and cloud chart, etc. Every data visualization has its own advantages and disadvantages, whose application value will not come into play unless a rational choice is made. However, the selection mainly depends on that whether the data visualization is able to help us better observe the data. Normally, visualized graphs consist of several basic parts, including primary area, graphics primitive and legend etc. Primary area, the most important part, is the major board and model used to make
visualized graphs, which is in the shape of square or rectangle. Actually, if the visualized graphs are organized differently, the contents delivered to the user will be different accordingly. For example, taking bar diagrams and maps as the research subject, rectangular coordinate is always the benchmark existed in bar diagram, whose vertical axis and horizontal axis represents the metrics and types respectively. Contrarily, geographic map is the benchmark of map, whose metrics are presented in different colors. From the aforementioned, we can know that once data visualization is to be done, it is mandatory to get the data organized and converted differently. For example, if bar chart is used to show the sales volume and amount of a commodity, key elements can be expressed abstractly, which will be used to form a visualized analysis model. Then depending on data visualization technology and Hadoop, a visualized analysis mode for Hadoop business data can be established more conveniently [
First of all, it is necessary to know that the study is based on Hadoop cloud computation platform. After the foundation is defined, visualized analysis model for Hadoop business data can be designed and established. For business, all business data is stored in the relational database so it is necessary to deliver the business data to HDFS. After the delivery, Hive data warehouse can be constructed. As long as data is transferred to Hive, the analysis results can be put into Hbase database. If so, the final results will blend with the already-existed visualized model, by which a visualized analysis model for Hadoop business data will be formed.
For data integration of visualized analysis model for Hadoop business data, it is a process that valuable business data is extracted from the enterprise database and stored to Hadoop HDFS. Normally, data storage includes two stages―full-amount import of original data and increment import. In this experiment, Sqoop was used to do the data import. See
After the business data is fully imported via Sqoop, the business data shall have its format converted in order to better satisfy the quality requirements on business data. In the visualized analysis model for Hadoop business data, there are two ways used to convert the business data―field combination and field split. First, field combination was studied, finding that all business data is totally
independent from each other, without any connection [
After business data is imported to Hive database, administrator will design the themes used for visualized analysis according to the enterprise demands. For the theme visualized analysis of Hadoop business data, it is that valid data will be used to know about the basic type and storage structure of business data, by which a theme type with visualized meaning will be formed at last. It is quite important to well design the key because high requirements are set to the response speed of data visualization. As for the design requirements on key, it is required that Category 1 key shall follow the standard “analysis theme and time of formation”. Analysis theme is used to distinguish the statistical analysis results while time of formation is used to identify when the statistical analysis result is formed.
According to
Compared with the Category 1 key, the Category 2 key is more complicated and more detailed. For Category 2 key, the design is in the mode “analysis theme + member property + time of formation”. Actually, Category 1 and Category 2 have the same column family while their columns are different. And the attribute value of inquiry result is stored in each column.
The feasibility of visualized analysis model for Hadoop business data can be verified by experiments and the experiment can be further verified by establishing Hadoop cloud computation platform via computer network center and knowing about the Hadoop cluster, hardware configuration, system version, software version and relevant parameters [
Purchase-sell-stock management platform existed in small and micro enterprises were used to verify the visualized analysis model for Hadoop business data. See
KEY | COLUMNFAMINLY |
---|---|
Analysis theme + time of formation | Member property + analysis model |
Analysis theme + member property + time of formation | Attribute value |
distributed system, data existed in Hadoop cloud computation platform will be distributed on each server. Thus, it is mandatory for us to do the data integration via some efficient measures, by which decentralized data can be integrated onto Hadoop cloud computation platform. Then a model can be established for all the data integrated on Hadoop cloud computation platform [
After the experiment is done, it is mandatory to analyze the experiment. The reason is that when storing data analysis results and establishing data visualized model, HBase database was used by Hadoop cloud computation platform [
By analyzing the experiment and the table, conclusions can be obtained. For the 1st inquiry, connection between the client terminal and cluster shall be set up, which makes the inquiry time long. Actually, the time required by inquiry is greatly affected by the network and cluster status etc. After the connection is set up, the inquiry time becomes more stable, millisecond is needed only. So according to the experiment, we can know that Hbase can satisfy the requirements.
According to the analysis on Hadoop technologies, data visualization technology and experiments, we can know that the visualized analysis model for Hadoop business data is feasible and it will play a positive role in the actual application [
According to the analysis mentioned above, a detailed study on the visualized analysis model for Hadoop business data was done. Of course, we can also know that the visualized analysis model for Hadoop business data is used to analyze the special features of business data and study the data visualization technology.
HBase inquiry analysis | ||||||
---|---|---|---|---|---|---|
Data volume (byte size) | 1st time | 2nd time | 3rd time | 4th time | 5th time | Average |
5000 (170 kilobyte) | 2.2800 s | 0.1030 s | 0.1560 s | 0.0690 s | 0.0290 s | 0.109 s |
50,000 (1.8 mega) | 0.0990 s | 0.1320 s | 0.1700 s | 0.2840 s | 0.0460 s | 0.134 s |
500,000 (18.5 s) | 0.7380 s | 0.1120 s | 0.4420 s | 0.1030 s | 0.1220 s | 0.225 s |
5,000,000 (194.6 s) | 0.7190 s | 0.2270 s | 0.0580 s | 0.2620 s | 0.1620 s | 0.217 s |
50,000,000 (1.8 G) | 0.9740 s | 0.0370 s | 0.1120 s | 0.0270 s | 0.0210 s | 0.059 s |
100,000,000 (3.8 G) | 11.720 s | 0.3220 s | 0.2700 s | 0.1930 s | 0.1830 s | 0.262 s |
After the analysis and study, business data will be transferred to Hadoop platform from the relational database. Then the data on Hadoop platform will be processed and studied by the way of statistical analysis. If we want to get the whole process smooth, all researchers shall attach great emphasis to data analysis, establishment and storage of analysis model. The feasibility and application value of Hadoop’s commercial data visualization analysis model can only be realized and verified when all aspects are fully considered.
The authors declare no conflicts of interest regarding the publication of this paper.
Wang, Z.X. (2018) Visualized Analysis Model for Hadoop Business Data. Journal of Computer and Communications, 6, 14-21. https://doi.org/10.4236/jcc.2018.67002