Step 1: Check for evidence of differences. As a first step, an Analysis of Variance test (ANOVA) at the 95% level of confidence is performed on the dataset. This tells us that differences among the samples in the dataset do indeed exist, and are worth studying further. To clarify, a “difference” is defined as a statistically significant difference in the dataset where differences affect the response variable (Y). If there were no differences in our data (according to the ANOVA test), then it is of little use to further analyze the data.
Step 2: Identify specific differences. Our next step is to use a test of multiple comparisons to uncover the specifics of these differences within the dataset. A number of methods of multiple comparisons are available, varying in power and error of test. Towards the more conservative (less error) end of the spectrum, we use a method called Tukey’s Honestly Significant Difference (Tukey’s HSD) . This procedure is simpler than some other methods in that it uses a single critical difference. Since the dataset has varying group sizes, we instead use a slight variant of the method called the Tukey-Kramer Method . Groups are, for example, pairs of category and aspect such as FlyFF:Mastery. Since we have four categories and six aspects, and noting the omission of the Frattesi:Mastery group, there are 23 groups. The method of Tukey-Kramer is summarized with the formula below. For two groups, their difference is significant only if it is larger or equal to the critical difference given at the right side of Equation (1). We apply this procedure at the 95% level of confidence.
A note of multiple comparison methods: if we consider an ordered list of means such that A > B > C > D, then we do not need to make every possible comparison. The difference |A – C| is smaller than |A – D|, so that if |A – D| is insignificant, then |A – C| will also be insignificant.
Step 3: Visualize differences. We provide visualizations of the Tukey-Kramer results by constructing simple vertex & edge graphs that we call JDK Diagrams (“Just Don’t Kare”), where each vertex represents one of the Aspects of Replayability. An edge lies between two vertices if and only if the difference between them (as given by the Tukey-Kramer result) is insignificant; i.e. we just do not care about their difference. Each category of games yields a different JDK Diagram. We show each of these in Figure 3, as undirected graphs, where each vertex is labeled via our Schemico acronym.
Step 4: Quantify differences. JDK Diagrams give us a way to visualize the results of the Tukey-Kramer method, but they also provide a method of reasoning the existence of ecological effects between categories of gaming. Based on how similar two JDK Diagrams are, we can infer whether ecological effects exist.
To compare graphs and discover how similar they are, we utilize a non-standard, simple metric of our own design as follows. Two graphs A and B are called similar if the expression is true, where is defined as the number of common edges of A and B, and where is defined as the number of distinct edges of A and B, and is some threshold value between 0 and 1. The value
is called the similarity level of the two graphs. We provide the results of this metric applied to each JDK Diagram in Table 1.
Step 5: Analyze differences. Finally, given a simple table of numbers to look at, we can analyze the gaming data very easily. The most similar pairs of JDK Diagrams
Figure 3. JDK Diagrams for each category. Highlighted in dark red are edges in common between the All-BG and SOC categories.
Table 1. Similarity levels between JDK Diagrams. Percent forms are shown in the top diagonal, and fractions in the lower.
are All-BG versus Frattesi and All-BG versus SOC. If we set the threshold level at 40%, then for these two categories there exist ecological effects. The existence of ecological effects here depends entirely how high or low the threshold value is. Since 40% is a low value, it may be difficult to accept that there are ecological effects, but, compared to the other numbers from Table 1, 40% is fairly high. We will need to further investigate the categories of games to learn why the number is so low, but we can try to reason about why the similarity level is low. In Figure 4, we propose what these differences are and how they affect replay in board games, providing a discussion as follows.
Board games can differ along two dimensions: number of players and intensity of play. Of intensity in playing board games, there are very competitive games which must be based on strategy, and the less competitive games based more on luck. When there are many players (>2), the aspect of social is more important. When the game is more competitive, the aspect of mastery is more important.
6. Proposing a Design Methodology
In the previous section, we attempted to show that ecological effect exists between some categories such as between SOC and All-BG. We believe there may be other ecological effects for other neighboring categories of games, such as FlyFF versus All-MMORPG. Here we propose a game design methodology (see Figure 5) to improve the success of a game’s release, which greatly depends on the existence of ecological effects.
1) List the Game’s Features. Beginning with a very basic idea of the game, plan ahead by outlining the key features contained within. We define a feature as anything that the player can do or experience as sensation within the game.
2) Estimate Scores for the Game’s Features. Estimate each feature by guessing how important they are for each of the six Aspects of Replayability, assigning values between 0 and 1, where 0 is “strongly disagree that the feature is important” and 1 is “strongly agree”.
3) Average the Feature Scores. Compute the mathematical average of scores for each aspect and feature. It
Figure 4. Expanding classes of board games into four categories based on effects to Aspects of Replayability.
Figure 5. Game design methodology.
may be necessary to weight some features based on how often they are experienced during the game.
4) Determine the Game’s Category. Estimate which category of gaming this game falls in, e.g. Competitive Board Game.
5) Lookup Scores for that Category. From learned results and previously analyzed datasets, lookup the scores for that category and discover player opinions regarding Replayability.
6) Adjust the Game. Compare the estimated averaged scores with the scores looked up from step five. This is where fine-tuning of the game takes place, and it can be done by tweaking features of the game such that they can be newly estimated and averaged to match the looked up scores of step five.
A game can be tuned so that there are more features that attribute to certain Aspects of Replayability. Below we discuss this in more detail. This section also serves to further detail each of the Aspects of Replayability.
1) Designing for Social: Games should cater to player needs of socializing with other gamers. Interaction between players must be emphasized. Features in the game should promote social play, such as teaming together and peer versus peer (PVP) combat.
2) Designing for Completion: Games should lure the player into wanting to complete either the game or parts of the game. Typically, compelling stories or attractive game play are what drives a player. Games which have many features waiting at the end often keep a player in the game until they have experienced them all.
3) Designing for Experience: It is not easy to provide guidelines for designing experience; there are no such tips for writing a good book or movie—games built for experience are no different. This is why we say this aspect is perhaps the most nebulous of all. To design for experience, the game must be unique and memorable. Superb stories, rich virtual-cultures and vast worlds are often the most memorable qualities of games.
4) Designing for Challenge: The concept of challenge is in providing features which lead gamers into feeling that only the elite can accomplish them. Often, we see this level of challenge made available in post-content material. As a premise of playability, a game must neither be too difficult nor too easy. Designing for Challenge is a task of fitting the game into the appropriate Flow Zone [17,18], where appropriate flow is much higher for very challenging aspects in the post-content.
5) Designing for Mastery: Games must provide features such as leveling up, upgrading equipment, and “training” to become as mastered as possible in the game. Competitive games boast a drive towards mastery. Of successful competitive games, the heights of mastery must be incredibly difficult to attain.
6) Designing for Impact: Randomness is often a key element within games. Although, when the game is too random, the player may feel out of control with the events in a game. When designing for impact, provide games which make the user feel completely in control of their fate in that different actions almost always change the outcome of the game. Multiple endings and nonlinearity are two examples of good impact-features.
In this paper we have provided a viewpoint on studying games from a software engineering perspective and have contributed the following.
1) A method of studying games.
2) Gaming datasets.
3) A method of analyzing our gaming datasets.
4) A methodology of using learned knowledge via our methods to develop games.
We have proposed a generalized method of studying games by collecting data, classifying the data and analyzing it. We have also proposed a way to analyze the data (JDK Diagrams report a Tukey-Kramer analysis where linked nodes show ranges that are statistically indistinguishable). Using JDK Diagrams, we have analyzed data from board games and MMORPGs, and from a prior study by Frattesi et al. In essence, we are trying to show a method of applying techniques of software engineering to the game development process.
From this paper, we urge a little research caution. In our view, researchers need to exercise more care before they prematurely generalize their results. In analyzing our data, we have shown that there are no ecological effects between instances of games (SOC/FlyFF) and All-G. Although it would be a very good result if such an ecological effect existed (we could apply what works for All-G, and it should work for any game), we instead turn to showing the existence of ecological effects between categories and instances of games. We have shown that there is a higher level of similarity for some categories, such as between SOC and All-BG. Since we believe SOC is a member of All-BG, this result is useful if we accept that ecological effects exist here. But since the level of similarity is still quite low, we may need to further deepen the categorization of such games.
We believe that it is possible to use our results to guide the development of games. Such a methodology would help designers to fine-tune their design plans and create a more enjoyable, more successful game, by changing or adding features to the game that match should better match the player expectations of that game in terms of the Aspects of Replayability.
We are interested in what other categories of games exist and in further categorizing others such that data for different games in the same category are as similar as possible. Genres of games such as Shooter and Action may not be the best classification of categories, as we wish to find ecological effects for games and their categories. In the future, we propose work on the following areas:
1) Discover or refine categories of games.
2) Collect more data.
3) Learn how to collect data faster and better.
4) Further refine how we analyze gaming data.
5) Further refine our design methodology.
1Short for “Just Don’t Kare”, named after the first author.