complexity measure, i.e. the size (number of operation and number of state variables), modularity and interconnectedness and applied those in logic control. Frey and Litz  intro- duced complexity metrics for Petrinet using besides others an adapted McCabe metric. Venkatesh et al.  proposed to count the number of elements required to represent a certain program in order to measure its com- plexity as well as Lee and Hsu  , who converted the programs in question into Boolean expressions by using if-then transformations and, afterwards, rated the programs’ complexity by comparing the calculated values.
For applying an object-oriented notion for describing the automation task, Chidamber and Kemerer developed a set of metrics of OO design  , e.g. weighted methods per class (WMC) which is a measure for class com- plexity used in this paper (see column control complexity in Table 1): in order to calculate the WMC of a pro- gram, the cyclomatic complexity measure of each method is summed up for all classes, cf.  . When applying classical IEC 61131-3 code to describe an automation task, the number of FBDs and their instances in case of IEC 61131-3 are a familiar measure  .
Besides those metrics which rely on the description of the automation task for deriving its complexity, also its characteristics like the type of control loop, i.e. logic control, closed loop control (with synchronization) or a technical process requiring both, called hybrid in the following can be taken into account. Furthermore require- ments on automation systems, which have to be fulfilled are introduced: real time requirements, communication requirements between different controllers, and networked automation systems (NAS) as a class of systems with real time and communications requirements because the code and functionality is distributed onto different au- tomation devices.
Besides the regular machine control function, diagnosis, exception handling, visualization and other functio- nality need to be developed. “In fact, industry folklore suggests that approximately 90% of the overall control logic is used for exception handling”  . During the last years, the authors’ team analyzed real PLC code from several MS companies and realized that 6% - 10% of the lines of code are dealing with diagnosis and safety  . As a consequence, the mode of operation (EN 13128  : auto, hand, manual etc.) as well as diagnosis includ- ing error handling (according to  and  ) are typical automation tasks which provide another possible clas- sification.
2) Engineering Task
The category engineering task describes the task to be solved by the human in the experiment, e.g. model (UML, SysML) or program (IEC Code) to control the automation task. Gemino und Wand  distinguish two types of human tasks in automation and control:
· design and creation versus
· understanding and analysis (being typical for maintenance tasks in MS).
For both types of tasks, it must be between modeling and programming (dashed lines in Figure 1). Moreover, maturity and functionality of tool support have to be considered as influencing factors. As tool classification three maturity levels and four different functionality types are proposed. Because the comparison of different notations in MDE is focused in this paper the modeling notation as such and its measures should be discussed in more detail.
The notation needs to fulfill the requirements given by the automation task, the life cycle model and the engi- neering task allowing modeling structure and behavior. Different modeling notations also possess different complexities. Calculation schemes are proposed by e.g. Recker et al. 2009  and Rossi and Brinkkemper  . The expressiveness of this measure is limited to the complexity of the pure notation calculated on the number of its elements.
With O being the number of object types, R describing the relationship types and P the property types of a method, C(M) is the resulting complexity of a heterogeneous modeling language.
Schalles  compared UML activity diagrams for behavioral modeling and UML class diagrams for struc- tural modeling, taking into account the high complexity differences (Table 2).
To allow the comparison of notations’ complexity, the complexity of the later discussed notations is calculated, too. The complexity of the UML class diagram (Table 2) is nearly double compared to IEC 61131-3, SysML- AT (see also  ) and CFC showing that between the typical MS notations the difference is very small.
Subjects’ qualification and experience is essential for the outcome of the experiment. Sierla et al.  and Ha- jarnarvis et al.  included industrial experts. Sierla introduced teamwork with clearly separated tasks similar to a real project team in industry.
Many papers on programmers’ competencies in modeling and informatics systems application are related to
Table 1. Overview of the experiments.
Legend: N—notation; Prot—prototype; p&p—pick and place; L—Lecture; Q—only qualitative results; M—method; HB—handbook; (p&p)—part of p&p; E— exercise; er—error rate; ß—Version; A—analogue; I/O—input/output; Rep—repetition; CG—code-gen IEC 61131-3; P—pattern; D—digital; ~—only pencil & paper; C—characteristics; ME/PE—main/pre experiment; HLE—hybrid learning environment.
Table 2. Complexity values for heterogeneous modeling languages.
1The values listed in the table are gained by counting the following elements: O (function, function block, variable, (read, write), constant) = 5; R (direct connection 1 to 1, direct connection 1 to n, set, reset, rising edge, falling edge, negated connection, branch instrution, return) = 9; P (FB name, FB type, FC type, input name, output name, variable name, variable type, network number) = 8; 2O (Block, ConstraintBlock, attribute (In, Out, Local), PortIn, PortOut, Property) = 8; R (BindingConnection, FunctionalDependency; direct connection 1 to 1) = 2; P (BlockName, BlockEntityName, ConstraintName, AttributeName, AttributDataType, PortpertyName, PropertyDataType, PortName, PortDataType) = 9; 3O (Function, Function Block, Input, Output, Comment, Composition, Selector) = 7; R (direct connection 1 to 1, direct connection 1 to n, Set, Reset, negated connection, branch instruction, connection marker based, return) = 8; P (FB name, FB type, FC type, input name, output name, jump mark name, execution sequence number) = 7; 4O (Input Ports, Output Ports, Protocol Ports, Knots, Block, Functions, FBs, requirements) = 8; R (1 to 1 concrete, 1 to n concrete, n to 1 concrete, 1 to 1 discrete, 1 to n discrete, n to 1 discrete, enclosing) = 7; P (FB name, Function name, Requirement type, Requirement name, Knot name, Knot type, Blockname) = 7.
competence models, e.g.  - . Usually, competencies in this context are understood as abilities, skills, and knowledge—a perspective which is still prominent in most Anglo-American research on competencies. An ex- ample is provided by Curtis  , who proposed that programming results depend on individual personal factors and mental abilities. His model covers intellectual aptitudes, the knowledge base, cognitive styles, the motiva- tional structure, personality characteristics, and behavioral characteristics. Although Curtis did not empirically test his model, the factors show at least facial validity.
Other approaches for gaining insights into competencies required for different programming approaches or skills analyze interviews from experts in the questioned domain  or evaluate programmers’ behavior when performing certain tasks, e.g. programming tasks  or debugging tasks  .
In prior experiments regarding process operators we tried to measure workload using a secondary task (e.g. communication and documentation) as, among others, proposed by Wickens  , but could not detect signifi- cant effects  .
Because for statistical significance a minimum of 15 subjects per different notation or approach, are necessary  , it is obviously impossible to conduct experiments with such a high number of experienced application en- gineers under same conditions.
In ergonomics it is usual to conduct usability experiments in engineering design with students of mechatronics or mechanical engineering because they are future application engineers. Maintenance tasks are performed in Germany mostly by skilled workers, therefore apprentices and technicians are appropriate subjects.
Kim  suggests that the complexity of the design task in the training period and the test period should be in- creased stepwise.
As Kim and Lerch  , Ruocco  and others (i.e.  - ) point out, repetition is essential for learning object orientation. Ruocco decided for a stepped approach when teaching UML throughout a computer science program. He found that the application of UML during a database course and the incorporation of use case dia- grams, sequence diagrams and activity diagrams led to a richer and deeper exposure to UML.
In longitudinal studies, too, teaching beginners or freshmen in computer science or object orientation mostly goes along with repeated training   - . For training purposes, the pedagogic methods of repetition and fade-out, i.e. decreasing support by trainer from training step to training step, seem to be suitable (see for more details 63). In case of IEC 611131-3 or UML prior knowledge (see expertise in section subjects) acts as a dis- turbing factor if not equally distributed over subjects. Therefore training is necessary to adapt prior knowledge before conducting the experimental task. Time between training and experiment should be nearly the same for all subjects to avoid time depending differences in results.
3.2. Overview of Affected Variables (Usability Requirements)
In order to perform usability experiments, it is necessary to clarify how usability can be measured (affected va- riables) and which metrics can be used to make quantifiable statements about the advantageousness of the object of research (given by affecting variables).
The standard ISO 9241-11  includes studies regarding use efficiency and the satisfaction of the users suggesting the measurement of product’s usability in its context of use. According to this standard concerning usability requirements, the main affected variables, are (1) effectiveness, (2) efficiency and (3) user acceptance (cp. Figure 2).
Effectiveness, i.e. the quality of the result depends on the completeness and correctness of an engineered so- lution. Efficiency is the effectiveness in relation to the effort to engineer a solution. Both measures analyze the models developed by the subjects during the experiment compared with a so called master model  .
User satisfaction is the scale to which users are free of interference and their attitude to use a product  . Furthermore, the standard ISO 9241-110 contains dialogue principles for human-computer interaction as attri- butes to usability requirements. Those principles are suitability for task, for learning, for individualization, con- formity with user expectations, self-descriptiveness as well as controllability and error tolerance.
3.2.1. Effectiveness-Quality of the Resulting Model, Program
In (ISO 9241-11:1998)  it is proposed to determine the effectiveness by linking the grade of completeness
Figure 2. Usability requirements, i.e. variables in experiments.
with the grade of correctness. Bevan (1995)  defined effectiveness in a different way as a product of quantity and quality:
(With N being the number of nodes, E the number of edges and R the number of errors with index task being the model of the subject and goal being the model of the experts taken as the correct solution in the master model).
Schalles compared only UML structure and behavior diagrams for business process modeling on an abstract level  . Applying his approach on MS would fall short. Using different notations modeled solutions are dif- ferent two regarding number of nodes and edges of the correct master model (see Appendix B for an example). As Strömmann  already realized often different correct solutions are created by different subjects, which should be evaluated equally good. Moreover, equality of nodes in a class diagram (abstract representation) and nodes and edges in an activity diagram (low level, object related) are rated equally by Schalles. In MS, the cor- rectness of structural model elements (e.g. classes or function blocks) should be measured differently than the correctness of low level behavioral model elements (e.g. correct steps or transitions from one state to another one  ) due to the different degree of difficulty and ease of change in case of an error. Because IEC 61131-3 FBD is a language without nodes and edges Schalles approach is not feasible.
In accordance with Annett’s proposal of an Hierarchical Task Analysis (HTA)  the proposed evaluation scheme counts referring to a top down approach all detailed elements modeled by the subject, allowing similar scores e.g. for a combination of class diagrams and created objects in comparison to structure elements in FBD.
In order to assess the effectiveness of notations the grade of task completion was used instead of measuring the grade of completeness and the grade of correctness separately before multiplying them. A task is completed, if its solution is logically and syntactically correct. As only correct task solutions, i.e. model elements are counted, it is not necessary to take additional errors into account. This results in the following term:
(With N being the number of tasks).
This approach has three main advantages:
First, it can be applied equally for all kinds of notations as long as the given task can be completed with it and there is no restriction as in  , that only models with fewer nodes and edges than the master model can be evaluated.
Second, the task analysis can be used to select relevant tasks for evaluation and reduce the number of tasks to review, which then can be checked for logical and syntactical correctness, resulting in highly accurate data on task completion.
Third, no negative points for errors have to be used to calculate correctness, as this could manipulate results in an undesired way, e.g. errors lead to negative overall efficiency or errors in one part of the model nullify correct solution of others.
A fully automated analysis of the student’s model compared to the master model is nearly impossible. The difficulty in rating the results of an experiment is comparable to a fair grading of exams by distributing points for correct solutions, but more sophisticated. As a consequence, the development of the correction guidelines for the manual evaluation is required. Points are given by two evaluators independently with a necessary interrater reliability of at least 65%.
Regarding industrial application the time needed to engineer an automation task correctly is one of the most important measures, defined as efficiency in usability evaluation. The efficiency of a notation can be calculated through a combination of effectiveness and time required for execution of a task (ISO 9241-11:1998).
Accoring to Schalles  , efficiency is defined as:
(With effectiveness F and time T).
If time is a freely selectable variable, this calculation basically provides a good comparison between notations in terms of effect per time. For the experimental design time may be fixed and restricted to a calculated amount of time with GOMS  or pre-experiments similar to an exam or left open for subject’s decision, delivering the modeling results when they feel ready. Fixed timing implies that the effectiveness measure already includes a statement on efficiency.
Nevertheless time should be recorded to allow analysis of modeling performance over time. Automatic sto- rage of the results in short time intervals allow, e.g. all 5 min or 7 min. the analysis of effectiveness per time in a more specific way.
3.2.3. User Acceptance-Subjective Aspects
Another possibly affected variable is the subjects’ acceptance of the used notation (or tool) and the automation task when executing the task. Here, usability questionnaires based on the standard DIN EN ISO 9241 are best practice.
Moreover, aspects as the subjects’ mental workload, control belief or motivation can be elicited (see Section 5, Section 4.7 for details).
For later analysis the affecting and affected variables and their relations will be evaluated to provide results for the comparison of the notation.
4. Selected Usability Studies
In the following Section usability studies (4.1) and usability experiments (4.2 - 4.8) with focus on different automation tasks are introduced and classified according to Section 3 (Table 1). The engineering task is classified as structural and/or behavior modeling task. The automation task’s complexity is given by number and type of I/O, as well as weighted methods of class and number of variables in case of a classical PLC programming ap- proach using IEC 61131-3 FBD. The five related experiments are stronger related to case studies and to indus- trial application.
The seven experiments by the author’s team (4.2 - 4.8) highlight different complex automation tasks and dif- ferent automation systems characteristics as given in Table 1.
4.1. Related Experiments
4.1.1. Experiment O.1 and O.2: Measuring Size and Complexity, Estimation of Development Time
Lucas and Tilbury and Lucas   demonstrate how task analysis could be usefully applied for the prelimi- nary assessment of the effectiveness and perhaps even the efficiency of logic control design methodologies. Lu- cas  calculated the time to create a simple logic design program on the basis of low level user operations, e.g. keystrokes, mouse clicks and mental operations, for IEC 61131-3 Ladder Logic Diagrams (LL 405 min), Petri Nets (PN 1100 min) and modular Finite State machine logic (mFSM 1500 min) showing the significant differ- ence given by the notation itself. To derive the necessary steps and the used strategies, i.e. copy & paste, manual copy, they observed engineers during the design process and surveyed the time needed. Moreover, Lucas and Tilbury   provide a way of comparing the complexity of control logic models respectively code of a simple lab scale MS created with the above mentioned notations plus SIPN by analyzing existing programs. They introduce quantitative measurements of complexity of a piece of code: size (i.e. number of operations and state variables), modularity (number of modules) and connectedness. Additionally, they introduce four typical scenarios for accessibility of data from a programmer’s point of view, i.e. 1) single output debugging (specific questions regarding specific unexpected behavior in the machine), 2) system manipulation (how the user can manipulate the machine to achieve a desired state), 3) desired system behavior (desired behavior of the machine when examining only the schematics and the logic) and 4) unexpected system behavior (system’s response to unexpected events). Because all these questions refer to already existing code they can be categorized to main- tenance tasks. The four notations evaluated are compared regarding the four scenarios showing that Ladder Logic is still most appropriate for the first two but hard for Scenario 3 and 4, whereas Petri net, SIPN and mFSM are rated moderate or easy in Scenario 3, but minor in Scenario 1. LL is the small but very interconnected and mFSM the most modular, although largest program.
4.1.2. Experiment O.3 and O.4: Reusability Strategies
Strömman et al.  compared IEC 61499 with IEC 61131-3 in logic control design to foster reuse. Profession- als and researchers act as subjects programming a lifter application during a workshop. The resulting solutions differ totally showing different type of approaches, e.g. reuse of existing ST Code copied into an IEC 61499 frame, reuse of design patter, i.e. a state diagram, a mechatronic approach and classical IEC 61131 function block approach, concluding that guidelines to use IEC 61499 are required as well as an environment that fosters collaboration and exchange of information. The results were gained by model comparison and written feedback. Beforehand interviews were conducted to reveal the relevance of the study. Design approaches are context-de- pendent, i.e. the background of the designers, the existence of legacy software as well as business goals etc.
Based on this experience in experiment E0.4 Sierla et al.  organized one courses on IEC 61499 in 2005 to enable twenty practitioners and researchers to propose and negotiate about design alternatives in a team context with recorded interviews. In a second course in 2006 for professionals (3 subjects), researchers (3 subjects) and a standardization worker worked in a team representing the different social groups in a project evaluating the impact of team organization, knowledge integration, and software development method by an interview after the course. The benefit of a modular structure was realized as well as the risk of combining continuous control loops combined with sequential batch control logic. The necessity of shared guidelines, design patterns and tool sup- port was highlighted in more detail especially for batch control systems section.
4.1.3. Experiment O.5: Change of Sequence
Hajarnarvis et al.  compared 63 subjects applying for different methodologies changing the sequence of a given simple program, i.e. contact logic, step logic, SFC and EC. The participants had to change the sequence of a simple task with three motors and one valve. The authors identified different main problems, e.g. insufficient modifications for all but EC and incorrect algorithm for SFC and EC. The results are separated according to the participants’ background, i.e. maintenance, planner, programmers and Rockwell personnel compared to the un- trained.
4.2. Experiment E1-Pure UML 1.4 and PLC Programming-Exploratory Study
The series of experiment E1 explored the influence of group work compared to individuals, the influence of prior experience in PLC programming and modeling, different qualified subjects, i.e. bachelor students of electrical and information engineering with students integrated into companies (StiP) and technicians modeling and programming a pick & place unit  - (see Figure 3).
As affected variables the number of steps realized and their correctness was evaluated compared to a master model. The notations compared are UML, ICL and a control group only using S7 PLC programming languages IL, LL and FBD.
The results regarding quality of the model, i.e. error rates with 43.84% are disappointing. The high impact of qualification level on number of realized steps and errors is significant (see Table 1, E1 results). The influence of prior knowledge which is in this experiment only based on subjective rating in a questionnaire is evident, too. In this experiment prior knowledge leads to halve the errors. Subjects rate the applicability of both UML and ICL for modeling structural aspects as very poor and for behavior as fair. Comparing groups (2 subjects) with individuals, groups reach a higher number of modeled steps in (23.44 compared to individuals 15.04; p = 0.01), but unfortunately the error rate is not significantly reduced. The experimental results, e.g. the identified errors in the developed models (see 9, Figure B1) are used as input for the further development of UML for MS (E2, E3, E4, E5). The pure models and high error rate reveal an insufficient training and experimental design, but also the weakness of pure UML 1.4 as modeling notation. Subjects claimed a reduced number of diagrams with a clear procedure for UML modeling, a tool to support modeling with integrated code generation, because paper and pencil is not accepted.
4.3. Experiment E2-Deployment Using Pattern and UML-PA
Based on the results of E1 a domain specific language UML-PA was developed with a reduced number of dia- grams and domain specific stereotypes  . The research question was to prove the benefit of such a domain specific language under architectural aspects, i.e. regarding deployment of control loops and the related sensors and actuators connected via a field bus. The subjects should identify correct pattern and connect them to model the system from sensors to actuators including its deployment and communication relations. For this reason UML-PA provides ports to model communication interfaces in so called instance structure diagram.
The modeling approach using UML-PA and its instance structure diagram is compared with UML 2.0 dia- grams, i.e. class diagram, component diagram, composite structure diagram and deployment diagram. As auto- mation task a simplified real continuous hydraulic press was chosen with 30 control loops to be switched be- tween distance control and pressure control in case of overpressure. Each valve is equipped with a distance sen- sor to measure the valve opening and each control loop with a pressure transmitter. As additional input the press operator sets the set values of the pressure in the cylinder connected to the valve. The controllers output is the set value of the valve position and to the HMI the valve opening.
Figure 3. Pick & place unit (E1, E3, E4, E5).
UML participants checked their results after 1.78 changes and took the results as guidance to find an appro- priate solution, UML-PA subjects checked their solution after 3.9 changes  (see also  for further infor- mation). The subjects properly analyzed the task and selected the given pattern establishing the required communication more efficient with UML-PA compared to UML 2.0 (see Table 1, E2), which is easy to understand due to the additional effort, i.e. diagram changes, needed in UML 2.0. This idea is included in the SysML-AT approach discussed in E7. The identified breaks and time needed to understand the relation between different diagrams needs to be optimized regarding improvement of MDE (see E5).
Subjects criticized the restricted tool. The restricted tool support encouraged students to follow a trial and er- ror strategy which is unacceptable in a real industrial application.
4.4. Experiment E3-Error Handling Using plcUML SC vs. IEC 61131-3-SFC
Fulfilling the requirements from E1, a reduced number of diagrams with tool support and code generation, Witsch and Vogel-Heuser developed a prototypical plcUML editor implementing UML class diagram and state chart in a real IEC 61131-3 run time development with integrated code generation in CoDeSys 3.x   (see also  ). The plcUML diagrams are integrated similar to SFC as additional language transformed internally into a ST language derivate. Yang et al.  applied orthogonal regions in UML state charts to model primary sys- tem functions and corresponding traversal features and concurrent behavior. Witsch et al.  introduce compo- site states as groups of states allowing to model error behavior for those grouped states. Evaluation with experts showed the strength of the composite states for error handling as well as mode of operation, the focus of expe- riment E3.
The experiment validates that using state charts is more efficient than using classical SFC in IEC 61131 to proof cyclically sensor states regarding inconsistency as well as timing errors in a single moving cylinder, i.e. a cylinder component of the pick &place unit (cylinder in Figure 3). The mean steps programmed per minute us- ing state charts with composite states was 1.98 points/min compared to classical SFC in IEC 61131-3 with 1.41 points/min given the same points for both solutions to be reached  . The modeling speed of the SC group was significantly higher than the SFC group even if the SC subjects didn’t use composite states.
The benefit of composite states is evident for error handling (see Figure 4, left), i.e. in SC the error handling for all states can be handled by one exception transition out of a composite state instead of multiple transitions, i.e. after each activity, error handling activities follow. If an error in the exception handling algorithm is identi- fied or an additional condition needs to be included modifications to the process can be covered in one path in SC, compared to multiple paths in SFC (cf. Figure 4, right).
Subjects using composites states estimate their programming experience higher than those who didn’t use composites states. Many subjects criticized the absence of an automatic placement of elements in the tool, a site effect in the plcUML condition. In this experiment only exception handling was evaluated with a prototypical
Figure 4. Comparison of subjects’ best solution in SC group (left) and SFC group (right).
tool. A more general design is discussed in Experiment E5 using plcUML with a more mature tool version.
4.5. Experiment E4-Sequence of Structure and Behavior Modeling in Workflow Using UML with Elaborated Training Concepts
The research question to be answered is, whether subjects can be successfully forced to model structure, when asking them to model structure before behavior or whether behavior first is a good strategy for engineers to achieve proper model quality. Therefore a training concept as well as a subsequent experiment has been devel- oped together with researchers from instruction theory  .
In a pre-experiment for E4 the main focus was to reveal whether the order of modeling is important for the quality of the model. The assumption was that students start with behavior modeling because it is easier for them, and then run short in time before finishing the structural model.
The pre-experiment was conducted without tool support only with paper and pencil after a training realized by a lecture and exercise in a very large classes (bachelor students 2nd semester mechanical engineering). The sub- jects were split in two groups: one group was told to start with structure modeling, the other with behavior mod- eling.
It showed that 35% of the subjects had problems to create suitable classes from similar objects of a plant in- cluding their attributes, and methods.
Examples of typical errors in the class diagram were (error rate in %):
· Objects were listed in addition to the classes, which inherit from the classes (23%).
· Classes were used in which objects of the class occur as attributes (7%).
· Single objects were modeled without classes (5%)  .
Overall, no significant differences concerning the modeling order could be found, but significant differences with respect to the trainer, as the two groups were trained by different teachers.
In order to eliminate that confounding effect in the main experiment one trainer trained for both groups. In that study, which has not yet been published, a larger sample (102 subjects) has been tested using the same pro- cedure and task as described for the pre-experiment above.
Here, the average participant reached 19.97 out of 46 points, i.e. lacked 26.03 points (SD = 9.1819). Regard- ing the performance measures, the “behavior first” group scored remarkably higher than the “structure first” group: While the participants in the “behavior first” group achieved 23.4 points on average (SD = 10.326; SE = 0.982), the mean value of the “structure first” group was only 18.4 out of 46 points (SD = 8.220; SE = 1.825).
In contrast, the structural modeling performance of the two groups was comparable (T = −0.972, df = 100, p = 0.33): participants in the “structure first” group achieved 12.26 points on average (SD = 5.029; SE = 0.601), while in the “behavior first” group they reached 11.14 points (SD = 6.067; SE = 1.072).
In the “structure first” group, the subjects reached only a mean of 6.14 out of 24 points in behavior modeling (SD = 5.083; SE = 0.607); in the “behavior first” group, however, the average behavior modeling performance was 12.25 points (SD = 6.112; SE = 1.080). This difference is highly significant (T = −5.278, df = 100, p = 0.00).
As a result for the next experiments we learned that the class room training was not suitable enough and that forcing students to follow a specific modeling order is not helpful to improve structural models.
4.6. Experiment E5-plcUMLvs IEC 61131-3 FBD with Apprentices Optimizing Training, Design and Analysis of Results-Exploratory Study
In this experiment the superiority of UML compared to FBD in a design task with a sophisticated training and with repetitive application of the notation, the ß-version of an UML tool (called plcUML), for a complex open loop control task and apprentices as subjects should be demonstrated. To allow further analysis between model- ing results and subjects’ abilities and the development of an individual training fitting to individual abilities in a next step, selected abilities are collected as well as user acceptance.
As control task a sub-part of the pick & place unit with multiple reuse (only open loop control, weak real time requirements without communication requirements) should be modeled, i.e. three storage elements with one storage cylinder pushing the work pieces out of the storage and five different terminals with a terminal cylinder each, pushing the work pieces into the terminal. Because in industry very often skilled workers are conducting maintenance tasks and even easy design modifications 1st and 2nd year apprentices from a vocational school in Munich (89 subjects) act as subjects. Selected results of this experiment are reported already in  .
A hybrid learning environment (HLE), allowing to switching between computer-based and conventional in- structional designs] was developed and implemented. During training the groups repeatedly exercised program- ming and modeling tasks with increasing complexity (named fade out).
Several affecting variables related to abilities were obtained, i.e. grades in mathematics, German, automation, and mechatronics as well as cognitive capabilities, motivation levels, challenge, and workload (single instru- ments are described in  ). As performance variable the programming/modeling achievement was evaluated. To obtain this value, the developed models/programs were stored (every 5 min) and analyzed manually by two evaluators, who compared them to a master model. The subjects performance was measured as number of cor- rectly modeled or programmed elements and compared with respect to structure, e.g. classes or FBDs on the one hand and behavior, i.e. state charts and FBDs on the other (for details see Appendix A). Unfortunately, the re- sults were disappointing, because an overall significant benefit of plcUML compared to FBD could not be de- tected, but nevertheless interesting results could be found, e.g.
· OO modeling and FBD programming show different relations to variables like cognitive abilities, experience, workload, and knowledge the students’ performance in the plcUML/CD + SC groups seems to be less re- lated to previous knowledge and cognitive abilities than students’ performance in the 61131/FBD groups  .
· Subjects needed different times for structural modeling using UML/CD vs. FBD (see master model Figure 5). Subjects needed in average 6.22 minutes more time for UML class creation (in comparison to the time needed to create the FB structure. This difference is slightly not significant (ANOVA, F(1, 81) = 3.60, p = 0.06), cf.  .
On the basis of the unexpected results further analysis of models, modeling process and the relation between model and results as well as subjective results have been conducted. Analyzing the main errors especially the errors in structural model, i.e. classes built:
· 42 subjects out of 44 built classes (including superfluous ones) as part of the structural model;
· 23 out of 42 used these classes in their behavioral model;
· 31 out of 42 modeled a second cylinder class separating storage cylinder and terminal cylinder, but 14 out of those 31 subjects built the second class identically besides the name, this indicates that they understood the class concept but use another type of abstraction, which is more related to the mechanical structure, i.e. a terminal and a storage cylinder are different instead of the software view in which both cylinders are identical.
Analyzing tool and training effects gathered from the subjective rating from questionnaire (Figure 6), 27 subjects of the plcUML grouped asked for additional training. From subjects’ observation and analysis of the time needed, the authors expected the abstraction needed to build classes and the relationship between CD and SC to be the main challenge, because in the UML groups long thinking breaks occur before modeling classes. In the questionnaire only 5 subjects mentioned that development of classes is difficult, which is surprising regard- ing the above mentioned errors in building classes and the thinking breaks.
Regarding tool aspects (item 1.2 positive and item 2.5 negative in Figure 6) the plcUML tool seems to have some more problems.
Further subjective results gained from the questionnaires were: 1) Frustration levels were significantly higher in the UML group compared to the FBD group (p = 0.02); 2) The clearness of FBD was rated significantly higher than of UML (p = 0.017); 3) Behavior programming was rated significantly easier with FBD than with
Figure 5. plcUML class diagram master model integrating storage and terminal cylinder and its IEC representation.
Figure 6. Subjective statements after the experiment.
UML (p = 0.012). And 4) subjective quality estimation and factual quality match far better with UML than with FBD (p = 0.025).
Because of the observed thinking breaks, we analyzed the modeling progress over time (points over time) in a random sample of only three subjects (with similar quality of model) we found differences in plcUML and IEC group, in the plcUML group there is a longer period of time until points referring to the master model are ga- thered and there is a clear ramp in points compared to the more steady increase of points in FBS group (Figure 7).
For further experiments detailed analysis of modeling progress is needed and, therefore, the cycle time of storing data needs to be reduced and a more efficient approach of analyzing subjects’ modeling process over time needs to be developed. Details for the analysis and rating of subjects’ models compared to the master mod- el are given in Appendix A.
Subjects debugged at different times, some at the beginning and others at the end of the experiments with a nearly complete model. The analysis of debugging is necessary to find errors and will be focused in future work.
The design of the experiment including training and data analysis was appropriate delivering detailed rela- tions between abilities and model quality, but revealing still shortcomings of plcUML as notation for apprentices in design tasks. Our assumption is that the necessary abstraction to build classes is too high for this group of subjects. These results fit to the notational complexity of class diagrams of Schalles. Therefore, in further expe- riments technicians and engineers will be included as subjects with a higher level of knowledge and experience in PLC programming. Additionally, different levels of task complexity will be tested.
4.7. Experiment E6-Maintenance Task in Early Phases of Notation Development with SysML-AT vs. Continuous Function Chart (CFC)
The research question of experiment E6 is how to evaluate three notations in a qualitative way in a very short period of time for training and experiment in an early phase of the development of a notation. E6 evaluates dif- ferent modeling and programming notations (see also  ), i.e. Parametric diagram (PD) of SysML-AT  vs. Continuous Function Chart (CFC) and IEC 61131-3 Structured Text (ST), regarding a maintenance task, i.e. understandability (analysis and interpretation according to  ) of model contents in a qualitative way. The ex- periment was based on three different simple models of physical laws (about 4 - 5 sub-blocks and 7 - 8 va- riables), with each model described in every considered notation. Because the evaluation should take place in an early design phase and the time needed should be very short, tool support is not applicable. Bachelor students of mechanical engineering worked without a tool after a very short training, passing all three different notations and all three models. The sequence of the notations was permuted for each subject (see Figure 8) to eliminate learning effects.
As a software maintenance scenario, the subjects had to correctly interpret the models’ contents, consisting of components (sub-blocks and variables) and data flows to answer questions regarding the model contents cor- rectly. The mean of correctly answered questions was highest for the PD (68.25%) with a positive offset of 3.97% to ST (64.28%) and 4.76% to CFC (63.49%) (see Table 1, E6). The experiment shows, that even a short training with a short time for experiment and a small number of subjects delivers qualitative results. In accordance with
Figure 7. Modeling progress over time for 3 subjects of each group.
Figure 8. Experimental design of E6-maintenance task.
the results, in questionnaires that tested the subjective cognitive demand, the subjects rated the PD as the most understandable notation. Furthermore, all of the subjects answered, that they experienced a learning effect re- gardless of the different notations they used.
The results of a focus group that was conducted for additionally evaluating the SysML-AT  also indicated that the developed modeling approach is well suited for automation software modeling.
4.8. Experiment E7-Conceptual Engineering of Structural aspects of Distributed Networked Automation Systems (NAS)
The research question in this experiment is whether additional support in structural modeling of NAS realized with characteristics and pattern is beneficial in the conceptual design or whether the resulting complexity hinders the benefit. Besides the instruments regarding user acceptance are more elaborated and should give answers in more detail in relation to quality of models.
As the results of E5 and E6 show, plcUML and SysML-AT have positive influence on the programming of a PLC. Following Sierla  and the difficulties identified for engineering of distributed systems E7 evaluates a SysML-AT based notation and workflow vs. CFC for a high-level design of NAS in MS (see also   ). The evaluated approach focuses on the overall design of NAS integrating notation SysML-AT being the successor of plcUML. The SysML-AT based concept contains a workflow procedure referring to a life cycle model (following requirements E1) including communication and real time requirements for a hybrid control task (experiment E7 a). Additionally, characteristics (E7 b) and characteristics plus patterns in (E7 c) are compared to the pure notation and workflow procedure. Conditions b and c are only qualitative measures, because of the small group size.
The approach covers the modeling of automation hardware and software as well as of functional and non- functional requirements. From described requirements the functions that need to be implemented can be derived and captured within the same model. Hardware elements like sensors, actuators and nodes and their interfaces and properties are considered within the modeling approach as well. This enables the integration and linking of hardware and software models  . The notation is based on the SysML Block and Requirements diagram using ports to represent software and hardware interfaces (Refer also to  . Duration of experiment was not re- stricted, but taken as measure (mean given in Table 1, E7).
Characteristics as well as pattern supported subjects in solving the task, i.e. design of the automation concept of a coking plant including belt synchronization without implementation. Main task of the experiment was to conceptually design a closed loop for speed synchronization which included three belts. This comprised all ne- cessary functions, interfaces and relations to the sensors and actuators. The internal behavior and control algo- rithms were not required, i.e. the structural part of the model needs to be designed. Characteristics detail re- quirements as well as the later design solution including element relations. During the design the comparison of requirement characteristics and solution characteristics help to decide if the solution fits the requirements (cp. Figure 9).
Additionally patterns, divided into functional and deployment patterns, help to find a solution. Functional patterns include proposals and support the engineer in the development of the functional model. Deployment patterns indicate distribution alternatives of functions and support the engineer in the development of the deployment model  .
The models subjects stored after finishing the given task were analyzed compared to a master model. The re- sults show major difference between the subjects’ solutions and master model, i.e. experts’ best practice regard- ing module structure. Similar to Sierla  different possible solutions were detected, i.e. most subjects chose a functional oriented modeling approach, instead of a mechatronic approach taking modularity, reuse and architectural aspects of NAS into account. The experiment intended that students follow a mechatronic approach, therefore the master model was built realizing the mechatronic paradigm. In further experiments either the me- chatronic approach needs to be integrated into the training or subjects’ mental models need to be collected before- hand.
Nevertheless, the experiments show a significant benefit of SysML-AT compared to CFC (see Table 1, E7 a to c column results). Regarding notation with life cycle model, i.e. NM subjects gained significant better models compared to CFC (123.1 mean compared to 182 max. points, the best subject gaining 144 points, see appendix C). Using characteristics additionally (NMC), subjects improved their models again. But for those experiments only qualitative results are available due to the limited number of five subjects. Regarding user acceptance measures subjects stated less mental demand using pattern (see Table 3), i.e. NMCP has lowest mental demand with 10.75 of max 20 points. (the higher the more mental demand). The motivational factor “fear of failure” was most pronounced in the group with patterns and characteristics (NMCP). Furthermore, this group showed high external control beliefs meaning that subjects strongly related the outcome of the results to external circumstances and high fatalistic externality meaning that success is assessed as depending on fate, fortune, and chance and, however, subjects perceived low mental demand during task performance. In addition, according to usability aspects, suitability for task was best evaluated for group with characteristics (NMC). Suitability for individualization of patterns was rated significant lower than both other conditions.
Based on UML-PA and E2 as well as the experiences gained and rules derived for E5 (including task development, training and tool development) and experimental design in general (see 6) the experimental design of E7 was developed appropriately evaluating also the derived rules (see 6). E7 evaluated the benefit of NM for NAS and hybrid control with real time and communication requirements. Even relations to abilities realized in E5 could be further developed with a more advanced questionnaire. Results reveal more relations to human factors, e.g. mental workload, and usability measures. For further engineering support the challenge is to find a compromise between supports by characteristics and pattern and the approaches’ complexity.
Figure 9. Characteristics meta-model  .
Table 3. Results of human factors and usability measurement in E7.
4.9. Summary of the Experiments
All experiments focus on the design phase besides E6, a centralized single PLC as control hardware besides E7 and students as subjects besides E1 and E5.
· E1 was the first experiment exploring the method of usability evaluation in logic design engineering with a single closed loop controller and compared pure UML 1.4 and PLC in a first attempt without the support of an engineering tool and with a large unstructured task.
· E2 focused on a hybrid automation task including communication with the focus to support deployment by simple UML-PA pattern compared to classical UML 2.0 with restricted tool support and a narrow engineer- ing task.
· E3 focuses on error handling comparing plcUML State Chart to Sequential Function Chart (SFC) in IEC us- ing a very simple automation sub-task and a short classical training.
· E4 focuses on SC vs. IEC 61131-3-SFC sequence of structure and behavior modeling in workflow using UML 2.0 with a didactically more elaborated but classically conducted training concepts in smaller sub groups with the goal to increase the quality of the structure model.
· E5 is similar to E1 also an exploratory experiment further developing the method of usability engineering experiments using a real software engineering tool with embedded UML the so called plcUML compared to IEC 61131-3 FBD with apprentices optimizing repetitive training, and exercise, an elaborate training envi- ronment and smaller automation task with reusable sub-process, including also human factors and prior knowledge.
· E6 focuses on a maintenance task in the early phases of notation development with SysML-AT vs. Continuous Function Chart (CFC) to show the benefit of easy and quick sub-experiments in the development pro- cess of the notation.
· E7 Conceptual Engineering of Structural aspects of distributed networked automation systems (NAS) in- cluding a procedure for life cycle support and characteristics for pattern selection and reuse with a detailed analysis of user acceptance including motivation.
In every single description of an experiment the research questions as well as the most important aspects of the experimental design, results and lessons learned regarding usability aspects are discussed as well as results for further development of MDE, i.e. notation, procedure and tool. Most experiments are based on prior experi- ments and notational development resulting from a prior experiment is tested in one of the following experi- ments.
5. Selected Results for Future Usability Experiments
The following section summarizes the best practice rules gathered to the best of our knowledge. At first the criteria for the selection and configuration of the affecting variables (see Figure 1) are discussed, e.g. the task, the training and selecting a group of subjects. Afterwards the criteria for selecting the affected variables are dis- cussed (see Figure 2).
5.1. Configuration of Affected Variables
5.1.1. Task Development
As affecting variable (see Figure 1), the type of the engineering task (maintenance or design) and automation task complexity and characteristics are key issues in relation to the complexity of the new notation or approach to be evaluated and the time available for training and the experiment itself.
1) Automation Task
To classify or rank the automation task complexity compared to other experiments and to estimate the time needed for training as well as the task itself in the experiment, the authors introduced some measures, i.e. num- ber of I/O, number and type of control loops and depending on the used notation the WMC and number of states for OO design and the number of FBDs and variables for classical PLC programming using IEC 61131-3. Be- sides the task characteristics, i.e. real time, communication requirements and the tasks type as well as the inclu- sion of exception handling (E3) and mode of operation are relevant, too. In the above introduced experiments the WMC reaches from 3 in E2 a strongly restricted experiment using pattern to 43 in E5 and 45 in E1 in a more industrial related scenario.
It is obvious that a complete engineering task consists of a lot of decision points with different ways to a cor- rect solution. These variation possibilities need to be covered by an evaluation scheme.
2) Engineering Task
Starting with HTA or GOMS, the required steps to fulfill the task are found. The quality of the HTA depends on the skills and experience (also industrial) of the experts conducting the HTA. Interviews with industrial ex- perts are helpful to find appropriate subtasks as well as typical module libraries available to be provided in the experimental setup.
Modeling mostly consists of structural and behavioral aspects. In most of the experiments described above, structure and behavior were an issue (Table 1, column engineering task). All experiments besides E6 dealt with design and model creation (E2 model configuration). E6 highlights maintenance tasks and showed that tool support may be neglected for easy tasks as well as training may be very short compared to design tasks.
For both, modeling and training, the designer has to decide whether to provide a life cycle model or even a method and a tool. For more complex engineering tasks a tool is a prerequisite to gain subjects acceptance (not reached in E1, E4 and E6) and motivation. On the other hand, a prototypical tool (E3) leads to results that may be induced by the tool and not by the notation to be evaluated. Sophisticated tools need additional time for training. The prototype plcUML or SysML-AT, therefore, needs to be carefully tested by novices and persons belonging to the qualification group of future users before conducting the experiment, to ensure an effective de- tection of as many defects of the tool as possible prior to the experiment. Since otherwise frustration will rise and may act as disturbing factor in the experiment (E7 c).
5.1.2. Development of Training
As discussed above, an appropriate training is a prerequisite for meaningful results, but hard to achieve (E5 not E7 c)) in the first experiments. A hybrid learning environment is advantageous to reduce disturbances by indi- vidual trainers as in the pre experiment of E4. Furthermore, process simulation offers high benefits as to testing and debugging the software. For more complex notations and procedures, e.g. OO and UML, repetitive training with fade out is beneficial (E5). A training period of 1.5 days for OO with apprentices as subjects and 0.5 days for E7 a) with students as subjects was appropriate. With a very simple task or strictly focused hypothesis and a restrictive tool significantly shorter duration can be reached (E2 and E6).
5.1.3. Selection of Subjects
Besides E1, we decided for individual subjects to allow the identification of reasons and dependencies to indi- vidual abilities. This excludes to examine benefits of group work as found in Sierla  . In the field of MS en- gineering students are a typical group of subjects for design tasks as well as technicians and apprentices for maintenance tasks and simple design modifications at customer site. The necessary numbers of subjects per cell to gain quantitative results is minimum 15. Different skills and abilities, e.g. mathematics are often related to results and act as disturbance factors.
Pre-tests are recommended to adjust distribution of subjects to groups regarding expertise and abilities. Dif- ferent tests are available (E5) or adaptable (e.g. on general intelligence  or on previous knowledge). Missing or insufficient motivation may also be a disturbing factor as realized in experiment E7c. Also, mental workload, i.e. the cognitive demand perceived during modeling tasks is a critical factor for the probability of errors and, therefore, should be at an intermediate level (E5 and E7). When analyzing specific aspects of a notation in more detail after the main experiment, group sizes from 6 to 8 are regularly implemented to get qualitative results. In E6 the sequence of the notations was permuted for each subject to eliminate learning effects instead of using one notation for one group, which in case of E6 would have multiplied the necessary number of subjects by three.
5.2. Measuring Affected Variables/Usability Requirements
To analyze the gained result and to evaluate it, master models are recommended, developed by the designer of the experiment together with other experts.
5.2.1. Data Collection-Organizational and Technical Challenges
For the data analysis observation and recording of subjects’ results are most important. The easiest way to ob- serve subjects is to take a video, but the manual analysis of the video is time consuming. In engineering tasks using an engineering tool, the most often implemented strategy is to store the model cyclically with a selected time (all 2 or 5 minutes E5 and E7 5 min) or if a new input is typed in the model (E2). The cyclical storing strategy has the disadvantage of losing information in between storing intervals similar to the sampling of an analogue value. Storing the model with every subject’s input has the disadvantage of large amounts of data, which need to be analyzed later. The strategy may not be integrated in real tools as necessary if using a ß-ver- sion of an industrial tool (in E5). The challenge is to implement storing strategies in the prototype or to get access to a market leading tool in case it should be used for evaluation. The CoDeSys implementation was easy to realize for the authors’ team due to the gathered developer’s knowledge of the plcUML-Plugin. Additionally to model analysis, human observers are advantageous especially in case of pre-experiments and to include addi- tional information gathered by observation. Unfortunately, this is expensive, because the observers need to be trained; the observation needs to be documented in a standardized form and approximately 1 observer is re- quired for 2 - 4 subjects. In E5 long periods of thinking breaks in the OO groups before building classes were found and included in further analysis. The analysis of results gained, i.e. model consolidation over time seems to be useful, but is depending on the availability of data and ease of analysis.
In psychology thinking aloud is an often implemented method, which is often not accepted and applicable by engineering students (E1). Another issue is to gain information why subjects make mistakes or chose a specific solution. To a certain degree this information may be gained by individual interviews or online questionnaires directly after the experiment (E4). In E4 subjects were asked to analyze their solutions compared with the master solution and give reasons, e.g. lack of time, translation problems, distraction etc., for their mistakes. The method is promising, but hard to realize with large groups of subjects because of possible interviewer effects with regard to the questions asked.
Usability evaluation concerning affected variables, i.e. effectiveness, efficiency and user acceptance was rea- lized with different methods. To assess effectiveness, completeness and correctness are measured by counting the numbers of correct steps compared to the master model, e.g. in the behavior model, e.g. a state chart the num- ber of steps, in the structure model in FBD the number of variables, the number of classes and objects in a class diagram (for evaluation scheme E5 see Appendix A).
The difficulty in rating the results of an experiment is comparable to grading exams by distributing points for correct solutions, but more sophisticated. Points are given by two evaluators independently with a necessary in- terrater reliability of at least 65% (E5, E7 see Appendix).
To evaluate efficiency time stamps need to be included in the stored data and analyzed or as mentioned in 1) the cyclically stored data are taken to analyze efficiency over time. In most experiments efficiency is effectiveness in the given period of time subjects got for the experiment. In most experiments time was limited due to organi- zational reasons, besides E7 where time was taken as a variable: When subjects felt ready they submitted their solution and the time needed was stored.
5.2.4. User Acceptance
For evaluation of user acceptance in all of the above described experiments questionnaires based inter alia on the EN ISO 9241 and on recognized tests as RSME  and NASA-TLX  were implemented and further de- veloped from one to the next experiment to analyze subjective values regarding modeling as such, the notation evaluated and/or the tool used, e.g. E1 and E5. Furthermore, extended evaluation of attributes for usability re- quirements examined by EN ISO 9241-110 questionnaire (E7) was additionally used to collect users’ assess- ment of applicability of patterns and characteristics. Results revealed suitability for task and for individualiza- tion as appropriate indicators of difference. Questionnaires regarding the notation and tool may also reveal weak- nesses of training and notation (E5 class concept).
6. Selected Results for the Development of Future Notations for Model Based Software Engineering
In MS, hybrid control tasks, real time and communication requirements of different complexity need to be engi- neered during design and maintained during operation covering structure and behavior in MS models.
From the results of E1, we realized that pure UML 1.4 with its five diagrams used in E1 is confusing and not appropriate especially for structure models. Additionally embedded tool support in PLC development environ- ments and a procedure is requested by subjects. Forcing students to follow a specific modeling order, e.g. beha- vior or structure first (E4) is not helpful to improve structural models. The introduction of plcUML embedding class diagrams and state charts into an IEC 61131-3 tool enlarged with composite states for error handling showed benefit, but tool aspects as placement were criticized (E3). In E5 a more general, but simple logic design task with reuse revealed weaknesses of plcUML in design tasks for apprentices using a ß-Version of the tool. The challenge for apprentices was the necessary abstraction when building classes. Weaknesses in training and tool were criticized (Figure 6). The tool has been further developed and integrated in CoDeSys by industry in June 2013 now used in different industrial companies and research. Experiments focusing on maintenance tasks, evaluated in E6 with students of mechanical engineering, indicated that the SysML-AT PD has advantages compared to CFC and ST (qualitatively).
All these evaluations concentrated on the automation software of one centralized PLC. Regarding deployment and NAS two experiments were conducted, i.e. E2 and E7 including communication and real time requirements. In E2 a domain specific UML the UML-PA with reduced number of diagrams was beneficial in deployment of software to hardware devices like PLCs, using patterns with a very simple conceptual control task. The restricted tool was criticized, but the reduced number of diagrams was advantageous compared to UML 2.0. plcUML, consists of the Class Diagram for modeling software structure as well as the Activity Diagram and State Chart for modeling discrete software behavior using Activity Diagrams in the early phases of the software lifecycle for specification issues and the State Chart for detailed modeling of behaviour. Further developments of plcUML, namely the SysML-AT added the SysML Parametric Diagram for modeling constraints as mathematical equa- tions to describe physical laws to the diagrams of plcUML. Although advantages of both notations were noticed, the results from E6 (focus group) indicate that a MDE approach for MS has to consider and support require- ments analysis and architectural design and a supporting method. Especially for NAS the architectural design is even more important. Such a method was developed and positively evaluated in experiment E7 to be most ap- propriate for all typical requirements of automation in MS. Recent works currently develop an approach that contains the developed methodology for NAS and requirements modeling followed by software modeling and generation based on the plcUML and SysML-AT.
7. Conclusion and Outlook
MDE approaches should increase efficiency and quality in design and maintenance of software engineering for MS. The article showed results of usability experiments using pure UML, domain specific UML versions, i.e. UML-PA and UML E as well as domain specific SysML-AT for mainteance purposes and NAS. Summarizing the most important technical issues pure UML 1.4 or 2.0 is not appropriate, but plcUML with reduced number of diagramms and a supporting modeling process integrated in an IEC 61131-environment to support roundtrip en- gineering. For error handling plcUML SC with commmposite states are beneficial compared to IEC 61131-3 FBD. Structural modeling using pure or even plcUML is still a challenge for many subjects as well as the crea- tion of classes in the sense of abstraction used in computer science. Abstraction in automation and mechatronics is different to computer science, i.e. more related to physics, also in distributed systems application. Complexity of notation (class diagramm and E7) relates to difficulties in applying the notation in an experiment with time restrictions (2 days). For NAS the applicability of notation was positively and quantitaive evaluated and for characteristics and pattern further experiments and longer training time is needed. Ongoing research is looking at a detailed analysis of humans’ mistakes trying to find reasons by interviewing subjects after the experiment.
Regarding real industrial software engineering tasks in MS all these experiments lack of experienced subjects, i.e. application engineers and the start-up phase with debugging. Real applications and some applications engi- neers are included in  . The classical debugging phase to find faults is not explicitly analyzed up to now even if Myers  provides an interesting approach to classify runtime faults and the underlying software errors. Debugging in E5 was limited to simulation and restricted due to given time. At the moment we implement inte- views after another experiment focusing on reuse of modules with apprentices to analyse faults categorized to Myers’ classification.
Regarding usability aspects the presented experiments proofed the relevant affecting and affected variables (Figure 1 and Figure 2) to be taken into account when designing the experiment.
To increase efficiency and quality of software in the development process of an industrial company in ma- chine and plant manufacturing model based approaches using notations as UML and SysML are applicable and could be proven as partially quantitive beneficial. The prerequiste for a real benefit is the availability of an inte- grated tool support in the IEC 61131-3 especially for maintenance reasons to guarantee consistency of model and implemented code. Nevertheless, it is will not be easy to introduce and implement MDE using UML and SysML in an industrial company. Training and rules for application are necessary as well as a workflow to inte- grate existing legacy software developed in years. To integrate legacy software the existing software needs to be analyzed at first and modularity concepts need to be developed as a prerequisite for MDE. Variablilty analysis from software engineering should be implemented to maintain and evolve models and code synchronously.
Further research is also needed regarding the integration of more advanced controllers into the usability eval- uation, e.g. modeled in Matlab/Simulink.
The author gratefully acknowledges the support of the German Research Foundation (DFG) for the projects DisPA (Vo 937/2-1), KREAagentuse (VO 937/8-1) and FAVA (VO 937/13-1) and the support and fruitful dis- cussions with Christoph Legat, Daniel Schütz, Kerstin Duschl, and Martin Obermeier.
Appendix A. Example of Subject’s UML Model (Evaluation Scheme E1)
One example of a modular UML model (Figure A1) is given. The results show the subject’s problem to identify reusable parts of the plant. He decided to use the state chart (Figure A1, left) and the class diagram (Figure A1, right). He tried to build a modular state chart and formed a class “single out and transport to stamp”. This shows the inadequate understanding of classes and state charts and their application. Unfortunately, none of the sub- jects realized a correct class model.
Appendix B. (Evaluation Scheme E5 and WMC Calculation)
B.1. Evaluation Scheme E
In the following the measurement from E5 for plcUML and FBD model quality is shown. First the evaluated model elements are discussed for structure and behavior modeling. Then typical subject’s solutions for both no- tations are shown and the points given are depicted.
The model quality regarding the grade of task completion (correct model elements) for both notations was evaluated for structure and behavior through manual code/model inspection. As measure for structure in plcUML models, the number of correct attributes with a correct data type and the correct access modifier in the class dia- gram was counted (cf. Figure B1). As methods were not imperative in order to solve the given task, they were not included in the measurement.
Additionally the created object instances in the main program were counted for the structure model. i.e. each correct instantiation of a cylinder object was counted (cf. Figure B2).
Figure A1. Modular UML behavior model of one subject (left: hand written model; right: translated model).
Figure B1. Structure model quality measurement plcUML: Class diagram.
Figure B2. Structure model quality measurement plcUML: Object instantiation.
This results in a maximum of 20 points available for the structure model in plcUML. The plcUML behavior model quality was measured by identifying correct method calls, sequences of variable comparisons and states (cf. Figure B3). If the subsequent state after a logically correct variable comparison included a logically correct method call an additional point was given.
In Figure B4(a) and Figure B4(b) a complete example measurement for one student’s model in UML is shown. Missing Points are depicted as Xs. The quality of the structure model (Figure B4(a)) is 13/20 or 65% and quality of the behavior model 24/67 or 35.82% (Figure B4(b)). The overall model quality is 37/87 or 42.53%.
Similar to the plcUML model quality measurement, the FBD program quality was evaluated. For the structure quality every necessary in- and output for the FBs was counted, cf. Figure B5. This results in a maximum of 32
Figure B3. Behavior model quality measurement plcUML.
Figure B4. (a) plcUML structure model quality: 13/20 points; (b) plcUML behavior model example: model quality 47/67. Overall model quality: 13 + 24 = 37 Points/Relative overall modeling performance: 37/87 = 42.53%.
points for FBD model structure quality.
The FBD behavior model quality was measured by identifying correct FB or FC calls, sequences of variable comparisons and the connection of these elements, cf. Figure B6. If the subsequent call after a logically correct variable comparison included a logically correct FB or FC call an additional point was given.
B.2. WMC Calculation
WMC is defined as the sum of Ci. Ci is the cyclomatic complexity of the ith Method, and is calculated by counting the conditions of the method +1, cf. McCabe 1976  . In Figure B7 an example for WMC calcula- tion is given. In this case only the methods auto and manual of the example class are relevant. The auto method contains several conditions and therefore has a cyclomatic complexity corresponding to the number of included conditions +1 as defined by McCabe, resulting in Cauto = 10. The manual method does not contain any condi- tions resulting in a cyclomatic complexity Cmanual of 1. Finally these two complexity values sum up to an
Figure B5. Structure model quality measurement FBD.
Figure B6. Behavior model quality measurement FBD.
Figure B7. WMC calculation example.
Figure C1. Most subject’s functional oriented model (left) and Master model-mechatronic oriented model (right).
overall WMC of 11 for the example class.
Appendix C. Master Model and Subjects’ Solution (Evaluation Scheme E7)
Most subjects in E7 chose a functional oriented model deployment (Figure C1, left), i.e. deploying different functions (speed control, temperature control and sorting) on different PLCs, which is from an architectural pers- pective considering mechatronic modularity and reuse an inappropriate solution. NAS experts chose a different approach (Figure C1, right) to support reuse of existing modules. A comparison and evaluation of these differ- ent models was difficult as “how to build a module” was not included in the training, because the MDE ap- proach should be intuitively applicable.
The best subject gained 144 out of 182 points building a mechatronic oriented model, but neglected require- ments and other details.