Modern manufacturing systems are expected to undertake multiple tasks, flexible for extensive customization, and that trends make production systems become more and more complicated. The advantage of a complex production system is a capability to fulfill more intensive goods production and to adapt to various parameters in different conditions. The disadvantage of a complex system, on the other hand, with the pace of the increase of complexity, lies in the control difficulties rising dramatically. Moreover, classical methods are reluctant to control a complex system, and searching for the appropriate control policy tends to become more complicated. Thanks to the development of machine learning technology, this problem is provided with more possibilities for the solutions. In this paper, a hybrid machine learning algorithm, integrating genetic algorithm and reinforcement learning algorithm, is proposed to cope with the accuracy of a control policy and system optimization issue in the simulation of a complex manufacturing system. The objective of this paper is to cut down the makespan and the due date in the manufacturing system. Three use cases, based on the different recipe of the product, are employed to validate the algorithm, and the results prove the applicability of the hybrid algorithm. Besides that, some additionally obtained results are beneficial to find out a solution for the complex system optimization and manufacturing system structure transformation.
Over the past few years, industrial manufacturing is confronted with extensive changes. From local economy towards a globalized and fully competitive economy, markets require highly qualified and customized products at lower costs and with shorter life cycles [
Among all these concepts, production process tends to be characterized by modularity, decentralization, autonomy, scalability, reusability, adaptability and reconfigurability. Moreover, a critical issue in production is the system control approach. For a complex system, traditional control methods are reluctant to the adaptation of more input variables, state parameters, and more flexible production requirements.
One regular practice is a job shop scheduling or a job shop problem (JSP). The objective of that is to minimize the makespan with the given n jobs and m identical machines in a factory. It is recognized as an NP-hard problem in the mathematical domain [
An inferior encoding method generates an enormous amount of invalid solutions and enlarges searching space, so a reasonable method for encoding is the precondition for GA. Binary encoding and Gray encoding are suitable for a series of real parameters e.g. five DeJong test functions [
Another part of the hybrid algorithm is reinforcement learning. Reinforcement learning (RL) is another sound algorithm based on studies of a system’s structure. RL algorithms are a series of learning policies that make programs improve their performance by receiving rewards or punishments from the environment [
In this paper, a hybrid machine learning algorithm is represented to improve a flexible manufacturing system. An approach based on both system analyses and the results from messy GA is proposed. It presents a way to making a decision on a better reward function for RL algorithm. All this work is executed in the simulation environment of CoDeSys under IEC 61131-3. Based on obtained experience, an idea to optimize flexible system structure is discussed.
The rest of the paper is organized as follows. Section 2 introduces the flexible manufacturing system and formulates the problem. The control policy is discussed in Section 3. Simulated results are present and discussed in Section 4. Then some additional beneficial results are presented in Section 5. Section 6 concludes the paper and proposes avenues for future work.
Nowadays, a flexible manufacturing system becomes more and more completed and flexible for massive customization. It is designed with more branches between workstations, which means it can provide more options for products to be transport and flexibility for manufacturing different kinds of goods. A demonstrator of such a system in the laboratory of the chair of Automation and Information Systems, Technical University of Munich, is shown in
Some fundamental properties of the manufacturing system can be concluded as follows:
・ There are two filling machines mounted on a workstation respectively at the left side of the system, each with two kinds of pellets, which means it can provide different recipes. So it is a flexible manufacturing system, and it can be used to produce various types of products.
・ There are ten conveyor switches which connect workstations and inventory, and each of them leads to the different directions in the system. That means that there are several routes for the production process for one kind of a product.
・ The inventory for raw product and inventory for final product are located next to each other and share one robot.
There are five materials in the experiment, namely red pellets, green pellets, blue pellets, black pellets, and water. Among them, the racks of red pellets and green pellets are mounted on workstation 1; the racks of blue pellets and black pellets are installed on workstation 2. Referring to water, it is a dependent variable. Thus we just consider the pellets. It is easy to find out that there are at most 24 − 1 = 15 products. Here, we use
According to the position of raw materials on workstations, we will check the recipe of a product and define sub use cases.
If the recipes of products are following:
That means a raw product must go by the workstation 1, but not the workstation 2. That is defined as a use case 1. Based on the above recipes, the use case 1 consists of 3 sub-scenarios.
If the recipes of products are following:
That means a raw product must go by the workstation 2, but not the workstation 1. That is defined as a use case 2. Based on the above recipes, the use case 2 consists of 3 sub-scenarios.
If the recipes of products are following:
That means a raw product must go by the workstation 2, but not the workstation 1. That is defined as a use case 3. According to the above recipes, the use case 3 consists of 9 sub-scenarios.
Here, we need some further discussion on the mentioned use cases.
1)
2)
3) If there are more than two products, that means
Besides, we have to consider the following two scenes:
1) When a raw product is a feed by the entrance of the system, it has several routes to a workstation, and after the filling process, it has random ways to the exit. Different route plan consumes different makespan, and it may interrupt the following product and generates “traffic jam.”
2) When a final product appears by the exit of the manufacturing system, the robot has two options, to feed a raw product to the system first or to pelletize the completed product. Different choices result in different tardiness time and completion time.
These two scenarios are referred to the behavior of the feeding/palletizing robot, but one selection of the robot has a global influence on a manufacturing system. So they are viewed as a part of the three use cases defined above. Due to the random characteristics of the process, e.g. the various completion time by different routes, the whole manufacturing process is a stochastic process. Thus, the goal of this paper is to find out how to plan a reasonable route for the global process and to determine the potential relationship between feeding interval and completion time with the application of a hybrid machine learning algorithm.
After illustrating and formulating the target system, the next step is to find a reasonable control method to implement.
Before the introduction of the algorithm implementation, we need to have an overview of a representative complex manufacturing system. Essential elements of the system consist of the robot, workstations, and conveyor belts (CB) and conveyor switches (CS). Conveyor belts transport products to the targeted workstation and conveyor switches change routes for the process and make it possible for one product to bypass an unnecessary workstation. Under the normal circumstance, one conveyor switch cannot work independently but must cooperate with conveyor belts. It connects three belts in the manufacturing system so that the system can decide not only which product will be delivered first, but also the coordination of a process when several belts are taking one product to the conveyor switch simultaneously. So the fundamental research unit consisting of one conveyor switch and three conveyor belts is defined as a trident node. In other words, transporting tasks in the flexible manufacturing system are undertaken by Trident nodes. The decomposition of the fabrication system to Trident nodes is shown in
One important target of the system is to decide a better transport order and to coordinate behaviors of conveyor belts. So the key component in the node is the conveyor switch. Since one node consists of four elements, all the conveyor switches are numbered by 4n. On the global level, a product will go from the feeding part to the palletizing part. According to the system structure in
the left end of one conveyor switch will be counted by 4n + 3. Besides that, a public conveyor belt in adjacent nodes has two identifiers.
The control system tries to find the time-efficient option for one product; then it attempts to find the time-efficient option for the whole manufacturing process. On the other hand, algorithm recognizes the influence of each option or each node simultaneously. The flowchart of the hybrid algorithm is displayed in
According to the above analyses, a messy GA constitutes the basis of this hybrid algorithm. A standard procedure for the implementation process of GA includes encoding, operation, and selection.
In this paper, the system is encoded by the sequence of the nodes’ number with genetic algorithms. A raw product may pass the different amount of nodes, so the gene code is a messy digital number. It was first mentioned by D. Goldberg [
A “tour” is defined as a product traveling from the feeding entrance to the palletizing exit. A “legal tour” means a pathway that passes by the corresponding filling machine according to the recipes. On the contrary, an “illegal tour” doesn’t pass by the corresponding filling machine. For example, 1243690 is a legal tour for the recipe “red pellets”, and 1236780 is an illegal tour for the recipe “black pellets”. In the coded number, “0” represents node 10 to avoid conflict with node 1.
Since an encoding number implies the passing order of nodes, the gene length is
naturally more than 6. That means a product must pass through at least six nodes from the feeding point to the palletizing point via some filling station. Intuitively, the gene length can be a relatively huge number because a product may go around some cycles e.g. “3243”, “345783”. If a solution is a very long string, it will consume more time. Thus it can be considered as not an optimal solution. On the other hand, because of constraints of a crossover operator and mutation operator, the gene length cannot be set too short either. In this paper, the gene length shall be fixed to no more than 16.
During the crossover operation, the same node(s) except node 10, node 1, node involved in a filling process and the adjacent node to 10 and 1 in common between two parents is searched at first. For instance, the following two parents are legal tours for the use case 1.
Parent 1:
Parent 2:
This means two parental solutions pass through common node 1, 2, 3, 4, 6, 8, 10. Node 3 is involved in a filling process for the use case 1 and node 8 is the adjacent node to 1 in common. In that way, node 2, 4, 6 are potential crossover points. The system will choose one of them randomly and then will build up an offspring. If one child represents an illegal tour, the system will select another rest crossover point to generate new offspring. Suppose node 6, which is underlined in parents, is a crossover point, in this case, we get offspring like following,
Offspring 1’:
Offspring 2:
The route of offspring one does not include the filling station. Using node□; 4, as a crossover point, the system keeps offspring2 and then generates a new random child,
Offspring 1:
The crossover rate is set as 1 in this paper.
The mutation is necessary for helping this algorithm to jump out of a premature searching space. Inspired by cycles in the system mentioned in 3.2.2, we employ them to execute a mutation operator. They increase the gene length, but not always increase time consumption. So a cycle is applied here as modifications to help the system to get rid of a pseudo-optimal solution space. One of the node 3, 4, 5, 7 will be selected to add a cycle. Suppose a parent as following:
Parent:
One possible mutation might be,
Offspring:
Node 4 which is underlined in the parent’s chromosome is randomly chosen as a mutation point in the instance. The mutation rate is set based on the parent. If chromosomes of offspring are identical to two parents, then the mutation rate is set as 1. Else, if potential chromosomes of offspring are different from parents, the mutation rate is set as 0.05 at the global level.
In the mutation process, an extra circle is added. Similarly, a circle in a parent can be also subtracted. However, that will make an intergenerational transmission of a mess. To avoid that kind of confusion, non-repeat chromosome routes are provided with higher priority. However, it does not mean that non-repeat chromosome routes are elite routes. This is another necessity of the mutation process. 15 initial parents in the whole population are chosen to generate and iterate the later population.
・ If the non-repeat chromosome population in a use case is more than 15, then choose 15 of them randomly.
・ If the non-repeat chromosome population in a use case is less than 10, the system will accept all of them first and then substitute the vacancy with repeating chromosome randomly.
Based on the analysis in the previous section, a repeating chromosome offspring can be generated by the non-repeat parents. It is essential because the optimal solution may exit in repeat chromosome routes.
Because control units on real plants are PLC-mounted, the simulation is executed on CoDeSysV3.5 SP5 Patch 3. CoDesys is a developing environment based on the IEC-61131 standard. Physical parameters are following,
・ Speed of conveyor belt -vb = 300 mm/s;
・ The robot transporting time -trt = 2 s;
・ The moving speed of robot -vrm = 300 mm/s;
・ The filling time at each workstation -tf = 2 s.
Another fundamental property should be pointed that one product can pass a workstation several times, but it can pick up pellets by one workstation only once in the simulation. The behaviors of the system in this paper are simulated with the proposed hybrid algorithm. Owing to the characteristic of a stochastic process, one node in the system can decide the transporting sequence and direction, and the robot can decide to feed a raw product first or to palletize a finished product. The goal of the simulation is to recreate the decision and learning process of the system to find out whether the hybrid algorithm fits the circumstance. It is evident that the production effectivity is decided by robot transporting time, moving time on the slide way of the robot, and time-consuming within the manufacturing system. On the other hand, the makespan in the production system is denoted by one selected route in the stochastic process. Because of its own characteristics of the system, there is no intervention process in the use case 1 and the use case 2. So targets of the use case 1 and the use case 2 are to find the optimal routes. The simulation results are displayed in
From
those four nodes appear in one solution, it does not affect the performance of a solution. On the other hand, rest nodes are non-essential nodes. Nonetheless, they will affect the performance. The influence of these nodes may be beneficial, or harmful. In fact, a raw product needs only red pellets or green pellets in use case 2, which means it is time-saving to avoid node 5, 7, 8 in the route plan. Thus, node 5, 7, 8 are defined as the negative nodes or negative chromosomes. Finally, we cannot judge rest nodes 2, 3, 6, 9 on the surface. So they are defined as neutral nodes or chromosomes.
There are also some other facts we need to consider. For example, there are several routes for a product to go through node 8 to node 10. One is from node 8 to node 10; another might be from node eight via node 7, 6, 9 to node 10. That means adjacent nodes are not independent.
Based on experience gained before, there ward function can be set as follows:
Here, Pn represents the potential positive nodes; Hn represents the harmful nodes; Nn represents the neutral nodes, and Z( ) counts the passing number of nodes.
There are 63 raw products waiting in line in total based on the physical structure displayed on the right side in
After the reward function is determined, the reinforcement learning algorithms need to be introduced briefly. A typical decision starts at time t0, with the initial state given by s0. At any time, possible actions are based on the current state, and it is depicted as
where
Once
The Q-learning algorithm estimates Q* from the interaction between actuators and environment, depending on both the previous state and the selected action. Thus Q is updated. The foundation of the algorithm is a simple value iteration update proposed in [
Here,
Obviously, the system transporters fulfill the following conditions, and then it has been proved that the control policy converges to
・ Explicit, distinctive values of the Q-function are stored and updated for each state-action pair
・ The sum of the squares of α is finite, whereas the sum of α is infinite.
・ The controller keeps trying all actions in all states with nonzero probability.
The completion time of 50 episodes is shown in
From the simulated 50 trials, the completion time tends to decrease. At the very beginning, it shrinks quickly, and then slows down. The curve is not smooth because the range of the tour time is discontinuous. Furthermore, the curve fluctuates a lot because of the system structure and the relationship between nodes and routes. The control system will traverse as many initial populations as possible at the very beginning. The influence of a local decision is comprehensive and in-depth. One previous decision affects the tours of several following products, thus affects the total completion time. Once an option of a previous product is selected, the tour range of the following product shrinks. The sharp time decline occurs when the algorithm recognizes one crucial node, while the subsequent rise is always tiny because the algorithm tries to promote the system performance in the neighborhood. After 30 tests, the curve tends to be mild. Therefore, it can be convinced that the system has completed the training process.
There is another part of the system, i.e. the feeding and palletizing robot. Though it is defined as a subsidiary device, it will also influence the performance of the system. For example, if it chooses to continue to feed a raw product to the system when a finished product appears on the exit conveyor belt, the palletizing process must be delayed, and vice versa. The mean tardiness time is employed as an indicator to elaborate that influence. The tardiness time is defined as the time delay from the appearance of a raw product at the head of the scheduling list to its first appearance at the entrance conveyor belt. The average tardiness time is displayed in
It is obvious that test 23 has a significant shorter completion time than test 3 by the evaluation of completion time. Synthesizing the results from
Comparison of test 3 and test 23 shows that if the average tardiness time is viewed as a sequence, the limit of the sequence is decided by the physical structure of the system, but not the number of the product.
Based on the analysis and simulation results of different nodes in the system, it is found that the topological structure has a significant influence. A flexible manufacturing system can be controlled and improved more conveniently by a corresponding machine learning algorithm due to the more reasonable nodes topological structure in the design process. In return, an intelligent machine learning algorithm can enhance the design of a flexible manufacturing system.
It is evident that the best position for a unique workstation is by node nine if there is only one kind of a product. At that moment, the complexity of the system is reduced to a simple manufacturing system. Even though work station one is fixed by the conveyor belt between node 2 and node 4, the system behaviour (e.g. completion time, average tardiness time) will be totally changed.
In this paper, the model of a multi-branches complex manufacturing system is elaborated and formulated. Then a hybrid algorithm integrating messy GA and RL algorithm is proposed to control the fabrication process based on system structural analyses and recipe specification. Messy GA is employed to represent route options, and RL algorithm is used to evaluate and to improve system performance. Next, implementation of the hybrid control algorithm is simulated on IEC 61131 environment. Results of the simulation prove that the algorithm can significantly cut down the time consumption.
Besides that, the result can be beneficial for the system design in the topological structure level. The position of workstations and nodes can be better planned to cut down the time consumption on the route planning or to avoid conflicts between two or more products when they need to pass through one node.
Further research should focus on two prospects. First, it is necessary to simulate more complicated scenarios in a production session which can put forward the control policy for the complex manufacturing system. For instance, some products only pass workstation 1, and some products pass both workstations in one process. Besides, the production order is uncertain, and needs to be scheduled by the control system. Second, it will be beneficial to find the optimal position of a workstation without changing the structure of a system.
Li, H. (2017) Improve the Performance of a Complex FMS with a Hybrid Machine Learning Algorithm. Journal of Software Engineering and Applications, 10, 257-272. https://doi.org/10.4236/jsea.2017.103015