This paper presents a de-novo computational design method driven by deep reinforcement learning to achieve reliable predictions and optimum properties for periodic microstructures. With recent developments in 3-D printing, microstructures can have complex geometries and material phases fabricated to achieve targeted mechanical performance. These material property enhancements are promising in improving the mechanical, thermal, and dynamic performance in multiple engineering systems, ranging from energy harvesting applications to spacecraft components. The study investigates a novel and efficient computational framework that integrates deep reinforcement learning algorithms into finite element-based material simulations to quantitatively model and design 3-D printed periodic microstructures. These algorithms focus on improving the mechanical and thermal performance of engineering components by optimizing a microstructural architecture to meet different design requirements. Additionally, the machine learning solutions demonstrated equivalent results to the physics-based simulations while significantly improving the computational time efficiency. The outcomes of the project show promise to the automation of the design and manufacturing of microstructures to enable their fabrication in large quantities with the utilization of the 3-D printing technology.

Advancements in 3-D printing technology enabled the automation of the design and manufacturing of high-performance engineering materials. For example, computational models and machine learning (ML) techniques have an increasingly critical role in the design of 3-D printed microstructures as testing all possible microstructural combinations with a purely experimental approach remains infeasible. Accordingly, the present work focuses on the development and integration of ML methods into a physics-based simulation environment to enable and accelerate the design of 3-D printed multi-phase microstructures for different engineering applications. The current 3-D printing technology can produce structures that are made of multiple phases by utilizing different base materials. The multi-phase structures can provide complementary properties by balancing the desired features of the base materials. Our goal in this study is to find the optimum material properties of 3-D printed microstructures to improve the performance of engineering components under mechanical and thermo-mechanical loads and dynamic effects. The design optimization is first performed using a finite element model. Although finite element methods produce high-fidelity solutions, the computation times required for design studies may be excessive. Therefore, this study investigates the compatibility of ML and design optimization to produce computationally efficient results while maintaining a high level of accuracy.

The ML paradigm has attracted a lot of interest from materials modeling and design communities [

3-D printed multi-phase microstructures are assumed to demonstrate orthotropic material properties. Due to the directional independent nature of orthotropic properties, there are 9 independent variables that make up the compliance matrix in the global frame. For assembly and solution purposes, the inverse of the compliance matrix is used to construct the stiffness matrix which will be used in the local and global finite element system. The stress--strain equation for a three-dimensional orthotropic material is given as follows:

where the stiffness coefficients (_{ij} values) are calculated from the orthotropic material properties (Young’s modulus values: _{xx}, _{yy}; Poisson’s ratios: _{xy}, _{yz}). This relationship is then used to solve an orthotropic plate problem that describes a periodic multi-phase microstructure. Carbon Epoxy and Boron Epoxy are used as base materials of the microstructure; therefore, the design is expected to demonstrate properties that range within the property values of the base materials. Carbon Epoxy and Boron Epoxy material properties can be found in Reference [

Property | Value in carbon epoxy | Value in boron epoxy |
---|---|---|

_{xx} |
159 GPa | 224.6 GPa |

_{yy} |
14 GPa | 12.7 GPa |

_{zz} |
14 GPa | 12.7 GPa |

_{xy} |
4.8 GPa | 4.4 GPa |

_{zx} |
4.8 GPa | 4.4 GPa |

_{yz} |
4.3 GPa | 2.4 GPa |

0.32 | 0.35 | |

0.14 | 0.01 | |

0.14 | 0.01 |

The first problem aims to minimize the maximum VMS value experienced by the microstructure under tensile loads (

The objective function is defined as the minimization of the maximum VMS value and

The second problem aims to minimize the maximum VMS value experienced by the microstructure under thermo-mechanical loads. The microstructure is modeled as a plate that represents a low-Earth orbit satellite component. The change of temperature is assumed to be 293 degrees according to the data presented in Reference [

BCs | _{x} |
_{y} |
_{z} |
_{xy} |
_{zx} |
_{yz} |
_{zy} |
_{zx} |
_{yz} |
_{xx} |
_{yy} |
VMS (Mechanical) | VMS\break (Mechanical + Thermal) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

FFRR | 159000 | 12705 | 12700 | 4477 | 4400 | 2400 | 0.32 | 0.07 | 0.08 | 4E–7 | 2.8E–5 | 1138.14 | 1136.00 |

RFRR | 159000 | 12705 | 12700 | 4477 | 4400 | 2400 | 0.32 | 0.07 | 0.08 | 4E–7 | 2.8E–5 | 1138.14 | 1135.85 |

RFFR | 159000 | 12723 | 12700 | 4800 | 4400 | 2400 | 0.32 | 0.14 | 0.14 | 4E–7 | 2.8E–5 | 1114.33 | 1112.07 |

FRRR | 159000 | 12705 | 12700 | 4477 | 4400 | 2400 | 0.32 | 0.01 | 0.14 | 4E–7 | 2.8E–5 | 1138.14 | 1135.86 |

RRRR | 159000 | 12724 | 12700 | 4800 | 4400 | 2400 | 0.32 | 0.04 | 0.12 | 4E–7 | 2.8E–5 | 1120.65 | 1118.39 |

RRFR | 159000 | 12723 | 12700 | 4800 | 4400 | 2400 | 0.32 | 0.14 | 0.14 | 4E–7 | 2.8E 5 | 1114.33 | 1112.07 |

FRRF | 159000 | 12700 | 12700 | 4401 | 4400 | 2400 | 0.32 | 0.01 | 0.01 | 4E–7 | 2.8E–5 | 1142.56 | 1140.27 |

RRRF | 159000 | 12724 | 12700 | 4800 | 4400 | 2400 | 0.32 | 0.14 | 0.14 | 4E–7 | 2.8E–5 | 1120.64 | 1118.39 |

RRFF | 159000 | 12723 | 12700 | 4800 | 4400 | 2400 | 0.32 | 0.14 | 0.14 | 4E–7 | 2.8E–5 | 1114.33 | 1112.07 |

where

The last application in this study is the optimization of the periodic microstructure to enhance the natural frequency of a plate that is assumed to be a nano-satellite component. Particularly, the natural frequency of a 2-Unit (with dimensions of 20 cm × 10 cm × 10 cm) CubeSat structure is considered. Natural frequency is an important performance metric as undesired resonance can lead to the failure of satellite components. The most important vibration implications may occur during the rocket launch. To account for the effects of the vibrations, the natural frequency of a 2-Unit CubeSat is optimized using the finite element scheme. The example CubeSat is chosen as a sample QB50 satellite of the European Union. Accordingly, the first natural frequency of a QB50 satellite should be more than 200 Hz for safety purposes [

where

In this equation,

BCs | _{x} |
_{y} |
_{z} |
_{xy} |
_{zx} |
_{yz} |
_{zy} |
_{zx} |
_{yz} |
Density (g/cm^{3}) |
Frequency (Hz) |
---|---|---|---|---|---|---|---|---|---|---|---|

FFRR | 160163 | 14000 | 12700 | 4800 | 4400 | 2400 | 0.32 | 0.01 | 0.01 | 1.6 | 244.0264 |

RFRR | 160094 | 14000 | 12700 | 4800 | 4400 | 2400 | 0.32 | 0.01 | 0.01 | 1.6 | 243.9832 |

RFFR | 159103 | 12818 | 12700 | 4800 | 4400 | 2400 | 0.32 | 0.14 | 0.14 | 1.6 | 242.5780 |

FRRR | 160310 | 14000 | 12700 | 4800 | 4400 | 2400 | 0.32 | 0.01 | 0.14 | 1.6 | 244.1196 |

RRRR | 224600 | 14000 | 12700 | 4800 | 4400 | 3316 | 0.32 | 0.09 | 0.14 | 1.6 | 281.9488 |

RRFR | 160255 | 14000 | 12700 | 4800 | 4400 | 2400 | 0.32 | 0.01 | 0.01 | 1.6 | 244.0849 |

FRRF | 160163 | 14000 | 12700 | 4800 | 4400 | 2400 | 0.32 | 0.01 | 0.01 | 1.6 | 244.0264 |

RRRF | 160258 | 14000 | 12700 | 4800 | 4400 | 2400 | 0.32 | 0.01 | 0.01 | 1.6 | 244.0867 |

RRFF | 159105 | 12810 | 12700 | 4800 | 4400 | 2400 | 0.32 | 0.14 | 0.14 | 1.6 | 242.5747 |

To enhance the computational efficiency of the design solutions, concepts of ML (

The Deep RL algorithm in this work is based on the advantage actor critics RL, which is also known as Advantage Actor-Critic (A2C). Actor critics systems receive information from their external environment and select an action based on that information. After performing a specific action, the environment returns feedback, or reward in RL, to the actor. Furthermore, the critics use the state of the environment and output of the actor as its input, then evaluates how well the actor’s action is under the current environment. The critic output is similar to the reward that directly comes from the environment since both the reward and critics evaluate the actor output based on the current environment. After receiving the reward and evaluation from the critics, the actor adjusts itself and learns what action provides maximum reward under different kinds of environments. After many times of training (it took 200 iterations to train deep RL programs in this study), the actor has a very high probability of selecting the best action under a specific environment. The workflow of a Deep RL framework is shown in

In this study, the material properties used in the finite element simulations define the environment for Deep RL. Deep RL starts at a random combination of material properties within the given range. The algorithm searches for the optimum combination of material properties by selecting different actions that either increase or decrease one of the material properties. After each action, the Deep RL program receives either a positive or a negative reward and then adjusts itself based on the reward so that it chooses a better action next time if it encounters a similar environment. The program selects an action, receives a reward, then adjusts itself based on the reward and selects an action again. The more iterations that the program runs, the more accurate the selected action. When the advantage is zero, RL stops adjusting itself and the combination of material properties, which is also the environment, and converges to the optimum properties.

The main structure of the A2C for two convolutional neural networks is defined as follows: one of the neural networks serves as an actor, and the other one serves as the critics. The actor convolution neural network has one input layer, two hidden layers, and one output layer. The inputs of the actor convolutional neural network are the material properties, which also define the environment, and the outputs of the actor are the probabilities of selecting different actions. Additionally, the critics system is designed using a convolutional neural network that consists of one input layer, one hidden layer, and one output layer. The critics network takes the actor output and current environment (material properties) as its input it outputs a Q-Value that evaluates the goodness of the actor output [

For each action, the Deep RL receives a reward. The two convolutional neural networks can adjust themselves based on one reward from the environment, or they can adjust after selecting several actions and receiving several rewards from the environment. In this project, the one-step Monte Carlo return is chosen. In

The goal for the adjustment of two convolution neural networks is to minimize the loss function (

In this study, the Deep RL is initialized with randomized values for properties within the ranges of base material properties. Next, the actor takes the current material properties as input and produces an output that determines whether to increase or decrease each of the material properties. Backpropagation of convolutional neural networks only takes place after receiving the reward. The design problems discussed in Section 2 are then solved using the Deep RL framework. The changes in the objective function values are shown in

In problem-1, the randomly generated initial data point provided a VMS value around 1121.5 MPa. The Deep RL successfully learned and determined the optimum choice for the material properties. Although the Deep RL aims to learn the data to make the best decision, the output of the actor is the probability of increasing or decreasing one material property, which means that there is still a small chance for the program to select a wrong action. After 200 iterations, the minimum VMS was found as 1116.507 MPa. This result is the same as the finite element-based optimization solution.

In problem-2, the final result produced by the Deep RL for VMS is 1114.1 MPa, while the finite element-based optimization solution was found as 1118.4 MPa. The Deep RL provided a better optimum solution in this problem due to the limitations arising from the gradient-based optimization algorithm (SQP) used with the finite element solution, such as the likelihood of producing local minimum solutions rather than the global solution. The gradient-based optimization is significantly dependent on the selection of the initial design point. For instance, with the change of the initial guess, the optimization result of the finite element solution also converged to 1114.1 MPa in this problem.

In problem-3, the input environment consisted of 10 variables and the output used 20 probabilities that involve increasing or decreasing material properties.

In this study, a value-based approach, called Q-Learning is chosen as the second RL strategy. The Q-Learning algorithm utilizes two inputs from the environment-the state and the action-and returns the expected future reward of that action and the state pair. The driving equation behind this algorithm is the Bellman Equation [

The Bellman equation yields a “Q-Value,” which is a measure of the quality of performing an action at a given state. These values are stored in a Q table that is continually updated until the learning has stopped or the values have converged. The rows in the Q table are the states and the columns are the actions. The highest values in any row define the best action at that particular state.

An approach used to determine which action gets chosen at each episode in which a Q-Value is calculated is called Epsilon-Greedy. This equation is used to balance exploration-choosing actions at random- and exploitation-choosing actions based on their Q-Value. The rate of exploration is determined by the exploration rate of

The Q-Value RL model consists of a separate function that maps each action to the appropriate next state and objective function. The objective function given by the action is then used in a reward structure to assign the correct reward value for the action that was chosen. The next state was determined based on the action that was taken. Both the next state value and the reward value were then used to compute the Q-Value for the current state-action pair and added to the Q table. The process was repeated for 1000 iterations to allow the table to converge.

This code laid a foundational structure for the design problems involving an orthotropic material. For the first problem of finding the optimum material properties, vectors that equally distribute the range of the 9 material properties across their respective ranges were created. For computation purposes, the increments were made to only increase by a value of 20%. The Q table was set up so that each material property increment would be a state and a total of 18 actions resembled increasing or decreasing each of the 9 material properties. In this case, there were 6 total states and each odd action would increase the material property by one increment, and even actions would decrease the material property by one increment. To keep track of where each material property within their respective ranges, an index vector was created.

To map each action to the correct next state and to calculate the resulting VMS, a new function was created. This function takes in the state and action pair as well as the index vector to properly assess each state and change the index vector according to the action that was called. One important note is that this function starts the index vector according to the state that is selected. This means that if the third state is called, then all index vector entries will be the value of three. Then, based on the action value associated with a material property, the corresponding index vector will change. Since there are edge cases, the first and last states will not allow for decreasing (first state) or increasing (last state) the index vector. Each new state is the next successive state until the sixth and final state, wherein the next state is randomly chosen. Finally, the VMS is calculated based on the new set of index vector variables.

The VMS that is given by the function is used in the reward structure to aid in the calculation of the Bellman equation. It is also an integral part of the quality of the Q table that is produced. The goal of this particular problem was to find the minimized value of the maximum VMS and, thus, the reward structure had to reflect that. Two different types of reward structures were used to see the effect of reward structure on Q table quality and convergence. The first structure was a simple reward structure that would compare the new VMS value with the old VMS value. This is named the simple reward structure because the rewards are allocated using a simple principle-if the new VMS is greater than the old, a negative reward is given. Similarly, if the new VMS is less than the previous value, a positive reward is given. The other reward structure used both the previous and new, or currently calculated VMS, then used relative error to decide how rewards were given. If the relative error between the new and previous VMS values were less than or equal to 10 percent, then a positive reward was given. If the error was greater than 10 percent, then a negative reward was given. The relative error equation is given below for reference.

Each of the two reward structures was compared against each other to see their effects on the Q table quality and convergence. In terms of quality, the simple reward structure gave smaller Q Values versus the relative error structure. This made distinguishing the best actions for each state a bit harder as compared to the relative error reward structure. The convergence of both structures was also very similar in that they took roughly the same amount of time to complete the desired number of iterations. Overall, they performed similarly, giving the same Q table trends when compared against each other.

The same Q-Value RL code was used to complete the second and third optimization problems for the orthotropic material. Minor adjustments were made to accommodate the new quantities, but each code is built on the same foundation of Epsilon-Greedy and a “model” function to map the states to actions of the different material properties.

The main purpose of the Q-Value RL framework is to serve as a verification tool and supplementary code that confirms results achieved through the finite element and Deep RL solutions. Rather than finding one minimized or maximized property, the Q-Value RL code serves to narrow down the range of each property value to find the optimum values to minimize VMS and maximize natural frequency. The next section discusses the results obtained by the RL Code for each of the three problems involving the orthotropic material microstructure.

The Q-Value RL framework was run for the three design problems (problem-1, problem-2, and problem-3). The optimum design parameters obtained by Q-Value RL using the RRRR boundary conditions are shown in

Material property | Trial 1 | Trial 2 | Trial 3 |
---|---|---|---|

_{x} |
159000–161624 | 159000 | 159000–16124 |

_{y} |
13740 | 13948–14000 | 13740 |

_{z} |
12700 | 12804–12856 | None |

_{xy} |
4720 | 4720 | 4720–4736 |

_{yz} |
2400–2476 | 2400 | 2400 |

_{zx} |
4416 | 4400–4416 | 4400–4416 |

_{xy} |
0.3060–0.3088 | 0.3060 | 0.3060–0.3088 |

_{zx} |
0.01 | 0.0256–0.0308 | 0.0152–0.0204 |

_{yz} |
0.0152–0.0204 | 0.01 | 0.01 |

When comparing the results of the RL code and the optimization solution for problem-1, the results of the RL code are very similar to the finite element based optimization solution, except

For problem-2, the material properties are not quite like the results of the finite element-based optimization solution. Some material properties were close, but others varied slightly. It is believed that the reason for this discrepancy is due to the separation of the y- and z-direction values that may have caused the fluctuations in the VMS and thus caused Q Values to be manipulated.

Material property | Trial 1 | Trial 2 | Trial 3 |
---|---|---|---|

E_{x} (MPa) |
161624–164248 | 172120 | 164248–166872 |

E_{y} (MPa) |
13792–13844 | 13740 | 13740 |

E_{z} (MPa) |
12752–12804 | 12700 | 12752–12804 |

G_{xy} (MPa) |
4720–4736 | 4720–4736 | 4720 |

G_{yz} (MPa) |
2400 | 2552–2628 | 2476–2552 |

G_{zx} (MPa) |
4400 | 4400 | 4400 |

v_{xy} |
0.3060–0.3088 | 0.3060–0.3088 | 0.3088–0.3116 |

v_{zx} |
0.01 | 0.01 | 0.01 |

v_{yz} |
0.01 | 0.01 | 0.0256–0.0308 |

α_{xx} (1/C°) |
4 × 10^{−7} |
4 × 10^{−7} to 1.22 × 10^{−6} |
4 × 10^{−7} |

α_{yy} (1/C°) |
2.4 × 10^{−5} to 2.5 × 10^{−5} |
2.3 × 10^{−5} to 2.4 × 10^{−5} |
2.3 × 10^{−5} |

α_{xx} (1/C°) |
2.5 × 10^{−5} to 2.6 × 10^{−5} |
2.3 × 10^{−5} |
2.3 × 10^{−5} |

Material property | Trial 1 | Trial 2 | Trial 3 |
---|---|---|---|

_{x} |
159000–161624 | 159000 | 159000 |

_{y} |
13740 | 13740–13792 | 13740 |

_{z} |
12752–12804 | 12700 | 12700–12752 |

_{xy} |
4720 | 4720 | 4720 |

_{yz} |
2476–2552 | 2400 | 2476–2552 |

_{zx} |
4400 | 4416–4432 | 4400 |

_{xy} |
0.3088–0.3116 | 0.32 | 0.3060–0.3088 |

_{zx} |
0.01 | 0.01–0.0152 | 0.0152–0.0204 |

_{yz} |
0.0152–0.0204 | 0.01 | 0.01–0.0152 |

p (g/cm^{3}) |
1.60–1.88 | 1.60 | 1.60 |

For problem-3, the results from the three trials show that the Q-Value RL code was able to successfully match with the optimization solution with minor discrepancies. This is mainly due to the large number of iterations showing a clear trend in the Q table for this problem.

Though the Q-Value RL code functions properly, there are still a few disadvantages of using this Q-Value RL approach to find the optimum material properties for minimizing VMS and maximizing natural frequency. The first is that the computation time using the finite element simulations is excessive. Therefore, an ANN surrogate model is utilized for this study to obtain the results with the Q-Value RL framework. Another issue is that the framework does not provide a specific value for the optimum material properties. It is only capable of showing tendencies for each material property within their respective ranges. It also does not show the minimum VMS, or maximum natural frequency, that is computed using optimum properties. Overall, it is a good validation technique for the development of more complex calculation methods and shows that rudimentary RL concepts can be applied to finite element-based design practices.

Due to the iterative nature of optimization solutions, design studies can require significant computational times, especially for cases that involve multi-variable, non-linear problems like the finite element modeling optimization problems in this study. Therefore, this paper investigates and compares the benefit of incorporating ANNs with different network structures to estimate optimum microstructure designs. Instead of solely using the functions generated from finite element modeling, a neural network is also implemented with the SQP algorithm to find the desired material property combinations. Then, the differences between the estimated neural network and actual optimum values are compared against each other to analyze the integrity of using artificial networks to reduce the computational times for designing multi-phase microstructures.

Before creating the neural network, 1000 sample data points are collected using the finite element model. These input data points are chosen randomly within the given material property ranges referenced in

Additionally, the actual finite element results are tested against single-layer neural networks containing 10 and 20 nodes. Similar trials are repeated with double layer networks containing 5 and 10 nodes. All neural network training is done using the Levenberg–Marquardt algorithm to fit the input and output data over 1000 epochs. Once the neural networks are trained, the function is put into an optimizer to determine the desired material property depending on the problem statement, which may utilize the maximum or minimum of the design function. For the three problems investigated in this study, the SQP algorithm was able to satisfy both conditions. Although the SQP optimizer was able to produce a result close to the actual values, it seemed to bounce between different answers depending on the initial guesses. This behavior suggested that the relationship between the material inputs and outputs contained several local minimum points, and the program would get stuck in one of them if given an uninformed initial point. Similar behavior was also referenced in the deep reinforcement algorithm discussed in Section 3.1. Thus, a GlobalSearch object was utilized to find a global minimum consistently for the optimization step of the problem.

Overall, the artificial networks were highly successful in predicting the correct output values for all three problems. In fact, after a certain number of data points in the training set, the neural networks consistently produced values within the target data. Furthermore, after running the network function into an optimizer, many of the cases had results within 1% of the expected value found from traditional methods. The best neural network architecture investigated in the study contained one layer with 20 nodes since it provided accurate results without overfitting the training data and was relatively quick to run compared to the two-layer network.

The ANN framework was run for the three optimization design problems discussed in the previous sections (Mechanical Performance, Thermo-mechanical Performance, and Natural Frequency). For problem-1, the results from optimization converged to the expected value for training sets containing around 400 data points. Problem-2 converged at around 300 data points. Finally, problem-3 converged at around 400 data points. These values suggest that to have a satisfactory artificial neural network, the training data set must be at least 400 data points. The average error and convergence of the objective functions as a function of the number of training data points is visualized in

After comparing the Q-Learning, Actor-Critic, and ANN solutions to the finite element analysis calculations, ML consistently demonstrated that artificial intelligence can accurately predict optimum results. In particular, the ANN saved much more calculation time compared to the finite element methods since it only required training one time before generating a vector function to estimate the relationship of the input and output variables. After collecting 1000 sample points for training, the framework can create and optimize 20 individual networks for 3 trials in under 10 min on a basic computational platform that utilizes a single processor. Compared to the computational times needed to run the finite element-based simulations, this new method is much faster while keeping high accuracy in the results.

The summary of the research outcomes and potential extensions for future work are outlined below:

A finite element analysis library that can analyze the deformation behavior and natural frequencies of periodic microstructures under coupled mechanical and thermal loads is developed.

A Deep RL framework is developed and used to generate results that are compatible with the high-fidelity finite element solutions.

The Q-Value RL is investigated to significantly narrow down the potential solutions for optimum designs while eliminating a significantly large number of microstructural degrees of freedom.

An ANN framework is developed to improve the computation times of the proposed ML strategies, including the concept of Deep RL.

Many different settings of the proposed three ML frameworks are investigated to better understand their nature and applicability to different design problems.

The results are demonstrated for multi-phase microstructure design for various objectives, as specified in the Mechanical, Thermo-Mechanical, and Natural Frequency problems. Although the number of design variables was limited in the application problems, all different frameworks proposed and developed in this work are still applicable to the microstructure optimization problems with larger design spaces. Therefore, the future work for this topic will include an increased focus on the extension of the presented ML-driven design strategy to optimize multi-phase materials in larger length-scales by utilizing solution spaces that involve millions of variables.