Cognitive optimal-setting control of AIoT industrial applications with deep reinforcement learning
Published in IEEE Transactions on Industrial Informatics, 2020
Abstract
This paper addresses the challenge of overfitting in deep reinforcement learning (DRL) for industrial mechanical control, a problem that often leads to unstable control and increased risk in production. We propose an expected advantage learning (EAL) method to moderate the maximum value in expectation-based DRL. Our approach incorporates a tanh softmax policy, replacing the traditional sigmoid function with a tanh function as the activation value for the softmax function. This modification enables our proposed method to effectively reduce value overfitting during the cognitive computing phase.
In our experiments, we evaluated the performance of our EAL method against the traditional Deep Q-Network (DQN) and the advantage learning (AL) algorithm across several key metrics: total score, total steps, average score, and highest score per episode. The results demonstrate a notable improvement. Compared to the AL algorithm, our proposed EAL method increased the total score by 6% over the same number of training episodes. This indicates that the action probability distribution generated by our EAL method offers superior performance for optimal control settings in industrial applications, outperforming conventional softmax strategies.
