iconOpen Access

ARTICLE

crossmark

Research on Maneuver Decision-Making of Multi-Agent Adversarial Game in a Random Interference Environment

Shiguang Hu1,2, Le Ru1,2,*, Bo Lu1,2, Zhenhua Wang3, Xiaolin Zhao1,2, Wenfei Wang1,2, Hailong Xi1,2

1 Equipment Management and UAV Engineering College, Air Force Engineering University, Xi’an, 710051, China
2 National Key Laboratory of Unmanned Aerial Vehicle Technology, Xi’an, 710051, China
3 College of Information Technology, Nanjing Police University, Nanjing, 210023, China

* Corresponding Author: Le Ru. Email: email

(This article belongs to the Special Issue: Advanced Data Science Technology for Intelligent Decision Systems)

Computers, Materials & Continua 2024, 81(1), 1879-1903. https://doi.org/10.32604/cmc.2024.056110

Abstract

The strategy evolution process of game players is highly uncertain due to random emergent situations and other external disturbances. This paper investigates the issue of strategy interaction and behavioral decision-making among game players in simulated confrontation scenarios within a random interference environment. It considers the possible risks that random disturbances may pose to the autonomous decision-making of game players, as well as the impact of participants’ manipulative behaviors on the state changes of the players. A nonlinear mathematical model is established to describe the strategy decision-making process of the participants in this scenario. Subsequently, the strategy selection interaction relationship, strategy evolution stability, and dynamic decision-making process of the game players are investigated and verified by simulation experiments. The results show that maneuver-related parameters and random environmental interference factors have different effects on the selection and evolutionary speed of the agent’s strategies. Especially in a highly uncertain environment, even small information asymmetry or miscalculation may have a significant impact on decision-making. This also confirms the feasibility and effectiveness of the method proposed in the paper, which can better explain the behavioral decision-making process of the agent in the interaction process. This study provides feasibility analysis ideas and theoretical references for improving multi-agent interactive decision-making and the interpretability of the game system model.

Keywords


1  Introduction

Multi-agent systems play a crucial role in modern complex systems, such as autonomous driving of unmanned vehicles, energy management in smart grids, financial market trading strategies, and coordinated confrontation of drone swarms. In these domains, multiple agents need to coordinate and make decisions to achieve optimal overall benefits. The coordinated confrontation of drone swarms in the aerial domain, as a typical application, involves not only various maneuvers and tactical strategies of the agents but also the interactive behaviors of both sides in a constantly changing dynamic environment. Each agent must make real-time decisions in a complex and dynamic environment to ensure maximum overall effectiveness. Ensuring that agents make optimal decisions in dynamic environments has gradually become a hot topic in current academic research.

In the aerial confrontation scenario, the strategy interaction and behavior decision-making of agents are highly complex and rapidly dynamic macro-process [1]. Influenced by the dynamic changes of the situational environment, the asymmetry of the game player’s perceived information, the random and sudden situation, other external interference, and many other factors, the game evolution process of agents is highly uncertain [24]. These uncertain emergencies are difficult to predict effectively, which will directly affect the strategy formulation and action execution of game players and have a significant impact on the final results of the game. The level of decision-making and the formulation of operational strategies can largely determine the outcome of the game and even affect the development of the macro confrontation scenario [5]. Therefore, conducting in-depth research on the behavioral strategy evolution mechanisms of aerial game agents in environments with random disturbances, exploring the behavioral learning processes of intelligent agents, and identifying and predicting potential adversarial mechanisms and behavior patterns can lead to a better understanding and prediction of various random emergent situations in real-world aerial confrontation scenarios. This research can reveal how tactics and strategies of intelligent agents undergo natural selection and evolution under different environments and conditions. Consequently, it can aid in the development of more efficient and intelligent decision support systems, enabling strategies to adapt to continuously changing situational environments. This, in turn, enhances the adaptability and flexibility of autonomous decision-making systems, providing more accurate decision recommendations and theoretical foundations for command-control personnel to formulate more effective counter-strategies. Such advancements hold significant theoretical value and practical significance for the improvement and development of multi-agent decision-making systems in the real world.

Until now, numerous scholars have conducted research and achieved corresponding results in solving decision-making problems in aerial simulation confrontation scenarios. Specific solutions include differential games, matrix games, deep reinforcement learning, heuristic algorithms [6,7], and others. In terms of differential games, relevant literature describes the strategies and behavioral state changes of both adversarial parties in continuous time by establishing differential models of dynamic systems. In the 1960s, Isaacs [8] firstly studied the pursuit-evasion maneuvering decision problem based on differential games from a mathematical perspective and proposed an analytical method for optimal strategies. Although the research results were not entirely satisfactory due to model simplifications and the limitations of mathematical methods, it provided inspiration for future studies. Garcia et al. [9] investigated the active defense cooperative differential game problem involving three game participants, focusing on the target differential game problem of active control. Park et al. [10] proposed a method based on differential game theory to study within-visual-range air combat for Unmanned Combat Aerial Vehicles (UCAVs). This algorithm employs a hierarchical decision-making structure and uses differential game theory to compute the scoring function matrix, thereby solving for the optimal maneuvering strategies to address dynamic and challenging combat situations. These studies primarily analyze the aerial combat process in continuous time and space, enabling precise simulation of dynamic changes during air battles. However, the modeling process is complex, the models are highly dependent, the solution difficulty is high, real-time performance is low, and it is challenging to address multi-dimensional and multi-variable problems.

In terms of matrix games, related research employs a discretization approach to describe the strategy space and payoffs of the opposing sides. Austin et al. [11] was the first to utilize matrix game methods to study one-on-one low-altitude air combat maneuvering decisions in a hilly battlefield environment. Li et al. [4] proposed a constrained-strategy matrix game approach for generating intelligent decisions for multiple UCAVs in complex air combat environments using air combat situational information and time-sensitive information. Li et al. [12] proposed a dimensionality reduction-based matrix game solving algorithm for solving large-scale matrix games in a timely manner. The modeling process of matrix games is relatively straightforward, allowing for an intuitive demonstration of the strategy choices and payoff situations of both players. However, it heavily relies on prior knowledge, lacks flexibility in application, and struggles to effectively handle continuous state and action spaces. Additionally, it is unable to respond in real-time to complex dynamic adversarial environments, making it challenging to solve large-scale scenario problems.

In terms of deep reinforcement learning, relevant research primarily utilizes deep neural networks and reinforcement learning algorithms, which are capable of addressing adversarial problems in high-dimensional and complex environments. Cao et al. [13] addressed the problem that UCAVs are difficult to quickly and accurately perceive situational information and autonomously make maneuvering decisions in modern air wars that are susceptible to the influence of complex factors, and proposes an UCAVs maneuvering decision-making algorithm combining deep reinforcement learning and game theory. Liles et al. [14] developed a Markov Decision Process model of the stochastic dynamic allocation problem for the efficient and intelligent multi-program allocation decision-making problem faced by air force combat managers and uses approximate dynamic programming techniques to learn to find a high-quality solution for the Air Battle Management Problem (ABMP). Léchevin et al. [15] proposed a hierarchical decision-making and information system designed to provide coordinated aircraft path planning and deception warfare missions in real time. Zhang et al. [16] constructed beyond-visual-range air combat training environment and proposed a heuristic Q-network method that incorporates expert experience. This method aims to improve the efficiency of reinforcement learning algorithms in exploring strategy spaces, achieving self-learning of air combat maneuver strategies. Li et al. [17] proposed a UCAV autonomous maneuvering decision-making method based on game theory that considers the partially observable state of the adversary. Poropudas et al. [18] presented a new game-theoretic approach for validating discrete-event air combat simulation models and simulation-based optimization. Sun et al. [19] proposed a novel Multi-Agent Hierarchical Policy Gradient (MAHPG) Algorithm that learns various strategies and outperforms expert cognition through adversarial self-game learning. Deep reinforcement learning exhibits strong adaptability, allowing for continuous improvement of strategies during the training process. It is capable of handling problems with continuous time and continuous state spaces. However, the training process requires substantial amounts of data and computational resources. The results of model training are highly dependent on the quality and diversity of the training data, which may lead to overfitting or underfitting issues. Furthermore, the “black box” nature of the model limits its credibility and interpretability, making it challenging to explain and understand the specific formation process of the strategies.

Additionally, there are other relevant literature analyses that discuss the dynamic evolution process of aerial adversarial games. Hu et al. [20] studied the evolutionary mechanism of Unmanned Aerial Vehicle (UAV) swarm cooperative behavioral strategies in communication-constrained environments. Gao et al. [21] proposed an evolutionary game theory-based target assignment method for multi-UAV networks in 3D scenarios. Sheng et al. [22] proposed a posture evolution game model for the UAV cluster dynamic attack-defense problem. However, these studies did not take into account the interference of random environmental factors during confrontation.

To a certain extent, these methods provide effective solutions for multi-agent game decision-making, but there are also some limitations. In particular, the existing research results are mainly based on the premise of complete rationality of the players, focusing on solving the optimal strategy in a particular moment or situation, i.e., Nash equilibrium, failing to give full consideration to the dynamic changes in the process of confrontation, and neglecting the stochastic complexity of the environment and the uncertainty of information interference, lacking the analysis of the specific influencing factors of the dynamic game process and unable to analyze and study the whole process of the player’s coping strategy selection from a global perspective. In contrast, stochastic evolutionary game theory takes into account the bounded rationality and dynamic learning processes of agents, as well as the impact of random disturbances. This allows for the simulation of participants’ strategy selection processes over time, reflecting the dynamic changes in the maneuvering behaviors of agents in aerial confrontation. Therefore, this paper investigates the strategy interactions and decision-making behaviors of bounded rationality game agents in simulated adversarial scenarios under the influence of random disturbances, based on stochastic evolutionary game theory. Specifically, the main research content and contributions of this paper are as follows:

(1) Considering the risks that random disturbances in complex situational environments may pose to the autonomous decision-making of game players, as well as the impact of participants’ maneuvering behaviors on the state changes of game players in simulated adversarial scenarios, a nonlinear mathematical model is established to describe the strategy decision-making process of participants in this context.

(2) Through simulation experiments, the interactive relationships of strategy selection among game players, the stability of strategy evolution, and the dynamic decision-making processes were studied and validated. The study delves into the mechanisms by which certain factors influence agent behavior choices in specific environments. The results indicate that environmental disturbance factors and parameters related to agents’ maneuvering actions have varying impacts on the selection and evolution speed of agents’ strategies.

(3) This research explores the decision-making behaviors of multiple agents in aerial combat games, providing a new perspective for the study of agent strategy selection mechanisms. It offers a reliable quantitative analysis foundation for the observed results and fills a research gap in this field. The findings not only help decision-makers understand the dynamic changes in strategies under different factors and environmental conditions but also provide feasible analytical approaches and theoretical references to enhance the interpretability of multi-agent interactive decision-making and game system models.

2  Maneuvers Discussion

In the multi-agent simulation confrontation scenario, the behavioral decision-making of the agent is a highly complex game process involving the interaction of multiple variables and factors, and the state and behavior of the agent evolve with the escalation of the conflict between both game players and the change of the situation. Among them, the maneuvers executed by the agent are crucial to the implementation of their maneuvering strategies, and the quality of the execution of these maneuvers not only directly affects the effectiveness of the strategies but also has a significant impact on the game state of the entire simulation confrontation.

Maneuvers of the agent refer to the flight techniques and maneuvering strategies adopted by the agent in conflict to gain an advantage, avoid risk, or achieve a specific purpose. Currently, there are two common ways of dividing maneuvers: basic maneuvers based on the operational mode [23], and typical maneuvers based on tactical maneuver theory. The National Advisory Committee for Aeronautics scholars has classified the most commonly used maneuvers into seven basic maneuvers based on the operational mode [24,25] including straight line uniform flight, maximum acceleration flight, maximum deceleration flight, maximum overload left turn, maximum overload right turn, maximum overload climb, and maximum overload dive. Complex maneuvers can be generated by combining and arranging the basic maneuvers in this group. From the point of view of tactical theory and demonstration effect, the typical maneuvers include [26,27]: High-G Turn, Roll, Barrel Roll, Split-S, Scissors, Immelman Turn, Pougatcheff Cobra maneuver, Falling Leaves, Loop, High Yo-Yo, Low Yo-Yo, and others.

However, these two types of maneuvers are still essentially two-dimensional or even multi-dimensional combinations of typical basic maneuvers such as straight flight, turn, pitch, etc., and there are some limitations in the way they are divided. In the division based on the operational mode, this mode mainly focuses on the physical operation level of the agent, oversimplifies the complexity of actual confrontation, ignores the application and effect of these maneuvers in specific situations, and does not fully consider the impact of the differences in the performance of the game players on the use of complex maneuvers. In the division based on the tactical maneuver theory, the approach is highly dependent on the specific environment, equipment performance, and the posture of both sides. It is difficult to comprehensively cover all the possible changes and the opponent’s reaction, and there are discrepancies and lags between daily training and practical application, which cannot reflect the latest tactical dynamics promptly. In general, the two divisions have their focuses, but they have some common problems. In the rapidly developing and highly technological modern confrontation, neither of them can fully cover all possible maneuvers, and the maneuvers are often interrelated and affect each other, so it is difficult for a single division to reflect the interaction and synergistic effect between different maneuvers. To overcome these problems, a more integrated division is needed that takes into account both the practicality and flexibility of the operation and the theoretical and practical applicability of the maneuvers while being constantly updated to accommodate technological developments and the accumulation of practical experience.

Therefore, based on these two classifications, this paper, regarding the maneuver design of the literature [28,29], combines the strategic intent during the interactive confrontation and divides the typical strategy maneuvers commonly used by players in actual confrontation based on the characteristics and expected goals of different maneuvers, which can be classified into three specific categories: offensive maneuvers, defensive maneuvers, and offensive-defensive maneuvers. Offensive maneuvers have the primary goal of maximizing strike effectiveness, focusing on using the performance advantages and firepower systems of the game players to execute precision strikes against incoming targets, mainly including direct flight turns, target tracking, and fire attacks. Defensive maneuvers have the priority goal of ensuring the game players’ safety and survival, focusing on evading the opponent’s attack and protecting the game players from damage. The strategy involves a high degree of maneuverability and the ability to quickly adapt to the environment and mainly includes straight flight turns, danger warnings, and circling evasions. Offensive-defensive maneuvers can be used both as offensive strategies to enhance the effectiveness of the game players in attack, and as defensive strategies to provide effective defense flight maneuvers when necessary. The strategy has a balanced performance in terms of speed, maneuverability, concealment, and reaction speed, the maneuver is mainly loop maneuvers. The specific divisions are shown in Fig. 1.

images

Figure 1: Maneuvers division

3  Deterministic Evolutionary Game Model

In the multi-agent simulation confrontation scenario, both game players must be based on the ever-changing environment and a flexible choice of offensive or defensive strategies to ensure that the benefits of attacking the target and their own safety and survival are maximized. This confrontation is a dynamic decision-making process, both game players will have mutual conversion and attack-defense transposition on the situation. The game players, in the choice of strategy, must weigh the potential payoffs and possible costs in the assumption of a certain risk at the same time and strive to obtain the maximum return to ensure that every maneuver and decision-making can lay the foundation for the ultimate victory. Evolutionary game theory provides a framework for explaining strategy interactions and behavioral decisions in multi-agent systems, and it helps us understand how individuals or groups form and adjust their strategies during long-term interactions by modeling natural selection and learning processes [30].

3.1 Modeling

Assuming that both game players are limited rational groups with autonomous intelligent decision-making abilities, both sides are in the range of each other’s effective firepower in the confrontation process, and their strategy choices have a certain dynamic evolutionary law. Specifically, in the decision-making process, agents are capable of real-time perception of environmental changes and opponent behaviors, independently making judgments and autonomously adjusting strategies. However, constrained by incomplete information acquisition, limited computational capabilities, and finite response times, agents cannot achieve a level of fully rational decision-making. Consequently, collective decision-making often results in locally optimal or near-optimal outcomes under specific conditions. The Multi-agent Interactive Confrontation Game Evolution Model (MAICGEM) can be expressed as a quaternion array, i.e., MAICGEM={U,P,G,S}, The details are as follows:

(1) U=(UAgent_r,UAgent_b) represents both game players involved in confrontation, respectively.

(2) P=(Pr,Pb) denotes the belief space of strategies involved in the game, where the agent_r adopts offensive and defensive maneuvering strategies corresponding to choice probabilities of (x,1x), and the agent_b adopts offensive and defensive maneuvering strategies corresponding to choice probabilities of (y,1y), 0x,y1.

(3) G=(Gr,Gb) denotes the payoff space of the players involved in the game, where Gr denotes the payoff function of the agent_r under different maneuvering strategies, Gb denotes the payoff function of the agent_b under different maneuvering strategies, and both game players obtain different payoffs according to different combinations of strategies.

(4) S=(SiA,SiD) denotes the space of strategy combinations involved in the game, where i=r,b, Sr=(SrA,SrD) represents the offensive and defensive maneuver strategies adopted by the agent_r, respectively, Sb=(SbA,SbD) represents the offensive and defensive maneuver strategies adopted by the agent_b, respectively.

The benefits available to both players under different tactical maneuvering strategies differ, mainly: basic flight gains for straight flight turning maneuvers GiB_F, angular posture gains for target tracking maneuvers GiC_P, fire attack gains under full-process probability conditions for fire attack maneuvers GiMissile, threat determination gains of hazard warning actions GiT_J, maneuvering defense gains from hovering evasive maneuvers GiM_D, balanced gains in offensive and defensive transitions for somersault maneuvers GiO_D_W, and flight fuel consumption costs ViFuel and missile expenditures ViMiss. The specific form is indicated below:

{GiC_P=(1θi+ϑi2π)VitarGiMissile=j=14PijVitarGiT_J=Ti10eθi2πVitarGiM_D=j=14(1Pij)VitarGiO_D_W=ϵ1GiO+ϵ2GiD,(1)

{GiO=GiB_F+GiC_P+GiMissileGiD=GiB_F+GiT_J+GiM_D,(2)

where:

θi is the target azimuth, ϑi is the target entry angle, θi[0,π], ϑi[0,π], the relationship between the two is shown in Fig. 2;

images

Figure 2: Schematic diagram of the simulation confrontation

Vitar is the players’ own inherent value;

Pi1 is the probability that players detect a target;

Pi2 is the surge survival probability of an air-to-air missile;

Pi3 is the probability of hitting the target after an air-to-air missile breakout;

Pi4 is the destructive probability of an air-to-air missile hitting a target;

Ti is the time remaining to evade interceptor missiles;

GiO represents the offensive strategy gains, GiD represents the defensive strategy gains.

ϵ1, ϵ2 are the coefficients of the proportionality of the focus on offensive-defensive action gains in the offensive-defensive switching equilibrium gains, respectively, and ϵ1+ϵ2=1.

In an evolutionary game, both game players no longer take Nash Equilibrium as the final strategy choice, but in the continuous competition, learning, and imitation, the participants with low returns will learn from the high-return players to optimize and improve their own strategies and gradually seek for Evolutionary Stable Strategy (ESS). The attack-defense game tree is shown in Fig. 3.

images

Figure 3: Confrontation game tree

3.2 Payoff Function

From the game tree, there are four scenarios for the strategy combinations of the antagonists: (SbA,SrA), (SbA,SrD), (SbD,SrA), (SbD,SrD), Due to the asymmetry of the maneuvering strategies of the two parties, the benefits that can be obtained by the two players under the conditions of different strategy combinations are different. Combined with the previous assumptions and analysis of the main strategies of both game players, the payoff matrix is established as shown in Table 1.

images

From Table 1, the expected gains of the agent_r’s choice of offensive maneuvering strategy 𝒯11, and the expected gains of the defensive maneuvering strategy 𝒯12, and the average expected return 𝒯1¯ are, respectively:

{𝒯11=y(GrBF+GrCP+GrMissile+GrODWVrFuelVrMiss)+(1y)(GrBF+GrCP+GrMissileVrFuelVrMiss)𝒯12=y(GrBF+GrTJ+GrMDVrFuel)+(1y)(GrBF+GrTJ+GrMDVrFuel)𝒯1¯=x𝒯11+(1x)𝒯12,(3)

Further the dynamic replication equation for the agent_r’s strategy choice can be obtained as:

(x)=dx/dt=x(1x)[yGrO_D_W+GrC_PGrM_D+GrMissileGrT_JVrMiss],(4)

Similarly, the expected gains of the agent_b’s choice of offensive maneuvering strategy 𝒯21, and the expected gains of the defensive maneuvering strategy 𝒯22, and the average expected return 𝒯2¯ are, respectively:

{𝒯21=x(GbBF+GbCP+GbMissile+GbODWVbFuelVbMiss)+(1x)(GbB_F+GbC_P+GbMissileVbFuelVbMiss)𝒯22=x(GbB_F+GbT_J+GbM_DVbFuel)+(1x)(GbB_F+GbT_J+GbM_DVbFuel)𝒯2¯=y𝒯21+(1y)𝒯22,(5)

Further the dynamic replication equation for agent_b’s strategy choice can be obtained as:

(y)=dy/dt=y(1y)[xGbO_D_W+GbC_PGbM_D+GbMissileGbT_JVbMiss],(6)

4  Stochastic Evolutionary Game Model

4.1 Stochastic Evolutionary Game

Although the evolutionary game theory overcomes the shortcoming of the classical game theory which the decision-maker is completely rational, it discusses the strategic behavior in a deterministic system [31], and the multi-agent simulation confrontation is a macro-complex system with uncertainty and randomness in its evolving dynamical mechanism. In actual confrontation, the change of environment and the interference of other external factors have a certain degree of randomness, so the game decision-making is also affected by the dynamic change of the complex environment, and the traditional evolutionary game theory lacks the research on the strategy dynamic evolution of the game players in a random dynamic interference environment, which cannot describe the uncertainty and random dynamics of the complex environment. Gaussian white noise is a nonlinear random noise whose amplitude obeys the Gaussian distribution and power spectral density obeys the uniform distribution, which can better describe the random external environment [32,33]. Therefore, this paper introduces the Gaussian white noise, which is combined with the stochastic differential equations [34] for describing the various types of random interference factors in complex adversarial environments [35]. Considering x,y[0,1], then 1x and 1y are also non-negative, 1x,1y[0,1], it has no effect on the evolutionary outcome of the game system [36]. Combining the above factors, the replicated dynamic Formulas (4) and (6) are simplified as well as improved to obtain the following equations:

dx(t)=x(t)[yGrO_D_W+GrC_PGrM_D+GrMissileGrT_JVrMiss]dt+δ1x(t)dω(t),(7)

dy(t)=y(t)[xGbO_D_W+GbC_PGbM_D+GbMissileGbT_JVbMiss]dt+δ2y(t)dω(t),(8)

where ω(t) is the standard one-dimensional Brownian motion, obeying the normal distribution N(0,t), the motion is a kind of random motion without rules and order, which is able to better portray the influence of various random interference factors in complex adversarial environments; dω(t) denotes Gaussian white noise, which is used to represent various random interference factors in the complex adversarial environment, and its increment Δω(t)=ω(t+h)ω(t) obeys a normal distribution N(0,h) when t>0 and the step size h>0; δ1 and δ2 are normal numbers indicating the intensity coefficient of random environmental disturbances.

Accordingly, it can be seen that Formulas (7), (8) are both one-dimensional Ito^ stochastic differential equations, which represent the stochastic differential replicated dynamic equations of the two game players after being perturbed by various random disturbances in the complex environment, respectively. Ito^stochastic differential formula is a formulation for stochastic calculus that describes the fluctuating variation of a stochastic process [37,38]:

dX(t)=F(t,X(t))dt+G(t,X(t))dω(t),(9)

where t[t0,T], X(t0)=X0, X0R.

Since Ito^ stochastic differential formula is a nonlinear stochastic equation, it cannot be solved analytically directly. So it is necessary to perform a stochastic Taylor expansion which t[t0,T] is equidistributed according to the step size h=(Tt0)/N, N is the number of samples, tn=t0+nh is the nth sampling time point, and n=1,2,,N. Then, a stochastic Taylor expansion of Formula (9) is as follows [39]:

X(tn+1)=X(tn)+Γ0F(X(tn))+Γ1G(X(tn))+Γ11K1G(X(tn))+Γ00K0F(X(tn))+R,(10)

where, Γ0=h, Γ1=Δωn, Γ11=[(Δωn)2h]/2, Γ00=h2/2, K1=G(X)/X, K0=F(X)/X+(1/2)2G(X)/X2, R is the remainder term of the expansion.

Taylor expansion is the basis for the numerical solution of stochastic differential equations. In practice, we generally use discrete stochastic processes to approximate the continuous solutions of stochastic differential equations. There are two main solution methods, Euler method and Milstein method. Euler method is more direct in the solution process, and Milstein method uses Ito^ formula to increase the second-order terms with higher estimation accuracy. Based on this, the stochastic differential equations are solved by Milstein method, they are solved numerically after intercepting some of the terms based on the stochastic Taylor expansion. The details are as follows [39]:

X(tn+1)=X(tn)+hF(X(tn))+ΔωnG(X(tn))+12[(Δωn)2h]G(X(tn))G(X(tn)),(11)

Referring to the Formulas (11), (7) and (8) are expanded and solved to obtain as follows:

xn+1=xn+xn[ynGrO_D_W+GrC_PGrM_D+GrMissileGrT_JVrMiss]h+Δωnδ1xn+12[(Δωn)2h]δ1xnδ1xn,(12)

yn+1=yn+yn[xnGbO_D_W+GbC_PGbM_D+GbMissileGbT_JVbMiss]h+Δωnδ2yn+12[(Δωn)2h]δ2ynδ2yn,(13)

4.2 System Stability Criteria

Aiming at the possible equilibrium solutions in the confrontation game system, the stability analysis of both game players is carried out according to the stability discrimination theorem of stochastic differential equations [40]. Let there exist a continuously differentiable smooth function V(t,x) and positive constants c1,c2 such that [41]:

{c1|x|pV(t,x)c2|x|pV(t,x),where t0,(14)

(1) If there exists a positive constant λ such that LV(t,x)λV(t,x),t0, then the Zero-solution P-order moment index of the Ito^ stochastic differential Formula (10) is stable, and holds E|x(t,x0)|p<c2c1|x0|peλt,t0.

(2) If there exists a positive constant λ such that LV(t,x)λV(t,x),t0, then the Zero-solution P-order moment index of the Ito^ stochastic differential Formula (10) is unstable, and holds E|x(t,x0)|pc2c1|x0|peλt, t0.

Among them, LV(t,x)=Vt(t,x)+Vx(t,x)F(t,x)+12G2(t,x)Vxx(t,x).

For Formulas (7) and (8), let Vt(t,x)=x,Vt(t,y)=y, 0x,y1, c1=c2=1, p=1,λ=1, we get LV(t,x)=F(t,x), and then:

LV(t,x)=[yGrO_D_W+GrC_PGrM_D+GrMissileGrT_JVrMiss]x,(15)

LV(t,y)=[xGbO_D_W+GbC_PGbM_D+GbMissileGbT_JVbMiss]y,(16)

If the Zero-solution P-order moment index of Formulas (15) and (16) are stable, the following requirements need to be satisfied:

yGrO_D_W+GrC_PGrM_D+GrMissileGrT_JVrMiss1,(17)

xGbO_D_W+GbC_PGbM_D+GbMissileGbT_JVbMiss1,(18)

If the Zero-solution P-order moment index of Formulas (15) and (16) are unstable, the following requirements need to be satisfied:

yGrO_D_W+GrC_PGrM_D+GrMissileGrT_JVrMiss1,(19)

xGbO_D_W+GbC_PGbM_D+GbMissileGbT_JVbMiss1,(20)

Simplifying Formulas (17) and (18), respectively, then we can obtain the conditions under which the Formulas (15) and (16) satisfy the stability of the Zero-solution P-order moment index:

(a)   When GrO_D_W>0, y[1(GrC_PGrM_D+GrMissileGrT_JVrMiss)]/GrO_D_W, and 1(GrC_PGrM_D+GrMissileGrT_JVrMissile)GrO_D_W0;

(b)   When GrO_D_W<0, y[1(GrC_PGrM_D+GrMissileGrT_JVrMiss)]/GrO_D_W, and 1(GrC_PGrM_D+GrMissileGrT_JVrMissile)GrO_D_W0;

(c)   When GbO_D_W>0, x[1(GbC_PGbM_D+GbMissileGbT_JVbMiss)]/GbO_D_W, and 1(GbC_PGbM_D+GbMissileGbT_JVbMiss)GbO_D_W0;

(d)   When GbO_D_W<0, x[1(GbC_PGbM_D+GbMissileGbT_JVbMiss)]/GbO_D_W, and 1(GbC_PGbM_D+GbMissileGbT_JVbMiss)GbO_D_W0.

In summary, the stochastic evolutionary game system consisting of the two parties involved in the game satisfies the condition of exponential stability of the zero-solution expectation moments as (ab)(cd). Similarly, the zero-solution expected moment index instability condition satisfying Formulas (15) and (16) can be obtained.

5  Numerical Simulation Analysis

In order to confirm the correctness of the previous theoretical derivation and verify the validity of the model, the decision-making evolution process of the two players of the game is simulated and analyzed here. Before performing the numerical simulation, it is necessary to initialize the numerical simulation by assigning values according to the system stability constraints, so the following values are taken according to the constraints in the previous section. The initial selection probability of both players of the game is set to 0.5, the intensity coefficients of random environmental disturbances are taken as δi=0.5,1,2, the step size of simulation is h=0.001, and the rest of the parameters are set to values that satisfy the stability of Formulas (17) and (18) according to the constraints. Let the initial state of the confrontation game both sides are in the condition of state equivalence, at this time the specific values of the relevant parameters are given as: GrB_F=10, θr=0, ϑr=π, Vrtar=100, Pr1=Pr2=Pr3=Pr4=0.8, Tr=15, VrFuel=1, VrMiss=5; GbB_F=10, θb=0, ϑb=π, Vbtar=100, Pb1=Pb2=Pb3=Pb4=0.8, Tb=15, VbFuel=1, VbMiss=5; ϵ1=ϵ2=0.5.

5.1 Interference Intensity Factors

Fig. 4 shows the dynamic evolution process of the game players’ strategy in different random environmental interference intensities. From the figure, it can be seen that the magnitude of the interference intensity has a huge impact on the evolutionary results of the system. When the interference intensity is small, the system evolution results show different degrees of volatility, but they can still maintain a certain stability and consistency in general. This stability enables the system to evolve around one or some dominantly advantageous strategies. Although there are fluctuations in the process, these fluctuations usually do not change the overall evolution direction of the system, and the evolution results show a progressive stability state. With the continuous increase of interference intensity, the internal stability of the system is destroyed, and the system may enter a more dynamic and unstable state. In this case, the strategy selection of the game players becomes more difficult to predict and control. The strategy evolution shows persistent fluctuations, decision wandering and even reversals. This phenomenon reveals the important influence of the uncertainty in random environment on strategy selection and the challenge of maintaining or adjusting strategies in complex and dynamic environments.

images

Figure 4: Dynamic evolution process of game player’s strategy under uncertainty conditions (a) x evolutionary path under different interference intensities (b) y evolutionary path under different interference intensities

Moreover, the evolutionary trends and rates exhibited by the curves at different time points and under varying interference intensities differ, with some time segments even displaying reversal phenomena. These reversals are manifestations of the impact of random environmental interference on agents’ decision-making process. The degree of curve fluctuation varies, with greater random interference leading to larger fluctuations and a higher probability of reversals. Furthermore, the occurrence of random environmental interference at any given moment is stochastic, with random variables at two different time points being not only uncorrelated but also statistically independent. This induces uncertainty and randomness in the system’s evolutionary process both temporally and spatially, thereby increasing the overall unpredictability of the system’s evolution. Consequently, strategy evolution curves exhibit diverse fluctuations or reversals at different time points. Over extended time periods, despite the influence of random environmental interference, these fluctuations typically do not alter the overall evolutionary direction of the system. The agents’ strategy evolution ultimately converges to a certain strategic equilibrium state. Furthermore, the system’s evolutionary process exhibits a “threshold effect” characteristic of nonlinear dynamics, with a nonlinear relationship between interference intensity and system response. As a result, the system’s response characteristics differ under various interference levels. Smaller interference may induce minor fluctuations in the system, particularly near certain thresholds where significant changes in system behavior may occur, such as accelerated response speeds, increased amplitudes, or state transitions. Conversely, larger interference may exceed the system’s threshold range, triggering more complex nonlinear behaviors and leading to system state transformations.

5.2 System Sensitivity Analysis

The strategy selection of the game players will be affected by the comprehensive influence of many factors. According to the previous analysis of the strategy stability of both game players, the key parameters are selected and assigned different values to further explore the impact of the value of the key factors on the strategy of the game players caused by the trajectory of the stochastic evolution of the trajectory. The selection of parameters is based on the initial assignment, and the stochastic environmental interference intensity coefficient is uniformly valued as δi=0.5, which is combined with the stochastic differential equations to analyze the reasonable values of each parameter within the constraint range. Considering that the stochastic differential equations of the two players of the game have similarity in mathematical form, the agent_r player as an example only needs to analyze the evolution of the player of one side of the game, and the agent_b player of the factor analysis process is similar.

(1) Target azimuth θr

In the case of ensuring that the initial values of other parameters remain unchanged, only the value of θr is changed to analyze the impact of different values of parameter θr on the evolutionary results of the game. Let θr take the value 0,π/6,π/3,π/2,π, respectively, the simulation results are shown in Fig. 5. Where the dashed line indicates the deterministic evolution curve and the solid line indicates the stochastic evolution curve. As can be seen from the figure, in the absence of environmental uncertainty factors interference, the initial state of the game on both sides of the players is to choose offensive tactical strategy, the change of the target azimuth did not cause subversive impact on the strategy selection of the both game players, the azimuth angle changes from small to large is only to accelerate the rate of strategy evolution. When disturbed by the environmental uncertainty factors, the strategy evolution of the agent_r in the small azimuth angle was greatly affected, which might lead to the opposite strategy result, while the strategy choice of the agent_b player did not undergo a big change. In the large azimuth angle, the rate of strategy evolution of the two game players was accelerated significantly.

images

Figure 5: θr impact on the evolution of behavioral decisions on game players (a) θr impact on the evolution of behavioral decision-making in agent_r (b) θr impact on the evolution of behavioral decision-making in agent_b

Considering the reason for this phenomenon, it is due to the fact that the target azimuth angle directly affects the relative positions of the two players, the possible engagement distance, and the acquisition and processing of information, and the change of the azimuth angle directly affects the perceptual judgment of the two sides of the game on the battlefield environment, which in turn affects the initial choice of the game player to the strategy and adjustment. In the small azimuth, it means that the two players are more directly face-to-face conflict, the two players of the game in this scenario are in the optimal working interval of the airborne electronic sensing system and air-to-air missiles, at this time there is more time and space to respond to changes in the target’s strategy. In addition, due to the frequent and large amounts of information updates and rapid and dynamic strategy adjustments, decision-making requires more time to assess the effects of different strategies and possible consequences, resulting in their relatively low rate of strategy evolution. Moreover, the players under this layout is more sensitive to the perception of environmental uncertainty and random interference, which increases the complexity and unpredictability of strategy implementation, leading to greater fluctuations in its strategy selection, and its strategy adjustment becomes more conservative or hesitant, which may result in wavering and even cause strategy results opposite to those expected, further reducing the rate of strategy evolution. Under the large azimuth angle, the initial layout of the two players is more lateral or back-to-back. Although the urgency of direct conflict is reduced, the two players need to adjust their strategies more quickly to adapt to the other player’s possible flanking or backward actions and to maintain their own positional advantage, which leads to a faster rate of strategy evolution. Especially the environmental uncertainty can affect the perception and judgment of the two players more significantly under the layout, which further speeds up the strategy evolution rate of the game players.

(2) Target entry angle ϑr

In the case of ensuring that the initial values of other parameters remain unchanged, only the value of ϑr is changed to analyze the impact of different values of parameter ϑr on the evolutionary results of the game. Let ϑr take the value 0,π/6,π/3,π/2,π, respectively, the simulation results are shown in Fig. 6. As can be seen from the figure, in the absence of environmental uncertainty factors interference, the initial state of the both game players are to choose offensive tactical strategy, the change of the target entry angle did not cause subversive impact on the strategy selection of the both game players, but only affects the rate of strategy evolution. When disturbed by environmental uncertainty factors, the environmental random interference accelerated the strategy evolution rate of the both game players in the small entry angle and reduced the strategy evolution rate in the large entry angle.

images

Figure 6: ϑr impact on the evolution of behavioral decisions on game players (a) ϑr impact on the evolution of behavioral decision-making in agent_r (b) ϑr impact on the evolution of behavioral decision-making in agent_b

Consider that the phenomenon is due to the fact that different target entry angles affect the perception and judgment of both players, which in turn affects the timeliness of perception, decision-making and action. Small target entry angle means that both game players are in a tailgate situation, and the smaller the entry angle, the easier the target is to escape maneuvers. when the game players need to make quick decisions to achieve the target’s strikes to intercept, especially by the random factors of the environment, the environmental disturbances may exacerbate the uncertainty of the two players of the strategy when the small entry angle, resulting in the urgency of the decision-making and the frequency of the decision-making is increased, and the need for faster information processing capabilities, which in turn cause the game players must adjust their strategies faster to adapt to the rapid response of the opponent’s maneuvers and tactics as well as the constant changes of the battlefield environment. For large target entry angles, it means that both players of the game are in a head-on posture, when the missile attack area has a larger range and a better angular posture, so there is more time and space to respond to changes in the environment and target strategy.

(3) Inherent value of the game player Vrtar

In the case of ensuring that the initial values of other parameters remain unchanged, only the value of Vrtar is changed to analyze the impact of different values of parameter Vrtar on the evolutionary results of the game. Let Vrtar take the value 50,100,150,200,500, respectively, the simulation results are shown in Fig. 7. As can be seen from the figure, under the premise of no environmental uncertainty factors interference, when the inherent value of the agent_r is small, they tend to choose offensive tactical strategies, while the inherent value of the agent_b player is higher than that of the agent_r, so they tend to choose defensive tactical strategies. with the increase of the agent_r player’s own value, the agent_r is more conservative in the selection of tactical strategies, and begins to evolve towards defensive tactical strategies, at this time the agent_b side is more inclined to choose offensive tactics. When disturbed by environmental uncertainties, the agent_r player’s strategy evolution results in large fluctuations with the change of its own intrinsic value, which seriously affects the agent_r player’s strategy choice, while the agent_b player’s strategy evolution is relatively less affected by environmental factors and only changes the rate of strategy evolution.

images

Figure 7: Vrtar impact on the evolution of behavioral decisions on game players (a) Vrtar impact on the evolution of behavioral decision-making in agent_r (b) Vrtar impact on the evolution of behavioral decision-making in agent_b

Considering the reason for this phenomenon, it is because when the inherent value of the agent player is small, even if the agent is shot down and destroyed by the other side, its own loss is relatively small, so it is more aggressive and adventurous in the choice of strategy in order to seek to maximize the benefits or quickly change the battlefield situation. With the increase in the value of the agent player, the cost of the loss is also increased accordingly, so it is more conservative and hesitant in the tactics and pays attention to the cost-benefit calculation trade-offs and risk management to reduce and avoid potential losses. In addition, high-value agent players are more risky in uncertain environments, and benefit analysis becomes more complex and the consideration of risk more delicate, so they are more prudent in decision-making, leading to an increase in the volatility of tactical choices. This is one of the key reasons why countries around the world are now more willing to adopt high-volume, low-cost unmanned aerial vehicle to replace manned aerial vehicle.

(4) Basic probability of fire-strike process Pij

In the case of ensuring that the initial values of other parameters remain unchanged, only the value of Pij is changed to analyze the impact of different values of parameter Pij on the evolutionary results of the game. Taking the weapon damage probability Pr4 as a reference for illustration, so that Pr4 take the value 0,0.25,0.5,0.75,1, respectively, the simulation results are shown in Fig. 8. As can be seen from the figure, under the premise of no environmental uncertainties, the both game players tend to choose offensive tactical strategies. With the gradual increase of the weapon damage probability, the both game players tend to choose offensive tactical strategies. When interfered by environmental uncertainty factors, the random environmental interference factors have less influence on the strategy evolution of the agent_r player, only causing a certain degree of interference, affecting the rate of strategy evolution, while the agent_b side has a great influence, resulting in greater fluctuations in the strategy evolution results, and the reversal of the strategy results phenomenon.

images

Figure 8: Pr4 impact on the evolution of behavioral decisions on game players (a) Pr4 impact on the evolution of behavioral decision-making in agent_r (b) Pr4 impact on the evolution of behavioral decision-making in agent_b

Consider that the reason for this phenomenon is that because the lower weapon damage probability means that the agent_r player have difficulty effectively destroying targets when attacking, it may be safer to choose a conservative, defensive strategy, whereas the agent_b player’s tactical choices are dependent on predicting and reacting to the agent_r player’s strategy, and it may be that the agent_b player does not appear to be choosing an offensive tactical strategy as expected, due to tactical flexibility, risk aversion, or other strategic considerations. The higher weapon damage probability increases the success rate of the agent_r player when attacking, making attacking a more attractive option, and the agent_r player begins to turn to more offensive tactics in order to gain a greater advantage in confrontation, while the agent_b player responds to this threat by adjusting its tactics to be more offensive in order to try to pre-empt or at least maintain equilibrium in order to prevent being overpowered by the other side. In the face of uncertainties in environmental factors, the agent_r player’s strategy evolution is less affected, probably because it has already adopted a more conservative strategy and is less sensitive to environmental changes. On the other hand, the agent_b player may be more sensitive to environmental changes due to its higher weapon damage probability compared to the agent_r, and the uncertainty of the environment causes the agent_b player to incorrectly assess the opponent’s strength at certain moments or overreact in strategy adjustment, which leads to fluctuation or even reversal of the strategy evolution results. This phenomenon also reflects that tactical decision-making on the battlefield is the result of a combination of factors, which needs to be combined with the specific battlefield environment, strategic objectives, tactical considerations, as well as the psychology and behavior of the two sides of the game, and other factors to conduct a comprehensive analysis, especially in highly uncertain environments, where even minor information asymmetries or miscalculations may have a significant impact on decision-making.

(5) Remaining time to evade missiles Tr

In the case of ensuring that the initial values of other parameters remain unchanged, only the value of Tr is changed to analyze the impact of different values of parameter Tr on the evolutionary results of the game. Let Tr take the value 1,7,15,20,30, respectively, the simulation results are shown in Fig. 9. As can be seen from the figure, under the premise of no environmental uncertainties, when the remaining time for the agent_r player to avoid the interception missiles is small, both game players evolve towards offensive tactical strategies, as the remaining time for the warplanes to avoid the interception increases, both game players evolve towards defensive tactical strategies. When disturbed by environmental uncertainties, the random environmental disturbances have a small impact on the strategy evolution of the agent_r player, only affecting the rate of strategy evolution, while having a significant impact on the agent_b player, not only affecting the rate of strategy evolution, but also resulting in greater fluctuations in the results of strategy evolution, and the reversal of the strategy results phenomenon.

images

Figure 9: Tr impact on the evolution of behavioral decisions on game players (a) Tr impact on the evolution of behavioral decision-making in agent_r (b) Tr impact on the evolution of behavioral decision-making in agent_b

Consider that the reason for this phenomenon is due to the fact that when the remaining time to evade the interceptor missiles is shorter, it means that the agent_r player is facing an immediate threat to its survival, and the time urgency is higher, and this urgency prompts the agent_r player to adopt a more risky offensive strategy, striving to quickly break the deadlock or at least distracting the agent_b player’s attention in order to strive for a chance of escape, and this may lead to the fact that both the agent_r and the agent_b players are inclined to adopt an offensive tactical strategy in order to quickly resolve battles or gain an advantage. Moreover, the game players need to make quick decisions in a short time window, and offense is often the preferred strategy when time is of the essence. As the remaining time increases, both game players may have more time to assess the battlefield situation and formulate their strategies, which may lead to a tendency for both players to adopt a more cautious defensive tactical strategy, with the longer period of time permitting more opportunities for both players to make tactical adjustments and reassess risks. In the face of uncertain changes in environmental factors, the agent_r player’s strategy evolution is less affected, which may be because the agent_r has already formulated corresponding strategies based on the remaining time and these strategies are less sensitive to environmental changes. While the agent_b player may show higher sensitivity to environmental uncertainty and need to continuously adjust its strategy to adapt to the changes in the agent_r player’s behavior, it is difficult to accurately judge the agent_r player’s true intentions and capabilities, which increases the complexity of strategy evolution, leading to excessive or insufficient adjustment of the strategies, resulting in fluctuations in the rate of strategy evolution and reversal of results.

(6) Missile consumption costs VrMiss

In the case of ensuring that the initial values of other parameters remain unchanged, only the value of VrMiss is changed to analyze the impact of different values of parameter VrMiss on the evolutionary results of the game. Let VrMiss take the value 1,2,5,10,30, respectively, the simulation results are shown in Fig. 10. As can be seen from the figure, under the premise of no environmental uncertainties, when the missile consumption costs of the agent_r player is low, both players of the game evolve towards offensive tactical strategies, as the missile consumption costs increases, both players of the game evolve towards defensive tactical strategies. When disturbed by environmental uncertainties, stochastic environmental disturbances have a small impact on the strategy evolution of agent_r player, only affecting the rate of strategy evolution, while it have a great impact on the agent_b player, not only affecting the rate of strategy evolution, but also causing a reversal of the results of strategy evolution.

images

Figure 10: VrMiss impact on the evolution of behavioral decisions on game players (a) VrMiss impact on the evolution of behavioral decision-making in agent_r (b) VrMiss impact on the evolution of behavioral decision-making in agent_b

Consider that the reason for this phenomenon is due to the fact that when the remaining time to evade the interceptor missiles is shorter, it means that the agent_r player is facing an immediate threat to its survival, and the time urgency is higher, and this urgency prompts the agent_r player to adopt a more risky offensive strategy, striving to quickly break the deadlock or at least distracting the agent_b player’s attention in order to strive for a chance of escape, and this may lead to the fact that both the agent_r and the agent_b players are inclined to adopt an offensive tactical strategy in order to quickly resolve battles or gain an advantage. Moreover, the game players need to make quick decisions in a short time window, and offense is often the preferred strategy when time is of the essence. As the remaining time increases, both game players may have more time to assess the battlefield situation and formulate their strategies, which may lead to a tendency for both players to adopt a more cautious defensive tactical strategy, with the longer period of time permitting more opportunities for both players to make tactical adjustments and reassess risks. In the face of uncertain changes in environmental factors, the agent_r player’s strategy evolution is less affected, which may be because the agent_r has already formulated corresponding strategies based on the remaining time and these strategies are less sensitive to environmental changes. While the agent_bplayer may show higher sensitivity to environmental uncertainty and need to continuously adjust its strategy to adapt to the changes in the agent_r player’s behavior, it is difficult to accurately judge the agent_r player’s true intentions and capabilities, which increases the complexity of strategy evolution, leading to excessive or insufficient adjustment of the strategies, resulting in fluctuations in the rate of strategy evolution and reversal of results.

5.3 Discussion of Results

It is clear from the previous analysis that the evolution of the system will be subject to the interference of random factors in the external environment, and the greater the intensity of random environmental interference, the greater the fluctuation of the decision-making of the game players in the process of strategy evolution, and the more difficult it is for both game players to predict the changes in the environment accurately. The introduction of random disturbances increases the uncertainty of the system so that the decision-making process of the game players is no longer entirely based on rational analysis and expected returns but is affected by more uncontrollable factors. This randomness leads to the diversity and unpredictability of the system’s evolutionary path.

In addition, factors such as target azimuth, target entry angle, the inherent value of the game players, the basic probability of fire-strike process, the remaining time to evade missiles, and the missile consumption costs all have different impacts on the rate of evolution and the results of the game. Specifically, factors such as target azimuth, target entry angle, and the inherent value of the game players themselves have a relatively minor impact on the strategic evolution outcomes of both parties involved in the game, but they significantly influence the rate of evolution. Conversely, factors such as the basic probability of fire-strike process, the remaining time to evade missile, and the missile consumption cost have a relatively minor impact on the strategy evolution outcomes of the parameter owner itself, but they significantly affect the other party involved in the game. Therefore, in the actual operational process, decision-makers need to combine the specific environment, strategic objectives, tactical considerations, the psychology and behavior of the two game players and other factors to conduct a comprehensive analysis. They must understand the effect of various parameters in regulating different aspects of the decision-making process of the game players, and take into account the changes caused by a multitude of factors on the evolution of decision-making on both game players. Especially in a highly uncertain environment, even a small information asymmetry or miscalculation may have a significant impact on decision-making.

The method proposed in this paper is compared with other literatures, and the results are shown in Table 2. In the table, the method type refers to the theoretical approach adopted in the corresponding literature. Behavioral rationality indicates whether the literature considers the agents involved in adversarial decision-making as fully rational or boundedly rational, which affects the model’s alignment with real-world problems and, consequently, the algorithm’s practicality and generalizability. Information requirement denotes whether the adversarial decision-making process considers if agents can obtain complete environmental situational information. If agents can access all information in the adversarial environment, it is considered a complete information condition, otherwise, it is an incomplete information condition. The game process refers to the phased and state-based characteristics exhibited in the air attack-defense confrontation process, indicating whether the game process is dynamic or static, with dynamic processes featuring rapid state changes and more complex solution processes. Equilibrium solution indicates whether the algorithm’s equilibrium solving process is detailed and specific, as incomplete information dynamic adversarial games are more challenging to solve for equilibrium compared to complete information adversarial games and static adversarial games. Therefore, the level of detail in equilibrium solving directly affects the practicality and interpretability of the algorithm. Random environmental interference refers to whether the model considers the impact of random environmental factors on agent decision-making, which determines if the model can more accurately describe interference factors present in real-life scenarios.

images

6  Conclusion

Aiming at the maneuvering decision-making problem of agents in the interaction process of multi-agents in a random interference environment, the paper considers the specific impact of maneuvering actions on the game state, analyzes the possible risks caused by random interference in the complex environment on the maneuver decision-making of the game players, establishes a stochastic evolutionary mathematical model of multi-agent game behavior strategy, and researches the strategy selection interaction relationship, evolutionary stability, and dynamic decision-making process of each game player in a random interference environment. The results show that the parameters related to the maneuvering actions of the agent have different effects on the selection and evolution speed of the agent’s strategy, and the influence of environmental interference on the different parameters of the maneuver is also different. This also confirms the feasibility and effectiveness of the proposed method, which can better explain the behavior decision-making process of the agent in the interaction process.

This study quantifies the decision-making processes of agents through mathematical methods, providing a foundational model and analytical framework for subsequent research. It not only aids decision-makers in understanding the dynamics of strategies under various factors and environmental conditions but also offers feasible analytical approaches and theoretical references for improving decision-making in multi-agent interactions and enhancing the interpretability of game system models. Compared to other methods such as differential games, matrix games, deep reinforcement learning, and genetic algorithms, the proposed method in this paper takes into account the agents’ bounded rationality, dynamic learning processes, and the impact of random disturbances, making it more aligned with real-world scenarios. It can simulate the strategy selection process of participants over time, reflect the dynamic changes in agents’ maneuvering behaviors during aerial combat, and describe the various tactics in air warfare. This is beneficial for predicting the evolution of tactics and identifying optimal strategy combinations, thereby assisting decision-makers in better understanding the potential behaviors and responses of both friendly and adversary forces, ultimately enhancing the scientific rigor of air combat training and strategy development.

However, the paper primarily analyzes the impact of maneuver actions in environments with stochastic disturbances on the behavioral strategies of agents. It offers a solution that is interpretable for specific problems, but its universality is somewhat limited. We are committed to expanding our research scope in future studies. In subsequent research, we will conduct more refined experimental designs and employ more complex mathematical models to further analyze the specific impacts of different factors in various contexts. Additionally, we aim to provide more detailed mechanistic explanations to uncover the specific mechanisms and principles underlying these complex relationships, thereby enhancing the specificity and operability of the research findings. We also consider combining the self-learning capability of deep reinforcement learning and the strategy analysis capability of game theory to provide a more comprehensive solution to the air combat problem.

Acknowledgement: We would like to thank everyone who provided comments on this work.

Funding Statement: The authors received no specific funding for this study.

Author Contributions: Study conception and design: Shiguang Hu, Le Ru, Bo Lu, Zhenhua Wang; data collection: Shiguang Hu, Xiaolin Zhao, Wenfei Wang, Hailong Xi; analysis and interpretation of results: Shiguang Hu, Le Ru, Xiaolin Zhao, Bo Lu; draft manuscript preparation: Shiguang Hu, Le Ru. All authors reviewed the results and approved the final version of the manuscript.

Availability of Data and Materials: The authors confirm that the data supporting the findings of this study are available within the article or its supplementary materials.

Ethics Approval: Not applicable.

Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.

References

1. F. Jiang, M. Xu, Y. Li, H. Cui, and R. Wang, “Short-range air combat maneuver decision of UAV swarm based on multi-agent Transformer introducing virtual objects,” Eng. Appl. Artif. Intell., vol. 123, 2023, Art. no. 106358. doi: 10.1016/j.engappai.2023.106358. [Google Scholar] [CrossRef]

2. Z. Ren, D. Zhang, S. Tang, W. Xiong, and S. Yang, “Cooperative maneuver decision making for multi-UAV air combat based on incomplete information dynamic game,” Def. Technol., vol. 27, no. 3, pp. 308–317, 2023. doi: 10.1016/j.dt.2022.10.008. [Google Scholar] [CrossRef]

3. J. Xu, Z. Deng, and Q. Song, “Multi-UAV counter-game model based on uncertain information,” Appl. Math. Comput., vol. 366, no. 3, 2020, Art. no. 124684. doi: 10.1016/j.amc.2019.124684. [Google Scholar] [CrossRef]

4. S. Li, M. Chen, Y. Wang, and Q. Wu, “Air combat decision-making of multiple UCAVs based on constraint strategy games,” Def. Technol., vol. 18, no. 3, pp. 368–383, 2022. doi: 10.1016/j.dt.2021.01.005. [Google Scholar] [CrossRef]

5. Z. Chen et al., “Tripartite evolutionary game in the process of network attack and defense,” Telecommun. Syst., vol. 86, no. 2, pp. 351–361, May 2024. doi: 10.1007/s11235-024-01130-9. [Google Scholar] [CrossRef]

6. Y. Huang, A. Luo, T. Chen, M. Zhang, B. Ren and Y. Song, “When architecture meets RL plus EA: A hybrid intelligent optimization approach for selecting combat system-of-systems architecture,” Adv. Eng. Inform., vol. 58, Oct. 2023, Art. no. 102209. doi: 10.1016/j.aei.2023.102209. [Google Scholar] [CrossRef]

7. Y. Wang, X. Zhang, R. Zhou, S. Tang, H. Zhou and W. Ding, “Research on UCAV maneuvering decision method based on heuristic reinforcement learning,” Comput. Intell. Neurosci., vol. 2022, pp. 1–13, Mar. 2022. doi: 10.1155/2022/1477078. [Google Scholar] [PubMed] [CrossRef]

8. R. Isaacs, “A mathematical theory with applications to warfare and pursuit, control and optimization,” in Differential Games. New York, NY, USA: Wiley, 1965. [Google Scholar]

9. E. Garcia, D. Casbeer, and M. Pachter, “Active target defence differential game: Fast defender case,” IET Control Theory Appl., vol. 11, no. 17, pp. 2985–2993, Sep. 2017. doi: 10.1049/iet-cta.2017.0302. [Google Scholar] [CrossRef]

10. H. Park, B. Lee, and D. Yoo, “Differential game based air combat maneuver generation using scoring function matrix,” Int. J. Aeronaut. Space, vol. 17, no. 2, pp. 204–213, Jun. 2016. doi: 10.5139/IJASS.2016.17.2.204. [Google Scholar] [CrossRef]

11. F. Austin, G. Carbone, H. Hinz, M. Lewis, and M. Falco, “Game theory for automated maneuvering during air-to-air combat,” J. Guid. Control Dyn., vol. 13, no. 6, pp. 1143–1149, Dec. 1990. doi: 10.2514/3.20590. [Google Scholar] [CrossRef]

12. S. Li, M. Chen, and Q. Wu, “A fast algorithm to solve large-scale matrix games based on dimensionality reduction and its application in multiple unmanned combat air vehicles attack-defense decision-making,” Inform. Sci., vol. 594, no. 4, pp. 305–321, 2022. doi: 10.1016/j.ins.2022.02.025. [Google Scholar] [CrossRef]

13. Y. Cao, Y. Kou, and A. Xu, “Autonomous maneuver decision of UCAV air combat based on double deep Q network algorithm and stochastic game theory,” Int. J. Aerosp. Eng., vol. 2023, pp. 1–20, Jan. 2023. doi: 10.1155/2023/3657814. [Google Scholar] [CrossRef]

14. J. Liles, M. Robbins, and B. Lunday, “Improving defensive air battle management by solving a stochastic dynamic assignment problem via approximate dynamic programming,” Eur J. Oper. Res., vol. 305, no. 3, pp. 1435–1449, 2023. doi: 10.1016/j.ejor.2022.06.031. [Google Scholar] [CrossRef]

15. N. Léchevin and C. Rabbath, “A hierarchical decision and information system for multi-aircraft combat missions,” Proc. Inst. Mech. Eng. Part G J. Aerosp., vol. 224, no. 2, pp. 133–148, Feb. 2010. doi: 10.1243/09544100JAERO563. [Google Scholar] [CrossRef]

16. X. Zhang, G. Liu, C. Yang, and J. Wu, “Research on air confrontation maneuver decision-making method based on reinforcement learning,” Electron., vol. 7, no. 11, Nov. 2018, Art. no. 279. doi: 10.3390/electronics7110279. [Google Scholar] [CrossRef]

17. S. Li, Q. Wu, and M. Chen, “Autonomous maneuver decision-making of UCAV with incomplete information in human-computer gaming,” Drones, vol. 7, no. 3, Feb. 2023, Art. no. 157. doi: 10.3390/drones7030157. [Google Scholar] [CrossRef]

18. J. Poropudas and K. Virtanen, “Game-theoretic validation and analysis of air combat simulation models,” IEEE Trans. Syst. Man Cyber.-Part A: Syst. Hum., vol. 40, no. 5, pp. 1057–1070, Sep. 2010. doi: 10.1109/TSMCA.2010.2044997. [Google Scholar] [CrossRef]

19. Z. Sun et al., “Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play,” Eng. Appl. Artif. Intell., vol. 98, 2021, Art. no. 104112. doi: 10.1016/j.engappai.2020.104112. [Google Scholar] [CrossRef]

20. S. Hu, L. Ru, and W. Wang, “Evolutionary game analysis of behaviour strategy for UAV swarm in communication-constrained environments,” IET Control Theory Appl., vol. 18, no. 3, pp. 350–363, Nov. 2023. doi: 10.1049/cth2.12602. [Google Scholar] [CrossRef]

21. Y. Gao, L. Zhang, and Q. Wang, “An evolutionary game-theoretic approach to unmanned aerial vehicle network target assignment in three-dimensional scenarios,” Mathematics, vol. 11, no. 19, Oct. 2023, Art. no. 4196. doi: 10.3390/math11194196. [Google Scholar] [CrossRef]

22. L. Sheng, M. Shi, and Y. Qi, “Dynamic offense and defense of uav swarm based on situation evolution game,” (in Chinese), Syst. Eng. Electron., vol. 8, pp. 2332–2342, Jun. 2023. doi: 10.12305/j.issn.1001-506X.2023.08.06. [Google Scholar] [CrossRef]

23. C. Liu, S. Sun, and B. Xu, “Optimizing evasive maneuvering of planes using a flight quality driven model,” Sci. China Inform. Sci., vol. 67, no. 3, Feb. 2024. doi: 10.1007/s11432-023-3848-6. [Google Scholar] [CrossRef]

24. H. Huang, H. Cui, H. Zhou, and J. Li, “Research on autonomous maneuvering decision of UAV in close air combat,” in Proc. 3rd 2023 Int. Conf. Autonom. Unmanned Syst., Singapore, Springer, 2023, vol. 1171. doi: 10.1007/978-981-97-1083-6_14. [Google Scholar] [CrossRef]

25. Z. Li, J. Zhu, M. Kuang, J. Zhang, and J. Ren, “Hierarchical decision algorithm for air combat with hybrid action based on reinforcement learning,” (in Chinese), Acta Aeronaut. et Astronaut. Sinica, Apr. 2024. doi: 10.7527/S1000-6893.2024.30053. [Google Scholar] [CrossRef]

26. D. Lee, C. Kim, and S. Lee, “Implementation of tactical maneuvers with maneuver libraries,” Chin. J. Aeronaut., vol. 33, no. 1, pp. 255–270, 2020. doi: 10.1016/j.cja.2019.07.007. [Google Scholar] [CrossRef]

27. H. Duan, Y. Lei, J. Xia, Y. Deng, and Y. Shi, “Autonomous maneuver decision for unmanned aerial vehicle via improved pigeon-inspired optimization,” IEEE Trans. Aerosp. Electron. Syst., vol. 59, no. 3, pp. 3156–3170, Jun. 2023. doi: 10.1109/TAES.2022.3221691. [Google Scholar] [CrossRef]

28. J. Zhu, M. Kuang, and X. Han, “Mastering air combat game with deep reinforcement learning,” Def. Technol., vol. 34, no. 4, pp. 295–312, 2024. doi: 10.1016/j.dt.2023.08.019. [Google Scholar] [CrossRef]

29. W. Kong, D. Zhou, and Y. Zhao, “Hierarchical reinforcement learning from competitive self-play for dual-aircraft formation air combat,” J. Comput. Des. Eng., vol. 10, no. 2, pp. 830–859, Mar. 2023. doi: 10.1093/jcde/qwad020. [Google Scholar] [CrossRef]

30. S. Deng and Y. Yuan, “A data intrusion tolerance model based on an improved evolutionary game theory for the energy internet,” Comput. Mater. Contin., vol. 79, no. 3, pp. 3679–3697, Jan. 2024. doi: 10.32604/cmc.2024.052008. [Google Scholar] [CrossRef]

31. X. Richter and J. Lehtonen, “Half a century of evolutionary games: A synthesis of theory, application and future directions,” Phil. Trans. R. Soc. B, vol. 378, no. 1876, May 2023, Art. no. 489. doi: 10.1098/rstb.2021.0492. [Google Scholar] [PubMed] [CrossRef]

32. E. L. N. Ekassi, O. Foupouapouognigni, and M. S. Siewe, “Nonlinear dynamics and Gaussian white noise excitation effects in a model of flow-induced oscillations of circular cylinder,” Chaos Solitons Fract., vol. 178, 2024, Art. no. 114374. doi: 10.1016/j.chaos.2023.114374. [Google Scholar] [CrossRef]

33. A. V. Balakrishnan and R. R. Mazumdar, “On powers of Gaussian white noise,” IEEE Trans. Inf. Theory, vol. 57, no. 11, pp. 7629–7634, 2011. doi: 10.1109/TIT.2011.2158062. [Google Scholar] [CrossRef]

34. D. Cheng, F. He, H. Qi, and T. Xu, “Modeling, analysis and control of networked evolutionary games,” IEEE Trans. Automat. Control, vol. 60, no. 9, pp. 2402–2415, Sep. 2015. doi: 10.1109/TAC.2015.2404471. [Google Scholar] [CrossRef]

35. G. Jumarie, “Approximate solution for some stochastic differential equations involving both Gaussian and Poissonian white noises,” Appl. Math. Lett., vol. 16, no. 8, pp. 1171–1177, 2003. doi: 10.1016/S0893-9659(03)90113-3. [Google Scholar] [CrossRef]

36. H. Zhang, Y. Mi, X. Liu, Y. Zhang, J. Wang and J. Tan, “A differential game approach for real-time security defense decision in scale-free networks,” Comput. Netw., vol. 224, 2023, Art. no. 109635. doi: 10.1016/j.comnet.2023.109635. [Google Scholar] [CrossRef]

37. J. Armstrong and A. Ionescu, “Itô stochastic differentials,” Stoch. Proc. Appl., vol. 171, Apr. 2024, Art. no. 104317. doi: 10.1016/j.spa.2024.104317. [Google Scholar] [CrossRef]

38. B. Moghaddam, A. Lopes, J. Machado, and Z. Mostaghim, “Computational scheme for solving nonlinear fractional stochastic differential equations with delay,” Stoch. Anal. Appl., vol. 37, no. 6, pp. 893–908, Nov. 2019. doi: 10.1080/07362994.2019.1621182. [Google Scholar] [CrossRef]

39. K. Kang, L. Bai, and J. Zhang, “A tripartite stochastic evolutionary game model of complex technological products in a transnational supply chain,” Comput. Ind. Eng., vol. 186, Dec. 2023, Art. no. 109690. doi: 10.1016/j.cie.2023.109690. [Google Scholar] [CrossRef]

40. J. Yang, X. Liu, and X. Liu, “Stability of stochastic functional differential systems with semi-Markovian switching and Levy noise by functional Itô’s formula and its applications,” J. Franklin Inst., vol. 357, no. 7, pp. 4458–4485, May 2020. doi: 10.1016/j.jfranklin.2020.03.012. [Google Scholar] [CrossRef]

41. Y. Xu, B. Hu, and R. Qian, “Analysis on stability of strategic alliances based on stochastic evolutionary game including simulations,” (in Chinese), Syst. Eng. Theory Pract., vol. 31, pp. 920–926, May 15, 2011. doi: 10.12011/1000-6788(2011)5-920. [Google Scholar] [CrossRef]


Cite This Article

APA Style
Hu, S., Ru, L., Lu, B., Wang, Z., Zhao, X. et al. (2024). Research on maneuver decision-making of multi-agent adversarial game in a random interference environment. Computers, Materials & Continua, 81(1), 1879-1903. https://doi.org/10.32604/cmc.2024.056110
Vancouver Style
Hu S, Ru L, Lu B, Wang Z, Zhao X, Wang W, et al. Research on maneuver decision-making of multi-agent adversarial game in a random interference environment. Comput Mater Contin. 2024;81(1):1879-1903 https://doi.org/10.32604/cmc.2024.056110
IEEE Style
S. Hu et al., “Research on Maneuver Decision-Making of Multi-Agent Adversarial Game in a Random Interference Environment,” Comput. Mater. Contin., vol. 81, no. 1, pp. 1879-1903, 2024. https://doi.org/10.32604/cmc.2024.056110


cc Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 412

    View

  • 183

    Download

  • 0

    Like

Share Link