Predator-Prey¶
Multi-agent RL environment with two competing flocks/teams of agents:
Predator agents attempt to capture prey agents
Prey agents attempt to evade predator agents
See floxs.predator_prey.PredatorPrey
for details of the
environment API.
Dynamics¶
All agent states consist of their position and velocity (represented in polar coordinates as a heading an speed). Each step all agent positions are updated from their current velocity, and consequently their new rewards and observations generated.
The space is wrapped at the edges (i.e. it forms a torus).
Actions¶
Each agent can individually updated their velocity each step. Each agents
actions is an array of two continuous values in the range [-1, 1]
,
where the values represent [rotation, acceleration]
. The action values
are then scaled by the maximum rotation and acceleration parameters for
each agent type. In total the actions for the flock are given by arrays of shape
[n-predators, 2]
and [n-prey, 2]
, representing the velocity update
for each individual agent.
Rewards¶
Agents are individually rewarded on their proximity to other agents.
Predator agents are positively rewarded for coming within capture range of a prey agent
Prey agents are penalised if within range of a prey agent, with penalties accumulated over all predators in range
By default rewards are independent of distance, i.e. they are a binary fixed rewards when predator/prey are in range.
Rewards can be customised by implementing the
floxs.predator_prey.rewards.RewardFn
interface.
Observations¶
By default each agent individually observes their local neighbourhood
of the environment, as a segmented view. Each agents view is a 2d array
of shape [2, n_vision]
where each row represents a view of each
agent type (i.e. predator or prey). The view cone of each agent
is divided into segments, with values representing the distance to the closest
neighbour along a ray cast from the agent. In the case that no agent lies within
range, then the default value is -1.
Observations can be customized by extending the default
floxs.predator_prey.observations.ObservationFn
observation class.