Integrating learning and task planning for robots with Keras, including simulation, real robot, and multiple dataset support.
This document describes the desing specifically of the CoSTAR Task Plan (CTP) python library.
We have a modular architecture that describes a problem (a “world”), with its associated constraints (“conditions”), reward function, features, etc. Worlds are associated with Actors that behave according to different Policies. The key problem TTS tries to solve is to explore the space of possible Policies that may lead to successful results.
Most of the high-level types are implemented in costar_task_plan.abstract
. In general, we try to enforce typing as much as possible in python; this makes it easier to abstract out different functionality, among other things. For example, most Reward functions inherit from AbstractReward
, and most conditions from AbstractCondition
; in truth, these things are just classes with a provided __call__()
operator.
All environments extend the AbstractWorld
class. An AbstractWorld
contains actors, states, and references to all associated conditions and other things like that.
The most important functions are the tick()
and fork()
functions associated with the World. The tick()
function updates a particular world trace; the fork()
function will create a new world trace.
Pretty much everything here is modular. An example of why this is very important is in the LateralDynamics
class: we can handle different speeds more or less simply to make our learning problem more reasonable by thesholding velocities around zero to make sure the car actually stops.
Goal: explore the set of possible combinations of policies to find optimal paths that satisfy the constraints given by the world.
world.fork()
and world.tick()
.select()
and rollout()
.Goal: find policies that satisfy the constraints expressed by the list of conditions belonging to the current world.
In general, Trainers correspond to Agents in Keras-RL. They implement various algorithms, and are a little broader.
Adversaries are new – these govern how we choose different worlds. Their goal is to learn policies that can handle outliers.
World state:
Updates as:
ws1 = world.ticK(ws0, action)
Tick loop:
for actor, T, state, action in zip (self.actors, self.dynamics, ws.states. ws.actions):
new_state = T(state, action)
We can mostly get rid of actors (except as a convenience class to store some data).
Most other information should get removed.