costar_plan

Integrating learning and task planning for robots with Keras, including simulation, real robot, and multiple dataset support.

View the Project on GitHub jhu-lcsr/costar_plan

Design Philosophy

This document describes the desing specifically of the CoSTAR Task Plan (CTP) python library.

We have a modular architecture that describes a problem (a “world”), with its associated constraints (“conditions”), reward function, features, etc. Worlds are associated with Actors that behave according to different Policies. The key problem TTS tries to solve is to explore the space of possible Policies that may lead to successful results.

Most of the high-level types are implemented in costar_task_plan.abstract. In general, we try to enforce typing as much as possible in python; this makes it easier to abstract out different functionality, among other things. For example, most Reward functions inherit from AbstractReward, and most conditions from AbstractCondition; in truth, these things are just classes with a provided __call__() operator.

Environment Design

All environments extend the AbstractWorld class. An AbstractWorld contains actors, states, and references to all associated conditions and other things like that.

The most important functions are the tick() and fork() functions associated with the World. The tick() function updates a particular world trace; the fork() function will create a new world trace.

Pretty much everything here is modular. An example of why this is very important is in the LateralDynamics class: we can handle different speeds more or less simply to make our learning problem more reasonable by thesholding velocities around zero to make sure the car actually stops.

Overview of Environment Classes

Tree Search Design

Goal: explore the set of possible combinations of policies to find optimal paths that satisfy the constraints given by the world.

Overview of Tree Search Classes

Learning Design

Goal: find policies that satisfy the constraints expressed by the list of conditions belonging to the current world.

In general, Trainers correspond to Agents in Keras-RL. They implement various algorithms, and are a little broader.

Adversaries are new – these govern how we choose different worlds. Their goal is to learn policies that can handle outliers.

Overview of Learning Classes

Overview of OpenAI Gym Environments

Redesign

World state:

Updates as:

ws1 = world.ticK(ws0, action)

Tick loop:

for actor, T, state, action in zip (self.actors, self.dynamics, ws.states. ws.actions):
  new_state = T(state, action)

We can mostly get rid of actors (except as a convenience class to store some data).

Most other information should get removed.