Training Pipeline — From Scene Description to Locomotion
MH-FLOCKE training starts with a natural language scene description and produces a creature that can navigate that scene. The full pipeline handles knowledge acquisition, terrain generation, sensor setup, and the training loop with curriculum advancement.
Quick Start
# Basic flat walking (10k steps, ~45min CPU / ~10min GPU) python scripts/train_v032.py \ --creature-name go2 \ --scene "walk on flat meadow" \ --steps 10000 \ --skip-morph-check --no-terrain --auto-reset 500 --seed 1 # Ball interaction (50k steps, ~4h CPU / ~1h GPU) python scripts/train_v032.py \ --creature-name go2 \ --scene "dog plays with ball on grass" \ --steps 50000 \ --skip-morph-check --no-terrain --auto-reset 500 --seed 42
GPU is auto-detected — if CUDA is available, all SNN computations run on GPU. MuJoCo physics always runs on CPU.
Pipeline Stages
- Scene Parsing — Natural language → task type, environment, difficulty (e.g., “hilly grassland” → locomotion, hills, 0.5 difficulty)
- Knowledge Acquisition — Generate or load behaviors appropriate for the scene (walk_hills, balance_slope, etc.)
- World Setup — Load Go2 model, inject terrain heightfield, inject ball if scene requires it, setup scent sources
- Training Loop — Full sense-think-act cycle every timestep, with R-STDP learning, cerebellar corrections, and behavior planning
- FLOG Recording — Binary training log written every 10 steps (creature frames) and every 1000 steps (stats frames)
- Checkpointing — SNN weights, cerebellum state, CPG phases saved periodically for resume
Reward Computation
The training loop computes a multi-component reward signal:
reward = forward_velocity_reward
+ upright_bonus (0.1 if upright > 0.7)
+ ball_approach_reward (if ball scene)
+ heading_reward (if ball scene)
+ contact_bonus (5.0 if ball_dist < 0.3m)
This reward feeds into the cognitive brain's combined reward computation, which adds curiosity, empowerment, drive modulation, and emotion factor.
Curriculum (Ball Scenes)
Ball scenes use a 5-stage curriculum that gradually increases difficulty:
Stage 0: ball at 1.5m, 0° offset (straight ahead) Stage 1: ball at 1.5m, 17° offset Stage 2: ball at 2.0m, 17° offset Stage 3: ball at 2.5m, 26° offset Stage 4: ball at 3.0m, 34° offset
Advancement: when the running minimum ball distance drops below 0.5m, the next stage unlocks.
Output
Each run produces:
training_log.bin— FLOG binary with all physics and stats datasnn_state.pt— SNN weights and network statecheckpoint.pt— Full resume state (SNN + cerebellum + CPG + gate)brain.pt— Accumulated brain (cognitive modules, episodic memory, concept graph)
References
- Friston, K. (2010). The free-energy principle. Nature Reviews Neuroscience
- Kagan, B.J. et al. (2022). DishBrain: in vitro neurons learn and exhibit sentience. Neuron
- Grillner, S. (2003). The motor infrastructure. Nature Reviews Neuroscience
API Reference
train_v032.py — CLI Arguments
--creature-name str 'go2' or creature profile name --scene str Natural language scene description --steps int Total training steps (default 50000) --seed int Random seed --resume str Path to checkpoint.pt (must re-pass --scene and --creature-name) --skip-morph-check Skip morphology validation --no-terrain Flat ground (no heightfield) --auto-reset int Reset after N steps without progress --device str 'cuda' or 'cpu' (auto-detected if omitted)
Key Functions
compute_reward(creature, sensor_data, prev_data, ball_info) → float
Multi-component reward: forward velocity + upright bonus (0.1) + ball approach + heading + contact bonus (5.0 at <0.3m).
save_checkpoint(path, creature, cerebellum, cpg, gate, step)
Saves: SNN state, cerebellum state_dict, CPG phases, competence gate, training step. Used for --resume.
FLOG Recording
Creature frames: every 10 steps (pos, vel, ball_pos, heading, speed, step) Stats frames: every 1000 steps (distance, falls, PE, reward, actor, cpg, behavior, ...) Event frames: on milestones (curriculum advance, falls, records)
Curriculum Stages (ball scenes)
Stage 0: ball 1.5m, 0° offset Advance: min_ball_dist < 0.5m Stage 1: ball 1.5m, 17° offset Advance: min_ball_dist < 0.5m Stage 2: ball 2.0m, 17° offset Advance: min_ball_dist < 0.5m Stage 3: ball 2.5m, 26° offset Advance: min_ball_dist < 0.5m Stage 4: ball 3.0m, 34° offset Final stage