MH-FLOCKE MH-FLOCKE
HomeDocsGitHubBlogPaperYouTubeReddit𝕏

Training Pipeline

Training Pipeline — From Scene Description to Locomotion

MH-FLOCKE training starts with a natural language scene description and produces a creature that can navigate that scene. The full pipeline handles knowledge acquisition, terrain generation, sensor setup, and the training loop with curriculum advancement.

Quick Start

# Basic flat walking (10k steps, ~45min CPU / ~10min GPU)
python scripts/train_v032.py \
  --creature-name go2 \
  --scene "walk on flat meadow" \
  --steps 10000 \
  --skip-morph-check --no-terrain --auto-reset 500 --seed 1

# Ball interaction (50k steps, ~4h CPU / ~1h GPU)
python scripts/train_v032.py \
  --creature-name go2 \
  --scene "dog plays with ball on grass" \
  --steps 50000 \
  --skip-morph-check --no-terrain --auto-reset 500 --seed 42

GPU is auto-detected — if CUDA is available, all SNN computations run on GPU. MuJoCo physics always runs on CPU.

Pipeline Stages

  1. Scene Parsing — Natural language → task type, environment, difficulty (e.g., “hilly grassland” → locomotion, hills, 0.5 difficulty)
  2. Knowledge Acquisition — Generate or load behaviors appropriate for the scene (walk_hills, balance_slope, etc.)
  3. World Setup — Load Go2 model, inject terrain heightfield, inject ball if scene requires it, setup scent sources
  4. Training Loop — Full sense-think-act cycle every timestep, with R-STDP learning, cerebellar corrections, and behavior planning
  5. FLOG Recording — Binary training log written every 10 steps (creature frames) and every 1000 steps (stats frames)
  6. Checkpointing — SNN weights, cerebellum state, CPG phases saved periodically for resume

Reward Computation

The training loop computes a multi-component reward signal:

reward = forward_velocity_reward
       + upright_bonus (0.1 if upright > 0.7)
       + ball_approach_reward (if ball scene)
       + heading_reward (if ball scene)
       + contact_bonus (5.0 if ball_dist < 0.3m)

This reward feeds into the cognitive brain's combined reward computation, which adds curiosity, empowerment, drive modulation, and emotion factor.

Curriculum (Ball Scenes)

Ball scenes use a 5-stage curriculum that gradually increases difficulty:

Stage 0: ball at 1.5m, 0° offset (straight ahead)
Stage 1: ball at 1.5m, 17° offset
Stage 2: ball at 2.0m, 17° offset
Stage 3: ball at 2.5m, 26° offset
Stage 4: ball at 3.0m, 34° offset

Advancement: when the running minimum ball distance drops below 0.5m, the next stage unlocks.

Output

Each run produces:

  • training_log.bin — FLOG binary with all physics and stats data
  • snn_state.pt — SNN weights and network state
  • checkpoint.pt — Full resume state (SNN + cerebellum + CPG + gate)
  • brain.pt — Accumulated brain (cognitive modules, episodic memory, concept graph)

References

  • Friston, K. (2010). The free-energy principle. Nature Reviews Neuroscience
  • Kagan, B.J. et al. (2022). DishBrain: in vitro neurons learn and exhibit sentience. Neuron
  • Grillner, S. (2003). The motor infrastructure. Nature Reviews Neuroscience

API Reference

train_v032.py — CLI Arguments

--creature-name   str    'go2' or creature profile name
--scene           str    Natural language scene description
--steps           int    Total training steps (default 50000)
--seed            int    Random seed
--resume          str    Path to checkpoint.pt (must re-pass --scene and --creature-name)
--skip-morph-check       Skip morphology validation
--no-terrain             Flat ground (no heightfield)
--auto-reset      int    Reset after N steps without progress
--device          str    'cuda' or 'cpu' (auto-detected if omitted)

Key Functions

compute_reward(creature, sensor_data, prev_data, ball_info) → float

Multi-component reward: forward velocity + upright bonus (0.1) + ball approach + heading + contact bonus (5.0 at <0.3m).

save_checkpoint(path, creature, cerebellum, cpg, gate, step)

Saves: SNN state, cerebellum state_dict, CPG phases, competence gate, training step. Used for --resume.

FLOG Recording

Creature frames:  every 10 steps (pos, vel, ball_pos, heading, speed, step)
Stats frames:     every 1000 steps (distance, falls, PE, reward, actor, cpg, behavior, ...)
Event frames:     on milestones (curriculum advance, falls, records)

Curriculum Stages (ball scenes)

Stage 0: ball 1.5m, 0° offset     Advance: min_ball_dist < 0.5m
Stage 1: ball 1.5m, 17° offset    Advance: min_ball_dist < 0.5m
Stage 2: ball 2.0m, 17° offset    Advance: min_ball_dist < 0.5m
Stage 3: ball 2.5m, 26° offset    Advance: min_ball_dist < 0.5m
Stage 4: ball 3.0m, 34° offset    Final stage