The 15-Step Cognitive Cycle
MH-FLOCKE processes sensory input and produces motor output through a biologically grounded 15-step cognitive cycle, executed every simulation timestep. This architecture replaces the typical observe-act loop of reinforcement learning with a layered system inspired by vertebrate neuroscience.
The Cycle
- SENSE — Raw sensor values from MuJoCo (position, velocity, orientation, joint angles, visual target)
- BODY SCHEMA — Efference copy check: did the body move as predicted? Anomaly detection.
- WORLD MODEL — Spiking predictive model: predict next sensor state, compute prediction error (PE)
- EMOTIONS — Valence-arousal from body signals (falls → fear, progress → satisfaction)
- MEMORY — Retrieve similar past episodes via sensomotor pattern matching
- DRIVES — Compute dominant drive (survival, exploration, comfort, social)
- GWT — Global Workspace competition: sensory vs motor vs predictive vs error vs memory
- METACOGNITION — Self-monitoring: confidence, consciousness level, learning progress
- CONSISTENCY — Integrity check: are predictions, emotions, and memory aligned?
- REWARD — Combined signal: external reward + curiosity + empowerment + drive modulation + emotion factor
- LEARNING — R-STDP weight updates modulated by reward and prediction error (Free Energy Principle)
- SYNAPTOGENESIS — SNN spike patterns consolidated into concept graph
- HEBBIAN — Co-activation strengthening (neurons that fire together wire together)
- DREAM — Periodic offline replay for memory consolidation
- NEUROMOD — Adjust DA/5-HT/NE/ACh levels from emotion + metacognition + consistency
Why Not Reinforcement Learning?
Standard deep RL (PPO, SAC, TD3) optimizes a scalar reward signal through gradient descent on a deep neural network. This works, but produces a black box with no biological grounding. MH-FLOCKE takes a different approach:
- Spiking Neural Network — Izhikevich neurons with biologically plausible dynamics, not differentiable activations
- R-STDP — Reward-modulated spike-timing-dependent plasticity instead of backpropagation
- Free Energy Principle — Prediction error minimization as the primary learning signal (Friston 2010), not reward maximization
- Cerebellar Forward Model — Marr-Albus-Ito architecture for motor correction, not an actor-critic network
- Central Pattern Generator — Innate rhythmic locomotion that the SNN learns to modulate, not learn from scratch
The result: a system where every component has a biological equivalent, and the learning dynamics can be explained in neuroscience terms.
System Diagram
The data flow follows the vertebrate nervous system hierarchy:
Scene Description ("dog plays with ball on grass")
→ TaskParser → Knowledge Acquisition → Terrain + Ball Injection
→ MuJoCo World (physics)
Sense-Think-Act Loop:
MuJoCo Sensors → Population Coding (8 neurons/channel)
→ SNN (4000 GrC + 200 GoC + PkC + DCN)
→ R-STDP (reward + prediction error)
→ Cerebellar corrections (Marr-Albus-Ito)
→ Motor Decoding (push/pull population voting)
→ CPG Baseline + SNN corrections + Reflexes
→ PD Controller → MuJoCo Actuators
Cognitive Layer (parallel):
→ World Model (predictive coding)
→ Emotions (valence/arousal)
→ Drives (survival/exploration)
→ GWT Competition (attention)
→ Behavior Planner (walk/trot/chase/sniff/play)
→ Neuromodulators (DA/5-HT/NE/ACh)
Module Map
The system consists of 6 packages with approximately 45 active modules:
- src/body/ — Physics, morphology, terrain, sensors (5 modules)
- src/brain/ — SNN, learning, cognition, motor control (25+ modules)
- src/behavior/ — Autonomous behavior system (5 modules)
- src/bridge/ — Task parsing, scene understanding (2 modules)
- src/llm/ — Optional LLM adapter for task understanding (1 module)
- src/utils/ — Configuration (1 module)
Key References
- Friston, K. (2010). The free-energy principle: a unified brain theory? Nature Reviews Neuroscience
- Izhikevich, E.M. (2003). Simple model of spiking neurons. IEEE Transactions on Neural Networks
- Marr, D. (1969). A theory of cerebellar cortex. Journal of Physiology
- Kagan, B.J. et al. (2022). In vitro neurons learn and exhibit sentience when embodied in a simulated game-world. Neuron (DishBrain)
- Grillner, S. (2003). The motor infrastructure: from ion channels to neuronal networks. Nature Reviews Neuroscience
- Baars, B.J. (1988). A Cognitive Theory of Consciousness. Cambridge University Press
API Reference
CognitiveBrain(snn, n_sensor_channels, n_motors, config, plasticity_genome)
Orchestrates all 15 cognitive modules. Instantiated by MuJoCoCreatureBuilder.
process(sensor_values, snn_input, output_spikes, controls, external_reward, is_fallen, extra_sensor_data) → dict
Full 15-step cognitive cycle. Called by MuJoCoCreature.step() after the SNN step. Returns dict with combined_reward, prediction_error, gwt_winner, emotion, drives, metacognition, consistency, consciousness_level, pci, knowledge_graph, and more.
get_state() → dict
Complete cognitive state for dashboard/logging. All module states aggregated.
MuJoCoCreature(genome, snn, world, body_name, creature_name)
Sense-Think-Act bridge between SNN and MuJoCo physics.
step(reward_signal, extra_sensor_data) → dict
Full cycle: get_sensor_input() → think() → apply_motor_output() → world.step() → brain.process(). Returns step info.
get_sensor_input() → Tensor
MuJoCo sensors → population-coded SNN input (8 neurons per channel, Gaussian tuning curves).
think(sensor_input) → Tensor
6 SNN substeps, accumulate output spikes for motor decoding.
apply_motor_output(output_spikes)
Population voting (push/pull pairs) → CPG blend → reflex add → spinal segments → PD controller → MuJoCo actuators.
MuJoCoCreatureBuilder
build(genome, world, n_hidden_neurons, device, creature_name, xml_path) → MuJoCoCreature
Factory: creates creature with SNN (cerebellar populations), CognitiveBrain, and all wiring.