MH-FLOCKE MH-FLOCKE
HomeDocsGitHubBlogPaperYouTubeReddit𝕏

Architecture Overview

The 15-Step Cognitive Cycle

MH-FLOCKE processes sensory input and produces motor output through a biologically grounded 15-step cognitive cycle, executed every simulation timestep. This architecture replaces the typical observe-act loop of reinforcement learning with a layered system inspired by vertebrate neuroscience.

The Cycle

  1. SENSE — Raw sensor values from MuJoCo (position, velocity, orientation, joint angles, visual target)
  2. BODY SCHEMA — Efference copy check: did the body move as predicted? Anomaly detection.
  3. WORLD MODEL — Spiking predictive model: predict next sensor state, compute prediction error (PE)
  4. EMOTIONS — Valence-arousal from body signals (falls → fear, progress → satisfaction)
  5. MEMORY — Retrieve similar past episodes via sensomotor pattern matching
  6. DRIVES — Compute dominant drive (survival, exploration, comfort, social)
  7. GWT — Global Workspace competition: sensory vs motor vs predictive vs error vs memory
  8. METACOGNITION — Self-monitoring: confidence, consciousness level, learning progress
  9. CONSISTENCY — Integrity check: are predictions, emotions, and memory aligned?
  10. REWARD — Combined signal: external reward + curiosity + empowerment + drive modulation + emotion factor
  11. LEARNING — R-STDP weight updates modulated by reward and prediction error (Free Energy Principle)
  12. SYNAPTOGENESIS — SNN spike patterns consolidated into concept graph
  13. HEBBIAN — Co-activation strengthening (neurons that fire together wire together)
  14. DREAM — Periodic offline replay for memory consolidation
  15. NEUROMOD — Adjust DA/5-HT/NE/ACh levels from emotion + metacognition + consistency

Why Not Reinforcement Learning?

Standard deep RL (PPO, SAC, TD3) optimizes a scalar reward signal through gradient descent on a deep neural network. This works, but produces a black box with no biological grounding. MH-FLOCKE takes a different approach:

  • Spiking Neural Network — Izhikevich neurons with biologically plausible dynamics, not differentiable activations
  • R-STDP — Reward-modulated spike-timing-dependent plasticity instead of backpropagation
  • Free Energy Principle — Prediction error minimization as the primary learning signal (Friston 2010), not reward maximization
  • Cerebellar Forward Model — Marr-Albus-Ito architecture for motor correction, not an actor-critic network
  • Central Pattern Generator — Innate rhythmic locomotion that the SNN learns to modulate, not learn from scratch

The result: a system where every component has a biological equivalent, and the learning dynamics can be explained in neuroscience terms.

System Diagram

The data flow follows the vertebrate nervous system hierarchy:

Scene Description ("dog plays with ball on grass")
  → TaskParser → Knowledge Acquisition → Terrain + Ball Injection
  → MuJoCo World (physics)
  
Sense-Think-Act Loop:
  MuJoCo Sensors → Population Coding (8 neurons/channel)
    → SNN (4000 GrC + 200 GoC + PkC + DCN)
      → R-STDP (reward + prediction error)
      → Cerebellar corrections (Marr-Albus-Ito)
    → Motor Decoding (push/pull population voting)
    → CPG Baseline + SNN corrections + Reflexes
    → PD Controller → MuJoCo Actuators
  
Cognitive Layer (parallel):
  → World Model (predictive coding)
  → Emotions (valence/arousal)
  → Drives (survival/exploration)
  → GWT Competition (attention)
  → Behavior Planner (walk/trot/chase/sniff/play)
  → Neuromodulators (DA/5-HT/NE/ACh)

Module Map

The system consists of 6 packages with approximately 45 active modules:

  • src/body/ — Physics, morphology, terrain, sensors (5 modules)
  • src/brain/ — SNN, learning, cognition, motor control (25+ modules)
  • src/behavior/ — Autonomous behavior system (5 modules)
  • src/bridge/ — Task parsing, scene understanding (2 modules)
  • src/llm/ — Optional LLM adapter for task understanding (1 module)
  • src/utils/ — Configuration (1 module)

Key References

  • Friston, K. (2010). The free-energy principle: a unified brain theory? Nature Reviews Neuroscience
  • Izhikevich, E.M. (2003). Simple model of spiking neurons. IEEE Transactions on Neural Networks
  • Marr, D. (1969). A theory of cerebellar cortex. Journal of Physiology
  • Kagan, B.J. et al. (2022). In vitro neurons learn and exhibit sentience when embodied in a simulated game-world. Neuron (DishBrain)
  • Grillner, S. (2003). The motor infrastructure: from ion channels to neuronal networks. Nature Reviews Neuroscience
  • Baars, B.J. (1988). A Cognitive Theory of Consciousness. Cambridge University Press

API Reference

CognitiveBrain(snn, n_sensor_channels, n_motors, config, plasticity_genome)

Orchestrates all 15 cognitive modules. Instantiated by MuJoCoCreatureBuilder.

process(sensor_values, snn_input, output_spikes, controls, external_reward, is_fallen, extra_sensor_data) → dict

Full 15-step cognitive cycle. Called by MuJoCoCreature.step() after the SNN step. Returns dict with combined_reward, prediction_error, gwt_winner, emotion, drives, metacognition, consistency, consciousness_level, pci, knowledge_graph, and more.

get_state() → dict

Complete cognitive state for dashboard/logging. All module states aggregated.

MuJoCoCreature(genome, snn, world, body_name, creature_name)

Sense-Think-Act bridge between SNN and MuJoCo physics.

step(reward_signal, extra_sensor_data) → dict

Full cycle: get_sensor_input() → think() → apply_motor_output() → world.step() → brain.process(). Returns step info.

get_sensor_input() → Tensor

MuJoCo sensors → population-coded SNN input (8 neurons per channel, Gaussian tuning curves).

think(sensor_input) → Tensor

6 SNN substeps, accumulate output spikes for motor decoding.

apply_motor_output(output_spikes)

Population voting (push/pull pairs) → CPG blend → reflex add → spinal segments → PD controller → MuJoCo actuators.

MuJoCoCreatureBuilder

build(genome, world, n_hidden_neurons, device, creature_name, xml_path) → MuJoCoCreature

Factory: creates creature with SNN (cerebellar populations), CognitiveBrain, and all wiring.