MH-FLOCKE MH-FLOCKE
HomeDocsGitHubBlogPapersYouTubeReddit𝕏

The Dog Follows the Light — Phototaxis with Spiking Neurons

A 560-Neuron Spiking Network Steers a Quadruped Toward Light

For the first time, MH-FLOCKE’s robot dog actively navigates toward a light source in simulation — driven by hardwired reflexes and learned neural adaptations. No external reward signal. No reinforcement learning. Just body signals.

What You See in the Video

A Freenove Robot Dog (€100, Raspberry Pi, 12 servos) walks across a flat surface in MuJoCo simulation. A light source (yellow dot on the mini-map) is placed ahead and to the side. The dog detects the light gradient and steers toward it using a VOR (Vestibulo-Ocular Response) reflex — a hardwired brainstem circuit that turns the body toward a visual target.

The mini-map in the bottom-left corner shows the robot’s trail (green line) and the light waypoint (yellow glow). You can see the dog arcing toward the light rather than walking straight. This arc is not a software limitation — it reflects the physical turning radius of the Freenove’s CPG gait, which produces roughly 12% steering asymmetry between left and right legs.

Hardwired vs. Learned: The Biological Design

MH-FLOCKE follows the same principle as biological motor development. A newborn puppy doesn’t learn to walk from scratch — its spinal cord has CPG circuits that produce rhythmic leg movements from birth. The cerebellum calibrates these movements through experience. The brainstem provides reflexes like the VOR. Learning refines what reflexes provide.

Hardwired components (present from “birth”):

  • CPG (Central Pattern Generator) — Mathematical oscillator producing rhythmic gait. The SNN does not generate the gait pattern; CPG provides the baseline.
  • VOR (Vestibulo-Ocular Response) — Reflexive steering toward the light target. Hardwired, like in a real animal’s superior colliculus.
  • Run-and-Tumble — A bacterial-inspired navigation state machine (Berg & Brown, 1972). Alternates between running straight and turning (tumbling) when the gradient changes.
  • Spinal reflexes — Righting reflex, cross-extension reflex, terrain compensation.

Learned through training (emerges from experience):

  • SNN weights (R-STDP) — Reward-modulated spike-timing-dependent plasticity adapts 560 neuron connections based on intrinsic reward (vestibular comfort, prediction error, curiosity).
  • Cerebellar correction (Marr-Albus-Ito) — The cerebellum learns forward-model corrections. Correction magnitude grows from 0.0006 to 0.034 over training — the strongest cerebellar signal ever measured in MH-FLOCKE.
  • CPG-to-SNN handoff — The CPG starts at 90% control and fades to ~45% as the SNN proves it can maintain stable locomotion. The SNN earns control through competence, not through a timer.

The Numbers

  • 33,000 steps, 9.7 minutes training time (57 sps on CPU)
  • 0 falls, perfect upright streak
  • 2 light targets reached (sf:2) through active VOR-guided steering
  • VOR signal up to +0.54 — strong, sustained steering toward the light
  • 4 Run-and-Tumble events — the navigation state machine triggered naturally
  • Cerebellar correction: 0.008 — real Marr-Albus-Ito learning

Why the Dog Arcs Around the Light

You’ll notice the dog doesn’t walk straight to the light — it takes a wide arc. This is not a bug. The Freenove’s CPG produces approximately 12% amplitude asymmetry between left and right legs when steering. This gives the robot a turning radius of roughly 5 meters. The VOR reflex fires correctly, and the CPG responds — but the body can only turn as fast as the legs allow.

This is exactly what happens with real quadrupeds. A horse can’t make the same tight turns as a cat. The steering intention is there; the biomechanics set the limit.

Performance Breakthrough: 6× Speedup

This session also resolved a critical performance bug. Step-time was growing from 20ms to 800ms over 100k steps — making long training runs impossible. The root cause: an O(N²) clustering operation in the Synaptogenesis module that processed 5,000 accumulated experience patterns without clearing the buffer.

The fix (buffer.clear() after consolidation + max_size reduction) brought step-time back to a stable 18ms across 100k steps. Training speed went from 7 sps to 54 sps — a 6× improvement that makes all future development viable.

What’s Next

  • Hardware phototaxis — The same VOR steering with a real camera (cv2) on the Freenove, following a flashlight on the floor.
  • Autonomous loop — Instead of pre-placed waypoints, the dog chooses its own targets based on curiosity, exploration drive, and episodic memory. All the modules exist; they need a conductor.
  • Paper 2 — Sim-to-real transfer + phototaxis results for Frontiers in Neurorobotics or CoRL workshop.

MH-FLOCKE is an open-source project by Marc Hesse — independent researcher, Potsdam, Germany. Named after Flocke, my late dog.

Code: github.com/MarcHesse/mhflocke (Apache 2.0)
Paper: aiXiv Preprint

The Mogli Oscillator: When Your Robot Dog Gets a Real Spine

Today we replaced the mathematical heart of MH-FLOCKE’s locomotion system. The old Central Pattern Generator — a set of sine and cosine functions that produced smooth, predictable gait patterns — is gone. In its place: 24 Izhikevich spiking neurons that fight each other to make a robot dog walk.

What Changed

Every vertebrate animal has a spinal CPG — a neural circuit in the spinal cord that produces rhythmic locomotion patterns without input from the brain. Thomas Graham Brown discovered this in 1911: take two neurons, connect them with mutual inhibition, add intrinsic adaptation, and you get alternating rhythm. Flexor fires, inhibits extensor. Flexor fatigues, extensor takes over. Repeat forever.

The Mogli Oscillator implements exactly this. Each of the robot’s 12 joints gets a pair of Izhikevich neurons — one flexor, one extensor — that alternate through adaptation-driven switching. Four legs × 3 joints = 24 neurons total.

The legs are coupled through interneurons: left-right pairs inhibit each other (alternation), diagonal pairs excite each other (walk gait synchronization). The coupling topology determines the gait — and because the coupling weights are stored in a learnable matrix, they can adapt through R-STDP.

Why It Matters

The old sine/cosine CPG was a placeholder. It worked — the Freenove robot walked, the Go2 ran 45 meters — but it had fundamental limitations:

No per-leg independence. All legs followed the same global clock. If one leg needed to step differently — obstacle, slope, missing leg — the CPG couldn’t accommodate it.

No real turning. Steering was a hack: offset the abduction angle. The Mogli Oscillator turns by giving one side more tonic drive — longer steps on the outside, shorter on the inside. Exactly how animals turn.

No gait transitions. Walk, trot, gallop require different phase relationships between legs. The old CPG had fixed phase offsets. The Mogli Oscillator’s coupling weights can shift to produce any gait.

No fault tolerance. If a leg breaks, the old CPG keeps sending signals to all four legs. The Mogli Oscillator can reorganize: the remaining three oscillators find a new stable pattern through R-STDP. Like a real dog learning to walk on three legs.

Adaptive Gain: The Cautious Newborn

The output gain is not hardcoded — it develops. The oscillator starts with a gain of 3.0 (tiny, cautious movements) and ramps to 8.0 over the first 2000 training steps. This mirrors biological development: serotonin (5-HT) from the brainstem’s raphe nuclei gradually increases motor neuron excitability over the first postnatal days. A puppy’s first steps are small and wobbly — not because it’s broken, but because the gain hasn’t matured yet.

CPG Autonomy: The Decerebrate Cat

An interesting finding during development: when the behavior planner switched to “look around” or “alert” behaviors (which reduce movement amplitude), the Mogli Oscillator stalled completely. The robot stopped walking.

This is biologically wrong. Decerebrate cats — cats with the brain disconnected from the spinal cord — still walk on a treadmill. The spinal CPG is autonomous. The cortex modulates locomotion but cannot silence it. Only the basal ganglia can actively suppress locomotion, and we haven’t implemented that yet.

The fix: a CPG autonomy floor of 70%. The behavior planner can slow down the robot, but the spinal rhythm continues. This is not a hack — it’s an anatomical property of the spinal cord.

First Results

The Mogli Oscillator walks forward, maintains perfect balance (0 falls in 50,000 steps), and the SNN actor takes over faster than with the old CPG (65% actor competence vs 11%). The walk gait emerges from the coupling topology:

  • Left↔Right correlation: -0.78 (alternating)
  • Diagonal correlation: +0.73 (synchronized)
  • Exactly the pattern of a walking dog.
Metric Mogli Oscillator Mathematical CPG
Distance (50k steps) 1.21 m 8.2 m
Falls 0 0
Actor Competence 0.649 0.108
CPG Weight 58% 85%
Upright Streak 50,000 50,000

Distance is lower because the system prioritizes stability during learning — exactly what a biological newborn does. The actor learns 6× faster because the Mogli Oscillator provides richer, more variable input signals than the predictable sine wave.

Nothing Is Faked

Everything you see in the video is the real output of neuronal dynamics. No sine functions. No hardcoded trajectories. The 24 Izhikevich neurons fire, inhibit each other, and produce a walking pattern through the same mechanism that Thomas Graham Brown described 115 years ago. The wackiness you see is real — it’s not a smooth sine wave anymore, it’s spike-driven motor commands. Like a real animal.

What’s Next

R-STDP coupling learning — the robot learns to adjust its own gait pattern through reward-modulated plasticity. The weight matrix is prepared, the eligibility traces exist. Activation is next.

Limb loss compensation — simulate a broken leg in MuJoCo, watch the remaining oscillators reorganize. A robot dog that loses a leg and teaches itself to walk again — like a real animal. No retraining needed.

Hardware deployment — the Mogli Oscillator runs at 28ms/step in simulation. The Freenove Pi runs at 25ms/step. Same code, same architecture. The sim-to-real gap is bridged by on-device R-STDP learning.

The Mogli Oscillator is named after our current test pilot — a dog who doesn’t know he inspired a neural architecture.


Code: github.com/MarcHesse/mhflocke (Apache 2.0) — enable with --neural-cpg flag
Video: YouTube @mhflocke
Paper 1 — Architecture & Ablation: doi.org/10.5281/zenodo.19336894
Paper 2 — Sim-to-Real Transfer: doi.org/10.5281/zenodo.19481146

v0.4.3: The Wall — When a Spiking Neural Network Learns to Stop

A dog doesn’t need to crash into a wall twice. The first bump tells the whiskers something is wrong, the brainstem slams the brakes, and the cerebellum remembers: next time, slow down earlier.

Today’s update teaches the Freenove robot dog the same lesson — using the same biological architecture.

The Experiment

The setup is simple: a small quadruped robot, 232 spiking neurons, a wall 80cm ahead. No pre-programming, no path planner, no reward shaping beyond “hitting the wall is bad.” The question: can a biologically grounded spiking neural network learn active behavior change from a clear binary signal?

The answer, after 9 bug fixes and 8 training runs: yes.

What We Found (and What Broke Along the Way)

The first seven runs produced corrections of exactly 0.0000. The cerebellum was learning — PF→PkC weights were growing — but the corrections never reached the motors. Three fundamental architecture bugs were hiding in the pipeline:

Bug 1: The DCN was deaf. Deep Cerebellar Nuclei compute motor corrections as the difference between “push” and “pull” populations. But the DCN was reading Purkinje cell spike activity — an exponential moving average that was essentially zero because PkC rarely fire discrete spikes. The fix: read the graded compartment state (apical voltage + dendritic calcium), not just spikes. This is more biologically accurate — PkC→DCN synapses show graded GABAergic release proportional to membrane potential.

Bug 2: Symmetric climbing fibers. When the robot hit the wall, the Inferior Olive sent identical error signals to both push and pull Purkinje cells. Identical calcium → identical DCN inhibition → push minus pull = zero → corrections = zero. Always. The fix: asymmetric CF — push PkC get strong CF (0.9), pull PkC get weak CF (0.1). Biology does the same thing: when an animal hits an obstacle, the correction is to reduce forward drive and increase braking.

Bug 3: Weighted blending killed corrections. The CPG-SNN blend was cpg × weight + correction × (1 - weight). With CPG at 90%, corrections were multiplied by 0.1 — a 10× attenuation. The fix: additive blendingcpg × weight + correction. The cerebellum modulates the CPG via the reticulospinal tract; it doesn’t compete with it.

The Breakthrough: Run 8

After fixing all three bugs, Run 8 showed something we’d never seen before:

  • Corrections alive: 0.001–0.012 (was 0.0000 in all previous runs)
  • 6 wall collisions in 20,000 steps (episodic learning working)
  • Actor competence: 0.000 → 0.299 (first non-zero ever)
  • CPG weight: 90% → 55% (first handoff ever)

The robot walks toward the wall. The ultrasonic sensor (Channel 18) fires. DA drops from 0.22 to 0.05. The obstacle climbing fiber activates asymmetrically. The cerebellum learns. After each collision, the robot is reset to the start — like a puppy’s owner picking it up after it bumps into furniture.

The Architecture

The obstacle avoidance system adds three new components to MH-FLOCKE’s biological stack:

Ultrasonic Sensor (Channel 18): Simulated HC-SR04 rangefinder in MuJoCo, real HC-SR04 on the Raspberry Pi. Same encoding in both: nonlinear proximity mapping (√ function for urgency). The sensor channel is identical between simulation and hardware, enabling direct brain transfer.

Obstacle Climbing Fiber: Three zones with asymmetric error signals — COLLISION (<10cm, strong CF on push PkC), DANGER (<30cm, graded asymmetric CF), WARNING (<80cm, hip yaw CF for turning).

Trigeminal Brake: A hardwired reflex that reduces CPG amplitude near obstacles. Without this, the CPG at 90% overpowers any cerebellar correction. The brake creates space for learning.

What This Means

This is the first time in MH-FLOCKE’s history that the SNN has produced non-zero motor corrections that actually changed the robot’s behavior. The CPG handoff — from 90% to 55% — means the spiking neural network is taking over motor control from the innate rhythm generator.

The graded DCN fix alone improves every training run, not just obstacle scenes. Any user cloning the repo now gets a cerebellum that actually works.

Try It

git clone https://github.com/MarcHesse/mhflocke
cd mhflocke

# Obstacle avoidance (Freenove, 20k steps)
python scripts/train_v032.py --creature-name freenove \
  --scene "walk toward wall" --steps 20000 \
  --no-terrain --no-sensory --no-vision --hardware-sensors \
  --auto-reset 500 --fresh

# Normal walking (Go2, flat)
python scripts/train_v032.py --creature-name go2 \
  --scene "walk on flat meadow" --steps 50000 --no-terrain

What’s Next

The 50k run with episodic wall training, the same experiment on the Go2 (4,500 neurons), and eventually on the real Freenove hardware with the HC-SR04 sensor. The wall is just the beginning — the architecture now supports any binary sensory signal → cerebellar correction loop.

Named after Flocke — my dog who never needed 9 bug fixes to avoid a wall.

Why We Replaced Reward Shaping with Free Energy

Architecture Deep Dive · March 22, 2026 · MH-FLOCKE Level 15 v0.4.x


Every reinforcement learning tutorial starts the same way: define a reward function. Want the robot to walk? Reward forward velocity. Want it to reach a target? Reward proximity. Want it to stay upright? Penalize falling.

It works. PPO, SAC, and TD3 can solve locomotion tasks in hours. But there’s a problem that becomes obvious the moment you try to build something that actually behaves like an animal: reward functions are lies we tell the optimizer.

MH-FLOCKE doesn’t use reward shaping. It uses Free Energy — a framework from computational neuroscience that turns prediction errors into action. Here’s why that matters, and what it took to make it work.

The Problem with Rewards

A reward function encodes what the designer wants, not what the agent understands. When you write reward = forward_velocity * 0.5 - torque_penalty * 0.01 + alive_bonus * 1.0, you’re injecting your knowledge of physics, biomechanics, and task structure into a scalar signal. The agent never learns why moving forward is good. It learns that a particular number goes up when certain joint angles coincide with certain body velocities.

This creates three specific problems:

Reward hacking. The agent finds ways to maximize the number that have nothing to do with the intended behavior. A walking robot that discovers it can get alive_bonus by vibrating in place. A ball-chasing agent that orbits the ball at exactly the distance where reward is maximized without ever touching it.

Brittle transfer. Change the terrain, the body, or the task even slightly, and the carefully tuned reward weights collapse. A reward function tuned for flat ground produces bizarre gaits on slopes because the relative importance of balance vs. speed shifts — but the weights don’t.

No intrinsic motivation. Turn off the reward, and the agent stops. It has no reason to explore, no curiosity, no drive. In biological systems, animals explore even without external reward because the nervous system is fundamentally organized around reducing prediction error — not maximizing an external signal.

Free Energy: Prediction Error as the Universal Currency

The Free Energy Principle, formulated by Karl Friston, proposes that biological systems minimize the difference between what they predict and what they observe. This isn’t a reward — it’s an error signal. The organism builds a generative model of its world and acts to make that model’s predictions come true.

In MH-FLOCKE, this translates to a concrete mechanism. The system maintains predictions about its sensory states — joint angles, body orientation, distance to objects. When reality deviates from prediction, that deviation becomes the prediction error (PE). The system then has two options: update its model (perception) or act to change the world (action).

The key insight: you don’t need to tell the system what’s good. You need to tell it what to expect. If the system expects to be near the ball, being far from the ball creates prediction error. The system will act to reduce that error — not because it’s been rewarded for approaching, but because the discrepancy between expectation and reality is aversive at a fundamental computational level.

Implementation: Task-Specific Prediction Error

The abstract principle needed concrete engineering. Here’s how Free Energy works in MH-FLOCKE’s code.

The brain computes a Task-Specific Prediction Error (TPE) every simulation tick:

TPE = (ball_distance - expected_distance) / normalization_factor

This TPE feeds into three systems simultaneously:

1. The SNN learning rule. R-STDP modulates synaptic plasticity based on a combination of reward and prediction error: modulation = 0.1 × reward + 0.9 × (−PE). When the dog approaches the ball, PE decreases, the negative of that decrease is positive, and synapses that contributed to the approach get strengthened. The 90/10 split means prediction error dominates — the system learns primarily from its own internal error signal, not from external reward.

2. The Vision Boost. When TPE exceeds a threshold, the last 16 input neurons — carrying environmental sensory information — get amplified proportional to the error magnitude. This is biological attention: unexpected stimuli become more salient. The dog literally pays more attention to the ball when its predictions about ball distance are wrong.

3. Neuromodulation. TPE drives dopamine release in the simulated neuromodulatory system. High positive PE (far from expected position) triggers exploration via norepinephrine. Decreasing PE triggers dopamine, reinforcing the current behavioral strategy. This creates a natural explore-exploit balance without epsilon-greedy or entropy regularization.

What We Lost (and What We Gained)

Free Energy is not free. Compared to PPO with a well-tuned reward function, here’s what changed:

Lost: Speed of convergence. PPO can solve ball-approach in 50k steps with a dense reward. MH-FLOCKE needs 100k steps with the curriculum. The prediction error gradient is weaker than a hand-designed reward — the signal-to-noise ratio is lower because the system has to discover the relevance of its own error signals.

Lost: Simplicity. A reward function is 5 lines of code. The Free Energy implementation spans the SNN controller, the vision boost module, the neuromodulatory system, and the R-STDP learning rule. It’s distributed across the architecture, not centralized in one function.

Gained: Robustness. The 10-seed ablation study showed that MH-FLOCKE’s variance across seeds is dramatically lower than PPO. When it works, it works consistently — because the learning signal comes from internal prediction dynamics, not from the accident of which random seed produces a favorable initial exploration trajectory.

Gained: Emergent behavior. The dog developed behavioral sequences — sniff → walk → trot → chase → alert — that were never programmed and never rewarded. They emerged because the prediction error landscape naturally creates behavioral attractors. When the ball is far, prediction error is high, driving fast locomotion. When close, PE drops, and the gait naturally slows. The transitions aren’t state-machine logic — they’re the dynamics of a system minimizing its own surprise.

Gained: Transfer potential. The same Free Energy architecture that drives ball approach also drives obstacle avoidance, terrain adaptation, and righting after falls. Change the prediction (expect flat ground → encounter a slope), and the system adapts — not because we wrote a slope-reward, but because the prediction error automatically captures the relevant discrepancy.

The Honest Result

Our ablation study produced one genuinely negative finding: motivational drives (hunger, curiosity, social) don’t significantly improve locomotion quality. Configuration B (SNN + Cerebellum, no drives) performs identically to Configuration C (SNN + Cerebellum + drives). The drives affect navigation — which direction the dog goes — but not how well it walks.

This is a real limitation. Free Energy as implemented in MH-FLOCKE is primarily a navigation framework, not a locomotion framework. The actual walking comes from CPGs and the cerebellar forward model. Free Energy tells the dog where to go, not how to move its legs.

In biological systems, these aren’t separate — the motivation to move and the mechanics of movement are deeply intertwined through spinal-cortical loops. MH-FLOCKE’s current architecture treats them as modular, which is both its engineering strength and its biological weakness.

What’s Next

The next step is closing the loop: letting Free Energy modulate not just navigation but gait selection. When prediction error is high (ball is far, terrain is rough), the system should shift to a more cautious gait. When PE is low (ball is close, ground is flat), it should accelerate. The CPG already supports multiple gaits — the missing piece is using prediction error to select between them.

But the core insight stands: you don’t need to tell a system what’s good. You need to give it the ability to predict, and the drive to minimize the gap between prediction and reality. Everything else — approach, avoidance, exploration, caution — emerges from the dynamics of a system that hates being surprised.


MH-FLOCKE is an independent research project by Marc Hesse in Potsdam, Germany. Read the full technical details in our research paper or watch the latest results on YouTube.

Ball Contact — What 4 Changes Made It Work

Dev Log #1 · March 22, 2026 · MH-FLOCKE Level 15 v0.4.x


For weeks, the dog walked beautifully but ignored the ball completely. It would stroll past it, around it, occasionally bump into it by accident — but never pursue it. The spiking neural network was learning to walk. It just had no reason to care about a red sphere sitting on the grass.

Then, in a single 100k-step training run, everything changed. The Go2 quadruped turned toward the ball, approached it deliberately, and made contact — 294 frames of sustained ball interaction, with a minimum distance of 0.8 centimeters.

No reward shaping. No hardcoded “go to ball” command. Four architectural changes made a biologically grounded system do something that PPO with dense rewards still struggles with.

Here’s what happened.

The Problem: Walking Without Purpose

MH-FLOCKE’s brain runs a 15-step cognitive cycle every simulation tick. Spiking neurons fire. The cerebellum predicts motor outcomes. Central pattern generators produce rhythmic gaits. Neuromodulators shift between exploration and exploitation.

But all of this was happening in a closed loop. The SNN received sensory input that included ball distance and angle — the information was there. The network just had no gradient to follow. Ball distance was one of 80+ input dimensions, buried in proprioceptive noise. The R-STDP learning rule couldn’t distinguish “getting closer to the ball” from random fluctuation.

The system needed a way to feel that the ball matters.

Change 1: Task-Specific Prediction Error

Instead of using a generic reward signal, I introduced a task-specific prediction error (TPE) that directly encodes “how far am I from where I should be”:

TPE = (ball_dist - 3.0) / 3.0

When the dog is 3 meters from the ball, TPE is 0 — neutral. Closer than 3 meters, TPE goes negative — the world is better than expected. Further away, TPE grows positive — something is wrong.

This is not a reward. It’s a prediction error in the Free Energy sense: the system expects to be near the ball (because that’s where interesting things happen), and any deviation from that expectation creates a signal to act.

The critical difference from reward shaping: TPE doesn’t tell the dog what to do. It tells the dog how surprised it should be.

Change 2: Vision Boost

The TPE signal alone wasn’t enough. The SNN has 80+ input neurons, and the ball-related inputs (distance, angle) were getting drowned out by proprioceptive signals — joint angles, velocities, IMU readings. The network couldn’t hear the ball over the noise of its own body.

The fix: when TPE exceeds a threshold (0.05), the last 16 input neurons — the ones carrying sensory/environmental information — get amplified by TPE × 0.5. Higher prediction error means louder sensory input.

This mirrors how biological attention works: when something is unexpected, sensory cortex activity increases. The salience of the stimulus goes up proportional to how wrong your predictions are.

The effect was immediate. The SNN started responding to ball distance changes within the first 10k steps.

Change 3: R-STDP Sign Fix

This was the most embarrassing bug. The R-STDP learning rule combines reward and prediction error:

combined = 0.1 × reward + 0.9 × (−PE)

The minus sign on PE is critical. When the dog approaches the ball, PE decreases (less surprise). The negative of a decreasing value is positive — which means approaching creates positive reinforcement for the synapses that were active during that movement.

The original code had the sign flipped. Approaching the ball was punishing the very synapses that caused the approach. The SNN was literally learning to avoid the ball.

One minus sign. Weeks of debugging.

Change 4: Ball Curriculum

Even with correct gradients, dropping a ball 3 meters away at a random angle is too hard for a system that just learned to walk. The solution: a 5-stage curriculum.

Stage 1 starts the ball at 1.5 meters, directly ahead (0° angle). The dog barely has to turn — just walk forward. When ball_dist_min drops below 0.5 meters, the curriculum advances.

Each stage increases distance and angle: (1.5m, 0°) → (2.0m, 17°) → (2.5m, 23°) → (2.7m, 28°) → (3.0m, 34°).

In the 100k-step run, the dog advanced through two stages. It mastered straight-ahead approach, then learned to turn slightly before approaching. The curriculum let the SNN build on what it already knew.

Results

The numbers from the run:

  • 0.8 cm minimum ball distance — the dog essentially touched it
  • 294 contact frames — sustained interaction, not a single bump
  • 0 falls in 100k steps — stable locomotion throughout
  • 47 ball contact episodes across 5 curriculum stages
  • CPG at 40% — the dog was trotting, not sprinting

The 10-seed ablation study confirmed this wasn’t a fluke. Configuration B (SNN + Cerebellum) outperforms the PPO baseline by 3.5× on ball approach metrics, with significantly lower variance.

What This Means

This is not a robot dog playing fetch. It’s a proof of concept for something deeper: a biologically grounded system that develops goal-directed behavior through prediction error minimization, not through reward engineering.

The dog doesn’t get a treat for touching the ball. It touches the ball because touching the ball reduces prediction error. The ball is interesting because the system expects it to be interesting — and the Free Energy framework turns that expectation into action.

Four changes. One minus sign. A robot dog that learned to care about a ball.


MH-FLOCKE is an independent research project by Marc Hesse in Potsdam, Germany. The system runs on a Unitree Go2 quadruped in MuJoCo simulation, using spiking neural networks, a cerebellar forward model, and central pattern generators.

Watch the full run: YouTube Video #3 · Read the paper: aiXiv