The Dog Follows the Light — Phototaxis with Spiking Neurons
A 560-Neuron Spiking Network Steers a Quadruped Toward Light
For the first time, MH-FLOCKE’s robot dog actively navigates toward a light source in simulation — driven by hardwired reflexes and learned neural adaptations. No external reward signal. No reinforcement learning. Just body signals.
What You See in the Video
A Freenove Robot Dog (€100, Raspberry Pi, 12 servos) walks across a flat surface in MuJoCo simulation. A light source (yellow dot on the mini-map) is placed ahead and to the side. The dog detects the light gradient and steers toward it using a VOR (Vestibulo-Ocular Response) reflex — a hardwired brainstem circuit that turns the body toward a visual target.
The mini-map in the bottom-left corner shows the robot’s trail (green line) and the light waypoint (yellow glow). You can see the dog arcing toward the light rather than walking straight. This arc is not a software limitation — it reflects the physical turning radius of the Freenove’s CPG gait, which produces roughly 12% steering asymmetry between left and right legs.
Hardwired vs. Learned: The Biological Design
MH-FLOCKE follows the same principle as biological motor development. A newborn puppy doesn’t learn to walk from scratch — its spinal cord has CPG circuits that produce rhythmic leg movements from birth. The cerebellum calibrates these movements through experience. The brainstem provides reflexes like the VOR. Learning refines what reflexes provide.
Hardwired components (present from “birth”):
CPG (Central Pattern Generator) — Mathematical oscillator producing rhythmic gait. The SNN does not generate the gait pattern; CPG provides the baseline.
VOR (Vestibulo-Ocular Response) — Reflexive steering toward the light target. Hardwired, like in a real animal’s superior colliculus.
Run-and-Tumble — A bacterial-inspired navigation state machine (Berg & Brown, 1972). Alternates between running straight and turning (tumbling) when the gradient changes.
Learned through training (emerges from experience):
SNN weights (R-STDP) — Reward-modulated spike-timing-dependent plasticity adapts 560 neuron connections based on intrinsic reward (vestibular comfort, prediction error, curiosity).
Cerebellar correction (Marr-Albus-Ito) — The cerebellum learns forward-model corrections. Correction magnitude grows from 0.0006 to 0.034 over training — the strongest cerebellar signal ever measured in MH-FLOCKE.
CPG-to-SNN handoff — The CPG starts at 90% control and fades to ~45% as the SNN proves it can maintain stable locomotion. The SNN earns control through competence, not through a timer.
The Numbers
33,000 steps, 9.7 minutes training time (57 sps on CPU)
0 falls, perfect upright streak
2 light targets reached (sf:2) through active VOR-guided steering
VOR signal up to +0.54 — strong, sustained steering toward the light
4 Run-and-Tumble events — the navigation state machine triggered naturally
Cerebellar correction: 0.008 — real Marr-Albus-Ito learning
Why the Dog Arcs Around the Light
You’ll notice the dog doesn’t walk straight to the light — it takes a wide arc. This is not a bug. The Freenove’s CPG produces approximately 12% amplitude asymmetry between left and right legs when steering. This gives the robot a turning radius of roughly 5 meters. The VOR reflex fires correctly, and the CPG responds — but the body can only turn as fast as the legs allow.
This is exactly what happens with real quadrupeds. A horse can’t make the same tight turns as a cat. The steering intention is there; the biomechanics set the limit.
Performance Breakthrough: 6× Speedup
This session also resolved a critical performance bug. Step-time was growing from 20ms to 800ms over 100k steps — making long training runs impossible. The root cause: an O(N²) clustering operation in the Synaptogenesis module that processed 5,000 accumulated experience patterns without clearing the buffer.
The fix (buffer.clear() after consolidation + max_size reduction) brought step-time back to a stable 18ms across 100k steps. Training speed went from 7 sps to 54 sps — a 6× improvement that makes all future development viable.
What’s Next
Hardware phototaxis — The same VOR steering with a real camera (cv2) on the Freenove, following a flashlight on the floor.
Autonomous loop — Instead of pre-placed waypoints, the dog chooses its own targets based on curiosity, exploration drive, and episodic memory. All the modules exist; they need a conductor.
Paper 2 — Sim-to-real transfer + phototaxis results for Frontiers in Neurorobotics or CoRL workshop.
MH-FLOCKE is an open-source project by Marc Hesse — independent researcher, Potsdam, Germany. Named after Flocke, my late dog.
Dev Log #1 · March 22, 2026 · MH-FLOCKE Level 15 v0.4.x
For weeks, the dog walked beautifully but ignored the ball completely. It would stroll past it, around it, occasionally bump into it by accident — but never pursue it. The spiking neural network was learning to walk. It just had no reason to care about a red sphere sitting on the grass.
Then, in a single 100k-step training run, everything changed. The Go2 quadruped turned toward the ball, approached it deliberately, and made contact — 294 frames of sustained ball interaction, with a minimum distance of 0.8 centimeters.
No reward shaping. No hardcoded “go to ball” command. Four architectural changes made a biologically grounded system do something that PPO with dense rewards still struggles with.
Here’s what happened.
The Problem: Walking Without Purpose
MH-FLOCKE’s brain runs a 15-step cognitive cycle every simulation tick. Spiking neurons fire. The cerebellum predicts motor outcomes. Central pattern generators produce rhythmic gaits. Neuromodulators shift between exploration and exploitation.
But all of this was happening in a closed loop. The SNN received sensory input that included ball distance and angle — the information was there. The network just had no gradient to follow. Ball distance was one of 80+ input dimensions, buried in proprioceptive noise. The R-STDP learning rule couldn’t distinguish “getting closer to the ball” from random fluctuation.
The system needed a way to feel that the ball matters.
Change 1: Task-Specific Prediction Error
Instead of using a generic reward signal, I introduced a task-specific prediction error (TPE) that directly encodes “how far am I from where I should be”:
TPE = (ball_dist - 3.0) / 3.0
When the dog is 3 meters from the ball, TPE is 0 — neutral. Closer than 3 meters, TPE goes negative — the world is better than expected. Further away, TPE grows positive — something is wrong.
This is not a reward. It’s a prediction error in the Free Energy sense: the system expects to be near the ball (because that’s where interesting things happen), and any deviation from that expectation creates a signal to act.
The critical difference from reward shaping: TPE doesn’t tell the dog what to do. It tells the dog how surprised it should be.
Change 2: Vision Boost
The TPE signal alone wasn’t enough. The SNN has 80+ input neurons, and the ball-related inputs (distance, angle) were getting drowned out by proprioceptive signals — joint angles, velocities, IMU readings. The network couldn’t hear the ball over the noise of its own body.
The fix: when TPE exceeds a threshold (0.05), the last 16 input neurons — the ones carrying sensory/environmental information — get amplified by TPE × 0.5. Higher prediction error means louder sensory input.
This mirrors how biological attention works: when something is unexpected, sensory cortex activity increases. The salience of the stimulus goes up proportional to how wrong your predictions are.
The effect was immediate. The SNN started responding to ball distance changes within the first 10k steps.
Change 3: R-STDP Sign Fix
This was the most embarrassing bug. The R-STDP learning rule combines reward and prediction error:
combined = 0.1 × reward + 0.9 × (−PE)
The minus sign on PE is critical. When the dog approaches the ball, PE decreases (less surprise). The negative of a decreasing value is positive — which means approaching creates positive reinforcement for the synapses that were active during that movement.
The original code had the sign flipped. Approaching the ball was punishing the very synapses that caused the approach. The SNN was literally learning to avoid the ball.
One minus sign. Weeks of debugging.
Change 4: Ball Curriculum
Even with correct gradients, dropping a ball 3 meters away at a random angle is too hard for a system that just learned to walk. The solution: a 5-stage curriculum.
Stage 1 starts the ball at 1.5 meters, directly ahead (0° angle). The dog barely has to turn — just walk forward. When ball_dist_min drops below 0.5 meters, the curriculum advances.
Each stage increases distance and angle: (1.5m, 0°) → (2.0m, 17°) → (2.5m, 23°) → (2.7m, 28°) → (3.0m, 34°).
In the 100k-step run, the dog advanced through two stages. It mastered straight-ahead approach, then learned to turn slightly before approaching. The curriculum let the SNN build on what it already knew.
Results
The numbers from the run:
0.8 cm minimum ball distance — the dog essentially touched it
294 contact frames — sustained interaction, not a single bump
0 falls in 100k steps — stable locomotion throughout
47 ball contact episodes across 5 curriculum stages
CPG at 40% — the dog was trotting, not sprinting
The 10-seed ablation study confirmed this wasn’t a fluke. Configuration B (SNN + Cerebellum) outperforms the PPO baseline by 3.5× on ball approach metrics, with significantly lower variance.
What This Means
This is not a robot dog playing fetch. It’s a proof of concept for something deeper: a biologically grounded system that develops goal-directed behavior through prediction error minimization, not through reward engineering.
The dog doesn’t get a treat for touching the ball. It touches the ball because touching the ball reduces prediction error. The ball is interesting because the system expects it to be interesting — and the Free Energy framework turns that expectation into action.
Four changes. One minus sign. A robot dog that learned to care about a ball.
MH-FLOCKE is an independent research project by Marc Hesse in Potsdam, Germany. The system runs on a Unitree Go2 quadruped in MuJoCo simulation, using spiking neural networks, a cerebellar forward model, and central pattern generators.
To provide the best experience, we use technologies like cookies to store and/or access device information. Consenting to these technologies allows us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent may adversely affect certain features and functions.
Functional
Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Präferenzen
Die technische Speicherung oder der Zugriff ist für den rechtmäßigen Zweck der Speicherung von Präferenzen erforderlich, die nicht vom Abonnenten oder Benutzer angefordert wurden.
Statistics
Die technische Speicherung oder der Zugriff, der ausschließlich zu statistischen Zwecken erfolgt.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.