MH-FLOCKE MH-FLOCKE
HomeDocsGitHubBlogPapersYouTubeReddit𝕏

The Dog Follows the Light — Phototaxis with Spiking Neurons

A 560-Neuron Spiking Network Steers a Quadruped Toward Light

For the first time, MH-FLOCKE’s robot dog actively navigates toward a light source in simulation — driven by hardwired reflexes and learned neural adaptations. No external reward signal. No reinforcement learning. Just body signals.

What You See in the Video

A Freenove Robot Dog (€100, Raspberry Pi, 12 servos) walks across a flat surface in MuJoCo simulation. A light source (yellow dot on the mini-map) is placed ahead and to the side. The dog detects the light gradient and steers toward it using a VOR (Vestibulo-Ocular Response) reflex — a hardwired brainstem circuit that turns the body toward a visual target.

The mini-map in the bottom-left corner shows the robot’s trail (green line) and the light waypoint (yellow glow). You can see the dog arcing toward the light rather than walking straight. This arc is not a software limitation — it reflects the physical turning radius of the Freenove’s CPG gait, which produces roughly 12% steering asymmetry between left and right legs.

Hardwired vs. Learned: The Biological Design

MH-FLOCKE follows the same principle as biological motor development. A newborn puppy doesn’t learn to walk from scratch — its spinal cord has CPG circuits that produce rhythmic leg movements from birth. The cerebellum calibrates these movements through experience. The brainstem provides reflexes like the VOR. Learning refines what reflexes provide.

Hardwired components (present from “birth”):

  • CPG (Central Pattern Generator) — Mathematical oscillator producing rhythmic gait. The SNN does not generate the gait pattern; CPG provides the baseline.
  • VOR (Vestibulo-Ocular Response) — Reflexive steering toward the light target. Hardwired, like in a real animal’s superior colliculus.
  • Run-and-Tumble — A bacterial-inspired navigation state machine (Berg & Brown, 1972). Alternates between running straight and turning (tumbling) when the gradient changes.
  • Spinal reflexes — Righting reflex, cross-extension reflex, terrain compensation.

Learned through training (emerges from experience):

  • SNN weights (R-STDP) — Reward-modulated spike-timing-dependent plasticity adapts 560 neuron connections based on intrinsic reward (vestibular comfort, prediction error, curiosity).
  • Cerebellar correction (Marr-Albus-Ito) — The cerebellum learns forward-model corrections. Correction magnitude grows from 0.0006 to 0.034 over training — the strongest cerebellar signal ever measured in MH-FLOCKE.
  • CPG-to-SNN handoff — The CPG starts at 90% control and fades to ~45% as the SNN proves it can maintain stable locomotion. The SNN earns control through competence, not through a timer.

The Numbers

  • 33,000 steps, 9.7 minutes training time (57 sps on CPU)
  • 0 falls, perfect upright streak
  • 2 light targets reached (sf:2) through active VOR-guided steering
  • VOR signal up to +0.54 — strong, sustained steering toward the light
  • 4 Run-and-Tumble events — the navigation state machine triggered naturally
  • Cerebellar correction: 0.008 — real Marr-Albus-Ito learning

Why the Dog Arcs Around the Light

You’ll notice the dog doesn’t walk straight to the light — it takes a wide arc. This is not a bug. The Freenove’s CPG produces approximately 12% amplitude asymmetry between left and right legs when steering. This gives the robot a turning radius of roughly 5 meters. The VOR reflex fires correctly, and the CPG responds — but the body can only turn as fast as the legs allow.

This is exactly what happens with real quadrupeds. A horse can’t make the same tight turns as a cat. The steering intention is there; the biomechanics set the limit.

Performance Breakthrough: 6× Speedup

This session also resolved a critical performance bug. Step-time was growing from 20ms to 800ms over 100k steps — making long training runs impossible. The root cause: an O(N²) clustering operation in the Synaptogenesis module that processed 5,000 accumulated experience patterns without clearing the buffer.

The fix (buffer.clear() after consolidation + max_size reduction) brought step-time back to a stable 18ms across 100k steps. Training speed went from 7 sps to 54 sps — a 6× improvement that makes all future development viable.

What’s Next

  • Hardware phototaxis — The same VOR steering with a real camera (cv2) on the Freenove, following a flashlight on the floor.
  • Autonomous loop — Instead of pre-placed waypoints, the dog chooses its own targets based on curiosity, exploration drive, and episodic memory. All the modules exist; they need a conductor.
  • Paper 2 — Sim-to-real transfer + phototaxis results for Frontiers in Neurorobotics or CoRL workshop.

MH-FLOCKE is an open-source project by Marc Hesse — independent researcher, Potsdam, Germany. Named after Flocke, my late dog.

Code: github.com/MarcHesse/mhflocke (Apache 2.0)
Paper: aiXiv Preprint

The Dog Finds Its Target — No Reward Required

The Freenove robot dog can now navigate to targets on its own. No external reward, no reward shaping, no supervision. It sniffs, turns, runs, and finds what it’s looking for — using a biological navigation pattern that bacteria figured out billions of years ago.

What happened

The Freenove robot dog — a 100-euro kit with a Raspberry Pi and 12 servos — is controlled by a network of 560 spiking Izhikevich neurons. Not conventional neural networks, but biologically realistic neurons that fire like real brain cells.

In simulation, we place scent sources on the ground. The dog can sense the scent intensity and direction. It has to figure out how to get there by itself.

The result after 33,000 steps: 5.43 meters walked, 4 scent sources found, zero falls. That’s 61% further than the baseline without scent, which just walked straight ahead blindly.

Why this is harder than it sounds

The obvious approach — continuously steer toward the smell — doesn’t work. We tried it. The dog spiraled in circles. Every step, it corrected its heading. Every correction shifted the scent angle. The feedback loop turned into a death spiral.

This is a classic engineering mistake. Real animals don’t navigate like PID controllers.

How real animals navigate

Bacteria solved this problem billions of years ago. The mechanism was described by Berg and Brown in 1972: Run-and-Tumble.

The principle is simple. Sniff — measure the gradient. Tumble — a brief turning impulse toward the source. Run — walk straight, no corrections. Then sniff again. If the scent got stronger during the run, extend the next straight phase. If not, correct more often.

It’s not continuous steering. It’s a rhythm of orientation and movement. Sniff, turn, run. Sniff, turn, run. Like a dog following a trail.

What we built

We implemented this biological pattern as a state machine in the training loop. Three states: SNIFF (1 step — measure the gradient), TUMBLE (12 steps — steering impulse), RUN (40 steps — straight ahead, no corrections).

The system includes an improvement check: if scent strength increased after a run, the next straight phase gets extended. The dog found the right direction — keep going. If not, the phase stays short and it corrects more frequently.

Three bugs, one breakthrough

It didn’t work immediately. Three bugs were hiding in the system.

First, the heading computation was wrong. A quaternion has four components, and we used the W component as the yaw angle. That value is always approximately 1.0. The dog had been sniffing in the wrong direction for weeks without us noticing.

Second, the scent radius was too small. The Freenove is a small robot with short steps. At a target radius of 0.5 meters, it walked past targets more often than into them.

Third, new scent sources spawned behind the dog instead of ahead of it, because the respawn logic didn’t account for the walking direction.

After fixing all three: 4 targets found, 5.43 meters, zero falls. And the Directed Learning module autonomously tested and confirmed a hypothesis about gait frequency — without us programming that behavior.

What this means

The system now consists of: a spiking neural network that learns motor control, a cerebellum that corrects movements in real time, a central pattern generator that provides the base gait, an emotion and motivation system, episodic memory, and biologically grounded navigation.

None of these modules use external reward. The dog doesn’t get points for walking or arriving. It learns from body signals: losing balance feels bad, moving feels good, curiosity drives it forward.

And it runs on a 100-euro robot. Same code in simulation and on the Raspberry Pi.

What’s next

The next step is a closed learning loop: after each episode, the dog asks itself what worked and what didn’t. The building blocks exist — episodic memory, concept graph, world model, directed learning. They just need to be connected.

On real hardware, scent becomes light: the Freenove has a camera, and brightness in the image is a gradient just like scent in the air. Same algorithm, different sensor. Point a flashlight at the floor, the dog walks toward it.

The video is on YouTube https://www.youtube.com/watch?v=phYPEFLMlJI.

Code on GitHub: github.com/MarcHesse/mhflocke