MH-FLOCKE MH-FLOCKE
HomeDocsGitHubBlogPapersYouTubeReddit𝕏

The Dog Finds Its Target — No Reward Required

23 Apr 2026 4 min read Marc Hesse
The Dog Finds Its Target — No Reward Required

The Freenove robot dog can now navigate to targets on its own. No external reward, no reward shaping, no supervision. It sniffs, turns, runs, and finds what it’s looking for — using a biological navigation pattern that bacteria figured out billions of years ago.

What happened

The Freenove robot dog — a 100-euro kit with a Raspberry Pi and 12 servos — is controlled by a network of 560 spiking Izhikevich neurons. Not conventional neural networks, but biologically realistic neurons that fire like real brain cells.

In simulation, we place scent sources on the ground. The dog can sense the scent intensity and direction. It has to figure out how to get there by itself.

The result after 33,000 steps: 5.43 meters walked, 4 scent sources found, zero falls. That’s 61% further than the baseline without scent, which just walked straight ahead blindly.

Why this is harder than it sounds

The obvious approach — continuously steer toward the smell — doesn’t work. We tried it. The dog spiraled in circles. Every step, it corrected its heading. Every correction shifted the scent angle. The feedback loop turned into a death spiral.

This is a classic engineering mistake. Real animals don’t navigate like PID controllers.

How real animals navigate

Bacteria solved this problem billions of years ago. The mechanism was described by Berg and Brown in 1972: Run-and-Tumble.

The principle is simple. Sniff — measure the gradient. Tumble — a brief turning impulse toward the source. Run — walk straight, no corrections. Then sniff again. If the scent got stronger during the run, extend the next straight phase. If not, correct more often.

It’s not continuous steering. It’s a rhythm of orientation and movement. Sniff, turn, run. Sniff, turn, run. Like a dog following a trail.

What we built

We implemented this biological pattern as a state machine in the training loop. Three states: SNIFF (1 step — measure the gradient), TUMBLE (12 steps — steering impulse), RUN (40 steps — straight ahead, no corrections).

The system includes an improvement check: if scent strength increased after a run, the next straight phase gets extended. The dog found the right direction — keep going. If not, the phase stays short and it corrects more frequently.

Three bugs, one breakthrough

It didn’t work immediately. Three bugs were hiding in the system.

First, the heading computation was wrong. A quaternion has four components, and we used the W component as the yaw angle. That value is always approximately 1.0. The dog had been sniffing in the wrong direction for weeks without us noticing.

Second, the scent radius was too small. The Freenove is a small robot with short steps. At a target radius of 0.5 meters, it walked past targets more often than into them.

Third, new scent sources spawned behind the dog instead of ahead of it, because the respawn logic didn’t account for the walking direction.

After fixing all three: 4 targets found, 5.43 meters, zero falls. And the Directed Learning module autonomously tested and confirmed a hypothesis about gait frequency — without us programming that behavior.

What this means

The system now consists of: a spiking neural network that learns motor control, a cerebellum that corrects movements in real time, a central pattern generator that provides the base gait, an emotion and motivation system, episodic memory, and biologically grounded navigation.

None of these modules use external reward. The dog doesn’t get points for walking or arriving. It learns from body signals: losing balance feels bad, moving feels good, curiosity drives it forward.

And it runs on a 100-euro robot. Same code in simulation and on the Raspberry Pi.

What’s next

The next step is a closed learning loop: after each episode, the dog asks itself what worked and what didn’t. The building blocks exist — episodic memory, concept graph, world model, directed learning. They just need to be connected.

On real hardware, scent becomes light: the Freenove has a camera, and brightness in the image is a gradient just like scent in the air. Same algorithm, different sensor. Point a flashlight at the floor, the dog walks toward it.

The video is on YouTube https://www.youtube.com/watch?v=phYPEFLMlJI.

Code on GitHub: github.com/MarcHesse/mhflocke

#Baby-KI #chemotaxis #Freenove #intrinsic reward #navigation #Run-and-Tumble #SNN