My Robot Dog Couldn’t Walk Straight — 8 Bugs and a New Steering System

The Freenove robot dog has a problem. It drifts to the right. Every unit is different — servo tolerances, weight distribution, surface friction — and no amount of calibration fixes it permanently. The drift changes with battery level, temperature, and floor surface.

For weeks, I tried to fix this with Z-offset steering — shifting all four feet sideways to create a turning moment. It seemed logical. It was useless. A 45-second hardware test proved it: ±5mm of Z-offset produces less than 5 degrees of effect against 70 degrees of mechanical drift. One measurement killed weeks of assumptions.

Tank Steering

The replacement is asymmetric stride — differential hip amplitude between left and right legs. Left legs take a longer step, the dog curves right. Like a tank. Biology does this through the reticulospinal tract, which modulates stride length independently per side.

Hardware test: drift reduced from 70 degrees to 8.5 degrees. Three times more effective than Z-offset, and it works on any surface because the IMU provides closed-loop feedback.

The PID controller reads the actual heading from the MPU6050 IMU, compares it to the target heading from the camera (where the light is), and drives the stride asymmetry. No calibration, no per-robot tuning. The I-term accumulates over time to eliminate steady-state offset — exactly what the cerebellum does biologically through long-term depression.

Eight Bugs in One Session

The steering replacement exposed eight bugs that had been hiding in the system:

The steering signal was computed correctly but silently dropped in compute_tendon() — wrong code path, one line fix. The log was showing a proxy value instead of the actual steering — the controller had been working the whole time, but the display said zero. The competence gate required walking speed above 0.03 m/s, but with drift, all locomotion energy went to correction instead of forward progress. The baby never grew up. MuJoCo yaw convention is inverted versus hardware — one minus sign. A threshold prevented target updates when the dog was roughly aimed at the light. The PD controller initialized its target to zero instead of the current heading.

Each one of these individually prevented the system from working. Finding them required switching between simulation and real hardware, comparing signs and values, and measuring instead of guessing.

The Dog Approaches the Light

After fixing all eight bugs and tuning the PID on hardware (Kp=0.05, Ki=0.01, Kd=0.015), the Freenove robot approaches a light source from 0.52 meters to 0.17 meters in 60 seconds. Not perfect tracking — the steering saturates near the end — but genuine, IMU-corrected, drift-compensated navigation on a 100-euro robot kit.

In simulation with measured hardware drift injected: 50,000 steps, zero falls, three light targets found, actor competence 1.0.

Meta-Learning Loop

This release also introduces the complete autonomous meta-learning loop — four modules that form a closed self-improvement cycle:

EpisodeAnalyzer compares successful versus unsuccessful navigation events and identifies what makes the dog successful. Which context variables (gait quality, heading error, velocity, steering offset) correlate with finding the light?

StrategyAdapter converts those insights into parameter adjustments — modifying run/tumble duration, PID gains, and exploration bias.

CuriosityExplorer uses world model prediction error to drive exploration. High prediction error means unfamiliar territory — explore more. Low prediction error means familiar ground — exploit what works.

HypothesisGenerator creates testable motor hypotheses from insights that can be tested autonomously through the existing Directed Learning module.

The loop runs but has not generated insights yet — the dog is too successful in the current scenario (100% success rate, no failures to learn from). Harder scenarios and longer runs will activate it.

Hardware Drift Simulation

The drift profile injected into the simulator has been updated. The previous measurement (-0.4 deg/s) was taken during a stationary test. Under walking load, the actual drift is 1.5 to 2.0 deg/s — servo asymmetry amplifies under dynamic conditions. The updated profile makes simulation training more realistic.

Every Freenove unit has different drift characteristics. MH-FLOCKE handles this automatically through the PID controller. No manual calibration needed.

What Comes Next

Hardware video with the new PID steering. Longer simulation runs to activate the meta-learning loop. A potential third robot platform (Petoi Bittle X V2) to prove the architecture is body-agnostic.

The code is on GitHub (Apache 2.0). Updated documentation at mhflocke.com/docs.

v0.5.1 — Full changelog

The Dog Follows the Light — Phototaxis with Spiking Neurons

A 560-Neuron Spiking Network Steers a Quadruped Toward Light

For the first time, MH-FLOCKE’s robot dog actively navigates toward a light source in simulation — driven by hardwired reflexes and learned neural adaptations. No external reward signal. No backpropagation. Just body signals.

What You See in the Video

A Freenove Robot Dog (€100, Raspberry Pi, 12 servos) walks across a flat surface in MuJoCo simulation. A light source (yellow dot on the mini-map) is placed ahead and to the side. The dog detects the light gradient and steers toward it using a VOR (Vestibulo-Ocular Response) reflex — a hardwired brainstem circuit that turns the body toward a visual target.

The mini-map in the bottom-left corner shows the robot’s trail (green line) and the light waypoint (yellow glow). You can see the dog arcing toward the light rather than walking straight. This arc is not a software limitation — it reflects the physical turning radius of the Freenove’s CPG gait, which produces roughly 12% steering asymmetry between left and right legs.

Hardwired vs. Learned: The Biological Design

MH-FLOCKE follows the same principle as biological motor development. A newborn puppy doesn’t learn to walk from scratch — its spinal cord has CPG circuits that produce rhythmic leg movements from birth. The cerebellum calibrates these movements through experience. The brainstem provides reflexes like the VOR. Learning refines what reflexes provide.

Hardwired components (present from “birth”):

CPG (Central Pattern Generator) — Mathematical oscillator producing rhythmic gait. The SNN does not generate the gait pattern; CPG provides the baseline.
VOR (Vestibulo-Ocular Response) — Reflexive steering toward the light target. Hardwired, like in a real animal’s superior colliculus.
Run-and-Tumble — A bacterial-inspired navigation state machine (Berg & Brown, 1972). Alternates between running straight and turning (tumbling) when the gradient changes.
Spinal reflexes — Righting reflex, cross-extension reflex, terrain compensation.

Learned through training (emerges from experience):

SNN weights (R-STDP) — Reward-modulated spike-timing-dependent plasticity adapts 560 neuron connections based on intrinsic reward (vestibular comfort, prediction error, curiosity).
Cerebellar correction (Marr-Albus-Ito) — The cerebellum learns forward-model corrections. Correction magnitude grows from 0.0006 to 0.034 over training — the strongest cerebellar signal ever measured in MH-FLOCKE.
CPG-to-SNN handoff — The CPG starts at 90% control and fades to ~45% as the SNN proves it can maintain stable locomotion. The SNN earns control through competence, not through a timer.

The Numbers

33,000 steps, 9.7 minutes training time (57 sps on CPU)
0 falls, perfect upright streak
2 light targets reached (sf:2) through active VOR-guided steering
VOR signal up to +0.54 — strong, sustained steering toward the light
4 Run-and-Tumble events — the navigation state machine triggered naturally
Cerebellar correction: 0.008 — real Marr-Albus-Ito learning

Why the Dog Arcs Around the Light

You’ll notice the dog doesn’t walk straight to the light — it takes a wide arc. This is not a bug. The Freenove’s CPG produces approximately 12% amplitude asymmetry between left and right legs when steering. This gives the robot a turning radius of roughly 5 meters. The VOR reflex fires correctly, and the CPG responds — but the body can only turn as fast as the legs allow.

This is exactly what happens with real quadrupeds. A horse can’t make the same tight turns as a cat. The steering intention is there; the biomechanics set the limit.

Performance Breakthrough: 6× Speedup

This session also resolved a critical performance bug. Step-time was growing from 20ms to 800ms over 100k steps — making long training runs impossible. The root cause: an O(N²) clustering operation in the Synaptogenesis module that processed 5,000 accumulated experience patterns without clearing the buffer.

The fix (buffer.clear() after consolidation + max_size reduction) brought step-time back to a stable 18ms across 100k steps. Training speed went from 7 sps to 54 sps — a 6× improvement that makes all future development viable.

What’s Next

Hardware phototaxis — The same VOR steering with a real camera (cv2) on the Freenove, following a flashlight on the floor.
Autonomous loop — Instead of pre-placed waypoints, the dog chooses its own targets based on curiosity, exploration drive, and episodic memory. All the modules exist; they need a conductor.
Paper 2 — Sim-to-real transfer on the Freenove Robot Dog. Published: aiXiv 260409.000002.

MH-FLOCKE is an open-source project by Marc Hesse — independent researcher, Potsdam, Germany. Named after Flocke, my late dog.

Code: github.com/MarcHesse/mhflocke (Apache 2.0)
Paper: aiXiv Preprint

The Mogli Oscillator: When Your Robot Dog Gets a Real Spine

Today I replaced the mathematical heart of MH-FLOCKE’s locomotion system. The old Central Pattern Generator — a set of sine and cosine functions that produced smooth, predictable gait patterns — is gone. In its place: 24 Izhikevich spiking neurons that fight each other to make a robot dog walk.

What Changed

Every vertebrate animal has a spinal CPG — a neural circuit in the spinal cord that produces rhythmic locomotion patterns without input from the brain. Thomas Graham Brown discovered this in 1911: take two neurons, connect them with mutual inhibition, add intrinsic adaptation, and you get alternating rhythm. Flexor fires, inhibits extensor. Flexor fatigues, extensor takes over. Repeat forever.

The Mogli Oscillator implements exactly this. Each of the robot’s 12 joints gets a pair of Izhikevich neurons — one flexor, one extensor — that alternate through adaptation-driven switching. Four legs × 3 joints = 24 neurons total.

The legs are coupled through interneurons: left-right pairs inhibit each other (alternation), diagonal pairs excite each other (walk gait synchronization). The coupling topology determines the gait — and because the coupling weights are stored in a learnable matrix, they can adapt through R-STDP.

Why It Matters

The old sine/cosine CPG was a placeholder. It worked — the Freenove robot walked, the Go2 ran 45 meters — but it had fundamental limitations:

No per-leg independence. All legs followed the same global clock. If one leg needed to step differently — obstacle, slope, missing leg — the CPG couldn’t accommodate it.

No real turning. Steering was a hack: offset the abduction angle. The Mogli Oscillator turns by giving one side more tonic drive — longer steps on the outside, shorter on the inside. Exactly how animals turn.

No gait transitions. Walk, trot, gallop require different phase relationships between legs. The old CPG had fixed phase offsets. The Mogli Oscillator’s coupling weights can shift to produce any gait.

No fault tolerance. If a leg breaks, the old CPG keeps sending signals to all four legs. The Mogli Oscillator can reorganize: the remaining three oscillators find a new stable pattern through R-STDP. Like a real dog learning to walk on three legs.

Adaptive Gain: The Cautious Newborn

The output gain is not hardcoded — it develops. The oscillator starts with a gain of 3.0 (tiny, cautious movements) and ramps to 8.0 over the first 2000 training steps. This mirrors biological development: serotonin (5-HT) from the brainstem’s raphe nuclei gradually increases motor neuron excitability over the first postnatal days. A puppy’s first steps are small and wobbly — not because it’s broken, but because the gain hasn’t matured yet.

CPG Autonomy: The Decerebrate Cat

An interesting finding during development: when the behavior planner switched to “look around” or “alert” behaviors (which reduce movement amplitude), the Mogli Oscillator stalled completely. The robot stopped walking.

This is biologically wrong. Decerebrate cats — cats with the brain disconnected from the spinal cord — still walk on a treadmill. The spinal CPG is autonomous. The cortex modulates locomotion but cannot silence it. Only the basal ganglia can actively suppress locomotion, and I haven’t implemented that yet.

The fix: a CPG autonomy floor of 70%. The behavior planner can slow down the robot, but the spinal rhythm continues. This is not a hack — it’s an anatomical property of the spinal cord.

First Results

The Mogli Oscillator walks forward, maintains perfect balance (0 falls in 50,000 steps), and the SNN actor takes over faster than with the old CPG (65% actor competence vs 11%). The walk gait emerges from the coupling topology:

Left↔Right correlation: -0.78 (alternating)
Diagonal correlation: +0.73 (synchronized)
Exactly the pattern of a walking dog.

Metric	Mogli Oscillator	Mathematical CPG
Distance (50k steps)	1.21 m	8.2 m
Falls	0	0
Actor Competence	0.649	0.108
CPG Weight	58%	85%
Upright Streak	50,000	50,000

Distance is lower because the system prioritizes stability during learning — exactly what a biological newborn does. The actor learns 6× faster because the Mogli Oscillator provides richer, more variable input signals than the predictable sine wave.

Nothing Is Faked

Everything you see in the video is the real output of neuronal dynamics. No sine functions. No hardcoded trajectories. The 24 Izhikevich neurons fire, inhibit each other, and produce a walking pattern through the same mechanism that Thomas Graham Brown described 115 years ago. The wackiness you see is real — it’s not a smooth sine wave anymore, it’s spike-driven motor commands. Like a real animal.

What’s Next

R-STDP coupling learning — the robot learns to adjust its own gait pattern through reward-modulated plasticity. The weight matrix is prepared, the eligibility traces exist. Activation is next.

Limb loss compensation — simulate a broken leg in MuJoCo, watch the remaining oscillators reorganize. A robot dog that loses a leg and teaches itself to walk again — like a real animal. No retraining needed.

Hardware deployment — the Mogli Oscillator runs at 28ms/step in simulation. The Freenove Pi runs at 25ms/step. Same code, same architecture. The sim-to-real gap is bridged by on-device R-STDP learning.

The Mogli Oscillator is named after my current test pilot — a dog who doesn’t know he inspired a neural architecture.

Code: github.com/MarcHesse/mhflocke (Apache 2.0) — enable with --neural-cpg flag
Video: YouTube @mhflocke
Paper 1 — Architecture & Ablation: doi.org/10.5281/zenodo.19336894
Paper 2 — Sim-to-Real Transfer: doi.org/10.5281/zenodo.19481146

v0.4.3: The Wall — When a Spiking Neural Network Learns to Stop

A dog doesn’t need to crash into a wall twice. The first bump tells the whiskers something is wrong, the brainstem slams the brakes, and the cerebellum remembers: next time, slow down earlier.

Today’s update teaches the Freenove robot dog the same lesson — using the same biological architecture.

The Experiment

The setup is simple: a small quadruped robot, 232 spiking neurons, a wall 80cm ahead. No pre-programming, no path planner — just a single binary signal: “hitting the wall is bad.” The question: can a biologically grounded spiking neural network learn active behavior change from a clear binary signal?

The answer, after 9 bug fixes and 8 training runs: yes.

What I Found (and What Broke Along the Way)

The first seven runs produced corrections of exactly 0.0000. The cerebellum was learning — PF→PkC weights were growing — but the corrections never reached the motors. Three fundamental architecture bugs were hiding in the pipeline:

Bug 1: The DCN was deaf. Deep Cerebellar Nuclei compute motor corrections as the difference between “push” and “pull” populations. But the DCN was reading Purkinje cell spike activity — an exponential moving average that was essentially zero because PkC rarely fire discrete spikes. The fix: read the graded compartment state (apical voltage + dendritic calcium), not just spikes. This is more biologically accurate — PkC→DCN synapses show graded GABAergic release proportional to membrane potential.

Bug 2: Symmetric climbing fibers. When the robot hit the wall, the Inferior Olive sent identical error signals to both push and pull Purkinje cells. Identical calcium → identical DCN inhibition → push minus pull = zero → corrections = zero. Always. The fix: asymmetric CF — push PkC get strong CF (0.9), pull PkC get weak CF (0.1). Biology does the same thing: when an animal hits an obstacle, the correction is to reduce forward drive and increase braking.

Bug 3: Weighted blending killed corrections. The CPG-SNN blend was cpg × weight + correction × (1 - weight). With CPG at 90%, corrections were multiplied by 0.1 — a 10× attenuation. The fix: additive blending — cpg × weight + correction. The cerebellum modulates the CPG via the reticulospinal tract; it doesn’t compete with it.

The Breakthrough: Run 8

After fixing all three bugs, Run 8 showed something I’d never seen before:

Corrections alive: 0.001–0.012 (was 0.0000 in all previous runs)
6 wall collisions in 20,000 steps (episodic learning working)
Actor competence: 0.000 → 0.299 (first non-zero ever)
CPG weight: 90% → 55% (first handoff ever)

The robot walks toward the wall. The ultrasonic sensor (Channel 18) fires. DA drops from 0.22 to 0.05. The obstacle climbing fiber activates asymmetrically. The cerebellum learns. After each collision, the robot is reset to the start — like a puppy’s owner picking it up after it bumps into furniture.

The Architecture

The obstacle avoidance system adds three new components to MH-FLOCKE’s biological stack:

Ultrasonic Sensor (Channel 18): Simulated HC-SR04 rangefinder in MuJoCo, real HC-SR04 on the Raspberry Pi. Same encoding in both: nonlinear proximity mapping (√ function for urgency). The sensor channel is identical between simulation and hardware, enabling direct brain transfer.

Obstacle Climbing Fiber: Three zones with asymmetric error signals — COLLISION (<10cm, strong CF on push PkC), DANGER (<30cm, graded asymmetric CF), WARNING (<80cm, hip yaw CF for turning).

Trigeminal Brake: A hardwired reflex that reduces CPG amplitude near obstacles. Without this, the CPG at 90% overpowers any cerebellar correction. The brake creates space for learning.

What This Means

This is the first time in MH-FLOCKE’s history that the SNN has produced non-zero motor corrections that actually changed the robot’s behavior. The CPG handoff — from 90% to 55% — means the spiking neural network is taking over motor control from the innate rhythm generator.

The graded DCN fix alone improves every training run, not just obstacle scenes. Any user cloning the repo now gets a cerebellum that actually works.

Try It

git clone https://github.com/MarcHesse/mhflocke
cd mhflocke

# Obstacle avoidance (Freenove, 20k steps)
python scripts/train_v032.py --creature-name freenove \
  --scene "walk toward wall" --steps 20000 \
  --no-terrain --no-sensory --no-vision --hardware-sensors \
  --auto-reset 500 --fresh

# Normal walking (Go2, flat)
python scripts/train_v032.py --creature-name go2 \
  --scene "walk on flat meadow" --steps 50000 --no-terrain

What’s Next

The 50k run with episodic wall training, the same experiment on the Go2 (4,500 neurons), and eventually on the real Freenove hardware with the HC-SR04 sensor. The wall is just the beginning — the architecture now supports any binary sensory signal → cerebellar correction loop.

Named after Flocke — my dog who never needed 9 bug fixes to avoid a wall.

Ball Contact — What 4 Changes Made It Work

Dev Log #1 · March 22, 2026 · MH-FLOCKE Level 15 v0.4.x

For weeks, the dog walked beautifully but ignored the ball completely. It would stroll past it, around it, occasionally bump into it by accident — but never pursue it. The spiking neural network was learning to walk. It just had no reason to care about a red sphere sitting on the grass.

Then, in a single 100k-step training run, everything changed. The Go2 quadruped turned toward the ball, approached it deliberately, and made contact — 294 frames of sustained ball interaction, with a minimum distance of 0.8 centimeters.

No hardcoded “go to ball” command. Four architectural changes made a biologically grounded system do something that PPO with dense rewards still struggles with.

Here’s what happened.

The Problem: Walking Without Purpose

MH-FLOCKE’s brain runs a 15-step cognitive cycle every simulation tick. Spiking neurons fire. The cerebellum predicts motor outcomes. Central pattern generators produce rhythmic gaits. Neuromodulators shift between exploration and exploitation.

But all of this was happening in a closed loop. The SNN received sensory input that included ball distance and angle — the information was there. The network just had no gradient to follow. Ball distance was one of 80+ input dimensions, buried in proprioceptive noise. The R-STDP learning rule couldn’t distinguish “getting closer to the ball” from random fluctuation.

The system needed a way to feel that the ball matters.

Change 1: Task-Specific Prediction Error

Instead of using a generic reward signal, I introduced a task-specific prediction error (TPE) that directly encodes “how far am I from where I should be”:

TPE = (ball_dist - 3.0) / 3.0

When the dog is 3 meters from the ball, TPE is 0 — neutral. Closer than 3 meters, TPE goes negative — the world is better than expected. Further away, TPE grows positive — something is wrong.

This is not a reward. It’s a prediction error in the Free Energy sense: the system expects to be near the ball (because that’s where interesting things happen), and any deviation from that expectation creates a signal to act.

The critical difference from a generic dense reward: TPE doesn’t tell the dog what to do. It tells the dog how surprised it should be.

Change 2: Vision Boost

The TPE signal alone wasn’t enough. The SNN has 80+ input neurons, and the ball-related inputs (distance, angle) were getting drowned out by proprioceptive signals — joint angles, velocities, IMU readings. The network couldn’t hear the ball over the noise of its own body.

The fix: when TPE exceeds a threshold (0.05), the last 16 input neurons — the ones carrying sensory/environmental information — get amplified by TPE × 0.5. Higher prediction error means louder sensory input.

This mirrors how biological attention works: when something is unexpected, sensory cortex activity increases. The salience of the stimulus goes up proportional to how wrong your predictions are.

The effect was immediate. The SNN started responding to ball distance changes within the first 10k steps.

Change 3: R-STDP Sign Fix

This was the most embarrassing bug. The R-STDP learning rule combines reward and prediction error:

combined = 0.1 × reward + 0.9 × (−PE)

The minus sign on PE is critical. When the dog approaches the ball, PE decreases (less surprise). The negative of a decreasing value is positive — which means approaching creates positive reinforcement for the synapses that were active during that movement.

The original code had the sign flipped. Approaching the ball was punishing the very synapses that caused the approach. The SNN was literally learning to avoid the ball.

One minus sign. Weeks of debugging.

Change 4: Ball Curriculum

Even with correct gradients, dropping a ball 3 meters away at a random angle is too hard for a system that just learned to walk. The solution: a 5-stage curriculum.

Stage 1 starts the ball at 1.5 meters, directly ahead (0° angle). The dog barely has to turn — just walk forward. When ball_dist_min drops below 0.5 meters, the curriculum advances.

Each stage increases distance and angle: (1.5m, 0°) → (2.0m, 17°) → (2.5m, 23°) → (2.7m, 28°) → (3.0m, 34°).

In the 100k-step run, the dog advanced through two stages. It mastered straight-ahead approach, then learned to turn slightly before approaching. The curriculum let the SNN build on what it already knew.

Results

The numbers from the run:

0.8 cm minimum ball distance — the dog essentially touched it
294 contact frames — sustained interaction, not a single bump
0 falls in 100k steps — stable locomotion throughout
47 ball contact episodes across 5 curriculum stages
CPG at 40% — the dog was trotting, not sprinting

The 10-seed ablation study confirmed this wasn’t a fluke. Configuration B (SNN + Cerebellum) outperforms the PPO baseline by 3.5× on ball approach metrics, with significantly lower variance.

What This Means

This is not a robot dog playing fetch. It’s a proof of concept for something deeper: a biologically grounded system that develops goal-directed behavior through prediction error minimization, not through reward engineering.

The dog doesn’t get a treat for touching the ball. It touches the ball because touching the ball reduces prediction error. The ball is interesting because the system expects it to be interesting — and the Free Energy framework turns that expectation into action.

Four changes. One minus sign. A robot dog that learned to care about a ball.

MH-FLOCKE is an independent research project by Marc Hesse in Potsdam, Germany. The system runs on a Unitree Go2 quadruped in MuJoCo simulation, using spiking neural networks, a cerebellar forward model, and central pattern generators.

Watch the full run: YouTube Video #3 · Read the paper: aiXiv