My Robot Dog Couldn’t Walk Straight — 8 Bugs and a New Steering System

The Freenove robot dog has a problem. It drifts to the right. Every unit is different — servo tolerances, weight distribution, surface friction — and no amount of calibration fixes it permanently. The drift changes with battery level, temperature, and floor surface.

For weeks, I tried to fix this with Z-offset steering — shifting all four feet sideways to create a turning moment. It seemed logical. It was useless. A 45-second hardware test proved it: ±5mm of Z-offset produces less than 5 degrees of effect against 70 degrees of mechanical drift. One measurement killed weeks of assumptions.

Tank Steering

The replacement is asymmetric stride — differential hip amplitude between left and right legs. Left legs take a longer step, the dog curves right. Like a tank. Biology does this through the reticulospinal tract, which modulates stride length independently per side.

Hardware test: drift reduced from 70 degrees to 8.5 degrees. Three times more effective than Z-offset, and it works on any surface because the IMU provides closed-loop feedback.

The PID controller reads the actual heading from the MPU6050 IMU, compares it to the target heading from the camera (where the light is), and drives the stride asymmetry. No calibration, no per-robot tuning. The I-term accumulates over time to eliminate steady-state offset — exactly what the cerebellum does biologically through long-term depression.

Eight Bugs in One Session

The steering replacement exposed eight bugs that had been hiding in the system:

The steering signal was computed correctly but silently dropped in compute_tendon() — wrong code path, one line fix. The log was showing a proxy value instead of the actual steering — the controller had been working the whole time, but the display said zero. The competence gate required walking speed above 0.03 m/s, but with drift, all locomotion energy went to correction instead of forward progress. The baby never grew up. MuJoCo yaw convention is inverted versus hardware — one minus sign. A threshold prevented target updates when the dog was roughly aimed at the light. The PD controller initialized its target to zero instead of the current heading.

Each one of these individually prevented the system from working. Finding them required switching between simulation and real hardware, comparing signs and values, and measuring instead of guessing.

The Dog Approaches the Light

After fixing all eight bugs and tuning the PID on hardware (Kp=0.05, Ki=0.01, Kd=0.015), the Freenove robot approaches a light source from 0.52 meters to 0.17 meters in 60 seconds. Not perfect tracking — the steering saturates near the end — but genuine, IMU-corrected, drift-compensated navigation on a 100-euro robot kit.

In simulation with measured hardware drift injected: 50,000 steps, zero falls, three light targets found, actor competence 1.0.

Meta-Learning Loop

This release also introduces the complete autonomous meta-learning loop — four modules that form a closed self-improvement cycle:

EpisodeAnalyzer compares successful versus unsuccessful navigation events and identifies what makes the dog successful. Which context variables (gait quality, heading error, velocity, steering offset) correlate with finding the light?

StrategyAdapter converts those insights into parameter adjustments — modifying run/tumble duration, PID gains, and exploration bias.

CuriosityExplorer uses world model prediction error to drive exploration. High prediction error means unfamiliar territory — explore more. Low prediction error means familiar ground — exploit what works.

HypothesisGenerator creates testable motor hypotheses from insights that can be tested autonomously through the existing Directed Learning module.

The loop runs but has not generated insights yet — the dog is too successful in the current scenario (100% success rate, no failures to learn from). Harder scenarios and longer runs will activate it.

Hardware Drift Simulation

The drift profile injected into the simulator has been updated. The previous measurement (-0.4 deg/s) was taken during a stationary test. Under walking load, the actual drift is 1.5 to 2.0 deg/s — servo asymmetry amplifies under dynamic conditions. The updated profile makes simulation training more realistic.

Every Freenove unit has different drift characteristics. MH-FLOCKE handles this automatically through the PID controller. No manual calibration needed.

What Comes Next

Hardware video with the new PID steering. Longer simulation runs to activate the meta-learning loop. A potential third robot platform (Petoi Bittle X V2) to prove the architecture is body-agnostic.

The code is on GitHub (Apache 2.0). Updated documentation at mhflocke.com/docs.

v0.5.1 — Full changelog

v0.4.3: The Wall — When a Spiking Neural Network Learns to Stop

A dog doesn’t need to crash into a wall twice. The first bump tells the whiskers something is wrong, the brainstem slams the brakes, and the cerebellum remembers: next time, slow down earlier.

Today’s update teaches the Freenove robot dog the same lesson — using the same biological architecture.

The Experiment

The setup is simple: a small quadruped robot, 232 spiking neurons, a wall 80cm ahead. No pre-programming, no path planner — just a single binary signal: “hitting the wall is bad.” The question: can a biologically grounded spiking neural network learn active behavior change from a clear binary signal?

The answer, after 9 bug fixes and 8 training runs: yes.

What I Found (and What Broke Along the Way)

The first seven runs produced corrections of exactly 0.0000. The cerebellum was learning — PF→PkC weights were growing — but the corrections never reached the motors. Three fundamental architecture bugs were hiding in the pipeline:

Bug 1: The DCN was deaf. Deep Cerebellar Nuclei compute motor corrections as the difference between “push” and “pull” populations. But the DCN was reading Purkinje cell spike activity — an exponential moving average that was essentially zero because PkC rarely fire discrete spikes. The fix: read the graded compartment state (apical voltage + dendritic calcium), not just spikes. This is more biologically accurate — PkC→DCN synapses show graded GABAergic release proportional to membrane potential.

Bug 2: Symmetric climbing fibers. When the robot hit the wall, the Inferior Olive sent identical error signals to both push and pull Purkinje cells. Identical calcium → identical DCN inhibition → push minus pull = zero → corrections = zero. Always. The fix: asymmetric CF — push PkC get strong CF (0.9), pull PkC get weak CF (0.1). Biology does the same thing: when an animal hits an obstacle, the correction is to reduce forward drive and increase braking.

Bug 3: Weighted blending killed corrections. The CPG-SNN blend was cpg × weight + correction × (1 - weight). With CPG at 90%, corrections were multiplied by 0.1 — a 10× attenuation. The fix: additive blending — cpg × weight + correction. The cerebellum modulates the CPG via the reticulospinal tract; it doesn’t compete with it.

The Breakthrough: Run 8

After fixing all three bugs, Run 8 showed something I’d never seen before:

Corrections alive: 0.001–0.012 (was 0.0000 in all previous runs)
6 wall collisions in 20,000 steps (episodic learning working)
Actor competence: 0.000 → 0.299 (first non-zero ever)
CPG weight: 90% → 55% (first handoff ever)

The robot walks toward the wall. The ultrasonic sensor (Channel 18) fires. DA drops from 0.22 to 0.05. The obstacle climbing fiber activates asymmetrically. The cerebellum learns. After each collision, the robot is reset to the start — like a puppy’s owner picking it up after it bumps into furniture.

The Architecture

The obstacle avoidance system adds three new components to MH-FLOCKE’s biological stack:

Ultrasonic Sensor (Channel 18): Simulated HC-SR04 rangefinder in MuJoCo, real HC-SR04 on the Raspberry Pi. Same encoding in both: nonlinear proximity mapping (√ function for urgency). The sensor channel is identical between simulation and hardware, enabling direct brain transfer.

Obstacle Climbing Fiber: Three zones with asymmetric error signals — COLLISION (<10cm, strong CF on push PkC), DANGER (<30cm, graded asymmetric CF), WARNING (<80cm, hip yaw CF for turning).

Trigeminal Brake: A hardwired reflex that reduces CPG amplitude near obstacles. Without this, the CPG at 90% overpowers any cerebellar correction. The brake creates space for learning.

What This Means

This is the first time in MH-FLOCKE’s history that the SNN has produced non-zero motor corrections that actually changed the robot’s behavior. The CPG handoff — from 90% to 55% — means the spiking neural network is taking over motor control from the innate rhythm generator.

The graded DCN fix alone improves every training run, not just obstacle scenes. Any user cloning the repo now gets a cerebellum that actually works.

Try It

git clone https://github.com/MarcHesse/mhflocke
cd mhflocke

# Obstacle avoidance (Freenove, 20k steps)
python scripts/train_v032.py --creature-name freenove \
  --scene "walk toward wall" --steps 20000 \
  --no-terrain --no-sensory --no-vision --hardware-sensors \
  --auto-reset 500 --fresh

# Normal walking (Go2, flat)
python scripts/train_v032.py --creature-name go2 \
  --scene "walk on flat meadow" --steps 50000 --no-terrain

What’s Next

The 50k run with episodic wall training, the same experiment on the Go2 (4,500 neurons), and eventually on the real Freenove hardware with the HC-SR04 sensor. The wall is just the beginning — the architecture now supports any binary sensory signal → cerebellar correction loop.

Named after Flocke — my dog who never needed 9 bug fixes to avoid a wall.

Seeing the Brain Think: Population-Aware SNN Visualization

The Brain3D visualization in MH-FLOCKE’s rendered videos now shows the actual cerebellar architecture — not an abstract network graph, but the real populations with their correct sizes and live spike activity from training data.

Six Populations, One Brain

The cerebellar SNN in MH-FLOCKE is organized into six biologically inspired populations, each with a distinct computational role:

Mossy Fibers (MF) — sensory input from proprioception, CPG phase, and IMU
Granule Cells (GrC) — sparse expansion layer, the largest population
Golgi Cells (GoC) — inhibitory feedback, regulating granule cell activity
Purkinje Cells (PkC) — the main learning substrate, driven by climbing fiber error
Deep Cerebellar Nuclei (DCN) — motor correction output
Output (OUT) — final motor commands to actuators

The Brain3D visualization now renders each population at its correct size. For the Freenove Robot Dog: 48 MF, 106 GrC, 18 GoC, 24 PkC, 24 DCN, 12 OUT = 232 total. For the Unitree Go2: 304 MF, 4000 GrC, 200 GoC, 24 PkC, 24 DCN, 72 OUT = 4,624 total.

Data-Driven, Not Decorative

Every aspect of the visualization is driven by actual training data stored in the FLOG (training log). The population sizes come from the FLOG metadata — written by the training script from the live SNN topology. Spike activity in the rendered video comes from the FLOG’s spike data recorded at each training step.

The FLOG metadata now includes a population_sizes dictionary that captures the exact neuron count per population. Both the Freenove and Go2 renderers read this data and pass it to the Brain3D overlay.

Scalable Architecture

The same cerebellar architecture scales from 232 neurons (Freenove, Raspberry Pi) to 4,624 neurons (Go2, desktop GPU). The topology.py module computes population sizes proportionally: for small networks, the granule cell layer shrinks but the architecture is preserved. Like a mouse cerebellum versus an elephant cerebellum — same cell types, same connectivity, different scale.

This scaling is what makes sim-to-real transfer possible: the Freenove brain is structurally identical to the Go2 brain, just smaller. A brain trained on one platform could theoretically transfer to the other with topology adaptation.

The source code and rendering pipeline are available on GitHub.

From Simulation to Walking Robot: MH-FLOCKE on Real Hardware

MH-FLOCKE now runs on a real robot. A Freenove Robot Dog Kit (~100€) with a Raspberry Pi 4 executes the same spiking neural network that learns to walk in the MuJoCo simulator — same code, same weights, same cerebellar architecture.

One Codebase, Two Platforms

The key design decision: the Pi imports src/brain/ directly. There is no separate hardware implementation, no NumPy approximation, no simplified model. The SNNController and CerebellarLearning classes run identically on both platforms. A brain file (brain.pt) trained in simulation loads on the Pi without conversion.

This was made possible by topology.py, a new module that computes cerebellar population sizes without any MuJoCo dependency. Both the simulator’s MuJoCoCreatureBuilder and the Pi’s freenove_bridge.py call the same function.

What Runs on the Pi

The Freenove profile uses 232 neurons — scaled down from the Go2’s 4,624 but with the same cerebellar architecture:

48 mossy fiber inputs (12 servo + 2 CPG + 4 IMU + padding)
106 granule cells (expansion layer)
18 Golgi cells (inhibitory feedback)
24 Purkinje cells (2 per actuator)
24 DCN neurons (motor correction output)
12 output neurons (one per servo)

At 34ms per step (29Hz), the control loop runs fast enough for stable walking. The cerebellar climbing fiber responds to real IMU orientation errors from the MPU6050, and PF→PkC weights grow from 0.078 to 0.114 over a typical session — the cerebellum is learning on real hardware.

The Live Dashboard

A web dashboard on port 8080 shows what the SNN is doing in real time: all six cerebellar populations with live spike activity, servo angles, the competence gate balance, and neuromodulation levels. Every data point comes directly from the running PyTorch SNN.

Try It Yourself

The complete deployment guide, source code, and servo configuration are on GitHub. The Freenove FNK0050 kit costs about 100€ and requires a Raspberry Pi 4 with 2GB+ RAM. PyTorch runs CPU-only on the Pi — no GPU needed.

Watch the demo video →

Ball Contact — What 4 Changes Made It Work

Dev Log #1 · March 22, 2026 · MH-FLOCKE Level 15 v0.4.x

For weeks, the dog walked beautifully but ignored the ball completely. It would stroll past it, around it, occasionally bump into it by accident — but never pursue it. The spiking neural network was learning to walk. It just had no reason to care about a red sphere sitting on the grass.

Then, in a single 100k-step training run, everything changed. The Go2 quadruped turned toward the ball, approached it deliberately, and made contact — 294 frames of sustained ball interaction, with a minimum distance of 0.8 centimeters.

No hardcoded “go to ball” command. Four architectural changes made a biologically grounded system do something that PPO with dense rewards still struggles with.

Here’s what happened.

The Problem: Walking Without Purpose

MH-FLOCKE’s brain runs a 15-step cognitive cycle every simulation tick. Spiking neurons fire. The cerebellum predicts motor outcomes. Central pattern generators produce rhythmic gaits. Neuromodulators shift between exploration and exploitation.

But all of this was happening in a closed loop. The SNN received sensory input that included ball distance and angle — the information was there. The network just had no gradient to follow. Ball distance was one of 80+ input dimensions, buried in proprioceptive noise. The R-STDP learning rule couldn’t distinguish “getting closer to the ball” from random fluctuation.

The system needed a way to feel that the ball matters.

Change 1: Task-Specific Prediction Error

Instead of using a generic reward signal, I introduced a task-specific prediction error (TPE) that directly encodes “how far am I from where I should be”:

TPE = (ball_dist - 3.0) / 3.0

When the dog is 3 meters from the ball, TPE is 0 — neutral. Closer than 3 meters, TPE goes negative — the world is better than expected. Further away, TPE grows positive — something is wrong.

This is not a reward. It’s a prediction error in the Free Energy sense: the system expects to be near the ball (because that’s where interesting things happen), and any deviation from that expectation creates a signal to act.

The critical difference from a generic dense reward: TPE doesn’t tell the dog what to do. It tells the dog how surprised it should be.

Change 2: Vision Boost

The TPE signal alone wasn’t enough. The SNN has 80+ input neurons, and the ball-related inputs (distance, angle) were getting drowned out by proprioceptive signals — joint angles, velocities, IMU readings. The network couldn’t hear the ball over the noise of its own body.

The fix: when TPE exceeds a threshold (0.05), the last 16 input neurons — the ones carrying sensory/environmental information — get amplified by TPE × 0.5. Higher prediction error means louder sensory input.

This mirrors how biological attention works: when something is unexpected, sensory cortex activity increases. The salience of the stimulus goes up proportional to how wrong your predictions are.

The effect was immediate. The SNN started responding to ball distance changes within the first 10k steps.

Change 3: R-STDP Sign Fix

This was the most embarrassing bug. The R-STDP learning rule combines reward and prediction error:

combined = 0.1 × reward + 0.9 × (−PE)

The minus sign on PE is critical. When the dog approaches the ball, PE decreases (less surprise). The negative of a decreasing value is positive — which means approaching creates positive reinforcement for the synapses that were active during that movement.

The original code had the sign flipped. Approaching the ball was punishing the very synapses that caused the approach. The SNN was literally learning to avoid the ball.

One minus sign. Weeks of debugging.

Change 4: Ball Curriculum

Even with correct gradients, dropping a ball 3 meters away at a random angle is too hard for a system that just learned to walk. The solution: a 5-stage curriculum.

Stage 1 starts the ball at 1.5 meters, directly ahead (0° angle). The dog barely has to turn — just walk forward. When ball_dist_min drops below 0.5 meters, the curriculum advances.

Each stage increases distance and angle: (1.5m, 0°) → (2.0m, 17°) → (2.5m, 23°) → (2.7m, 28°) → (3.0m, 34°).

In the 100k-step run, the dog advanced through two stages. It mastered straight-ahead approach, then learned to turn slightly before approaching. The curriculum let the SNN build on what it already knew.

Results

The numbers from the run:

0.8 cm minimum ball distance — the dog essentially touched it
294 contact frames — sustained interaction, not a single bump
0 falls in 100k steps — stable locomotion throughout
47 ball contact episodes across 5 curriculum stages
CPG at 40% — the dog was trotting, not sprinting

The 10-seed ablation study confirmed this wasn’t a fluke. Configuration B (SNN + Cerebellum) outperforms the PPO baseline by 3.5× on ball approach metrics, with significantly lower variance.

What This Means

This is not a robot dog playing fetch. It’s a proof of concept for something deeper: a biologically grounded system that develops goal-directed behavior through prediction error minimization, not through reward engineering.

The dog doesn’t get a treat for touching the ball. It touches the ball because touching the ball reduces prediction error. The ball is interesting because the system expects it to be interesting — and the Free Energy framework turns that expectation into action.

Four changes. One minus sign. A robot dog that learned to care about a ball.

MH-FLOCKE is an independent research project by Marc Hesse in Potsdam, Germany. The system runs on a Unitree Go2 quadruped in MuJoCo simulation, using spiking neural networks, a cerebellar forward model, and central pattern generators.

Watch the full run: YouTube Video #3 · Read the paper: aiXiv