Development Roadmap

The journey from research paper to production-ready tool, and what's coming next.

Phase 1: Research Implementation

Completed

January 2026

  • ✅ Studied Google Research paper
  • ✅ Implemented core pyramid algorithm
  • ✅ Initial performance validation
  • ✅ Proof of concept on M3 hardware

Phase 2: 2-Model System

Completed

February 2026

  • ✅ Built traditional 2-model speculative decoding
  • ✅ Achieved 1.5x speedup baseline
  • ✅ Integrated with OpenClaw
  • ✅ Memory optimization

Phase 3: 3-Model Pyramid

Completed

March 2026

  • ✅ Added intermediate verification tier
  • ✅ Achieved 1.97x speedup
  • ✅ Smart KV cache sharing
  • ✅ Graceful fallback system

Phase 4: Production Ready

v1.0.0 Released

March 19, 2026

  • ✅ Prometheus metrics integration
  • ✅ Grafana dashboards
  • ✅ Comprehensive error handling
  • ✅ Documentation and tutorials
  • ✅ Public release

Q2 2026: Extended Model Support

In Progress

April - June 2026

  • 🔄 Support for Llama, Mistral, and Qwen models
  • 🔄 Custom model configuration
  • 🔄 Automatic tier selection
  • ⏳ Cross-model family cascades

Q3 2026: Performance Optimizations

Planned

July - September 2026

  • ⏳ Neural Engine acceleration
  • ⏳ Dynamic batch sizing
  • ⏳ 4-model pyramid experiments
  • ⏳ Target: 3x speedup

Q4 2026: Enterprise Features

Planned

October - December 2026

  • ⏳ Multi-GPU support
  • ⏳ Distributed inference
  • ⏳ Enterprise monitoring
  • ⏳ SLA guarantees

Status Legend

Completed
In Progress
Planned
Done

Shape the Future

Have ideas for momo-kibidango? We'd love to hear them! Join the discussion on GitHub Discussions or share your use cases in our Discord community.