Development Roadmap

The journey from research paper to production-ready tool, and what's coming next.

Phase 1: Research Implementation

Completed

January 2026

✅ Studied Google Research paper
✅ Implemented core pyramid algorithm
✅ Initial performance validation
✅ Proof of concept on M3 hardware

Phase 2: 2-Model System

Completed

February 2026

✅ Built traditional 2-model speculative decoding
✅ Achieved 1.5x speedup baseline
✅ Integrated with OpenClaw
✅ Memory optimization

Phase 3: 3-Model Pyramid

Completed

March 2026

✅ Added intermediate verification tier
✅ Achieved 1.97x speedup
✅ Smart KV cache sharing
✅ Graceful fallback system

Phase 4: Production Ready

v1.0.0 Released

March 19, 2026

✅ Prometheus metrics integration
✅ Grafana dashboards
✅ Comprehensive error handling
✅ Documentation and tutorials
✅ Public release

Q2 2026: Extended Model Support

In Progress

April - June 2026

🔄 Support for Llama, Mistral, and Qwen models
🔄 Custom model configuration
🔄 Automatic tier selection
⏳ Cross-model family cascades

Q3 2026: Performance Optimizations

Planned

July - September 2026

⏳ Neural Engine acceleration
⏳ Dynamic batch sizing
⏳ 4-model pyramid experiments
⏳ Target: 3x speedup

Q4 2026: Enterprise Features

Planned

October - December 2026

⏳ Multi-GPU support
⏳ Distributed inference
⏳ Enterprise monitoring
⏳ SLA guarantees

Status Legend

Completed

In Progress

Planned

✅Done

Shape the Future

Have ideas for momo-kibidango? We'd love to hear them! Join the discussion on GitHub Discussions or share your use cases in our Discord community.