🔮 Universal Format Support

Seamlessly load and manage GGUF, ONNX, Safetensors, HuggingFace models, and future formats with zero configuration.

⚡ Multi-Engine Architecture

Harness the power of WebGPU for maximum performance, WASM for universal compatibility, and Node.js for server deployments.

⚙️ Engine Guide 📊 Benchmarks

🧭 Intelligent Routing

Advanced algorithms automatically select the optimal model based on quality metrics, cost constraints, speed requirements, or custom strategies.

🎯 View Strategies 📚 Routing Guide

🚀 Real-Time Streaming

Experience lightning-fast token generation with async generators and real-time streaming capabilities for responsive AI applications.

🎮 Try Streaming 📖 Stream API

💰 Cost Optimization

Built-in cost analysis and optimization algorithms ensure maximum efficiency while staying within budget constraints.

💡 Cost Guide ⚡ Optimization

🔧 Developer-First API

Intuitive APIs designed for both simple quick-start scenarios and complex enterprise-grade implementations.

📋 API Reference 🚀 Quick Start

Quick Start

Get started with LLM Runner Router in just a few lines of code

// Simple Mode - For Rapid Prototyping
import { quick } from 'llm-runner-router';

const response = await quick("Explain quantum computing to a goldfish");
console.log(response.text);

// Advanced Mode - For Production Systems
import LLMRouter from 'llm-runner-router';

const router = new LLMRouter({
  strategy: 'quality-first',
  fallbacks: ['gpt-3.5', 'local-llama'],
  cacheEnabled: true
});

// Load multiple models
await router.load('huggingface:meta-llama/Llama-2-7b');
await router.load('local:./models/mistral-7b.gguf');

// Intelligent routing in action
const response = await router.advanced({
  prompt: "Write a haiku about JavaScript",
  temperature: 0.8,
  maxTokens: 50,
  streaming: true
});

// Stream tokens in real-time
for await (const token of response) {
  process.stdout.write(token);
}

<500ms Model Load Time

<100ms First Token

100+ Tokens/sec

<50% Memory Usage

Routing Strategies

Intelligent model selection algorithms designed for every use case

🏆 Quality First: Prioritizes the highest quality outputs with advanced scoring algorithms
💵 Cost Optimized: Balances performance with cost-effectiveness for budget-conscious deployments
⚡ Speed Priority: Optimizes for minimum latency and maximum throughput
⚖️ Balanced: Intelligently weighs quality, cost, and speed for optimal overall performance
🔄 Round Robin: Distributes requests evenly across available models for load balancing
🎯 Least Loaded: Routes to the least busy model instance for optimal resource utilization

📖 Learn More About Routing

📚 Documentation & Resources

Everything you need to master LLM Runner Router

📖

Complete Documentation

Comprehensive guides, API reference, and tutorials for all skill levels

View Documentation API Reference

💻

Code Examples

Ready-to-use examples for common use cases and advanced implementations

Browse Examples GitHub Examples

🚀

Quick Start Guide

Get up and running in 5 minutes with our step-by-step installation guide

Get Started NPM Package

🔧

Integration Guides

Platform-specific integration guides for React, Node.js, Docker, and more

View Guides Deployment

🎯

Performance Tuning

Optimization strategies, benchmarking tools, and performance best practices

Optimize Performance View Benchmarks

🤝

Community & Support

Join our community, get help, and contribute to the project's development

Get Support Contribute

🧠 LLM Runner Router