powered by 🜁 SentientEC

🧠 LLM Runner Router

A cutting-edge, full-stack agnostic neural orchestration system that intelligently adapts to ANY model format, ANY runtime environment, and ANY deployment scenario.

🔮 Universal Format Support

Seamlessly load and manage GGUF, ONNX, Safetensors, HuggingFace models, and future formats with zero configuration.

⚡ Multi-Engine Architecture

Harness the power of WebGPU for maximum performance, WASM for universal compatibility, and Node.js for server deployments.

🧭 Intelligent Routing

Advanced algorithms automatically select the optimal model based on quality metrics, cost constraints, speed requirements, or custom strategies.

🚀 Real-Time Streaming

Experience lightning-fast token generation with async generators and real-time streaming capabilities for responsive AI applications.

💰 Cost Optimization

Built-in cost analysis and optimization algorithms ensure maximum efficiency while staying within budget constraints.

🔧 Developer-First API

Intuitive APIs designed for both simple quick-start scenarios and complex enterprise-grade implementations.

Quick Start

Get started with LLM Runner Router in just a few lines of code

// Simple Mode - For Rapid Prototyping
import { quick } from 'llm-runner-router';

const response = await quick("Explain quantum computing to a goldfish");
console.log(response.text);

// Advanced Mode - For Production Systems
import LLMRouter from 'llm-runner-router';

const router = new LLMRouter({
  strategy: 'quality-first',
  fallbacks: ['gpt-3.5', 'local-llama'],
  cacheEnabled: true
});

// Load multiple models
await router.load('huggingface:meta-llama/Llama-2-7b');
await router.load('local:./models/mistral-7b.gguf');

// Intelligent routing in action
const response = await router.advanced({
  prompt: "Write a haiku about JavaScript",
  temperature: 0.8,
  maxTokens: 50,
  streaming: true
});

// Stream tokens in real-time
for await (const token of response) {
  process.stdout.write(token);
}
<500ms Model Load Time
<100ms First Token
100+ Tokens/sec
<50% Memory Usage

Routing Strategies

Intelligent model selection algorithms designed for every use case

  • 🏆 Quality First: Prioritizes the highest quality outputs with advanced scoring algorithms
  • 💵 Cost Optimized: Balances performance with cost-effectiveness for budget-conscious deployments
  • ⚡ Speed Priority: Optimizes for minimum latency and maximum throughput
  • ⚖️ Balanced: Intelligently weighs quality, cost, and speed for optimal overall performance
  • 🔄 Round Robin: Distributes requests evenly across available models for load balancing
  • 🎯 Least Loaded: Routes to the least busy model instance for optimal resource utilization
📖 Learn More About Routing

📚 Documentation & Resources

Everything you need to master LLM Runner Router

📖

Complete Documentation

Comprehensive guides, API reference, and tutorials for all skill levels

💻

Code Examples

Ready-to-use examples for common use cases and advanced implementations

🚀

Quick Start Guide

Get up and running in 5 minutes with our step-by-step installation guide

🔧

Integration Guides

Platform-specific integration guides for React, Node.js, Docker, and more

🎯

Performance Tuning

Optimization strategies, benchmarking tools, and performance best practices

🤝

Community & Support

Join our community, get help, and contribute to the project's development