Stop Feeding the Lorry One Chapati at a Time: Model Routing and Costs for Small Teams

Small teams do not need one expensive AI model for every job. This practical guide shows how to route tasks to the right model, control cost, and keep quality high without turning the budget into smoke.

Not every AI task deserves a heavyweight model. Some jobs are simple tea-money tasks, some are lorry-load tasks, and some should be parked for a human. Here is how small teams can route work intelligently and stop paying premium prices for basic chores.

A small team learns the truth about AI very quickly: the machine can be brilliant and still be too expensive for nonsense.

That is the real lesson behind model routing.

If you run a small business, a content team, a startup, or even a two-person operations unit with aspirations and a notebook full of “let’s automate this later,” you do not need one giant model to do everything. You need the right model for the right job at the right time, with a sensible rule for when to stop spending money like a person who just found a new credit card and a curious internet connection.

Model routing is the habit of sending each task to the model that fits it best.

Not the biggest model. Not the loudest model. Not the one with the prettiest demo reel.

The best one for the job.

That is how you stop feeding the lorry one chapati at a time.

The idea sounds technical, but the logic is ordinary. A small team already routes work manually all the time. A junior assistant drafts the first reply. A senior person handles the delicate customer issue. The accountant touches the numbers. The designer takes the visual task. The manager handles the one that could become drama if handled badly.

AI should be organized the same way.

A sensible routing setup usually has four layers.

First, classify the task. Ask what the job actually is. Is it a short rewrite? A customer reply? A summary of a meeting? A research question? A structured extraction from a form? Code? Translation? Brainstorming? Sensitive business content? The more specific the task label, the easier it is to route.

“Help me with my business” is not a task. That is a cry for help with a tote bag.

Second, estimate the risk. Some tasks can tolerate mistakes. Others cannot. A blog headline can be adjusted. A customer refund note, a pricing reply, a private internal memo, or a message that might go out publicly needs much more care. High-risk tasks may need a stronger model, stricter review, or a human in the loop.

Third, estimate the complexity. A simple FAQ answer does not need the same model as a multi-step workflow that reads files, checks rules, and writes a structured output. Nor does a one-line translation need the same horsepower as a long synthesis of several documents. If the task is basically “turn this into cleaner English,” do not summon the heavyweight champion of the cloud.

Fourth, estimate the cost ceiling. This is where many teams become polite victims of their own enthusiasm. They test a model on a few examples, love the result, and then forget that production traffic is not a demo. Every prompt, every retry, every long context window, every unnecessary attachment, and every overstuffed system instruction can push the bill upward.

A routing system protects the team from that slow bleed.

Think of it like transport. If you need to move one notebook across town, you do not hire a truck with seventeen tyres and a horn loud enough to wake ancestral memory. If you need to move ten sacks of maize, a bicycle is not wisdom. Routing is just matching the vehicle to the cargo.

The same is true with models.

For a small team, a good routing policy might look like this:

1. Fast and cheap model for routine work. Use this for summaries, short rewrites, label tagging, simple Q&A, formatting help, and low-risk content cleanup. If the task is repetitive and the expected answer is predictable, start here.

2. Stronger general model for moderate complexity. Use this when the task needs better reasoning, more nuance, longer context handling, or more careful drafting. This is often the default for customer-facing writing, internal analysis, or multi-step planning.

3. Premium model only for the hard cases. Use the heavyweight model when the task is genuinely difficult, customer-sensitive, or high-impact. That might mean a complex workflow, a dense document, a coding problem, or a situation where correctness matters more than saving a few coins.

4. Human fallback for uncertainty. If the model is confused, the data is messy, the request is sensitive, or the outcome could embarrass the team, do not force the machine to improvise. Route to a person.

That last point is not a weakness. It is maturity.

Now, the money side.

AI cost is usually not just the sticker price on the pricing page. It is the total of prompt length, output length, retries, tool calls, context size, and the number of times the team asks the model to redo work because the first version was almost right in the way a broken umbrella is almost weatherproof.

A small team should track at least five cost signals:

average cost per task
average cost per successful task
retry rate
human correction rate
percentage of tasks that actually needed the expensive model

That last one is a favorite because it exposes vanity. Teams often discover that the premium model was only truly needed for 10 to 20 percent of the work, while the rest could have been handled by a cheaper lane with a good prompt and a clean output format.

This is why routing beats habit.

Habit says, “Always use the best model.” Routing says, “Use the best model where it changes the outcome.”

The difference matters because the expensive model is not just a cost line. It is also a temptation. Once a team gets used to power, they start using it everywhere. That is how a quick customer reply turns into a long essay with footnotes and emotional self-awareness.

Nice for a workshop. Bad for a budget.

A practical routing policy can be built with simple rules:

If the job is repetitive, low risk, and short, use the cheap lane.
If the job involves customers, ambiguity, or medium complexity, use the mid lane.
If the job is hard, public, or high stakes, use the premium lane.
If the model is uncertain or the task touches sensitive data, stop and escalate.

Some teams also route by data sensitivity. If the prompt includes private business records, customer details, or unreleased strategy, it may need a private setup or a human review. The rule is simple: the more sensitive the input, the smaller the circle that sees it.

Routing also makes comparison easier. If tasks always go through the same rule, the team can measure which model actually earns its keep. The biggest savings usually come from the boring questions: Which prompts are bloated? Which tasks really need the premium lane? Which handoffs are failing because the workflow is unclear?

Not by shrinking ambition. By cleaning up the path.

If you want a simple starter rule, try this:

Route by three tests: complexity, risk, and cost. If two of the three are low, use a cheaper model. If two of the three are high, use a stronger model or human review. If all three are high, do not pretend this is a casual task.

That is a tiny policy, but it can stop a lot of waste.

Small lab note from the Ni Biashara side: this is the sort of thing a Ni Biashara / Nia planning sheet can handle beautifully — a neat map of which jobs deserve the budget model, which jobs can take the small model, and which jobs should never be left alone with a prompt and confidence.

Practical takeaway: make a one-page routing table for your team this week. List your top ten AI tasks, assign each one a model tier, define a human fallback rule, and track cost per task for two weeks. If the bill still feels mysterious after that, the problem is probably not the AI. It is the lack of a routing policy.

Sources

AWS Docs: Understanding Intelligent Prompt Routing in Amazon Bedrock — https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-routing.html
OpenAI Pricing — https://openai.com/pricing
Anthropic Pricing — https://www.anthropic.com/pricing
Google AI Pricing — https://ai.google.dev/pricing

Search This Blog

Ni Biashara

Stop Feeding the Lorry One Chapati at a Time: Model Routing and Costs for Small Teams

Sources

Comments

Post a Comment

Popular posts from this blog

Who Controls AI? Follow the Data Center, Not the Speech

Private AI or Cloud AI? The Small-Business Choice That Is Less About Fashion and More About Peace of Mind