A Telegram assistant built on n8n that uses Zep memory to pass only the relevant context to the LLM — cutting token usage ~2× while improving response quality.
August 9, 2025
Telegram
n8n
Zep
LLM
OpenAI
Cost Optimization
RAG
Automation
Problem
Plain “chat history → LLM” bots are wasteful: every turn sends all prior messages.
As dialogs grow, latency and cost spike — and the model gets distracted by stale context.
Idea
Use Zep as a long-term memory service and let the workflow retrieve only the few chunks that matter for the user’s latest message. n8n orchestrates the flow, Telegram is the interface.Result: in tests, token consumption dropped by ~2× with equal or better answers.
const hits = items[0].json.results || [];
const context = hits.map(h => `• ${h.document.substring(0, 300)}`).join("\n");
return [{ json: {
prompt: [
"You are a concise assistant. Use CONTEXT only if relevant.",
"CONTEXT:\n" + context,
"USER:\n" + $json.text
].join("\n\n")
}}];
OpenAI (or generic LLM) — completion/chat with small max tokens.
Telegram — sends the reply text back to the user.
HTTP Request → Zep — saves the assistant message.
Prompting pattern
SYSTEM: Be concise. If CONTEXT is empty or irrelevant, answer from general knowledge.
CONTEXT (top-K from Zep):
{{ context_bullets }}
USER:
{{ user_message }}
This keeps the prompt short and on-topic, avoiding mega-histories.
Takeaway: Memory-aware retrieval with Zep lets the bot send only what matters to the LLM, delivering faster answers and roughly halving token spend on real chats.