Europe/Istanbul
All Projects

n8n + Zep — AI-Powered Telegram Bot Assistant

A Telegram assistant built on n8n that uses Zep memory to pass only the relevant context to the LLM — cutting token usage ~2× while improving response quality.
August 9, 2025
Telegram
n8n
Zep
LLM
OpenAI
Cost Optimization
RAG
Automation
Plain “chat history → LLM” bots are wasteful: every turn sends all prior messages.
As dialogs grow, latency and cost spike — and the model gets distracted by stale context.
Use Zep as a long-term memory service and let the workflow retrieve only the few chunks that matter for the user’s latest message. n8n orchestrates the flow, Telegram is the interface. Result: in tests, token consumption dropped by ~2× with equal or better answers.
Telegram → n8n Webhook (Telegram Trigger)
       → Zep: Add message to memory (per chat/user)
       → Zep: Search relevant memory (semantic + metadata filters)
       → Compose Prompt = System + User + Top-K Zep Results
       → LLM (OpenAI/any) → Telegram Reply
       → Zep: Store assistant reply (for future context)
  • Memory scope: one Zep collection per bot; documents keyed by chat_id.
  • Relevance: hybrid search (semantic + keyword), K=5 with score cutoff.
  • Privacy: only minimal metadata (chat_id, timestamps, tags) is stored alongside vectors.

  1. Telegram Trigger — receives messages and basic metadata.
  2. HTTP Request → Zep /api/v1/messages — appends user message to memory.
  3. HTTP Request → Zep /api/v1/search — retrieves top-K relevant chunks:
    {
      "query": "{{ $json.text }}",
      "user_id": "{{ $json.chat.id }}",
      "limit": 5,
      "score_threshold": 0.62,
      "filters": {"tags": ["kb","recent"]}
    }
    
  4. Function — builds compact prompt:
    const hits = items[0].json.results || [];
    const context = hits.map(h => `• ${h.document.substring(0, 300)}`).join("\n");
    return [{ json: {
      prompt: [
        "You are a concise assistant. Use CONTEXT only if relevant.",
        "CONTEXT:\n" + context,
        "USER:\n" + $json.text
      ].join("\n\n")
    }}];
    
  5. OpenAI (or generic LLM) — completion/chat with small max tokens.
  6. Telegram — sends the reply text back to the user.
  7. HTTP Request → Zep — saves the assistant message.

SYSTEM: Be concise. If CONTEXT is empty or irrelevant, answer from general knowledge.
CONTEXT (top-K from Zep):
{{ context_bullets }}
USER:
{{ user_message }}
This keeps the prompt short and on-topic, avoiding mega-histories.
  • Env vars: ZEP_API_URL, ZEP_API_KEY, OPENAI_API_KEY.
  • n8n: run via Docker; expose a webhook URL for Telegram.
  • Telegram: set webhook to https://<your-n8n>/webhook/telegram.
  • Zep: one collection; enable background summarization to keep vectors tidy.

  • n8n workflow (triggers, HTTP nodes, error branches, retries).
  • Zep integration: message upsert, semantic search, metadata strategy.
  • Prompt composer & token-budget guard; metrics for tokens/req & latency.
  • Telegram UX: typing indicators, fallback replies, admin commands.
Takeaway: Memory-aware retrieval with Zep lets the bot send only what matters to the LLM, delivering faster answers and roughly halving token spend on real chats.