Miron's Portfolio

Problem

Plain “chat history → LLM” bots are wasteful: every turn sends all prior messages.
As dialogs grow, latency and cost spike — and the model gets distracted by stale context.

Idea

Use Zep as a long-term memory service and let the workflow retrieve only the few chunks that matter for the user’s latest message. n8n orchestrates the flow, Telegram is the interface. Result: in tests, token consumption dropped by ~2× with equal or better answers.

Architecture (high level)

Telegram → n8n Webhook (Telegram Trigger)
       → Zep: Add message to memory (per chat/user)
       → Zep: Search relevant memory (semantic + metadata filters)
       → Compose Prompt = System + User + Top-K Zep Results
       → LLM (OpenAI/any) → Telegram Reply
       → Zep: Store assistant reply (for future context)

Memory scope: one Zep collection per bot; documents keyed by chat_id.
Relevance: hybrid search (semantic + keyword), K=5 with score cutoff.
Privacy: only minimal metadata (chat_id, timestamps, tags) is stored alongside vectors.

n8n Workflow (key nodes)

Telegram Trigger — receives messages and basic metadata.
HTTP Request → Zep /api/v1/messages — appends user message to memory.

HTTP Request → Zep /api/v1/search — retrieves top-K relevant chunks:

{
  "query": "{{ $json.text }}",
  "user_id": "{{ $json.chat.id }}",
  "limit": 5,
  "score_threshold": 0.62,
  "filters": {"tags": ["kb","recent"]}
}

Function — builds compact prompt:

const hits = items[0].json.results || [];
const context = hits.map(h => `• ${h.document.substring(0, 300)}`).join("\n");
return [{ json: {
  prompt: [
    "You are a concise assistant. Use CONTEXT only if relevant.",
    "CONTEXT:\n" + context,
    "USER:\n" + $json.text
  ].join("\n\n")
}}];

OpenAI (or generic LLM) — completion/chat with small max tokens.
Telegram — sends the reply text back to the user.
HTTP Request → Zep — saves the assistant message.

Prompting pattern

SYSTEM: Be concise. If CONTEXT is empty or irrelevant, answer from general knowledge.
CONTEXT (top-K from Zep):
{{ context_bullets }}
USER:
{{ user_message }}

This keeps the prompt short and on-topic, avoiding mega-histories.

Deployment & config

Env vars: ZEP_API_URL, ZEP_API_KEY, OPENAI_API_KEY.
n8n: run via Docker; expose a webhook URL for Telegram.
Telegram: set webhook to https://<your-n8n>/webhook/telegram.
Zep: one collection; enable background summarization to keep vectors tidy.

What I built personally

n8n workflow (triggers, HTTP nodes, error branches, retries).
Zep integration: message upsert, semantic search, metadata strategy.
Prompt composer & token-budget guard; metrics for tokens/req & latency.
Telegram UX: typing indicators, fallback replies, admin commands.

Takeaway: Memory-aware retrieval with Zep lets the bot send only what matters to the LLM, delivering faster answers and roughly halving token spend on real chats.