Building a Personal Agent Platform with Slack, RAG, and Draft PRs

I have been slowly building a personal agent platform. The goal is simple:

Answer questions in Slack using the knowledge that already exists in Slack, Confluence, Github.
Cite the sources so the answer can be trusted.

The shape is intentionally simple. A FastAPI service receives work, a worker processes jobs, Qdrant stores embeddings, Slack is the interface, and Kubernetes runs the whole thing.

Slack mention
  -> FastAPI / Slack Socket Mode
  -> Redis queue
  -> worker
  -> retrieve from Qdrant
  -> LLM with citations
  -> Slack thread reply

Why Build This At All?

Most of the knowledge is just scattered. I need a quick way to find a fly command stored in 4 year old confluence page. The problem is not lack of information, It is retrieval.

Search works when you know the exact keyword. It is weaker when you ask something like:

How do I access the VPN?

The answer may be split across a runbook, a Slack conversation, and a comment that mentions a recent exception. A useful assistant needs to search across those sources, bring back the right documents, and make it obvious where the answer came from.

Current Architecture

The current version has two deployed processes:

API pod: FastAPI endpoints and Slack Socket Mode.
Worker pod: background jobs for Slack replies, backfills, and ingestion.

Both share Redis for queueing and Qdrant for vector search.

The app runs in Kubernetes and is deployed with Argo CD. Secrets come from Vault, and the LLM calls go through an OpenAI-compatible gateway instead of calling provider APIs directly from the app.

That last part matters more than it sounds. If every service calls every model provider directly, credentials and protocols leak all over the system. A gateway gives one place to route models, hide credentials, and switch providers.

Ingestion

The first useful version ingests Slack history.

For each allowed channel, the worker:

Calls Slack history APIs.
Fetches thread replies.
Normalizes messages into text chunks.
Embeds each chunk.
Upserts points into Qdrant.

Each point has payload fields like:

{
  "source": "slack",
  "channel_id": "C123",
  "thread_ts": "1234567890.123",
  "user_id": "U123",
  "url": "https://slack.com/archives/..."
}

Then I added Confluence as a second source. Confluence pages use the same Qdrant collection but different payload fields:

{
  "source": "confluence",
  "space_key": "OPS",
  "page_id": "12345",
  "title": "VPN Setup Guide",
  "url": "https://example.atlassian.net/wiki/..."
}

Using one collection keeps retrieval simple because both sources use the same embedding model and vector size. Source-specific filters still work through payload indexes.

The First Retrieval Bug

The first version searched one global vector pool. That worked fine for Slack-heavy questions, but it failed on questions where the canonical answer lived in Confluence.

The bug looked like this:

User asks: how do I access VPN?

The top Slack messages were related enough to beat the Confluence runbook in raw vector similarity. The model saw Slack chatter, missed the runbook, and answered poorly.

The fix was not just "add a source bias." Bias can only re-rank documents that made it into the candidate pool. If the best Confluence page is ranked 200 globally, a small re-rank bonus does nothing.

The better fix was dual-pool retrieval:

Search the global collection.
Search a Confluence-filtered pool.
Merge candidates.
Re-rank the combined set.
Send the best sources to the model.

In pseudocode:

global_hits = qdrant.search(query_vector, limit=top_k * 3)
confluence_hits = qdrant.search(
    query_vector,
    filter={"source": "confluence"},
    limit=10,
)

hits = merge_and_dedupe(global_hits, confluence_hits)
ranked = rerank(hits)
return ranked[:top_k]

This is a small change, but it made the assistant feel much less random. For internal knowledge systems, source balance matters. Slack is verbose and recent. Confluence is sparse and authoritative. A naive vector search tends to over-reward the verbose source.

Citations Are The Product

Without citations, the assistant gives a confident answer and the user still has to verify it manually. With citations, the assistant becomes a fast path to the underlying source.

The Slack reply format I like is:

To access the VPN, follow the current setup guide [1].
If you need access approved, contact the support alias listed in the runbook [2].

Sources:
[1] VPN Setup Guide
[2] Access Request Process

In Slack, those source labels become real links to Slack permalinks or Confluence pages.

The model is allowed to use a search_kb tool. The worker forces one search on the first turn, then allows follow-up searches for a few iterations. This is better than blindly stuffing search results into the prompt because the model can ask a narrower second question if the first result set is too broad.

model
  -> tool_call search_kb("vpn setup")
  -> tool_result with sources
  -> optional tool_call search_kb("vpn approval access")
  -> final answer with source markers

Conclusion

This solves the first useful part of a personal agent platform. Now I have a bot to explain same information to multiple person.