Migrating off OpenAI Assistants: a multi-tenant retrospective
And what changes when user state has to come home
I’ve built searchmydocs.ai’s retrieval layer twice and I’m about to do it a third time. The third one isn’t my choice. OpenAI is sunsetting the Assistants API on August 26, 2026, and the multi-tenant patterns I built on it don’t survive the move without rework.
searchmydocs.ai is a doc Q&A app where users upload their files and chat with them. First I built a custom RAG pipeline on Supabase. Then I ditched it for the Assistants API, which worked well for a year. Now I’m migrating to Responses + Conversations. This post is about what shipping multi-tenant on Assistants actually cost me, and what changes when the persistent Assistant object goes away.
The first migration: custom RAG to Assistants API
I started with Supabase’s RAG tutorial for markdown. The plan was to implement that and then extend it with PDF support. The stack was straightforward: pgvector as the vector database, gte-small as the embedding model, running locally inside an edge function on Deno. It worked well for small markdown files. As files got larger I started hitting CPU timeouts on the free tier, and large documents would silently fail mid-embedding.
The bigger problem was that most people want to chat with PDFs, not markdown. PDF parsing is its own deep problem and I didn’t want to take on the scope creep. PDF heterogeneity is brutal: resumes look nothing like SEC quarterly reports, which are full of tables, numerical figures, and embedded images. I outsourced parsing to Azure AI Document Intelligence, which takes a PDF as input and returns markdown. By the time that was wired up, what started as a weekend RAG tutorial had quietly turned into a multi-service retrieval pipeline I was now responsible for maintaining.
After all of that, I didn't want to keep maintaining a RAG pipeline for what was supposed to be a document Q&A app. I needed a solution that handled the problem for me, and that's when I learned about the Assistants API. The model was clean: upload Files, create a Vector Store that handles chunking and embedding for you, and attach it to an Assistant configured with a system prompt and the file_search tool. Threads and citations came built in. I traded control over the retrieval layer for not having to babysit it. In early 2025, that looked like a great trade.
A year of not thinking about retrieval
Assistants were easy to use. Uploading and chunking were reliable. File_search worked: I’d ask a question of an uploaded document and the assistant found the right context most of the time. For a default RAG implementation, the Assistants API was the best starting point I’d seen. The retrieval was good enough that I stopped thinking about it as a problem. For about a year, that was the whole story. I shipped features. Users uploaded their files and got answers. The trade I made, control for velocity, was working exactly as I’d hoped.
The deprecation
On August 26, 2025, OpenAI announced the deprecation of the Assistants API in favor of the newer Responses API. The sunset date is exactly one year later: August 26, 2026. My first reaction was “not again.” This wasn’t a bug fix or a small API change. OpenAI was moving to a different mental model entirely: Responses for execution, Conversations for conversation state, and Prompts (managed in the dashboard) for configuration. The retrieval pipeline I had stopped thinking about was now back on my plate. I hadn’t chosen to migrate. I’d been assigned one.
Where state lives
With searchmydocs.ai I used a multi-tenant architecture. Each user could create their own Assistant configured with a custom prompt and attach or detach their own vector stores. In the new model, Prompts live in the dashboard, not per user. There’s no programmatic way to create them per user. The workaround is to store each user’s prompt in my own database and pass it as the instructions parameter on every Responses call. That works, but it means moving state from OpenAI’s side back onto mine. The persistent Assistant object I’d been relying on is gone.
I will be writing the migration code over the next few months. Most of the work will be locating all the state that used to live in OpenAI's Assistant object and giving it a new home in my database. The persistent Assistant object made it easy to forget that user-specific config (prompts, vector store attachments, conversation context) was living on OpenAI's side. When the abstraction goes away, all of that has to come home.
One thing did get easier. In the old pattern, you updated the Assistant object whenever you wanted to change vector stores. In the new pattern, you pass vector store IDs per request. Rotation goes from an Assistant update call to a function argument.

