If you are new to OpenClaw and trying to decide how to organize your repositories, shares, and vector store, start here. The goal is to make the right structural decisions before you index content and before you give agents access.
Step 1: Classify your content by trust boundary
Before choosing a vector store, identify the kinds of content you actually have:
- general family or household files
- operational house and device knowledge
- private legal and financial records
- business and publishing material
- canonical published content
The first design question is not which model to use. It is which agent should be allowed to see which content.
Step 2: Build buckets with one job each
A practical default is:
- Family Share for ordinary non-sensitive family files
- Home_Ops for manuals, appliance docs, and house operations material
- Private_Records for legal, tax, banking, insurance, identity, retirement, and medical records
- HASMaster_Ops for marketing, projects, videos, and working business material
- HASMaster_1000 for canonical and publish-facing content
If a bucket has two jobs, split it now.
Step 3: Organize private records by person first
For sensitive records, the top level should usually be by person, then by category. That works better because people ask for David’s divorce records or Monica’s identity documents, not every tax or insurance file across the whole household at once.
Step 4: Decide which agents get which buckets
Do not mount everything everywhere.
A good default is:
personalagent:Family ShareandHome_Opsprivate-recordsagent:Private_Recordsonlyhasmasteragent:HASMaster_OpsandHASMaster_1000
The rule that matters most is simple: if an agent can read private records, it should not publish. If an agent can publish, it should not read private records.
Step 5: Start with exact search, not vector search
For private records, exact search should come first. Tools like ripgrep, pdfgrep, pdftotext, and Recoll give you reliable document-backed retrieval before you add embeddings.
Vector search is still useful, but it should come second for related passages, similar filings, prior wording, and clustered context across many records.
Step 6: Decide whether you need one vector store or two
Use one vector store if all indexed content belongs to the same trust boundary and the same agents can safely search all of it.
Use two vector stores if you have both private records and business or publishing content, different agents should see different corpora, or you want a real operational boundary instead of one logical boundary.
For most mixed home-plus-business environments, the cleaner answer is one shared business store and one private store.
Step 7: Choose the vector store you can operate cleanly
Weaviate, Qdrant, and LanceDB can all work.
- Use Weaviate if you already run it and want service-based retrieval.
- Use Qdrant if you want a filtering-focused service and you are starting fresh.
- Use LanceDB if you want a more embedded app-local approach.
If you already have Weaviate in the stack, adding a second Weaviate instance for private retrieval is often simpler than replacing the stack.
Step 8: Define ingestion rules before indexing anything
Always exclude filesystem noise such as #recycle, #snapshot, @eaDir, Thumbs.db, desktop.ini, and office lock files.
For every indexed document, keep metadata like source path, bucket, person or household scope, category, and date when known.
Step 9: Keep working content separate from canonical content
Especially for business use, keep drafts, planning, marketing source material, and production notes separate from final publish-facing content and stable documentation.
That keeps retrieval cleaner and makes publishing easier to control.
Step 10: Validate before you automate
Before you turn on background ingestion or publishing, confirm that each agent can only see the buckets it should see, private paths are not mounted into business runtimes, business agents cannot read private records, private agents cannot publish, and exact search works before you trust vector retrieval.
Recommended default
If you want one practical default for a new OpenClaw user, use this:
Family Sharefor non-sensitive household filesHome_Opsfor manuals and operational docsPrivate_Recordsfor sensitive person-centered recordsHASMaster_Opsfor business working filesHASMaster_1000for canonical published business content- exact search first for private records
- separate private and business vector stores if both trust boundaries exist
That gives you a system that stays understandable when the corpus grows and keeps your OpenClaw agent boundaries clear from the start.