Planning OpenClaw Repositories and Vector Stores

If you are new to OpenClaw and trying to decide how to organize your repositories, shares, and vector store, start here. The goal is to make the right structural decisions before you index content and before you give agents access.

Step 1: Classify your content by trust boundary

Before choosing a vector store, identify the kinds of content you actually have:

  • general family or household files
  • operational house and device knowledge
  • private legal and financial records
  • business and publishing material
  • canonical published content

The first design question is not which model to use. It is which agent should be allowed to see which content.

Step 2: Build buckets with one job each

A practical default is:

  • Family Share for ordinary non-sensitive family files
  • Home_Ops for manuals, appliance docs, and house operations material
  • Private_Records for legal, tax, banking, insurance, identity, retirement, and medical records
  • HASMaster_Ops for marketing, projects, videos, and working business material
  • HASMaster_1000 for canonical and publish-facing content

If a bucket has two jobs, split it now.

Step 3: Organize private records by person first

For sensitive records, the top level should usually be by person, then by category. That works better because people ask for David’s divorce records or Monica’s identity documents, not every tax or insurance file across the whole household at once.

Step 4: Decide which agents get which buckets

Do not mount everything everywhere.

A good default is:

  • personal agent: Family Share and Home_Ops
  • private-records agent: Private_Records only
  • hasmaster agent: HASMaster_Ops and HASMaster_1000

The rule that matters most is simple: if an agent can read private records, it should not publish. If an agent can publish, it should not read private records.

Step 5: Start with exact search, not vector search

For private records, exact search should come first. Tools like ripgrep, pdfgrep, pdftotext, and Recoll give you reliable document-backed retrieval before you add embeddings.

Vector search is still useful, but it should come second for related passages, similar filings, prior wording, and clustered context across many records.

Step 6: Decide whether you need one vector store or two

Use one vector store if all indexed content belongs to the same trust boundary and the same agents can safely search all of it.

Use two vector stores if you have both private records and business or publishing content, different agents should see different corpora, or you want a real operational boundary instead of one logical boundary.

For most mixed home-plus-business environments, the cleaner answer is one shared business store and one private store.

Step 7: Choose the vector store you can operate cleanly

Weaviate, Qdrant, and LanceDB can all work.

  • Use Weaviate if you already run it and want service-based retrieval.
  • Use Qdrant if you want a filtering-focused service and you are starting fresh.
  • Use LanceDB if you want a more embedded app-local approach.

If you already have Weaviate in the stack, adding a second Weaviate instance for private retrieval is often simpler than replacing the stack.

Step 8: Define ingestion rules before indexing anything

Always exclude filesystem noise such as #recycle, #snapshot, @eaDir, Thumbs.db, desktop.ini, and office lock files.

For every indexed document, keep metadata like source path, bucket, person or household scope, category, and date when known.

Step 9: Keep working content separate from canonical content

Especially for business use, keep drafts, planning, marketing source material, and production notes separate from final publish-facing content and stable documentation.

That keeps retrieval cleaner and makes publishing easier to control.

Step 10: Validate before you automate

Before you turn on background ingestion or publishing, confirm that each agent can only see the buckets it should see, private paths are not mounted into business runtimes, business agents cannot read private records, private agents cannot publish, and exact search works before you trust vector retrieval.

Recommended default

If you want one practical default for a new OpenClaw user, use this:

  • Family Share for non-sensitive household files
  • Home_Ops for manuals and operational docs
  • Private_Records for sensitive person-centered records
  • HASMaster_Ops for business working files
  • HASMaster_1000 for canonical published business content
  • exact search first for private records
  • separate private and business vector stores if both trust boundaries exist

That gives you a system that stays understandable when the corpus grows and keeps your OpenClaw agent boundaries clear from the start.