Connecting Platforms like Stammer to our GPU via API

When users connect to your GPU-powered LLM via Stammer’s white‑label API, here’s what actually happens with their data and how storage is managed:


1. Where User Data Is Stored

According to Stammer’s own documentation:

  • All chat inputs, conversation history, prompts, and uploaded documents are stored on Stammer’s encrypted database—they control and host it docs.stammer.ai+7docs.stammer.ai+7docs.stammer.ai+7.

  • Only the contextual snippets needed to generate an LLM response are sent from Stammer to your inference endpoint on your GPU.

  • Those snippets are ephemeral—passed in via API request, used for response generation, and returned—but your system doesn’t automatically persist full transcripts unless designed to do so .


⚙️ 2. How the Flow Works

  1. User interacts via chat widget powered by Stammer.

  2. Stammer:

    • Stores user message & history.

    • Pulls relevant context (from knowledge-base or embeddings).

    • Sends that context + prompt to your /chat API endpoint.

  3. Your GPU server:

    • Runs LLM inference.

    • Returns the response to Stammer.

  4. Stammer delivers the response back to the user and logs entire conversation.

You are providing compute power, but Stammer handles full data storage, user state, knowledge base, and logs.


3. Do You Store Any User Data?

Not by default—unless you choose to:

  • You could optionally log:

    • Chat inputs

    • Response times

    • Errors

  • But storing PHI requires:

    • Your own HIPAA‑compliant encryption at rest

    • Logging in a way that integrates into Stammer’s audit trail

    • Possibly upstreaming logs back to Stammer’s database if you need later retrieval by users


4. Configuration Options

  • Default (Recommended):

    • Stammer stores everything, including persistent knowledge bases, user conversations, and auth data.

    • You only receive per-instance inference context, which isn’t stored unless you log it.

    • This gives you simpler compliance responsibilities, since Stammer handles most data.

  • Custom (Advanced):

    • You can implement your own storage on your GPU system for logs, analytics, or PHI—just make sure it’s encrypted, access‑controlled, auditable, etc.

    • You’d also want to share relevant logs back to Stammer so users see full conversation history in their dashboards.


✅ Summary

  • Stammer hosts & stores user data—your endpoint only receives the snippets needed for inference.

  • You don’t need to store PHI locally unless you intentionally log or persist chat inputs.

  • If you do store user data, it becomes your responsibility to secure it HIPAA‑compliantly and integrate it with Stammer’s system for full transparency.