Zumik
Framework integrations

LlamaIndex

The Zumik LlamaIndex LLM - a thin OpenAILike configuration pinned to Zumik's /v1 surface - for query engines, chat engines, and agents.

Zumik speaks the OpenAI wire format at https://api.zumik.ai/v1, so the LlamaIndex Zumik LLM is a thin configuration of LlamaIndex's OpenAILike LLM: the base URL is pinned and the key is read from ZUMIK_API_KEY. Use it anywhere a LlamaIndex LLM is accepted - query engines, chat engines, agents.

Install

pip install llama-index-llms-zumik

The package is llama-index-llms-zumik 0.1.0 (Apache-2.0), pinning llama-index-core==0.12.52, llama-index-llms-openai-like==0.4.0, and llama-index-llms-openai==0.4.7.

export ZUMIK_API_KEY="zk_..."
# optional, for staging / self-hosted:
# export ZUMIK_BASE_URL="https://api.zumik.ai/v1"

Basic completion

model is a Zumik alias (code.fast, auto.balanced, reasoning.best, ...) or a concrete provider model; it defaults to auto.balanced. Aliases are chat models, so is_chat_model defaults to True.

from llama_index_llms_zumik import Zumik

llm = Zumik(model="code.fast")          # reads ZUMIK_API_KEY
print(llm.complete("Summarize the CAP theorem in one sentence."))

context_window defaults to a conservative 128,000. Pass the real value if your alias targets a larger window.

Query engine over documents

Set Zumik as Settings.llm and build an index. For embeddings you can run a local model or use Zumik's OpenAI-compatible /v1/embeddings through a LlamaIndex OpenAILike embedding.

from llama_index.core import Settings, VectorStoreIndex, Document
from llama_index.embeddings.openai_like import OpenAILikeEmbedding
from llama_index_llms_zumik import Zumik, DEFAULT_API_BASE
import os

Settings.llm = Zumik(model="code.fast")

# Embeddings over the same Zumik /v1 surface.
Settings.embed_model = OpenAILikeEmbedding(
    model_name="auto.balanced",
    api_base=DEFAULT_API_BASE,
    api_key=os.environ["ZUMIK_API_KEY"],
)

docs = [
    Document(text="Zumik resolves model aliases like code.fast to a pinned provider release."),
    Document(text="QoS classes are interactive, standard, background, and batch."),
    Document(text="Every response reports the resolved release and reuse on its headers."),
]

index = VectorStoreIndex.from_documents(docs)
engine = index.as_query_engine()
print(engine.query("What QoS classes does Zumik expose?"))

The embeddings snippet needs llama-index-embeddings-openai-like, which is not a dependency of this package. Install it separately if you want Zumik-backed embeddings. DEFAULT_API_BASE is exported by the package and equals https://api.zumik.ai/v1.

Documents over a native session

The Zumik LLM rides the OpenAI-compatible /v1 surface, which resends the document context on every query. To reuse a long-lived document prefix instead, pin it to a native /v2 session with the Python SDK: build the prefix once as a bundle of document artifacts, open a session over it, then run queries against that session so the compiled context is reused.

from zumik import ZumikClient   # first-party Zumik SDK
import os

zk = ZumikClient(api_key=os.environ["ZUMIK_API_KEY"])

docs = [
    "Zumik resolves aliases to pinned provider releases.",
    "Reuse is reported on response headers, not guessed.",
]
artifacts = [zk.create_artifact("document", text) for text in docs]
prefix = zk.create_bundle(
    bundle_type="agent_prefix",
    items=[{"artifact_id": a["id"], "role": "context"} for a in artifacts],
)
session = zk.create_session(base_bundle_ids=[prefix["id"]])

# Each query rides the same session, so the document prefix is compiled once and reused.
for q in ("How is reuse reported?", "What does an alias resolve to?"):
    answer = zk.create_native_response(
        model="code.fast",
        input=q,
        session_id=session["id"],
        branch_id=session["default_branch_id"],
    )
    print(answer["output_text"])

See the RAG example for the full retrieval-over-sessions reference and artifacts for the object model.

On this page