LlamaIndex
The Zumik LlamaIndex LLM - a thin OpenAILike configuration pinned to Zumik's /v1 surface - for query engines, chat engines, and agents.
Zumik speaks the OpenAI wire format at https://api.zumik.ai/v1, so the LlamaIndex Zumik LLM is a thin configuration of LlamaIndex's OpenAILike LLM: the base URL is pinned and the key is read from ZUMIK_API_KEY. Use it anywhere a LlamaIndex LLM is accepted - query engines, chat engines, agents.
Install
pip install llama-index-llms-zumikThe package is llama-index-llms-zumik 0.1.0 (Apache-2.0), pinning llama-index-core==0.12.52, llama-index-llms-openai-like==0.4.0, and llama-index-llms-openai==0.4.7.
export ZUMIK_API_KEY="zk_..."
# optional, for staging / self-hosted:
# export ZUMIK_BASE_URL="https://api.zumik.ai/v1"Basic completion
model is a Zumik alias (code.fast, auto.balanced, reasoning.best, ...) or a concrete provider model; it defaults to auto.balanced. Aliases are chat models, so is_chat_model defaults to True.
from llama_index_llms_zumik import Zumik
llm = Zumik(model="code.fast") # reads ZUMIK_API_KEY
print(llm.complete("Summarize the CAP theorem in one sentence."))context_window defaults to a conservative 128,000. Pass the real value if your alias targets a larger window.
Query engine over documents
Set Zumik as Settings.llm and build an index. For embeddings you can run a local model or use Zumik's OpenAI-compatible /v1/embeddings through a LlamaIndex OpenAILike embedding.
from llama_index.core import Settings, VectorStoreIndex, Document
from llama_index.embeddings.openai_like import OpenAILikeEmbedding
from llama_index_llms_zumik import Zumik, DEFAULT_API_BASE
import os
Settings.llm = Zumik(model="code.fast")
# Embeddings over the same Zumik /v1 surface.
Settings.embed_model = OpenAILikeEmbedding(
model_name="auto.balanced",
api_base=DEFAULT_API_BASE,
api_key=os.environ["ZUMIK_API_KEY"],
)
docs = [
Document(text="Zumik resolves model aliases like code.fast to a pinned provider release."),
Document(text="QoS classes are interactive, standard, background, and batch."),
Document(text="Every response reports the resolved release and reuse on its headers."),
]
index = VectorStoreIndex.from_documents(docs)
engine = index.as_query_engine()
print(engine.query("What QoS classes does Zumik expose?"))The embeddings snippet needs llama-index-embeddings-openai-like, which is not a dependency of this package. Install it separately if you want Zumik-backed embeddings. DEFAULT_API_BASE is exported by the package and equals https://api.zumik.ai/v1.
Documents over a native session
The Zumik LLM rides the OpenAI-compatible /v1 surface, which resends the document context on every query. To reuse a long-lived document prefix instead, pin it to a native /v2 session with the Python SDK: build the prefix once as a bundle of document artifacts, open a session over it, then run queries against that session so the compiled context is reused.
from zumik import ZumikClient # first-party Zumik SDK
import os
zk = ZumikClient(api_key=os.environ["ZUMIK_API_KEY"])
docs = [
"Zumik resolves aliases to pinned provider releases.",
"Reuse is reported on response headers, not guessed.",
]
artifacts = [zk.create_artifact("document", text) for text in docs]
prefix = zk.create_bundle(
bundle_type="agent_prefix",
items=[{"artifact_id": a["id"], "role": "context"} for a in artifacts],
)
session = zk.create_session(base_bundle_ids=[prefix["id"]])
# Each query rides the same session, so the document prefix is compiled once and reused.
for q in ("How is reuse reported?", "What does an alias resolve to?"):
answer = zk.create_native_response(
model="code.fast",
input=q,
session_id=session["id"],
branch_id=session["default_branch_id"],
)
print(answer["output_text"])See the RAG example for the full retrieval-over-sessions reference and artifacts for the object model.
LangChain
ChatZumik and ZumikEmbeddings for LangChain (Python and JavaScript) - thin configurations of the LangChain OpenAI wrappers pinned to Zumik's /v1 surface.
Using the OpenAI SDK
Point a stock OpenAI SDK at Zumik's /v1 surface with a one-line base-URL swap, and when to prefer this over a native Zumik SDK.