LangChain
ChatZumik and ZumikEmbeddings for LangChain (Python and JavaScript) - thin configurations of the LangChain OpenAI wrappers pinned to Zumik's /v1 surface.
Zumik speaks the OpenAI wire format at https://api.zumik.ai/v1, so the LangChain integration is a thin configuration of the LangChain OpenAI wrappers - it does not reimplement LangChain. You get ChatZumik and ZumikEmbeddings with the base URL pinned and the key read from ZUMIK_API_KEY. Tool calling, structured output, .stream(), and .batch() all behave exactly as they do against OpenAI.
Install
pip install langchain-zumikThe Python package is langchain-zumik 0.1.0 (pins langchain-openai==0.3.28, langchain-core==0.3.72). The JS package is @zumik/langchain 0.1.0 (depends on @langchain/[email protected], with @langchain/core as a >=0.3.0 <0.4.0 peer). Both are Apache-2.0.
Set your key (never hardcode it):
export ZUMIK_API_KEY="zk_..."
# optional, for staging / self-hosted:
# export ZUMIK_BASE_URL="https://api.zumik.ai/v1"model is a Zumik alias (code.fast, auto.balanced, reasoning.best, ...) or a concrete provider model. Zumik resolves it server-side and reports the resolved release on the response headers. When omitted, both wrappers default to auto.balanced.
Chat
from langchain_zumik import ChatZumik
llm = ChatZumik(model="code.fast", temperature=0.2) # reads ZUMIK_API_KEY
print(llm.invoke("Explain a rolling deploy in two sentences.").content)
# Streaming and batching work exactly as upstream LangChain:
for chunk in llm.stream("List three rollback strategies."):
print(chunk.content, end="", flush=True)Tool calling and structured output are inherited from ChatOpenAI unchanged:
from pydantic import BaseModel
class Severity(BaseModel):
level: str
reason: str
triage = llm.with_structured_output(Severity)
print(triage.invoke("Disk at 100% on the primary DB. How bad?"))import { ChatZumik } from "@zumik/langchain";
const llm = new ChatZumik({ model: "code.fast", temperature: 0.2 });
const res = await llm.invoke("Explain a rolling deploy in two sentences.");
console.log(res.content);
// Streaming works as in upstream LangChain.js:
for await (const chunk of await llm.stream("List three rollback strategies.")) {
process.stdout.write(String(chunk.content));
}Tool calling and structured output are inherited from ChatOpenAI unchanged. Pass { apiKey } to override the env var and { baseUrl } for a staging or self-hosted host.
RAG with Zumik
Run both the embeddings and the generation model through Zumik so a retrieval pipeline lives on one surface. ZumikEmbeddings pairs with ChatZumik.
from langchain_zumik import ChatZumik, ZumikEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
docs = [
"Zumik resolves model aliases like code.fast to a pinned provider release.",
"Every response reports the resolved release and reuse on its headers.",
"QoS classes are interactive, standard, background, and batch.",
]
emb = ZumikEmbeddings(model="auto.balanced")
store = FAISS.from_texts(docs, emb)
retriever = store.as_retriever(search_kwargs={"k": 2})
prompt = ChatPromptTemplate.from_template(
"Answer using only the context.\n\nContext:\n{context}\n\nQuestion: {question}"
)
llm = ChatZumik(model="code.fast")
chain = (
{"context": retriever | (lambda d: "\n".join(x.page_content for x in d)),
"question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
print(chain.invoke("What does Zumik report on response headers?"))langchain-community and faiss-cpu are only needed for the vector store in this snippet; they are not dependencies of langchain-zumik itself. ZumikEmbeddings disables check_embedding_ctx_length because that path tokenizes with a hardcoded OpenAI model name, which does not apply to a Zumik alias - the server handles chunking.
When to reach past this integration
ChatZumik rides the OpenAI-compatible /v1 surface, which is the right default for most chains. To use Zumik's native state - artifacts, sessions, branches, signed purge - drop down to a Zumik SDK for the /v2 calls and keep ChatZumik for generation. See core concepts and the examples.
Coding agents
Use Zumik as a drop-in OpenAI-compatible endpoint for Cline, Roo Code, Continue, Aider, and the OpenAI SDKs - one base URL, one key, full request fidelity, and caching on by default.
LlamaIndex
The Zumik LlamaIndex LLM - a thin OpenAILike configuration pinned to Zumik's /v1 surface - for query engines, chat engines, and agents.