LangChain

ChatZumik and ZumikEmbeddings for LangChain (Python and JavaScript) - thin configurations of the LangChain OpenAI wrappers pinned to Zumik's /v1 surface.

Zumik speaks the OpenAI wire format at https://api.zumik.ai/v1, so the LangChain integration is a thin configuration of the LangChain OpenAI wrappers - it does not reimplement LangChain. You get ChatZumik and ZumikEmbeddings with the base URL pinned and the key read from ZUMIK_API_KEY. Tool calling, structured output, .stream(), and .batch() all behave exactly as they do against OpenAI.

Install

Python

pip install langchain-zumik

The Python package is langchain-zumik 0.1.0 (pins langchain-openai==0.3.28, langchain-core==0.3.72). The JS package is @zumik/langchain 0.1.0 (depends on @langchain/[email protected], with @langchain/core as a >=0.3.0 <0.4.0 peer). Both are Apache-2.0.

Set your key (never hardcode it):

export ZUMIK_API_KEY="zk_..."
# optional, for staging / self-hosted:
# export ZUMIK_BASE_URL="https://api.zumik.ai/v1"

model is a Zumik alias (code.fast, auto.balanced, reasoning.best, ...) or a concrete provider model. Zumik resolves it server-side and reports the resolved release on the response headers. When omitted, both wrappers default to auto.balanced.

Chat

from langchain_zumik import ChatZumik

llm = ChatZumik(model="code.fast", temperature=0.2)  # reads ZUMIK_API_KEY

print(llm.invoke("Explain a rolling deploy in two sentences.").content)

# Streaming and batching work exactly as upstream LangChain:
for chunk in llm.stream("List three rollback strategies."):
    print(chunk.content, end="", flush=True)

Tool calling and structured output are inherited from ChatOpenAI unchanged:

from pydantic import BaseModel

class Severity(BaseModel):
    level: str
    reason: str

triage = llm.with_structured_output(Severity)
print(triage.invoke("Disk at 100% on the primary DB. How bad?"))

import { ChatZumik } from "@zumik/langchain";

const llm = new ChatZumik({ model: "code.fast", temperature: 0.2 });

const res = await llm.invoke("Explain a rolling deploy in two sentences.");
console.log(res.content);

// Streaming works as in upstream LangChain.js:
for await (const chunk of await llm.stream("List three rollback strategies.")) {
  process.stdout.write(String(chunk.content));
}

Tool calling and structured output are inherited from ChatOpenAI unchanged. Pass { apiKey } to override the env var and { baseUrl } for a staging or self-hosted host.

RAG with Zumik

Run both the embeddings and the generation model through Zumik so a retrieval pipeline lives on one surface. ZumikEmbeddings pairs with ChatZumik.

from langchain_zumik import ChatZumik, ZumikEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

docs = [
    "Zumik resolves model aliases like code.fast to a pinned provider release.",
    "Every response reports the resolved release and reuse on its headers.",
    "QoS classes are interactive, standard, background, and batch.",
]

emb = ZumikEmbeddings(model="auto.balanced")
store = FAISS.from_texts(docs, emb)
retriever = store.as_retriever(search_kwargs={"k": 2})

prompt = ChatPromptTemplate.from_template(
    "Answer using only the context.\n\nContext:\n{context}\n\nQuestion: {question}"
)
llm = ChatZumik(model="code.fast")

chain = (
    {"context": retriever | (lambda d: "\n".join(x.page_content for x in d)),
     "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

print(chain.invoke("What does Zumik report on response headers?"))

langchain-community and faiss-cpu are only needed for the vector store in this snippet; they are not dependencies of langchain-zumik itself. ZumikEmbeddings disables check_embedding_ctx_length because that path tokenizes with a hardcoded OpenAI model name, which does not apply to a Zumik alias - the server handles chunking.

When to reach past this integration

ChatZumik rides the OpenAI-compatible /v1 surface, which is the right default for most chains. To use Zumik's native state - artifacts, sessions, branches, signed purge - drop down to a Zumik SDK for the /v2 calls and keep ChatZumik for generation. See core concepts and the examples.

LlamaIndex

The same OpenAI-compatible approach for LlamaIndex query and chat engines.

Using the OpenAI SDK

The base-URL swap directly, when a framework wrapper is not needed.