§ 08 / AI

LLM integrations — YandexGPT, GigaChat, Claude, GPT, Gemini, Kimi, GLM, and local models.

Not "let's bolt on AI because everyone has AI." We look through your processes for places where models genuinely save hours every week, and build tools around them. We work with Russian cloud models (YandexGPT, GigaChat, T-Lite), Western ones (Claude, GPT, Gemini, Grok, DeepSeek), Chinese open-weight ones (Qwen3, Kimi K2.5, MiniMax M2.7, GLM 5.1), and local ones (Llama, Mistral) — chosen to fit your data requirements and budget.

§ 08.1 Typical use cases

→ Support

AI support assistant

First-line customer support: the model answers 60–80% of questions and forwards complex cases to an operator. RAG over your knowledge base, conversation context memory.

→ Documents

Document processing

Extracting structured data from invoices, bills, contracts, and resumes. Replaces the manual entry that eats hours of every day for several people.

→ Search

Internal AI search

Smart search across your documents, wiki, and tickets: ask in natural language, get an answer with citations and source links. A vector database plus the right plumbing.

→ Analysis

Text and review analysis

Classifying support tickets, scoring review sentiment, surfacing topics from customer conversations, pulling insights out of interview transcripts.

→ Content

Generation and editing

Drafts of product descriptions, email campaigns, social posts, SEO copy. With your tone of voice and a fact-check pass.

→ Agents

Agents and automation

Scenarios where the LLM doesn't just respond, but acts: opening tickets, populating CRM, posting in Slack, fetching data from APIs. With human checkpoints at the critical steps.

§ 08.2 What's included

Discovery: figuring out where a model genuinely helps and where it stays a toy.
Picking the right model for the task: YandexGPT (Yandex), GigaChat (Sber), T-Lite / T-Pro (T-Bank), Claude (Anthropic), GPT-4 / GPT-5 (OpenAI), Gemini (Google), Grok (xAI), DeepSeek, Qwen3 (Alibaba), Kimi K2.5 (Moonshot), MiniMax M2.7, GLM 5.1 (Zhipu), Command (Cohere), local Llama / Mistral / Phi.
Prompt engineering, structured output (JSON schemas), function calls.
RAG: embeddings, vector database (pgvector, Qdrant, Chroma), retriever, ranking.
Eval set: how we measure quality and where the acceptable thresholds sit.
Safeguards: rate limits, input/output moderation, logging, cost control.
Monitoring, A/B testing of prompts, dashboards for token spend.

§ 08.3 Models we connect

Russian cloud models

YandexGPT (Yandex), GigaChat (Sber), T-Lite / T-Pro (T-Bank). Data is processed and stored inside Russia, with a personal-data processing agreement and full regulatory compliance. Integration via Yandex Cloud ML SDK or REST directly, GigaChat API, T-Bank AI. Russian is "native" to these — quality on Russian-language corpora is usually higher than untuned Western models.

Western cloud models

Claude (Anthropic), GPT-4 / GPT-5 (OpenAI), Gemini (Google), Grok (xAI), DeepSeek, Command (Cohere). Highest quality on most tasks, strong reasoning, large context windows, mature tool use and structured output. Plus: integration is dead simple. Minus: data leaves your perimeter, and not every model is reachable from Russia directly.

Chinese open-weight

Qwen3 (Alibaba), Kimi K2.5 (Moonshot), MiniMax M2.7, GLM 5.1 (Zhipu), DeepSeek-V3. A separate branch that has caught up with — and on some benchmarks overtaken — parts of the Western field over the last two years on price-per-token (especially code and math tasks). Available both as a cloud API and as open weights you can self-host. Russian out-of-the-box isn't strong on all of them, but Qwen3 and Kimi K2.5 hold up well.

Local open-source

Llama 3.x (Meta), Mistral / Mixtral, Phi (Microsoft), Gemma (Google) plus the Chinese open-weight models above. Deployed on your GPU server or in a dedicated cloud. Data never leaves your perimeter, zero dependence on external providers, predictable costs. They need slightly more careful tuning and a GPU with 24+ GB of memory for 7B–70B parameter models; top open-weight tier (Qwen3-235B, Kimi K2.5) requires a multi-GPU node or quantization.

Hybrid

The best option is often a router that sits between models. Routine queries go to a local or Russian cloud model; complex cases that need reasoning go to Claude or GPT. We build these routers with cost, quality, privacy, and latency all factored in.

§ 08.4 FAQ

ChatGPT is free. Why pay for an integration?

A free chatbot is a demo where you copy data in and out by hand. An integration is the model working inside your process: reading your database, writing to CRM, sending reports. The difference is the hours you don't spend on copy-paste.

Models lie and make things up.

True, and that has to be designed around. RAG with mandatory citations, response-format validation, fallback to a human operator when confidence drops, eval sets for quality control. Hallucinations don't go away — but their impact can be kept inside tolerable limits.

What about privacy? We handle customer personal data.

For personal data in Russia the natural pick is YandexGPT or GigaChat: data lives and is processed in-country, with a standard PDP agreement and 152-FZ compliance. Option two: local open-weight models (Llama, Qwen3, Kimi K2.5, Mistral, GLM 5.1) on your hardware — data never leaves the perimeter at all. Option three: Western cloud models (Claude, GPT) under an enterprise contract with no-training guarantees. We pick the right fit for your situation and compliance needs.

What does it cost to operate?

Depends on volume and chosen model. For companies up to 100 employees it's typically a few thousand to a few tens of thousands of rubles a month on tokens. At higher volumes a local model usually pays for itself in 2–4 months.

§ — Write to us

Describe
a process where a model would help.

hi@weiss.help ↗

or via Telegram · phone

First 20-minute call — free. Integration plan within 24 hours.