feat: initialize OmniClaw skills registry
This commit is contained in:
370
apis/sub2api/gpt-image-2.en.md
Normal file
370
apis/sub2api/gpt-image-2.en.md
Normal file
@@ -0,0 +1,370 @@
|
||||
# GPT Image 2 API Guide
|
||||
|
||||
This guide describes how to call `gpt-image-2` through sub2api or any OpenAI-compatible gateway.
|
||||
|
||||
Default examples use:
|
||||
|
||||
```text
|
||||
BASE_URL=https://claude.omniclaw.store/v1
|
||||
API_KEY=<sub2api API key generated from the /keys page>
|
||||
```
|
||||
|
||||
Do not use ChatGPT OAuth tokens from `.codex/auth.json` as API keys.
|
||||
|
||||
## Quick Summary
|
||||
|
||||
- Direct image generation: call `POST /v1/images/generations` with `model: "gpt-image-2"`.
|
||||
- Image editing: call `POST /v1/images/edits` with multipart `image[]` files and an optional `mask`.
|
||||
- Agent/Codex workflows: keep the main model as a text/agent model such as `gpt-5.5`, then call image generation through the Responses API `image_generation` tool.
|
||||
- Do not use `gpt-image-2` as the Codex main model.
|
||||
- `gpt-image-2` normally returns base64 image data at `data[0].b64_json`.
|
||||
- `3840x2160` 4K output works but is high-latency and high-cost; use 180-300 second timeouts for production.
|
||||
|
||||
## Official Capability Summary
|
||||
|
||||
`gpt-image-2` is an image generation and editing model with text input, image input, and image output support.
|
||||
|
||||
Model aliases:
|
||||
|
||||
```text
|
||||
gpt-image-2
|
||||
gpt-image-2-2026-04-21
|
||||
```
|
||||
|
||||
Supported API surfaces:
|
||||
|
||||
```text
|
||||
/v1/images/generations
|
||||
/v1/images/edits
|
||||
/v1/responses # via image_generation tool
|
||||
```
|
||||
|
||||
Official references:
|
||||
|
||||
- https://developers.openai.com/api/docs/models/gpt-image-2
|
||||
- https://developers.openai.com/api/docs/guides/image-generation
|
||||
- https://developers.openai.com/api/reference/resources/images
|
||||
|
||||
## Authentication
|
||||
|
||||
```bash
|
||||
export BASE_URL="https://claude.omniclaw.store/v1"
|
||||
export API_KEY="sk-..."
|
||||
```
|
||||
|
||||
JSON requests require:
|
||||
|
||||
```http
|
||||
Authorization: Bearer $API_KEY
|
||||
Content-Type: application/json
|
||||
```
|
||||
|
||||
For multipart image edits, let `curl -F` or the SDK set `Content-Type`.
|
||||
|
||||
## Image Generation
|
||||
|
||||
### Minimal Request
|
||||
|
||||
```bash
|
||||
curl -sS "$BASE_URL/images/generations" \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-image-2",
|
||||
"prompt": "A compact Apple-style dashboard UI, clean white background",
|
||||
"size": "1024x1024",
|
||||
"quality": "medium",
|
||||
"output_format": "png",
|
||||
"n": 1
|
||||
}' > image.json
|
||||
```
|
||||
|
||||
Decode the response:
|
||||
|
||||
```bash
|
||||
jq -r '.data[0].b64_json' image.json | base64 --decode > image.png
|
||||
```
|
||||
|
||||
### 4K Request
|
||||
|
||||
```bash
|
||||
curl -sS "$BASE_URL/images/generations" \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
--max-time 300 \
|
||||
-d '{
|
||||
"model": "gpt-image-2",
|
||||
"prompt": "A modern product poster, cinematic lighting, premium realistic photography",
|
||||
"size": "3840x2160",
|
||||
"quality": "medium",
|
||||
"output_format": "png",
|
||||
"n": 1
|
||||
}' > image-4k.json
|
||||
```
|
||||
|
||||
Production recommendation: first validate prompts with `1024x1024` or `1536x1024`, then upscale the request to `3840x2160`. `4K + high` can be slow and expensive.
|
||||
|
||||
## Generation Parameters
|
||||
|
||||
| Parameter | Type | Recommended value | Notes |
|
||||
|---|---|---|---|
|
||||
| `model` | string | `gpt-image-2` | Required. The snapshot `gpt-image-2-2026-04-21` is also valid. |
|
||||
| `prompt` | string | detailed natural language | Required. Include subject, environment, camera, style, lighting, and constraints. |
|
||||
| `n` | number | `1` | Number of images. Prefer single-image requests for retry and billing attribution. |
|
||||
| `size` | string | `1024x1024`, `1536x1024`, `3840x2160` | Flexible sizes are supported when they satisfy the model constraints. |
|
||||
| `quality` | string | `low`, `medium`, `high`, `auto` | Use `low` for drafts, `medium` for normal output, `high` for final assets. |
|
||||
| `output_format` | string | `png`, `jpeg`, `webp` | Default is usually `png`; use `jpeg` for latency-sensitive outputs. |
|
||||
| `output_compression` | number | `0-100` | Only applies to `jpeg` and `webp`. |
|
||||
| `background` | string | `auto`, `opaque` | `gpt-image-2` currently does not support `transparent`. |
|
||||
| `moderation` | string | `auto`, `low` | Adjusts filtering level but does not bypass safety policy. |
|
||||
| `stream` | boolean | `false` | Enables SSE image streaming. |
|
||||
| `partial_images` | number | `0-3` | Streaming only; partial images increase output token cost. |
|
||||
| `user` | string | end-user ID | Useful for audit and abuse monitoring. |
|
||||
|
||||
## Size Constraints
|
||||
|
||||
`size` can be `auto` or a valid `widthxheight` value:
|
||||
|
||||
- Maximum edge length is `3840px`.
|
||||
- Width and height must both be multiples of `16px`.
|
||||
- Long edge to short edge ratio must be at most `3:1`.
|
||||
- Total pixels must be between `655,360` and `8,294,400`.
|
||||
|
||||
Common values:
|
||||
|
||||
```text
|
||||
1024x1024
|
||||
1536x1024
|
||||
1024x1536
|
||||
2048x2048
|
||||
2048x1152
|
||||
3840x2160
|
||||
2160x3840
|
||||
auto
|
||||
```
|
||||
|
||||
Treat outputs larger than `2560x1440` as experimental high-pixel workloads with higher latency, higher cost, and higher failure probability.
|
||||
|
||||
## Response Shape
|
||||
|
||||
Typical response:
|
||||
|
||||
```json
|
||||
{
|
||||
"created": 1770000000,
|
||||
"background": "opaque",
|
||||
"data": [
|
||||
{
|
||||
"b64_json": "...",
|
||||
"revised_prompt": "..."
|
||||
}
|
||||
],
|
||||
"model": "gpt-image-2",
|
||||
"output_format": "png",
|
||||
"quality": "medium",
|
||||
"size": "1024x1024",
|
||||
"usage": {
|
||||
"input_tokens": 43,
|
||||
"input_tokens_details": {
|
||||
"image_tokens": 0,
|
||||
"text_tokens": 43
|
||||
},
|
||||
"output_tokens": 196,
|
||||
"output_tokens_details": {
|
||||
"image_tokens": 196,
|
||||
"text_tokens": 0
|
||||
},
|
||||
"total_tokens": 239
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Production systems should store:
|
||||
|
||||
- `model`
|
||||
- `size`
|
||||
- `quality`
|
||||
- `output_format`
|
||||
- `usage.total_tokens`
|
||||
- `usage.input_tokens`
|
||||
- `usage.output_tokens`
|
||||
- latency
|
||||
- upstream account, group, user, and key identifiers
|
||||
|
||||
## Image Editing
|
||||
|
||||
### Single-image Edit
|
||||
|
||||
```bash
|
||||
curl -sS "$BASE_URL/images/edits" \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-F "model=gpt-image-2" \
|
||||
-F "image[]=@input.png" \
|
||||
-F "prompt=Replace the sofa with a minimalist white lounge chair" \
|
||||
-F "size=1024x1024" \
|
||||
-F "quality=medium" \
|
||||
-F "output_format=png" \
|
||||
> edit.json
|
||||
```
|
||||
|
||||
### Masked Local Edit
|
||||
|
||||
```bash
|
||||
curl -sS "$BASE_URL/images/edits" \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-F "model=gpt-image-2" \
|
||||
-F "image[]=@input.png" \
|
||||
-F "mask=@mask.png" \
|
||||
-F "prompt=Change only the transparent masked region into a glass button" \
|
||||
-F "size=1024x1024" \
|
||||
-F "quality=medium" \
|
||||
> edit-mask.json
|
||||
```
|
||||
|
||||
Mask requirements:
|
||||
|
||||
- `image` and `mask` must have the same format and dimensions.
|
||||
- Files must be under 50MB.
|
||||
- `mask` must include an alpha channel.
|
||||
- Do not pass `input_fidelity` for `gpt-image-2`; the model processes image inputs at high fidelity by default.
|
||||
|
||||
## Responses API With `image_generation`
|
||||
|
||||
Use this when an agent should reason about the task before generating an image. The main model should be a text/agent model, such as `gpt-5.5`.
|
||||
|
||||
```bash
|
||||
curl -sS "$BASE_URL/responses" \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-5.5",
|
||||
"input": "Generate a clean product poster for an AI proxy service.",
|
||||
"tools": [
|
||||
{
|
||||
"type": "image_generation",
|
||||
"quality": "medium",
|
||||
"size": "1536x1024",
|
||||
"output_format": "png"
|
||||
}
|
||||
]
|
||||
}' > response-image.json
|
||||
```
|
||||
|
||||
Important:
|
||||
|
||||
- `model` is the main reasoning model, not `gpt-image-2`.
|
||||
- The `image_generation` tool performs the image work.
|
||||
- sub2api may inject the image tool for official Codex clients, but application calls should pass it explicitly.
|
||||
|
||||
## Streaming Images
|
||||
|
||||
The Images API supports SSE streaming:
|
||||
|
||||
```bash
|
||||
curl -N "$BASE_URL/images/generations" \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-image-2",
|
||||
"prompt": "A futuristic city skyline at sunrise",
|
||||
"stream": true,
|
||||
"partial_images": 2,
|
||||
"size": "1536x1024",
|
||||
"quality": "medium"
|
||||
}'
|
||||
```
|
||||
|
||||
Events:
|
||||
|
||||
```text
|
||||
image_generation.partial_image
|
||||
image_generation.completed
|
||||
```
|
||||
|
||||
`partial_images` can be `0-3`. Each partial image adds output token cost.
|
||||
|
||||
## SDK Examples
|
||||
|
||||
### Node.js
|
||||
|
||||
```ts
|
||||
import fs from "node:fs";
|
||||
import OpenAI from "openai";
|
||||
|
||||
const client = new OpenAI({
|
||||
apiKey: process.env.API_KEY,
|
||||
baseURL: process.env.BASE_URL ?? "https://claude.omniclaw.store/v1",
|
||||
});
|
||||
|
||||
const result = await client.images.generate({
|
||||
model: "gpt-image-2",
|
||||
prompt: "A premium product poster for an AI service",
|
||||
size: "1536x1024",
|
||||
quality: "medium",
|
||||
output_format: "png",
|
||||
n: 1,
|
||||
});
|
||||
|
||||
const b64 = result.data?.[0]?.b64_json;
|
||||
if (!b64) throw new Error("No image returned");
|
||||
fs.writeFileSync("image.png", Buffer.from(b64, "base64"));
|
||||
```
|
||||
|
||||
### Python
|
||||
|
||||
```py
|
||||
import base64
|
||||
import os
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
api_key=os.environ["API_KEY"],
|
||||
base_url=os.environ.get("BASE_URL", "https://claude.omniclaw.store/v1"),
|
||||
)
|
||||
|
||||
result = client.images.generate(
|
||||
model="gpt-image-2",
|
||||
prompt="A premium product poster for an AI service",
|
||||
size="1536x1024",
|
||||
quality="medium",
|
||||
output_format="png",
|
||||
n=1,
|
||||
)
|
||||
|
||||
b64 = result.data[0].b64_json
|
||||
with open("image.png", "wb") as f:
|
||||
f.write(base64.b64decode(b64))
|
||||
```
|
||||
|
||||
## Production Dispatch
|
||||
|
||||
- Routing: prefer plus/team/pro OpenAI OAuth accounts for image workloads.
|
||||
- Timeout: use 120 seconds for normal images and 300 seconds for 4K.
|
||||
- Retry: only retry transient network failures and 502/503/504 with low retry counts.
|
||||
- Concurrency: 4K output produces many image tokens; use low per-account concurrency. Standard 1024 images can use higher concurrency.
|
||||
- Billing: record `usage` and charge based on input and output tokens. 4K can produce far more output tokens than 1024 images.
|
||||
- Latency: use `jpeg` and `quality: low` for drafts or latency-sensitive previews.
|
||||
- Fallback: if `4K/high` fails, retry `4K/medium`; if that still fails, generate `1536x1024/medium` and upscale separately.
|
||||
|
||||
## Common Errors
|
||||
|
||||
| Symptom | Likely cause | Action |
|
||||
|---|---|---|
|
||||
| `401 INVALID_API_KEY` | Key is not a sub2api key or is disabled/deleted | Generate a new key from `/keys` |
|
||||
| `400 invalid_request_error` | Incompatible params such as transparent background or invalid size | Check `size`, `background`, and `quality` |
|
||||
| `429 usage_limit_reached` | Upstream account usage window hit | Switch plus/team/pro account or wait for reset |
|
||||
| `502 Upstream request failed` | Upstream did not return image data, network failed, or content was refused | Inspect server logs, simplify prompt, lower quality or size |
|
||||
| Request takes over 2 minutes | High pixels or complex prompt | Increase timeout, use streaming, or test lower resolution first |
|
||||
| `/v1/models` does not show `gpt-image-2` | Codex/text model list is not the Images API capability list | Call `/v1/images/generations` directly |
|
||||
|
||||
## Safety Boundary
|
||||
|
||||
Filter clearly disallowed content before sending requests, especially:
|
||||
|
||||
- Sexualized minors or young-looking subjects
|
||||
- Non-consensual sexual content, coercion, or sexual violence
|
||||
- Explicit nudity or graphic sexual activity
|
||||
- Illegal, hateful, or extreme violent content
|
||||
|
||||
For safe romantic scenes, explicitly constrain prompts with terms such as adult, non-explicit, no nudity, and fully clothed.
|
||||
|
||||
365
apis/sub2api/gpt-image-2.zh.md
Normal file
365
apis/sub2api/gpt-image-2.zh.md
Normal file
@@ -0,0 +1,365 @@
|
||||
# GPT Image 2 API 调用文档
|
||||
|
||||
本文档面向 sub2api/OpenAI-compatible 网关调用 `gpt-image-2`。示例默认使用:
|
||||
|
||||
```text
|
||||
BASE_URL=https://claude.omniclaw.store/v1
|
||||
API_KEY=<从 /keys 页面生成的 sub2api key>
|
||||
```
|
||||
|
||||
不要把 `.codex/auth.json` 里的 ChatGPT OAuth token 当 API key 使用。
|
||||
|
||||
## 快速结论
|
||||
|
||||
- 直接生成图片:使用 `POST /v1/images/generations`,`model` 传 `gpt-image-2`。
|
||||
- 编辑图片:使用 `POST /v1/images/edits`,multipart 上传 `image[]`、可选 `mask`。
|
||||
- Agent/Codex 场景:主模型仍用 `gpt-5.5`,通过 Responses API 的 `image_generation` tool 调图像能力;不要把 Codex 主模型设成 `gpt-image-2`。
|
||||
- `gpt-image-2` 返回 base64 图片数据,通常是 `data[0].b64_json`。
|
||||
- `3840x2160` 4K 可用,但属于高像素、长耗时场景;生产调用应设置 180-300 秒超时。
|
||||
|
||||
## 官方能力摘要
|
||||
|
||||
`gpt-image-2` 是图片生成和编辑模型,支持文本输入、图片输入、图片输出。模型别名和快照:
|
||||
|
||||
```text
|
||||
gpt-image-2
|
||||
gpt-image-2-2026-04-21
|
||||
```
|
||||
|
||||
支持端点:
|
||||
|
||||
```text
|
||||
/v1/images/generations
|
||||
/v1/images/edits
|
||||
/v1/responses # 通过 image_generation tool
|
||||
```
|
||||
|
||||
官方参考:
|
||||
|
||||
- https://developers.openai.com/api/docs/models/gpt-image-2
|
||||
- https://developers.openai.com/api/docs/guides/image-generation
|
||||
- https://developers.openai.com/api/reference/resources/images
|
||||
|
||||
## 认证
|
||||
|
||||
```bash
|
||||
export BASE_URL="https://claude.omniclaw.store/v1"
|
||||
export API_KEY="sk-..."
|
||||
```
|
||||
|
||||
所有 JSON 请求带:
|
||||
|
||||
```http
|
||||
Authorization: Bearer $API_KEY
|
||||
Content-Type: application/json
|
||||
```
|
||||
|
||||
multipart 编辑接口由 `curl -F` 或 SDK 自动设置 `Content-Type`。
|
||||
|
||||
## 生成图片
|
||||
|
||||
### 最小请求
|
||||
|
||||
```bash
|
||||
curl -sS "$BASE_URL/images/generations" \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-image-2",
|
||||
"prompt": "A compact Apple-style dashboard UI, clean white background",
|
||||
"size": "1024x1024",
|
||||
"quality": "medium",
|
||||
"output_format": "png",
|
||||
"n": 1
|
||||
}' > image.json
|
||||
```
|
||||
|
||||
解码:
|
||||
|
||||
```bash
|
||||
jq -r '.data[0].b64_json' image.json | base64 --decode > image.png
|
||||
```
|
||||
|
||||
### 4K 请求
|
||||
|
||||
```bash
|
||||
curl -sS "$BASE_URL/images/generations" \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
--max-time 300 \
|
||||
-d '{
|
||||
"model": "gpt-image-2",
|
||||
"prompt": "A modern product poster, cinematic lighting, premium realistic photography",
|
||||
"size": "3840x2160",
|
||||
"quality": "medium",
|
||||
"output_format": "png",
|
||||
"n": 1
|
||||
}' > image-4k.json
|
||||
```
|
||||
|
||||
生产建议:4K + `high` 很慢且成本高。先用 `1024x1024` 或 `1536x1024` 验证提示词,再升到 `3840x2160`。
|
||||
|
||||
## 生成参数
|
||||
|
||||
| 参数 | 类型 | 建议值 | 说明 |
|
||||
|---|---|---|---|
|
||||
| `model` | string | `gpt-image-2` | 必填。也可用快照 `gpt-image-2-2026-04-21`。 |
|
||||
| `prompt` | string | 详细自然语言 | 必填。写清主体、环境、镜头、风格、限制。 |
|
||||
| `n` | number | `1` | 生成数量。生产建议单张并发调度,便于重试和计费。 |
|
||||
| `size` | string | `1024x1024`、`1536x1024`、`3840x2160` | `gpt-image-2` 支持灵活尺寸,见下方尺寸约束。 |
|
||||
| `quality` | string | `low`、`medium`、`high`、`auto` | 草稿用 `low`,常规用 `medium`,最终图用 `high`。 |
|
||||
| `output_format` | string | `png`、`jpeg`、`webp` | 默认 `png`。延迟敏感优先 `jpeg`。 |
|
||||
| `output_compression` | number | `0-100` | 仅 `jpeg`/`webp` 有意义。 |
|
||||
| `background` | string | `auto`、`opaque` | `gpt-image-2` 当前不支持 `transparent`。 |
|
||||
| `moderation` | string | `auto`、`low` | 控制图像生成过滤强度;仍需遵守内容政策。 |
|
||||
| `stream` | boolean | `false` | 开启 SSE 流式图片事件。 |
|
||||
| `partial_images` | number | `0-3` | 流式时返回部分图片;会增加输出 token 成本。 |
|
||||
| `user` | string | 用户 ID | 终端用户标识,便于审计和滥用监控。 |
|
||||
|
||||
## 尺寸约束
|
||||
|
||||
`gpt-image-2` 的 `size` 可以是 `auto`,也可以是满足约束的 `宽x高`:
|
||||
|
||||
- 最大边不超过 `3840px`
|
||||
- 宽和高都必须是 `16px` 的倍数
|
||||
- 长边/短边比例不超过 `3:1`
|
||||
- 总像素在 `655,360` 到 `8,294,400` 之间
|
||||
|
||||
常用尺寸:
|
||||
|
||||
```text
|
||||
1024x1024 # 方图,通常最快
|
||||
1536x1024 # 横图
|
||||
1024x1536 # 竖图
|
||||
2048x2048 # 2K 方图
|
||||
2048x1152 # 2K 横图
|
||||
3840x2160 # 4K 横图
|
||||
2160x3840 # 4K 竖图
|
||||
auto
|
||||
```
|
||||
|
||||
超过 `2560x1440` 的输出通常应按实验性高像素场景处理:高延迟、高成本、失败概率更高。
|
||||
|
||||
## 返回结构
|
||||
|
||||
典型响应:
|
||||
|
||||
```json
|
||||
{
|
||||
"created": 1770000000,
|
||||
"background": "opaque",
|
||||
"data": [
|
||||
{
|
||||
"b64_json": "...",
|
||||
"revised_prompt": "..."
|
||||
}
|
||||
],
|
||||
"model": "gpt-image-2",
|
||||
"output_format": "png",
|
||||
"quality": "medium",
|
||||
"size": "1024x1024",
|
||||
"usage": {
|
||||
"input_tokens": 43,
|
||||
"input_tokens_details": {
|
||||
"image_tokens": 0,
|
||||
"text_tokens": 43
|
||||
},
|
||||
"output_tokens": 196,
|
||||
"output_tokens_details": {
|
||||
"image_tokens": 196,
|
||||
"text_tokens": 0
|
||||
},
|
||||
"total_tokens": 239
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
业务侧应持久化:
|
||||
|
||||
- `model`
|
||||
- `size`
|
||||
- `quality`
|
||||
- `output_format`
|
||||
- `usage.total_tokens`
|
||||
- `usage.input_tokens`
|
||||
- `usage.output_tokens`
|
||||
- 请求耗时
|
||||
- 上游账号/分组/用户/key
|
||||
|
||||
## 编辑图片
|
||||
|
||||
### 单图编辑
|
||||
|
||||
```bash
|
||||
curl -sS "$BASE_URL/images/edits" \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-F "model=gpt-image-2" \
|
||||
-F "image[]=@input.png" \
|
||||
-F "prompt=Replace the sofa with a minimalist white lounge chair" \
|
||||
-F "size=1024x1024" \
|
||||
-F "quality=medium" \
|
||||
-F "output_format=png" \
|
||||
> edit.json
|
||||
```
|
||||
|
||||
### 局部遮罩编辑
|
||||
|
||||
```bash
|
||||
curl -sS "$BASE_URL/images/edits" \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-F "model=gpt-image-2" \
|
||||
-F "image[]=@input.png" \
|
||||
-F "mask=@mask.png" \
|
||||
-F "prompt=Change only the transparent masked region into a glass button" \
|
||||
-F "size=1024x1024" \
|
||||
-F "quality=medium" \
|
||||
> edit-mask.json
|
||||
```
|
||||
|
||||
遮罩要求:
|
||||
|
||||
- `image` 和 `mask` 必须同格式、同尺寸
|
||||
- 文件小于 50MB
|
||||
- `mask` 必须包含 alpha 通道
|
||||
- `gpt-image-2` 不要传 `input_fidelity`;它自动按高保真处理输入图
|
||||
|
||||
## Responses API 调用 image_generation tool
|
||||
|
||||
用于多轮 Agent、让模型先理解需求再调用图片工具。主模型使用文本/Agent 模型,例如 `gpt-5.5`。
|
||||
|
||||
```bash
|
||||
curl -sS "$BASE_URL/responses" \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-5.5",
|
||||
"input": "Generate a clean product poster for an AI proxy service.",
|
||||
"tools": [
|
||||
{
|
||||
"type": "image_generation",
|
||||
"quality": "medium",
|
||||
"size": "1536x1024",
|
||||
"output_format": "png"
|
||||
}
|
||||
]
|
||||
}' > response-image.json
|
||||
```
|
||||
|
||||
注意:
|
||||
|
||||
- `model` 是主推理模型,不是 `gpt-image-2`
|
||||
- `image_generation` 工具负责图片生成
|
||||
- sub2api 对 Codex 官方客户端请求会注入 `image_generation` 工具提示,但业务调用仍建议显式传 tool
|
||||
|
||||
## 流式图片
|
||||
|
||||
Image API 支持流式生成:
|
||||
|
||||
```bash
|
||||
curl -N "$BASE_URL/images/generations" \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-image-2",
|
||||
"prompt": "A futuristic city skyline at sunrise",
|
||||
"stream": true,
|
||||
"partial_images": 2,
|
||||
"size": "1536x1024",
|
||||
"quality": "medium"
|
||||
}'
|
||||
```
|
||||
|
||||
事件类型:
|
||||
|
||||
```text
|
||||
image_generation.partial_image
|
||||
image_generation.completed
|
||||
```
|
||||
|
||||
`partial_images` 可设 `0-3`。每张 partial image 会额外产生输出 token 成本。
|
||||
|
||||
## SDK 示例
|
||||
|
||||
### Node.js
|
||||
|
||||
```ts
|
||||
import fs from "node:fs";
|
||||
import OpenAI from "openai";
|
||||
|
||||
const client = new OpenAI({
|
||||
apiKey: process.env.API_KEY,
|
||||
baseURL: process.env.BASE_URL ?? "https://claude.omniclaw.store/v1",
|
||||
});
|
||||
|
||||
const result = await client.images.generate({
|
||||
model: "gpt-image-2",
|
||||
prompt: "A premium product poster for an AI service",
|
||||
size: "1536x1024",
|
||||
quality: "medium",
|
||||
output_format: "png",
|
||||
n: 1,
|
||||
});
|
||||
|
||||
const b64 = result.data?.[0]?.b64_json;
|
||||
if (!b64) throw new Error("No image returned");
|
||||
fs.writeFileSync("image.png", Buffer.from(b64, "base64"));
|
||||
```
|
||||
|
||||
### Python
|
||||
|
||||
```py
|
||||
import base64
|
||||
import os
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
api_key=os.environ["API_KEY"],
|
||||
base_url=os.environ.get("BASE_URL", "https://claude.omniclaw.store/v1"),
|
||||
)
|
||||
|
||||
result = client.images.generate(
|
||||
model="gpt-image-2",
|
||||
prompt="A premium product poster for an AI service",
|
||||
size="1536x1024",
|
||||
quality="medium",
|
||||
output_format="png",
|
||||
n=1,
|
||||
)
|
||||
|
||||
b64 = result.data[0].b64_json
|
||||
with open("image.png", "wb") as f:
|
||||
f.write(base64.b64decode(b64))
|
||||
```
|
||||
|
||||
## 生产调度建议
|
||||
|
||||
- 路由:图片生成优先使用 plus/team/pro OpenAI OAuth 账号,避免 free 账号能力不足或限流。
|
||||
- 超时:普通图设置 120 秒,4K 设置 300 秒。
|
||||
- 重试:只对网络错误、502/503/504 做有限重试;不要对内容政策拒绝无限重试。
|
||||
- 并发:4K 请求输出 token 高,建议单账号小并发;普通 1024 图可更高并发。
|
||||
- 成本:记录 `usage` 并按 `input_tokens + output_tokens` 计费;4K 输出 token 可能远高于 1024。
|
||||
- 延迟:延迟敏感优先 `jpeg`,草稿用 `quality: low`。
|
||||
- 失败降级:4K/high 失败时降为 4K/medium;仍失败则 1536x1024/medium 先出图,再走放大流程。
|
||||
|
||||
## 常见错误
|
||||
|
||||
| 现象 | 可能原因 | 处理 |
|
||||
|---|---|---|
|
||||
| `401 INVALID_API_KEY` | key 不是 sub2api key,或已删除/停用 | 从 `/keys` 重新生成 key |
|
||||
| `400 invalid_request_error` | 参数不兼容,例如透明背景、尺寸不合法 | 检查 `size`、`background`、`quality` |
|
||||
| `429 usage_limit_reached` | 命中 OpenAI 账号用量窗口 | 切换 plus/team/pro 账号或等待恢复 |
|
||||
| `502 Upstream request failed` | 上游没返回图片、网络断开、内容被拒绝文本化 | 看服务端日志;必要时改提示词/降质量/改尺寸 |
|
||||
| 超过 2 分钟 | 高像素或复杂提示词 | 设置更长超时,使用流式或先低分辨率验证 |
|
||||
| `/v1/models` 不显示 `gpt-image-2` | Codex 主模型列表不等于图片接口能力列表 | 直接调用 `/v1/images/generations` |
|
||||
|
||||
## 安全边界
|
||||
|
||||
业务侧应在请求前过滤明显违规内容,尤其是:
|
||||
|
||||
- 未成年人或年轻化人物的性化内容
|
||||
- 非自愿、胁迫、性暴力场景
|
||||
- 明确裸露或露骨性行为
|
||||
- 违法、仇恨、极端暴力内容
|
||||
|
||||
建议提示词显式写清“成年人、非露骨、无裸露、完全穿着”等约束,降低被上游拒绝或返回非图片文本的概率。
|
||||
|
||||
Reference in New Issue
Block a user