feat: initialize OmniClaw skills registry
This commit is contained in:
370
apis/sub2api/gpt-image-2.en.md
Normal file
370
apis/sub2api/gpt-image-2.en.md
Normal file
@@ -0,0 +1,370 @@
|
||||
# GPT Image 2 API Guide
|
||||
|
||||
This guide describes how to call `gpt-image-2` through sub2api or any OpenAI-compatible gateway.
|
||||
|
||||
Default examples use:
|
||||
|
||||
```text
|
||||
BASE_URL=https://claude.omniclaw.store/v1
|
||||
API_KEY=<sub2api API key generated from the /keys page>
|
||||
```
|
||||
|
||||
Do not use ChatGPT OAuth tokens from `.codex/auth.json` as API keys.
|
||||
|
||||
## Quick Summary
|
||||
|
||||
- Direct image generation: call `POST /v1/images/generations` with `model: "gpt-image-2"`.
|
||||
- Image editing: call `POST /v1/images/edits` with multipart `image[]` files and an optional `mask`.
|
||||
- Agent/Codex workflows: keep the main model as a text/agent model such as `gpt-5.5`, then call image generation through the Responses API `image_generation` tool.
|
||||
- Do not use `gpt-image-2` as the Codex main model.
|
||||
- `gpt-image-2` normally returns base64 image data at `data[0].b64_json`.
|
||||
- `3840x2160` 4K output works but is high-latency and high-cost; use 180-300 second timeouts for production.
|
||||
|
||||
## Official Capability Summary
|
||||
|
||||
`gpt-image-2` is an image generation and editing model with text input, image input, and image output support.
|
||||
|
||||
Model aliases:
|
||||
|
||||
```text
|
||||
gpt-image-2
|
||||
gpt-image-2-2026-04-21
|
||||
```
|
||||
|
||||
Supported API surfaces:
|
||||
|
||||
```text
|
||||
/v1/images/generations
|
||||
/v1/images/edits
|
||||
/v1/responses # via image_generation tool
|
||||
```
|
||||
|
||||
Official references:
|
||||
|
||||
- https://developers.openai.com/api/docs/models/gpt-image-2
|
||||
- https://developers.openai.com/api/docs/guides/image-generation
|
||||
- https://developers.openai.com/api/reference/resources/images
|
||||
|
||||
## Authentication
|
||||
|
||||
```bash
|
||||
export BASE_URL="https://claude.omniclaw.store/v1"
|
||||
export API_KEY="sk-..."
|
||||
```
|
||||
|
||||
JSON requests require:
|
||||
|
||||
```http
|
||||
Authorization: Bearer $API_KEY
|
||||
Content-Type: application/json
|
||||
```
|
||||
|
||||
For multipart image edits, let `curl -F` or the SDK set `Content-Type`.
|
||||
|
||||
## Image Generation
|
||||
|
||||
### Minimal Request
|
||||
|
||||
```bash
|
||||
curl -sS "$BASE_URL/images/generations" \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-image-2",
|
||||
"prompt": "A compact Apple-style dashboard UI, clean white background",
|
||||
"size": "1024x1024",
|
||||
"quality": "medium",
|
||||
"output_format": "png",
|
||||
"n": 1
|
||||
}' > image.json
|
||||
```
|
||||
|
||||
Decode the response:
|
||||
|
||||
```bash
|
||||
jq -r '.data[0].b64_json' image.json | base64 --decode > image.png
|
||||
```
|
||||
|
||||
### 4K Request
|
||||
|
||||
```bash
|
||||
curl -sS "$BASE_URL/images/generations" \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
--max-time 300 \
|
||||
-d '{
|
||||
"model": "gpt-image-2",
|
||||
"prompt": "A modern product poster, cinematic lighting, premium realistic photography",
|
||||
"size": "3840x2160",
|
||||
"quality": "medium",
|
||||
"output_format": "png",
|
||||
"n": 1
|
||||
}' > image-4k.json
|
||||
```
|
||||
|
||||
Production recommendation: first validate prompts with `1024x1024` or `1536x1024`, then upscale the request to `3840x2160`. `4K + high` can be slow and expensive.
|
||||
|
||||
## Generation Parameters
|
||||
|
||||
| Parameter | Type | Recommended value | Notes |
|
||||
|---|---|---|---|
|
||||
| `model` | string | `gpt-image-2` | Required. The snapshot `gpt-image-2-2026-04-21` is also valid. |
|
||||
| `prompt` | string | detailed natural language | Required. Include subject, environment, camera, style, lighting, and constraints. |
|
||||
| `n` | number | `1` | Number of images. Prefer single-image requests for retry and billing attribution. |
|
||||
| `size` | string | `1024x1024`, `1536x1024`, `3840x2160` | Flexible sizes are supported when they satisfy the model constraints. |
|
||||
| `quality` | string | `low`, `medium`, `high`, `auto` | Use `low` for drafts, `medium` for normal output, `high` for final assets. |
|
||||
| `output_format` | string | `png`, `jpeg`, `webp` | Default is usually `png`; use `jpeg` for latency-sensitive outputs. |
|
||||
| `output_compression` | number | `0-100` | Only applies to `jpeg` and `webp`. |
|
||||
| `background` | string | `auto`, `opaque` | `gpt-image-2` currently does not support `transparent`. |
|
||||
| `moderation` | string | `auto`, `low` | Adjusts filtering level but does not bypass safety policy. |
|
||||
| `stream` | boolean | `false` | Enables SSE image streaming. |
|
||||
| `partial_images` | number | `0-3` | Streaming only; partial images increase output token cost. |
|
||||
| `user` | string | end-user ID | Useful for audit and abuse monitoring. |
|
||||
|
||||
## Size Constraints
|
||||
|
||||
`size` can be `auto` or a valid `widthxheight` value:
|
||||
|
||||
- Maximum edge length is `3840px`.
|
||||
- Width and height must both be multiples of `16px`.
|
||||
- Long edge to short edge ratio must be at most `3:1`.
|
||||
- Total pixels must be between `655,360` and `8,294,400`.
|
||||
|
||||
Common values:
|
||||
|
||||
```text
|
||||
1024x1024
|
||||
1536x1024
|
||||
1024x1536
|
||||
2048x2048
|
||||
2048x1152
|
||||
3840x2160
|
||||
2160x3840
|
||||
auto
|
||||
```
|
||||
|
||||
Treat outputs larger than `2560x1440` as experimental high-pixel workloads with higher latency, higher cost, and higher failure probability.
|
||||
|
||||
## Response Shape
|
||||
|
||||
Typical response:
|
||||
|
||||
```json
|
||||
{
|
||||
"created": 1770000000,
|
||||
"background": "opaque",
|
||||
"data": [
|
||||
{
|
||||
"b64_json": "...",
|
||||
"revised_prompt": "..."
|
||||
}
|
||||
],
|
||||
"model": "gpt-image-2",
|
||||
"output_format": "png",
|
||||
"quality": "medium",
|
||||
"size": "1024x1024",
|
||||
"usage": {
|
||||
"input_tokens": 43,
|
||||
"input_tokens_details": {
|
||||
"image_tokens": 0,
|
||||
"text_tokens": 43
|
||||
},
|
||||
"output_tokens": 196,
|
||||
"output_tokens_details": {
|
||||
"image_tokens": 196,
|
||||
"text_tokens": 0
|
||||
},
|
||||
"total_tokens": 239
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Production systems should store:
|
||||
|
||||
- `model`
|
||||
- `size`
|
||||
- `quality`
|
||||
- `output_format`
|
||||
- `usage.total_tokens`
|
||||
- `usage.input_tokens`
|
||||
- `usage.output_tokens`
|
||||
- latency
|
||||
- upstream account, group, user, and key identifiers
|
||||
|
||||
## Image Editing
|
||||
|
||||
### Single-image Edit
|
||||
|
||||
```bash
|
||||
curl -sS "$BASE_URL/images/edits" \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-F "model=gpt-image-2" \
|
||||
-F "image[]=@input.png" \
|
||||
-F "prompt=Replace the sofa with a minimalist white lounge chair" \
|
||||
-F "size=1024x1024" \
|
||||
-F "quality=medium" \
|
||||
-F "output_format=png" \
|
||||
> edit.json
|
||||
```
|
||||
|
||||
### Masked Local Edit
|
||||
|
||||
```bash
|
||||
curl -sS "$BASE_URL/images/edits" \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-F "model=gpt-image-2" \
|
||||
-F "image[]=@input.png" \
|
||||
-F "mask=@mask.png" \
|
||||
-F "prompt=Change only the transparent masked region into a glass button" \
|
||||
-F "size=1024x1024" \
|
||||
-F "quality=medium" \
|
||||
> edit-mask.json
|
||||
```
|
||||
|
||||
Mask requirements:
|
||||
|
||||
- `image` and `mask` must have the same format and dimensions.
|
||||
- Files must be under 50MB.
|
||||
- `mask` must include an alpha channel.
|
||||
- Do not pass `input_fidelity` for `gpt-image-2`; the model processes image inputs at high fidelity by default.
|
||||
|
||||
## Responses API With `image_generation`
|
||||
|
||||
Use this when an agent should reason about the task before generating an image. The main model should be a text/agent model, such as `gpt-5.5`.
|
||||
|
||||
```bash
|
||||
curl -sS "$BASE_URL/responses" \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-5.5",
|
||||
"input": "Generate a clean product poster for an AI proxy service.",
|
||||
"tools": [
|
||||
{
|
||||
"type": "image_generation",
|
||||
"quality": "medium",
|
||||
"size": "1536x1024",
|
||||
"output_format": "png"
|
||||
}
|
||||
]
|
||||
}' > response-image.json
|
||||
```
|
||||
|
||||
Important:
|
||||
|
||||
- `model` is the main reasoning model, not `gpt-image-2`.
|
||||
- The `image_generation` tool performs the image work.
|
||||
- sub2api may inject the image tool for official Codex clients, but application calls should pass it explicitly.
|
||||
|
||||
## Streaming Images
|
||||
|
||||
The Images API supports SSE streaming:
|
||||
|
||||
```bash
|
||||
curl -N "$BASE_URL/images/generations" \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-image-2",
|
||||
"prompt": "A futuristic city skyline at sunrise",
|
||||
"stream": true,
|
||||
"partial_images": 2,
|
||||
"size": "1536x1024",
|
||||
"quality": "medium"
|
||||
}'
|
||||
```
|
||||
|
||||
Events:
|
||||
|
||||
```text
|
||||
image_generation.partial_image
|
||||
image_generation.completed
|
||||
```
|
||||
|
||||
`partial_images` can be `0-3`. Each partial image adds output token cost.
|
||||
|
||||
## SDK Examples
|
||||
|
||||
### Node.js
|
||||
|
||||
```ts
|
||||
import fs from "node:fs";
|
||||
import OpenAI from "openai";
|
||||
|
||||
const client = new OpenAI({
|
||||
apiKey: process.env.API_KEY,
|
||||
baseURL: process.env.BASE_URL ?? "https://claude.omniclaw.store/v1",
|
||||
});
|
||||
|
||||
const result = await client.images.generate({
|
||||
model: "gpt-image-2",
|
||||
prompt: "A premium product poster for an AI service",
|
||||
size: "1536x1024",
|
||||
quality: "medium",
|
||||
output_format: "png",
|
||||
n: 1,
|
||||
});
|
||||
|
||||
const b64 = result.data?.[0]?.b64_json;
|
||||
if (!b64) throw new Error("No image returned");
|
||||
fs.writeFileSync("image.png", Buffer.from(b64, "base64"));
|
||||
```
|
||||
|
||||
### Python
|
||||
|
||||
```py
|
||||
import base64
|
||||
import os
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
api_key=os.environ["API_KEY"],
|
||||
base_url=os.environ.get("BASE_URL", "https://claude.omniclaw.store/v1"),
|
||||
)
|
||||
|
||||
result = client.images.generate(
|
||||
model="gpt-image-2",
|
||||
prompt="A premium product poster for an AI service",
|
||||
size="1536x1024",
|
||||
quality="medium",
|
||||
output_format="png",
|
||||
n=1,
|
||||
)
|
||||
|
||||
b64 = result.data[0].b64_json
|
||||
with open("image.png", "wb") as f:
|
||||
f.write(base64.b64decode(b64))
|
||||
```
|
||||
|
||||
## Production Dispatch
|
||||
|
||||
- Routing: prefer plus/team/pro OpenAI OAuth accounts for image workloads.
|
||||
- Timeout: use 120 seconds for normal images and 300 seconds for 4K.
|
||||
- Retry: only retry transient network failures and 502/503/504 with low retry counts.
|
||||
- Concurrency: 4K output produces many image tokens; use low per-account concurrency. Standard 1024 images can use higher concurrency.
|
||||
- Billing: record `usage` and charge based on input and output tokens. 4K can produce far more output tokens than 1024 images.
|
||||
- Latency: use `jpeg` and `quality: low` for drafts or latency-sensitive previews.
|
||||
- Fallback: if `4K/high` fails, retry `4K/medium`; if that still fails, generate `1536x1024/medium` and upscale separately.
|
||||
|
||||
## Common Errors
|
||||
|
||||
| Symptom | Likely cause | Action |
|
||||
|---|---|---|
|
||||
| `401 INVALID_API_KEY` | Key is not a sub2api key or is disabled/deleted | Generate a new key from `/keys` |
|
||||
| `400 invalid_request_error` | Incompatible params such as transparent background or invalid size | Check `size`, `background`, and `quality` |
|
||||
| `429 usage_limit_reached` | Upstream account usage window hit | Switch plus/team/pro account or wait for reset |
|
||||
| `502 Upstream request failed` | Upstream did not return image data, network failed, or content was refused | Inspect server logs, simplify prompt, lower quality or size |
|
||||
| Request takes over 2 minutes | High pixels or complex prompt | Increase timeout, use streaming, or test lower resolution first |
|
||||
| `/v1/models` does not show `gpt-image-2` | Codex/text model list is not the Images API capability list | Call `/v1/images/generations` directly |
|
||||
|
||||
## Safety Boundary
|
||||
|
||||
Filter clearly disallowed content before sending requests, especially:
|
||||
|
||||
- Sexualized minors or young-looking subjects
|
||||
- Non-consensual sexual content, coercion, or sexual violence
|
||||
- Explicit nudity or graphic sexual activity
|
||||
- Illegal, hateful, or extreme violent content
|
||||
|
||||
For safe romantic scenes, explicitly constrain prompts with terms such as adult, non-explicit, no nudity, and fully clothed.
|
||||
|
||||
Reference in New Issue
Block a user