# GPT Image 2 API Guide This guide describes how to call `gpt-image-2` through sub2api or any OpenAI-compatible gateway. Default examples use: ```text BASE_URL=https://claude.omniclaw.store/v1 API_KEY= ``` Do not use ChatGPT OAuth tokens from `.codex/auth.json` as API keys. ## Quick Summary - Direct image generation: call `POST /v1/images/generations` with `model: "gpt-image-2"`. - Image editing: call `POST /v1/images/edits` with multipart `image[]` files and an optional `mask`. - Agent/Codex workflows: keep the main model as a text/agent model such as `gpt-5.5`, then call image generation through the Responses API `image_generation` tool. - Do not use `gpt-image-2` as the Codex main model. - `gpt-image-2` normally returns base64 image data at `data[0].b64_json`. - `3840x2160` 4K output works but is high-latency and high-cost; use 180-300 second timeouts for production. ## Official Capability Summary `gpt-image-2` is an image generation and editing model with text input, image input, and image output support. Model aliases: ```text gpt-image-2 gpt-image-2-2026-04-21 ``` Supported API surfaces: ```text /v1/images/generations /v1/images/edits /v1/responses # via image_generation tool ``` Official references: - https://developers.openai.com/api/docs/models/gpt-image-2 - https://developers.openai.com/api/docs/guides/image-generation - https://developers.openai.com/api/reference/resources/images ## Authentication ```bash export BASE_URL="https://claude.omniclaw.store/v1" export API_KEY="sk-..." ``` JSON requests require: ```http Authorization: Bearer $API_KEY Content-Type: application/json ``` For multipart image edits, let `curl -F` or the SDK set `Content-Type`. ## Image Generation ### Minimal Request ```bash curl -sS "$BASE_URL/images/generations" \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-image-2", "prompt": "A compact Apple-style dashboard UI, clean white background", "size": "1024x1024", "quality": "medium", "output_format": "png", "n": 1 }' > image.json ``` Decode the response: ```bash jq -r '.data[0].b64_json' image.json | base64 --decode > image.png ``` ### 4K Request ```bash curl -sS "$BASE_URL/images/generations" \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ --max-time 300 \ -d '{ "model": "gpt-image-2", "prompt": "A modern product poster, cinematic lighting, premium realistic photography", "size": "3840x2160", "quality": "medium", "output_format": "png", "n": 1 }' > image-4k.json ``` Production recommendation: first validate prompts with `1024x1024` or `1536x1024`, then upscale the request to `3840x2160`. `4K + high` can be slow and expensive. ## Generation Parameters | Parameter | Type | Recommended value | Notes | |---|---|---|---| | `model` | string | `gpt-image-2` | Required. The snapshot `gpt-image-2-2026-04-21` is also valid. | | `prompt` | string | detailed natural language | Required. Include subject, environment, camera, style, lighting, and constraints. | | `n` | number | `1` | Number of images. Prefer single-image requests for retry and billing attribution. | | `size` | string | `1024x1024`, `1536x1024`, `3840x2160` | Flexible sizes are supported when they satisfy the model constraints. | | `quality` | string | `low`, `medium`, `high`, `auto` | Use `low` for drafts, `medium` for normal output, `high` for final assets. | | `output_format` | string | `png`, `jpeg`, `webp` | Default is usually `png`; use `jpeg` for latency-sensitive outputs. | | `output_compression` | number | `0-100` | Only applies to `jpeg` and `webp`. | | `background` | string | `auto`, `opaque` | `gpt-image-2` currently does not support `transparent`. | | `moderation` | string | `auto`, `low` | Adjusts filtering level but does not bypass safety policy. | | `stream` | boolean | `false` | Enables SSE image streaming. | | `partial_images` | number | `0-3` | Streaming only; partial images increase output token cost. | | `user` | string | end-user ID | Useful for audit and abuse monitoring. | ## Size Constraints `size` can be `auto` or a valid `widthxheight` value: - Maximum edge length is `3840px`. - Width and height must both be multiples of `16px`. - Long edge to short edge ratio must be at most `3:1`. - Total pixels must be between `655,360` and `8,294,400`. Common values: ```text 1024x1024 1536x1024 1024x1536 2048x2048 2048x1152 3840x2160 2160x3840 auto ``` Treat outputs larger than `2560x1440` as experimental high-pixel workloads with higher latency, higher cost, and higher failure probability. ## Response Shape Typical response: ```json { "created": 1770000000, "background": "opaque", "data": [ { "b64_json": "...", "revised_prompt": "..." } ], "model": "gpt-image-2", "output_format": "png", "quality": "medium", "size": "1024x1024", "usage": { "input_tokens": 43, "input_tokens_details": { "image_tokens": 0, "text_tokens": 43 }, "output_tokens": 196, "output_tokens_details": { "image_tokens": 196, "text_tokens": 0 }, "total_tokens": 239 } } ``` Production systems should store: - `model` - `size` - `quality` - `output_format` - `usage.total_tokens` - `usage.input_tokens` - `usage.output_tokens` - latency - upstream account, group, user, and key identifiers ## Image Editing ### Single-image Edit ```bash curl -sS "$BASE_URL/images/edits" \ -H "Authorization: Bearer $API_KEY" \ -F "model=gpt-image-2" \ -F "image[]=@input.png" \ -F "prompt=Replace the sofa with a minimalist white lounge chair" \ -F "size=1024x1024" \ -F "quality=medium" \ -F "output_format=png" \ > edit.json ``` ### Masked Local Edit ```bash curl -sS "$BASE_URL/images/edits" \ -H "Authorization: Bearer $API_KEY" \ -F "model=gpt-image-2" \ -F "image[]=@input.png" \ -F "mask=@mask.png" \ -F "prompt=Change only the transparent masked region into a glass button" \ -F "size=1024x1024" \ -F "quality=medium" \ > edit-mask.json ``` Mask requirements: - `image` and `mask` must have the same format and dimensions. - Files must be under 50MB. - `mask` must include an alpha channel. - Do not pass `input_fidelity` for `gpt-image-2`; the model processes image inputs at high fidelity by default. ## Responses API With `image_generation` Use this when an agent should reason about the task before generating an image. The main model should be a text/agent model, such as `gpt-5.5`. ```bash curl -sS "$BASE_URL/responses" \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-5.5", "input": "Generate a clean product poster for an AI proxy service.", "tools": [ { "type": "image_generation", "quality": "medium", "size": "1536x1024", "output_format": "png" } ] }' > response-image.json ``` Important: - `model` is the main reasoning model, not `gpt-image-2`. - The `image_generation` tool performs the image work. - sub2api may inject the image tool for official Codex clients, but application calls should pass it explicitly. ## Streaming Images The Images API supports SSE streaming: ```bash curl -N "$BASE_URL/images/generations" \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-image-2", "prompt": "A futuristic city skyline at sunrise", "stream": true, "partial_images": 2, "size": "1536x1024", "quality": "medium" }' ``` Events: ```text image_generation.partial_image image_generation.completed ``` `partial_images` can be `0-3`. Each partial image adds output token cost. ## SDK Examples ### Node.js ```ts import fs from "node:fs"; import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.API_KEY, baseURL: process.env.BASE_URL ?? "https://claude.omniclaw.store/v1", }); const result = await client.images.generate({ model: "gpt-image-2", prompt: "A premium product poster for an AI service", size: "1536x1024", quality: "medium", output_format: "png", n: 1, }); const b64 = result.data?.[0]?.b64_json; if (!b64) throw new Error("No image returned"); fs.writeFileSync("image.png", Buffer.from(b64, "base64")); ``` ### Python ```py import base64 import os from openai import OpenAI client = OpenAI( api_key=os.environ["API_KEY"], base_url=os.environ.get("BASE_URL", "https://claude.omniclaw.store/v1"), ) result = client.images.generate( model="gpt-image-2", prompt="A premium product poster for an AI service", size="1536x1024", quality="medium", output_format="png", n=1, ) b64 = result.data[0].b64_json with open("image.png", "wb") as f: f.write(base64.b64decode(b64)) ``` ## Production Dispatch - Routing: prefer plus/team/pro OpenAI OAuth accounts for image workloads. - Timeout: use 120 seconds for normal images and 300 seconds for 4K. - Retry: only retry transient network failures and 502/503/504 with low retry counts. - Concurrency: 4K output produces many image tokens; use low per-account concurrency. Standard 1024 images can use higher concurrency. - Billing: record `usage` and charge based on input and output tokens. 4K can produce far more output tokens than 1024 images. - Latency: use `jpeg` and `quality: low` for drafts or latency-sensitive previews. - Fallback: if `4K/high` fails, retry `4K/medium`; if that still fails, generate `1536x1024/medium` and upscale separately. ## Common Errors | Symptom | Likely cause | Action | |---|---|---| | `401 INVALID_API_KEY` | Key is not a sub2api key or is disabled/deleted | Generate a new key from `/keys` | | `400 invalid_request_error` | Incompatible params such as transparent background or invalid size | Check `size`, `background`, and `quality` | | `429 usage_limit_reached` | Upstream account usage window hit | Switch plus/team/pro account or wait for reset | | `502 Upstream request failed` | Upstream did not return image data, network failed, or content was refused | Inspect server logs, simplify prompt, lower quality or size | | Request takes over 2 minutes | High pixels or complex prompt | Increase timeout, use streaming, or test lower resolution first | | `/v1/models` does not show `gpt-image-2` | Codex/text model list is not the Images API capability list | Call `/v1/images/generations` directly | ## Safety Boundary Filter clearly disallowed content before sending requests, especially: - Sexualized minors or young-looking subjects - Non-consensual sexual content, coercion, or sexual violence - Explicit nudity or graphic sexual activity - Illegal, hateful, or extreme violent content For safe romantic scenes, explicitly constrain prompts with terms such as adult, non-explicit, no nudity, and fully clothed.