Hey everyone — I'm thrilled to share that I'm joining OpenAI. My mission here is to bring agents to everyone, including the people who've never touche
Good news! We're increasing rate limits across all usage tiers effective immediately. | Tier | Previous RPM | New RPM | |||| | Free | 3 | 10 | | Tier
We're launching the Realtime API beta — enabling lowlatency, speechtospeech experiences with GPT4o. Features Natural speech input and output no se
We're excited to announce GPT4o mini, our most costefficient small model. It's priced at 15 cents per million input tokens and 60 cents per million ou
We've launched Structured Outputs — a new feature that ensures modelgenerated outputs exactly match JSON Schemas you provide. How it works Set resp
The Assistants API v2 is now generally available! This release includes significant improvements based on your feedback. What's new Streaming supp
I've been working with the Assistants API's file search tool and wanted to share some performance tips after uploading ~500 documents. Tips 1. Chun
Whisper doesn't natively support speaker diarization, but here's my pipeline that combines Whisper with pyannote for speaker identification: python f
Our company network requires all outbound traffic to go through a corporate proxy. I'm struggling to configure the OpenAI Python SDK to work with it.
We're seeing a memory leak in our longrunning service that uses streaming responses. After ~1000 streaming requests, memory usage grows from 200MB to
I'm trying to finetune a model to be better at function calling for my specific use case but I'm struggling with the training data format. The docs s
I spent a week getting streaming to work properly with the Assistants API, especially with tool calls. Here's a complete implementation: python from
When building chatbots, you eventually hit the context window limit. Here are strategies I've used: 1. Sliding window Keep the last N messages. Simp
I'm building a RAG system for a startup and trying to decide between textembedding3small and textembedding3large. The cost difference is 5x. My use c
Pure vector search misses keywordheavy queries. I've implemented hybrid search vector + BM25 and the improvement is significant. Architecture 1. St
I set temperature: 0 for reproducible outputs but I'm still getting slightly different responses for the exact same prompt. Is this expected? python
I built AI Dungeon Master — a textbased RPG powered by GPT4o with persistent world state and character memory. Key features: Persistent world state
Training loss going down doesn't mean your finetuned model is actually better. I've learned this the hard way. Evaluation framework I use 1. Tasksp
As more companies ship AIgenerated content to end users, I want to discuss the ethical considerations: 1. Disclosure: Should users always know they'r
I'm confused about the difference between max_tokens and max_completion_tokens in the API. The docs mention both but I'm not sure when to use which.
We ran the numbers on finetuning GPT4omini vs using fewshot prompting with GPT4o for our classification task 10K requests/day. Option A: Fewshot GPT
Is it possible to have multiple assistants respond in the same thread? I want to build a multiagent system where: 1. Assistant A researcher searches
I've run both the Assistants API with file search and a custom RAG pipeline LangChain + Pinecone + GPT4o for the same use case: customer support over
I built FridgeChef — take a photo of your fridge contents and get recipe suggestions! Uses GPT4o vision to identify ingredients, then generates recipe
The textembedding3large model supports reducing dimensions via the dimensions parameter. I tested how this affects retrieval quality. Benchmark resu
I built openaistructured — a thin Python wrapper that makes working with Structured Outputs feel like working with Pydantic models. python from opena
Has anyone dealt with embedding drift when OpenAI updates the embedding model? I have 2M vectors stored in Pinecone generated with textembedding3large
I finetuned GPT4omini to generate SQL queries for our specific database schema and the results are impressive. Sharing my approach. Dataset 3,200 q
I'm building a ChatGPT plugin that connects to Notion and lets users query their workspace. The OAuth flow was the trickiest part. The plugin can: S
I've spent months refining system prompts for consistent outputs across GPT4o. Here's what I've learned: Do's Put format instructions at the END of
I'm trying to understand how prompt caching affects my token usage and billing. My system prompt is ~4000 tokens and I'm making thousands of calls per
When using response_format: { type: "json_object" } with deeply nested schemas, GPT4o sometimes returns invalid JSON. This doesn't happen with Structu
We're running OpenAI API calls in a production microservices architecture and need to implement key rotation. Currently we have a single API key hardc
I've been a traditional ML engineer for 6 years scikitlearn, PyTorch, classical NLP and want to transition to LLMfocused roles. The job market seems t
I'm trying to use the image edit endpoint to replace specific parts of an image using a mask, but the results are ignoring my mask area. python respo
We're deploying a GPT4o powered system for medical document summarization and hallucinations are our biggest concern. Even with retrieval augmentation
I'm using the TTS API to generate audiobookstyle narration but struggling with: 1. Long texts getting cut off seems to have a character limit 2. No n
I'm using streaming with the Chat Completions API and noticing that some chunks are being dropped, resulting in incomplete responses. This happens abo
Let's share our monthly API costs to help others estimate their budgets! I'll start: My usage indie SaaS: GPT4o: $340/month ~2M tokens/day GPT4omin
With the new Responses API, I wanted to document my migration experience from Chat Completions. Key differences 1. Input format is simplified — no
Tip for anyone running OpenAI API calls in production: always log the request ID from response headers. python response = client.chat.completions.wit
When using Structured Outputs with a recursive JSON schema e.g., a tree structure where nodes can contain child nodes, the API enters what seems like
Two years ago, "prompt engineer" was a hot job title. Now I'm seeing fewer dedicated prompt engineering roles and more "AI engineer" or "LLM engineer"
I've been building a RAG system that handles both text and images for a manufacturing client. Their documentation includes diagrams, flowcharts, and p
I just finished processing 1.2 million legal documents through the Batch API and wanted to share some lessons learned: What worked Batch API's 50%
Curious what everyone's full AI stack looks like in 2026. Here's ours: LLM: GPT4o primary, GPT4omini highvolume, o1 complex reasoning Embeddings: tex
After uploading PDFs to a vector store, the file search tool returns empty results even for queries that should clearly match the document content. S
Starting around 9am EST today, we're seeing intermittent 500 errors on gpt4o completions. About 20% of requests fail. Error response: json { "error
I'm using tiktoken to estimate costs before making API calls, but my count consistently differs from what the API reports. python import tiktoken en
We're opensourcing AgentDesk — our framework for building AI customer support agents with the OpenAI API. Features: Multiturn conversation managemen
I've been comparing DALLE 3 standard vs HD quality for product mockups. The HD option costs 2x more $0.080 vs $0.040 per image at 1024x1024. After ge
OpenAI recently added Direct Preference Optimization DPO to the finetuning API. I've been testing it for preference alignment and here are my first im
I've published 3 custom GPTs in the GPT Store. One has 50K+ conversations. Sharing what I've learned about visibility. What worked 1. Clear, keywor
Proper retry logic is essential for production OpenAI API usage. Here's my battletested implementation: python import time import random from openai
I built ReviewBot — a GitHub App that provides realtime code review comments on pull requests using GPT4o. It's been running on our team's repos for 3
I built a custom GPT with an action that queries my database, but it times out almost every time. The action calls my API which typically responds in
I'm building a chatbot using the Assistants API and some messages are disappearing from the thread after a run completes. The user message is there, b
I've been testing o1preview for researchlevel reasoning tasks and comparing it with GPT4o. The cost difference is significant $15/M input for o1 vs $2
I'm trying to use GPT4o's vision capabilities to extract structured data from photos of printed tables invoices, receipts, etc. The accuracy is decent
After migrating from gpt3.5turbo to gpt4omini, I'm seeing a massive spike in 429 rate limit errors even though my request volume hasn't changed. My s
When using function calling with GPT4o, the model sometimes returns multiple tool_calls in a single response for parallel execution. I'm struggling wi
I'm using DALLE 3 to generate marketing images and struggling to maintain consistent brand style across generations. Each image looks completely diffe
I've benchmarked the Whisper API against running whisperlargev3 locally for our podcast transcription service. Here are the results. Test setup 100
I'm finetuning GPT4omini on a customer service dataset 5000 examples and seeing performance peak at epoch 23, then degrade significantly. Training m
About 35% of my function calling requests to gpt4o return malformed JSON in the arguments field. This is causing production issues. Example of a brok
I've been running benchmarks comparing GPT4o and GPT4 Turbo on code generation tasks. Here are my findings from 500 test cases across Python, TypeScri
I've been extensively testing different chunking strategies with textembedding3large for RAG and wanted to share my findings. Strategies tested 1.
