If youâre building an AI agent or analyzing data in Python, your JSON needs go a bit deeper than just saving files. You’re probably parsing API responses, sending requests with payloads, handling nested JSON, maybe even streaming or validating data.
Look at this AI/data analyst JSON cheat sheet (Python edition)
Load API response into Python dict
pythonimport json
import requests
response = requests.get("https://api.example.com/data")
data = response.json() # already parsed to Python dict
If you get raw JSON text instead:
pythondata = json.loads(response.text)
Extract nested values (basic & smart ways)
Basic:
pythonvalue = data["person"]["location"]["city"]
Safe version:
pythonvalue = data.get("person", {}).get("location", {}).get("city")
With jsonpath-ng
(query-style access):
bashpip install jsonpath-ng
pythonfrom jsonpath_ng import parse
expr = parse("$.person.location.city")
matches = [match.value for match in expr.find(data)]
Handle dynamic or unknown JSON structure
pythondef walk_json(data):
if isinstance(data, dict):
for key, value in data.items():
walk_json(value)
elif isinstance(data, list):
for item in data:
walk_json(item)
else:
print(data)
Send JSON in an API request (POST)
pythonpayload = {
"input": "Write a poem about tacos",
"model": "gpt-3.5-turbo",
"temperature": 0.7
}
response = requests.post("https://api.openai.com/v1/chat/completions",
headers={"Authorization": "Bearer YOUR_KEY"},
json=payload)
Stream JSON responses (large LLM output or logs)
Example using response.iter_lines()
:
pythonwith requests.post(url, headers=headers, json=payload, stream=True) as r:
for line in r.iter_lines():
if line:
json_line = json.loads(line.decode("utf-8"))
print(json_line)
Flatten nested JSON (for DataFrame)
pythonfrom pandas import json_normalize
# Nested JSON example:
data = {
"user": {
"id": 1,
"name": "Beyoncé",
"location": {"city": "Houston", "state": "TX"}
}
}
flat = json_normalize(data)
Works with lists of records too.
Convert DataFrame â JSON (for saving or API use)
pythonimport pandas as pd
df = pd.DataFrame([
{"name": "Beyoncé", "age": 42},
{"name": "Rihanna", "age": 36}
])
# Convert to JSON string
json_string = df.to_json(orient="records", indent=2)
Validate if a string is valid JSON
pythondef is_json(string):
try:
json.loads(string)
return True
except ValueError:
return False
Save model outputs (LLM, predictions, etc.) to JSON
pythonoutputs = [{"prompt": "hi", "response": "hey there"}]
with open("outputs.json", "w") as f:
json.dump(outputs, f, indent=2)
Load logs/configs for your agent
pythonwith open("config.json") as f:
config = json.load(f)
agent_name = config.get("agent_name", "DefaultAgent")
Handle Decimal or datetime
pythonfrom decimal import Decimal
from datetime import datetime
data = {
"score": Decimal("92.4"),
"timestamp": datetime.now()
}
def custom(obj):
if isinstance(obj, Decimal):
return float(obj)
if isinstance(obj, datetime):
return obj.isoformat()
raise TypeError("Unhandled type")
json.dumps(data, default=custom)
Parse JSON from a file full of JSON objects (LLM logs)
pythonwith open("logs.jsonl") as f:
for line in f:
data = json.loads(line)
print(data["output"])
Advanced query/validation (with jsonschema
)
bashpip install jsonschema
pythonfrom jsonschema import validate, ValidationError
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "number"}
},
"required": ["name"]
}
data = {"name": "Beyoncé", "age": 42}
try:
validate(data, schema)
except ValidationError as e:
print("Invalid:", e)
First, quick mindset shift: when youâre working with JSON as an AI agent builder, youâre not just reading and writing files. Youâre…
- Talking to APIs (input/output = JSON)
- Logging predictions
- Saving user settings
- Parsing chat history
- Streaming data
- Building dynamic prompts or chains
- Extracting or validating structured responses
Let’s unpack that real quick, uh?
1. Talking to APIs (input/output = JSON)
Whenever your agent hits the OpenAI API (or any AI/ML API), youâre sending and receiving JSON.
Example:
json{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Summarize this..."}]
}
Youâll either handcraft these payloads or auto-build them from user inputs.
2. Logging predictions
Anytime your agent gives a response, youâll want to save what was input, what came out, what the timestamp was, maybe confidence scores, and which model/version was used.
Why?
- Reproducibility
- Error fixing
- Comparing responses from different agents
You log this in JSON, often one object per run.
3. Saving user settings
Youâll start letting users adjust things like:
- Preferred language
- Response length
- Formal vs casual tone
- âAlways summarize, never explainâ
Instead of using a database at first, youâll just use a .json
file per user.
Simple. Works offline. Easy to edit.
4. Parsing chat history
Letâs say your agent needs context from earlier in the convoâlike tools, names, or a goal you set earlier.
Youâll need to track the chat, usually as a list of JSON message objects.
json[
{"role": "user", "content": "Call me Elle."},
{"role": "assistant", "content": "Hi Elle!"}
]
Then you feed this back into GPT for memory/continuity.
5. Streaming data
Sometimes you want to stream a response token-by-token, especially for long outputs.
JSON comes in when:
- You process partial JSON chunks
- You make sure streaming output is still parsable (openAI gives partial JSON sometimes)
- You stitch these chunks into a full object on your end
6. Building dynamic prompts or chains
Letâs say your agent has 3 steps: extract question âĄïž fetch context âĄïž answer it.
Youâll store âprompt templatesâ in JSON and dynamically fill in blanks.
This way you donât hardcode text into your Python.
json{
"step1": "Identify key question in: {{text}}",
"step2": "Find relevant data for: {{question}}",
"step3": "Answer based on context: {{context}}"
}
7. Extracting or validating structured responses
Letâs say GPT gives back:
json{"action": "summarize", "target": "last paragraph", "language": "en"}
Youâll validate:
- Did it return real JSON?
- Does it have the keys you expected?
- Is âactionâ one of your allowed operations?
This prevents your code from crashing when GPT gets weird.
Wanna squeeze more out of JSON in Pythonâstarting with simplified examples, then ramp up into a focused mini-lab kit for OpenAI agents?
Great!
A. Store & Read Chat Memory (Agent context history)
pythonchat_history = [
{"role": "user", "content": "Summarize this article"},
{"role": "assistant", "content": "Sure. Hereâs a summary..."}
]
with open("chat_log.json", "w") as f:
json.dump(chat_history, f, indent=2)
# Later, reload it
with open("chat_log.json") as f:
history = json.load(f)
Use: Your agent can pick up where it left off.
B. Merge JSON into prompts dynamically
pythonuser_info = {
"name": "Tate McRae",
"style": "casual",
"topic": "how JSON works"
}
prompt = f"Write a blog post for {user_info['name']} in a {user_info['style']} voice about {user_info['topic']}."
Use: Compose smarter, personalized prompts using structured data.
C. Extract exact info from LLM JSON output
Letâs say your model returns a structured response like:
json{"summary": "AI helps with surgery", "confidence": 0.93}
pythonresponse = json.loads(model_output)
print(response["summary"])
Use: Donât just print the raw responseâgrab the piece you need.
D. Keep multiple logs organized by agent run
pythonlog = {"run_id": "a1b2c3", "inputs": "Find trends", "outputs": "Here are 3 trends..."}
filename = f"runs/{log['run_id']}.json"
with open(filename, "w") as f:
json.dump(log, f, indent=2)
Use: One JSON file per run. Clean. Searchable. Reproducible.
E. Use JSON to define test cases for your agent
pythontest_cases = [
{"input": "What's 2+2?", "expected": "4"},
{"input": "Who is Carlie Hanson?", "expected": "A pop singer"}
]
for case in test_cases:
output = your_agent(case["input"])
if output != case["expected"]:
print(f"FAIL: {case['input']}")
Use: Turn test prompts into JSON, keep test logic separate from code.
F. Batch model inputs using JSONL
json{"input": "Summarize this..."}
{"input": "Whatâs next in AI?"}
{"input": "Explain JSON in plain English"}
Each line = 1 JSON object.
pythonwith open("batch.jsonl") as f:
for line in f:
task = json.loads(line)
print("Processing:", task["input"])
Use: Scalable bulk processing. Streamlined agent input.
G. Use JSON to inject tools or agent settings
pythontools = {
"tools": ["summarizer", "translator", "sentiment_analyzer"],
"defaults": {
"language": "en",
"max_tokens": 1000
}
}
agent_config = json.dumps(tools, indent=2)
Use: Your agent reads this as part of its setup or self-instruction.
H. Trick: JSON within JSON for nested agents
You can save the actual full prompt or response history as a string inside a JSON:
pythontask = {
"name": "long_summary",
"raw_prompt": json.dumps(chat_history),
"result": "Itâs a summary of five pages..."
}
Use: When nesting structured agent flows.
đ Part 2: Stuff no one warns you about but youâll 100% be doing
Pre-validating the prompt as a JSON-safe string
Before you stick something into a JSON payload, youâll:
- Escape quotes
- Strip bad characters
- Truncate long strings
- Catch encoding bugs
Otherwise, OpenAI rejects your request or gives bad output.
Auto-cleaning malformed JSON from LLM output
Sometimes GPT says itâll return valid JSON… then doesnât.
Youâll need little auto-fixers like:
- Add missing closing brackets
- Replace single quotes with double quotes
- Strip out rogue
\n\n
before{
This is the dirty job that saves your app.
Building JSON from scratch, line-by-line
When debugging or prototyping, youâll literally build JSON yourself:
pythonCopyEditout = {"step": 1}
if user_input:
out["user_said"] = user_input
if summary:
out["summary"] = summary
Way safer than dumping a whole LLM response into one json.loads()
and praying.
Creating your own mini-specs
Eventually, youâll invent your own rules for agent outputs:
jsonCopyEdit{
"action": "summarize",
"priority": "high",
"language": "en",
"style": "playful"
}
Youâll document this in a comment or README so other devs know how to format things for your agent. Itâs like API design but micro.
Compressing big JSONs to keep under token limits
Youâll often be like, âDamn, this whole chain + memory + context is too long.â
So youâll:
- Remove unnecessary keys
- Truncate text fields
- Convert verbose JSON to compact mode
Useful when passing context into prompts.
Creating fallback values for missing JSON fields
Agent asks âWhatâs their preferred tone?â
GPT forgot to include it in the JSON? Youâll use:
pythonCopyEdittone = parsed.get("tone", "neutral")
This is one of those boring-but-vital things that keeps agents from breaking.
Parsing GPT output that has JSON inside a string
Sometimes GPT returns this:
jsonCopyEdit{"result": "{\"topic\": \"race cars\", \"angle\": \"for beginners\"}"}
So youâll do:
pythonCopyEditouter = json.loads(raw)
inner = json.loads(outer["result"])
That double-layer is sneakily common with nested agents.
Storing tiny caches or lookups
Donât want to call GPT for the same thing twice?
Store previous results in a quick JSON cache:
jsonCopyEdit{"summarize:carbon emissions": "Carbon emissions are..."}
Saves cost. Saves time. Keeps your agent sharp.
Serializing non-serializable stuff
Youâll eventually want to save objects like datetime
, Path
, etc.
Pythonâs json
doesnât like that.
Youâll write helpers like:
pythonCopyEditdef fix(obj):
if isinstance(obj, datetime):
return obj.isoformat()
return str(obj)
And call: json.dump(obj, default=fix)
Auto-generating JSON schemas to validate GPT output
Once you know the shape of your responses, use tools like pydantic
or jsonschema
to validate them.
This protects your code like a bouncer at the door.
GPT canât get in unless it fits the rules.
Injecting tiny functions into JSON chains
Yes, GPT can return JSON with an âactionâ key… but then youâll trigger Python functions based on it.
pythonCopyEditif response["action"] == "summarize":
return summarize(response["text"])
You just turned a JSON blob into agent-level behavior.