Building AI Agents

Build AI: Zero-Ambiguity Model Context Protocol for AI Agents

Let’s say your client asked for a spending accountability agent and you had to come up with an MCP for AI Agents based on your client’s pain.

What goes in the protocol:

  • Persistent Memory Schema: Custom JSON structure for what should always be remembered (e.g. categories, user-preferred labels like ā€œfun moneyā€ instead of ā€œentertainmentā€).
  • Temporal Awareness Layer:
    • Know the current month, quarter, year.
    • Be able to compare ā€œApril 2025 vs March 2025ā€ and ā€œApril 2025 vs April 2024ā€.
  • Event Threading: Link spending events to real-world context (e.g. ā€œbought flowers → friend’s birthdayā€ or ā€œKES26500 on skincare → Sephora haulā€).
  • Change Detection Rules: AI flags anomalies: ā€œThis is 3x your usual coffee spend.ā€ Not guesses—real, percent-based alerts.
  • Response Memory Rules: Let the agent remember user reactions: ā€œYou told me to chill last time I flagged skincare—so I will, unless it’s double.ā€

Tech Stack

šŸ’¾ Data Layer

  • PostgreSQL for structured, transactional data (spend logs, metadata, user preferences).
  • TimescaleDB extension if you want time-series smarts built-in for auto-rollups & fast comparisons.

šŸ“² API Ingestion

  • Plaid or Stripe for banking transactions.
  • Custom webhook endpoints to capture spend from manual inputs, emails, or receipts.
  • Normalize data into this shape:

{
"amount": 23.90,
"currency": "USD",
"date": "2025-04-12",
"merchant": "Starbucks",
"category": "Coffee",
"tags": ["latte", "morning", "work"]
}

AI Agent Context Design

šŸ—‚ Memory Breakdown

  • Short-Term Memory: Active month, past 3 months (in RAM or temp cache).
  • Long-Term Memory: Year-to-year logs (pull from PostgreSQL only when needed).
  • User Style Profile: How you like summaries (bullets? graphs? sass?).

šŸŖ Retrieval Protocol

  • User asks ā€œHow was March?ā€ → Query March table → Compare with Feb + last year’s March → Run difference detection → Format response.
  • Use LangChain or LlamaIndex with a smart retriever logic:
    • Pull relevant chunks only.
    • Context window has: current month, top spend areas, anomalies, and last user feedback.

Summary Generator

Summarizing:

{
"summary": "You spent $2,389 in March, up 12% from February. Top increase: Dining (+$180). Decline: Transport (–$75).",
"graph_reference": "spend_trend_mar2025.png",
"alerts": ["Dining spend increased significantly. Want to set a budget alert?"]
}
  • Use OpenAI, Claude, or fine-tuned DeepSeek to generate summaries from structured input only (force the AI to read the clean data—don’t ask it to interpret raw logs).
RECOMMENDED  10 SIMPLE STEPS TO GETTING COMFORTABLE WITH GITHUB

Integration Architecture

Optional: add voice input, like ā€œHey, how much did I blow on brunch this month?ā€ → same flow, different entry point.

I would start by defining a structured memory protocol that treats every transaction as a timestamped, categorized event with optional user context.

The agent needs to understand the current time and be able to perform comparisons across various time periods without ambiguity. I would use PostgreSQL for storing transaction data, with a TimescaleDB extension for efficient time-series handling.

Database Alternatives to PostgreSQL + TimescaleDB

It’s nice to have options, so;

1. SQLite

  • I would use this if the project is local or lightweight.
  • No server needed, just a file.
  • Ideal for mobile/desktop agents with offline use.
  • Works everywhere.

2. DuckDB

  • I’d pick this for fast analytics and local summaries.
  • Built for OLAP-style (analysis-heavy) queries.
  • Zero setup. Reads from CSVs/Parquet directly.

3. ClickHouse

  • I would use this if the transaction data grows huge.
  • It’s columnar, insanely fast for reads.
  • Open-source, runs on low-cost hardware.

4. InfluxDB (OSS version)

  • I’d use this if I wanted to optimize for time-series specifically without relying on Timescale.
  • Built-in time-based functions.
  • Open-source tier works for solo agents.

5. QuestDB

  • I’d pick this if I wanted a super-light, native time-series engine.
  • SQL syntax.
  • Built for speed, even on low-resource setups.

All of these can run on local machines or VPS servers (like Hetzner, Linode, or Contabo—cheap, global reach).

For regions with unreliable internet, I’d lean on SQLite or DuckDB for local-first storage.

For cloud sync or team use, ClickHouse or InfluxDB OSS would handle scale without needing expensive infrastructure.

RECOMMENDED  the most cutting-edge web development components and trends that are making waves in 2025

Bonus Tools for Integration

  • Supabase – I’d use it if I wanted a hosted PostgreSQL + realtime layer without managing infra.
  • Airbyte – I’d use it to pull data from bank APIs or CSVs.
  • Metabase – I’d use this to visualize summaries with a dashboard (even offline).

I would define three memory layers in the agent’s logic:

  1. A short-term window that focuses on the current and previous two months.
  2. A long-term comparison store for the past three years.
  3. A user preference layer that includes tag translations, ignored categories, and sensitivity settings for alerts.

I would use three memory layers as a baseline because they cover the key timeframes an agent needs to function well without overcomplicating the architecture.

If you’re asking why just three—short-term, long-term, and user preference—I wouldn’t stop there if the use case demands more.

What else I would add depending on the complexity:

4. Relational Memory

  • I’d use this to link spending to events, people, or recurring life patterns.
  • Example: ā€œYou usually spend more on gifts around April—related to birthdays?ā€

5. Exception Memory

  • I’d store any flagged anomalies the user dismissed or explained.
  • Example: ā€œYou told me that KES 191,200 electronics bill was a one-off, so I won’t compare it next year.ā€

6. Goal/Constraint Memory

  • I’d store the client’s budget rules, goals, or monthly caps here.
  • Used for comparing actual vs expected.
  • Auto-adapts when they override or adjust.

7. Calendar/Contextual Layer

  • I’d add a layer that maps spend to holidays, weekends, travel periods, or personal tags like ā€œon callā€, ā€œvacationā€.
  • Makes spend summaries smarter and more aligned with life context.

To compare periods, I’d use raw SQL with window functions.

SQL window functions let me compare each row of data to the previous one in a clean, precise way, right inside the database. That means I can calculate trends, changes, and patterns instantly, without moving the data around or risking messy logic. It’s fast, accurate, and built for exactly this kind of time-based comparison.

RECOMMENDED  How to Create A Sleek Landing Page with Vercel in Minutes.

No LLM would touch raw transaction data. LLMs are probabilistic. The AI layer would receive only the distilled diffs, trends, and anomalies. For monthly summaries, I’d calculate percentage changes in each spend category, detect statistically significant deviations, and bundle all that into a tightly scoped JSON object for the LLM to turn into a readable summary.

I would train or prompt the LLM using few-shot examples to generate reports in the user’s preferred tone, with logic baked in to avoid false assumptions or filler commentary. The LLM would not decide what’s important—it would only surface what my backend marked as relevant.

The whole thing would run on a scheduler. On the first of each month, the agent would generate a new report, compare it to previous months and the same month in previous years, and deliver that as a push notification, email, or Slack message depending on how I configured delivery.

For integration, I’d run the AI agent in a containerized backend. The API layer would be exposed to a local dashboard or mobile app, using WebSockets or REST for sync. If the user asked, ā€œHow did April compare to March?ā€ the agent would instantly query the context window, pull structured results, and pass a compact payload to the LLM for formatting only.

If the user wanted real-time insight mid-month, I’d support that with a rolling projection system that uses current spend rate versus historical average pacing. Everything would be queryable with natural language, but every natural-language response would map back to a verifiable query trace and include a clickable breakdown.

I would keep the system closed-loop. Every interaction from the user would be logged, not for tracking, but to improve context. If they dismissed a warning twice, the system would soften it. If they flagged a summary as inaccurate, it would trigger a review of the upstream logic, not just the LLM phrasing.

Spread the love

You'll also like...