Architecture | Sandhya Honnappa

A front-end portfolio shows pixels. An architect's shows systems, decisions, and tradeoffs. Each study below is problem → approach → decisions → outcome.

JLL GPT

Key Architect & Lead Engineer

Enterprise multi-provider GenAI platform serving 45,000+ employees globally.

Azure OpenAI
AWS Bedrock
Google Gemini
Baidu Ernie
Azure Cosmos DB
AKS
LiteLLM
Qdrant
SignalR
APIM
Front Door

<problem />

JLL needed one centralised, governed AI surface for internal employees — without locking the company into a single LLM vendor, and resilient to provider outages, cost spikes, and regional data constraints.

<approach />

▹Designed a multi-provider routing strategy (Strategy Pattern + ChatBackendServiceLocator) enabling seamless LLM hot-swapping across Azure OpenAI, AWS Bedrock, Gemini, and Baidu Ernie.
▹Architected the backend with Clean/Onion Architecture, CQRS, and JWT/Okta auth on Azure Cosmos DB; built real-time persistent chat via SignalR with conversation management, file uploads, and RAG integration.
▹Designed the Azure networking topology: Front Door → APIM → Container Apps → Private Endpoints, with rate-limiting and idempotency protections.

<decisions />

▹Strategy Pattern over per-provider branching — so adding/swapping a provider is configuration, not a code change.
▹Private Endpoints + APIM policies to keep enterprise data inside the network boundary.
▹A dedicated AI gateway (LiteLLM) in front of every provider for caching, cost governance, and a uniform interface.

<outcome />

▹Single governed AI platform adopted by 45,000+ employees worldwide.
▹Provider failover and vendor flexibility at enterprise scale.
▹Real-time, persistent, RAG-enabled chat as the standard JLL AI experience.

The Cosmos DB Incident

Incident Lead

Cosmos DB grew 30 GB → 320 GB in 48 hours. Diagnosed, contained, and hardened.

Azure Cosmos DB
APIM
Azure Blob Storage
Idempotency

<problem />

In production, Cosmos DB storage exploded from 30 GB to 320 GB within 48 hours — a cascading duplicate-upload bug threatening sustained data growth and cost runaway.

<approach />

▹Traced the cascade to duplicate uploads re-entering the write path and amplifying.
▹Contained it with APIM policies and idempotency guards to stop duplicate writes at the edge.
▹Migrated large payloads out of Cosmos into Blob Storage, sizing each store to its job.

<decisions />

▹Stop the bleeding at the gateway (APIM) first, then fix the data model — containment before cleanup.
▹Idempotency as a first-class protection, not an afterthought.
▹Right storage for the right data: documents in Cosmos, blobs in Blob Storage.

<outcome />

▹Prevented sustained data loss and cost runaway.
▹Idempotency and rate-limiting became standing protections across the platform.

LiteLLM AI Gateway on AKS

Architect & Owner

Production Python AI gateway with semantic caching and cost governance across two Azure regions.

LiteLLM
Python
AKS
Qdrant
Azure (2 regions)

<problem />

Multiple LLM providers and rising token spend needed a single control point for routing, caching, and cost governance.

<approach />

▹Deployed LiteLLM on AKS across two Azure regions for resilience.
▹Added exact-match and semantic caching via Qdrant to cut redundant LLM calls.
▹Centralised cost governance and a uniform provider interface behind the gateway.

<decisions />

▹Semantic caching (not just exact-match) to catch near-duplicate prompts.
▹Gateway-owned cost controls so governance lives in infrastructure, not scattered in app code.

<outcome />

▹Significantly reduced LLM costs and latency.
▹One operational surface for every provider integration.

MCP Gateway & Agentic AI Marketplace

Architect

Node/Express MCP Gateway mapping tool/resource/prompt primitives onto JLL GPT's strategy pattern.

MCP
Node.js
Express
JSON-RPC

<problem />

Agentic tool-use needed a standard, extensible way to expose JLL GPT capabilities to AI agents.

<approach />

▹Built an MCP Gateway on Node/Express mapping MCP tool/resource/prompt primitives to JLL GPT's multi-provider strategy pattern.
▹Authored ADRs, Claude Code setup guides, and CLAUDE.md slash-command patterns for the team.

<decisions />

▹Adopt MCP as the integration contract so new agentic tools plug in without bespoke wiring.
▹Document architecture decisions (ADRs) so the platform stays legible as it grows.

<outcome />

▹Extensible agentic integrations on a standard protocol.
▹Reusable patterns and docs that let other engineers extend the platform.

Nudge — AI Habit Tracker

Solo Builder (Personal Project)

Full-stack Python AI app demonstrating FastAPI + pgvector + LiteLLM + MCP patterns.

FastAPI
PostgreSQL
pgvector
LiteLLM
MCP
Railway

<problem />

A real, working personal tool that also demonstrates production Python AI backend patterns end-to-end.

<approach />

▹Multi-repo architecture: nudge-mcp-server, nudge-agent, nudge-pwa.
▹JWT auth, RAG-ready vector storage in pgvector, and an MCP server for agentic habit logging.
▹Hosted on Railway with managed Postgres.

<decisions />

▹Use the exact stack I'd recommend at work (FastAPI, pgvector, LiteLLM, MCP) so the project is also a reference implementation.
▹Multi-repo split to keep the agent, server, and PWA independently deployable.

<outcome />

▹A functioning personal habit tracker.
▹A portfolio demonstration of Python AI backend patterns.

</architecture>

<Contact/>→