Total Tasks

Completed

Hours Remaining

Days to Deadline

🔄 ERP AI Architecture - Internal API Flow

Detailed execution paths showing how requests flow through the microservices architecture

📊 Flow 1: Natural Language to SQL Query

Client Application

User enters: "Show me rice inventory levels for last month"

POST /api/v1/nl-query | Content-Type: application/json

↓

API Gateway (Cloud Endpoints + Cloud Run) / BFF Layer

Authentication & Authorization Check

Validates JWT token, checks RBAC permissions for inventory access

~50ms

↓

NL+SQL Agent Service (Vertex AI + LangChain)

Gemini 1.5 Pro processes natural language via LangGraph StateGraph

Identifies intent: inventory query, time range: last month

~200ms

↓

SQL Generator (LangChain Tool + Vertex AI)

Generates parameterized SQL with Vertex AI validation

SELECT * FROM inventory_data WHERE item_type='rice' AND date >= DATE_SUB(NOW(), INTERVAL 1 MONTH)

~100ms

↓

SQL Proxy / Guard (Cloud SQL Auth Proxy)

Query validation with Cloud IAM row-level security

Applies row-level security, checks allowlist, masks sensitive data

~30ms

↓

Cloud SQL PostgreSQL with Read Replicas

Query execution on managed read replica instances

Retrieves inventory data from optimized read replica

~150ms

↓

Result Formatter

Formats response for user consumption

Converts to JSON, adds metadata, applies presentation logic

~20ms

↓

API Gateway (Cloud Endpoints + Cloud Run)

Response caching & delivery

Caches result for 5 minutes, returns to client

~10ms

Key Architectural Decisions

Security First: Every SQL query passes through the proxy guard to prevent injection attacks and enforce data access policies
Performance Optimization: Read replicas handle query load, keeping the primary Cloud SQL database free for transactions
Caching Strategy: Results are cached at the gateway level for frequently requested data

Error Handling & Circuit Breakers

If Vertex AI Gemini service fails → Fallback to keyword-based query
If Cloud SQL database timeout → Return cached results if available
If SQL validation fails → Return sanitized error message

Performance Metrics

Total latency: ~760ms | Target SLA: < 1 second

Throughput: 1000 requests/second with horizontal scaling

🔍 Flow 2: RAG-Based Document Summarization

Client Application

User requests: "Summarize all purchase orders from Q3"

POST /api/v1/rag-summary | Content-Type: application/json

↓

API Gateway (Cloud Endpoints + Cloud Run)

Rate limiting check & auth validation

Ensures user hasn't exceeded RAG query limits (10/minute)

~40ms

↓

RAG Orchestrator (Vertex AI Agent Builder)

Query embedding with textembedding-gecko@003

Converts text to 768-dimensional vector using embedding model

~150ms

↓

Retriever Service (Vertex AI Vector Search)

Similarity search with automatic index optimization

Queries Vertex AI Vector Search for top-k relevant documents

~80ms

↓

Vertex AI Vector Search + Cloud SQL pgvector

HNSW index with metadata filtering via SQL

Performs approximate nearest neighbor search across sharded indices

~120ms

↓

Ranker Module

Re-ranking retrieved documents

Applies cross-encoder model to improve relevance ordering

~200ms

↓

Generator Service (Vertex AI Gemini 1.5 Pro)

Vertex AI Gemini synthesis with Document AI context grounding

Creates coherent summary from top-5 documents with citations

~500ms

↓

Citation Tracker

Add source references

Appends document IDs and confidence scores to response

~20ms

Microservices Orchestration Pattern

Service Choreography: Each service has a single responsibility and communicates through well-defined interfaces
Async Processing: Vector search and ranking can be parallelized for performance
Fallback Mechanisms: If vector store is unavailable, falls back to keyword search in Cloud SQL ERP database

Scaling Considerations

Vertex AI Vector Search uses managed index sharding for horizontal scaling

Generator Service can be replicated with GPU instances for parallel processing

📈 Flow 3: Time-Series Price Forecasting

Client Application

Request 6-month rice price forecast

GET /api/v1/forecast/rice-price?horizon=6m

↓

API Gateway (Cloud Endpoints + Cloud Run)

Check cache for recent predictions

Forecasts cached for 24 hours to reduce compute load

~30ms

↓

TS Forecasting Engine (BigQuery + Feast)

Feature extraction from BigQuery Feature Store via Feast registry

Retrieves price history, weather data, FX rates, market indicators

~100ms

↓

BigQuery Feature Store with Feast

Serve pre-computed features

Returns versioned feature vectors with monitoring metadata

~50ms

↓

Model Inference (Vertex AI Endpoints)

Load model from Vertex AI Model Registry with MLflow

Retrieves LSTM/Prophet ensemble model v2.3.1

~200ms

↓

Vertex AI Custom Training Models

LSTM/Prophet ensemble on managed GPU instances

Runs inference for 180-day horizon with confidence intervals

~300ms

↓

Post-processor & Explainability Module

Add interpretability layer

Generates SHAP values, identifies key price drivers

~150ms

↓

Confidence Estimator

Calculate prediction confidence

Provides uncertainty bounds based on historical accuracy

~50ms

ML Pipeline Architecture

BigQuery Feature Store with Feast Pattern: Ensures training-serving consistency and feature reusability
MLflow Model Registry on Vertex AI: Enables A/B testing between champion and challenger models
Explainability First: Every prediction includes interpretability metrics for business trust

Degradation Strategy

If primary model fails → Use simpler baseline model
If features unavailable → Use cached historical features
If confidence too low → Flag prediction as unreliable

🔧 Cross-Cutting Architectural Patterns

Distributed Tracing

Every request gets a correlation ID at the gateway, propagated through all service calls for end-to-end visibility

X-Correlation-ID: uuid-v4

Service Mesh Communication

All gRPC inter-service communication with Protocol Buffers use gRPC for efficiency, with automatic retry logic and circuit breakers

Protocol: gRPC + Protobuf

Observability Stack

Metrics exported to Prometheus, logs to ELK, traces to Jaeger for complete system visibility

Metrics: /metrics endpoint

Async Event Bus

Model updates and data ingestion trigger events via Kafka for decoupled processing

Topics: model.updated, data.ingested

AI/ML & Language Models

Vertex AI Gemini Google Gemini LangChain TensorFlow PyTorch LSTM Models Prophet SHAP Values Embedding Models

Backend & APIs

Cloud Run + Cloud Run services Kong/Nginx gRPC Protocol Buffers WebSocket REST APIs OpenAPI/Swagger Pub/Sub BFF Layer

Data & Storage

Cloud SQL PostgreSQL BigQuery Vertex AI Vector Search Cloud SQL pgvector Redis Cache Feast Feature Store DVC HNSW Index Read Replicas

Frontend & UI

React Next.js TypeScript SSR/SSG Jest React Testing Library WebSocket Client Chat Interface

MLOps & Model Management

Vertex AI Pipelines MLflow Weights & Biases Vertex AI Endpoints Model Registry A/B Testing Feature Registry Champion/Challenger

Cloud Infrastructure

Google Cloud Platform GKE Docker Kubernetes Pulumi IaC Ansible GitHub Actions HPA Auto-scaling

Monitoring & Observability

Prometheus Grafana ELK Stack Jaeger Distributed Tracing Circuit Breakers Correlation IDs Metrics Endpoint

Security & Testing

OAuth2/JWT RBAC SQL Proxy/Guard OWASP Pytest Coverage.py Locust PII Scrubbing

Microservices Architecture Flow

Client UI

→

API Gateway

→

Microservices

NL+SQL Service

RAG Orchestrator

TS Forecasting

→

Data Layer

🔧 GCP Technology Implementation Layer

NL+SQL Service

Primary Database:

• Cloud SQL with PostgreSQL

Read replicas for query optimization

LLM Engine (Vertex AI):

• Vertex AI with Gemini 1.5 Pro

2M token context for complex queries

Orchestration:

• LangGraph StateGraph

Query refinement & error handling

Deployment:

• Cloud Run (Serverless)

Auto-scaling with load balancing

RAG Orchestrator

Core Platform:

• Vertex AI Agent Builder

Managed document processing

Document Processing:

• Document AI

OCR, table extraction, layout parsing

Vector Storage:

• Vertex AI Vector Search

Billions of docs, <10ms latency

• Cloud SQL with pgvector

Metadata filtering & SQL queries

Embeddings:

• textembedding-gecko@003

768-dim vectors for similarity

TS Forecasting

Data Warehouse:

• BigQuery

Time-series data & aggregations

ML Platform:

• Vertex AI Custom Training

LSTM/Prophet ensemble models

Feature Store:

• BigQuery with Feast

Training-serving consistency

Model Registry:

• Vertex AI Model Registry

A/B testing & versioning

Integration Pattern: All services deployed on Cloud Run with LangChain orchestration and LangGraph for stateful workflows

⚡

Sub-Second Latency

Target SLA < 1 second

📈

High Throughput

1000 requests/second

🔄

Automated ML Pipeline

Weekly model retraining

Microservices Team Structure

The Rice Market AI System requires a cross-functional team of 3-4 members with complementary skills, structured around business capabilities to ensure autonomy, ownership, and end-to-end responsibility. Teams follow the "you build it, you run it" philosophy, taking ownership from development through production.

I. Core Product Teams (Business Capabilities)

1. Natural Language SQL (NL+SQL) Team

Mission: Enable users to query ERP data using natural language, translating it into secure SQL queries.

Key Responsibilities & Components:

NL+SQL Agent Service (Vertex AI Gemini 1.5 Pro + LangChain): Process natural language, identify intent, extract entities using LLM Engine
SQL Generator (LangChain + Vertex AI): Create parameterized SQL queries with proper escaping
SQL Proxy/Guard (Cloud SQL Auth Proxy): Implement Cloud IAM-based validation, row-level security, allowlist enforcement
Result Formatter: Format query responses for user consumption (JSON with metadata)
Cloud SQL PostgreSQL with Read Replicas: Optimize query execution on read replicas, manage data retrieval
Containerization: Cloud Run containerization with multi-stage builds for NL+SQL Agent Service (Vertex AI Gemini 1.5 Pro + LangChain) with test query processing
Model Training: Fine-tune Vertex AI Gemini models for rice market domain, achieve >80% accuracy

Core Skills:

NLP with Vertex AI Gemini 1.5 Pro via LangChain Vertex AI LLMs (Gemini Models) SQL Cloud SQL Database Design Cloud IAM & SQL Security (RBAC) Cloud Run API Development

2. RAG-Based Document Summarization Team

Mission: Provide intelligent summarization of documents based on Retrieval-Augmented Generation.

Key Responsibilities & Components:

RAG Orchestrator (Vertex AI Agent Builder + Vector Search): Generate query embeddings (768-dimensional vectors)
Vertex AI Vector Search: Perform similarity searches using HNSW indexes
Ranker Module: Re-rank documents using cross-encoder models
Generator Service (Vertex AI Gemini): Create LLM-based summaries with citations
Query Embedding (textembedding-gecko@003): Convert text to 768-dimensional vectors, search Vertex AI Vector Search
Cloud Run Deployment: RAG Orchestrator (Vertex AI Agent Builder + Vector Search) with vector DB connectivity
Model Training: Fine-tune Vertex AI models leveraging RAG for domain accuracy

Core Skills:

NLP with Vertex AI Gemini 1.5 Pro via LangChain Vertex AI Vector Search for production embeddings Cloud SQL with pgvector for hybrid search Document AI & Information Retrieval Vertex AI Embedding Models (gecko family) ML Ranking

3. Time-Series Price Forecasting Team

Mission: Generate accurate, interpretable time-series forecasts for rice prices over specified horizons.

Key Responsibilities & Components:

Feature Pipeline: Extract features from BigQuery Feature Store with Feast (price history, weather, FX)
Model Inference Engine: Run LSTM/Prophet ensembles with confidence intervals
Explainability Module: Generate SHAP values for price driver identification
Response Formatting: Package predictions with metadata and explanations
Forecasting Models: Develop 6-month forecasts with confidence intervals
BigQuery Feature Store with Feast Management: Feature registry, generation, versioning
MLflow Model Registry on Vertex AI: Version control, champion/challenger strategies

Core Skills:

Time-Series Analysis with PyTorch LSTM models on Vertex AI LSTM/Prophet Feature Engineering Explainable AI Statistical Modeling Data Science

II. Supporting Teams (Platform-Oriented)

4. Platform & MLOps with Vertex AI Pipelines, MLflow, and Weights & Biases Team

Mission: Provide robust, scalable, secure microservice platform and MLOps with Vertex AI Pipelines, MLflow, and Weights & Biases infrastructure for all teams.

Key Responsibilities & Components:

API Gateway (Cloud Endpoints + Cloud Run)/BFF Layer: Authentication, authorization, rate limiting, caching
Environment Setup: GCP project configuration with IAM and billing
Repository Structure: Monorepo with proper microservice directories
Data Pipeline: Ingestion, PII scrubbing, embedding generation
Vector Store Setup: Deploy and manage Vertex AI Vector Search with Cloud SQL pgvector
BigQuery Feature Store with Feast: Registry, generation, online/offline storage
MLflow Model Registry on Vertex AI: Version control, deployment configurations
ML Pipeline: Automated model retraining workflows
Container Orchestration: GKE deployment, HPA auto-scaling, Ansible
CI/CD Pipeline: GitHub Actions for testing and deployment
Monitoring: Prometheus, Grafana, ELK stack, distributed tracing
Security: OWASP best practices, security headers implementation

Core Skills:

DevOps with GitHub Actions, Pulumi IaC, and Ansible automation MLOps with Vertex AI Pipelines, MLflow, and Weights & Biases Google Cloud Platform (GKE, Vertex AI, BigQuery)/Kubernetes orchestration Cloud Run containerization with multi-stage builds CI/CD Ansible/Terraform Monitoring Tools Security

5. Client Application (Frontend) Team

Mission: Design and implement the user interface that interacts with the AI System's capabilities.

Key Responsibilities & Components:

Frontend Design: React with NextJS framework for SSR/SSG capabilities implementation with chat interface
API Development: RESTful APIs with OpenAPI specification
User Experience: Intuitive interface for queries and results
API Gateway (Cloud Endpoints + Cloud Run) Integration: Consume backend microservices
Performance Optimization: Client-side performance tuning

Core Skills:

React with NextJS framework for SSR/SSG capabilities UI/UX Design API Integration Web Security Performance Optimization

Team Principles and Practices

Team Size

Each team consists of 3-4 members to foster effective collaboration and reduce communication overhead.

Autonomy

Teams operate with limited dependencies, making rapid, context-aware decisions.

Ownership

Long-term accountability from conception to production, including on-call responsibilities.

Cross-functional

Diverse skills within each team (ML engineers, backend developers, data engineers).

Knowledge Sharing

Communities of practice (Chapters and Guilds) to disseminate knowledge without dictating choices.

Design Reviews

RFC process for new services with feedback from various teams to catch issues early.

Documentation

Living documentation including service overviews, contracts, runbooks, and metadata.

Consistency

Microservice chassis for common functionalities ensuring consistency with technical heterogeneity.