Group name: ERP for SMEs
Group members: 1. Thanh Phong Le
2. Davar Jamali
3. Pranab Nepal
Assigned Teacher Fellow: Javier Machin
GitHub:
https://github.com/ltphongssvn/ac215e115groupproject

ERP AI Architecture Implementation

Rice Market AI System - Natural Language SQL, RAG, and Time-Series Forecasting

0
Total Tasks
0
Completed
0h
Hours Remaining
0
Days to Deadline

🔄 ERP AI Architecture - Internal API Flow

Detailed execution paths showing how requests flow through the microservices architecture

📊 Flow 1: Natural Language to SQL Query

1
Client Application
User enters: "Show me rice inventory levels for last month"
POST /api/v1/nl-query | Content-Type: application/json
2
API Gateway (Cloud Endpoints + Cloud Run) / BFF Layer
Authentication & Authorization Check
Validates JWT token, checks RBAC permissions for inventory access
~50ms
3
NL+SQL Agent Service (Vertex AI + LangChain)
Gemini 1.5 Pro processes natural language via LangGraph StateGraph
Identifies intent: inventory query, time range: last month
~200ms
4
SQL Generator (LangChain Tool + Vertex AI)
Generates parameterized SQL with Vertex AI validation
SELECT * FROM inventory_data WHERE item_type='rice' AND date >= DATE_SUB(NOW(), INTERVAL 1 MONTH)
~100ms
5
SQL Proxy / Guard (Cloud SQL Auth Proxy)
Query validation with Cloud IAM row-level security
Applies row-level security, checks allowlist, masks sensitive data
~30ms
6
Cloud SQL PostgreSQL with Read Replicas
Query execution on managed read replica instances
Retrieves inventory data from optimized read replica
~150ms
7
Result Formatter
Formats response for user consumption
Converts to JSON, adds metadata, applies presentation logic
~20ms
8
API Gateway (Cloud Endpoints + Cloud Run)
Response caching & delivery
Caches result for 5 minutes, returns to client
~10ms

Key Architectural Decisions

  • Security First: Every SQL query passes through the proxy guard to prevent injection attacks and enforce data access policies
  • Performance Optimization: Read replicas handle query load, keeping the primary Cloud SQL database free for transactions
  • Caching Strategy: Results are cached at the gateway level for frequently requested data
Error Handling & Circuit Breakers
  • If Vertex AI Gemini service fails → Fallback to keyword-based query
  • If Cloud SQL database timeout → Return cached results if available
  • If SQL validation fails → Return sanitized error message
Performance Metrics

Total latency: ~760ms | Target SLA: < 1 second

Throughput: 1000 requests/second with horizontal scaling

🔍 Flow 2: RAG-Based Document Summarization

1
Client Application
User requests: "Summarize all purchase orders from Q3"
POST /api/v1/rag-summary | Content-Type: application/json
2
API Gateway (Cloud Endpoints + Cloud Run)
Rate limiting check & auth validation
Ensures user hasn't exceeded RAG query limits (10/minute)
~40ms
3
RAG Orchestrator (Vertex AI Agent Builder)
Query embedding with textembedding-gecko@003
Converts text to 768-dimensional vector using embedding model
~150ms
4
Retriever Service (Vertex AI Vector Search)
Similarity search with automatic index optimization
Queries Vertex AI Vector Search for top-k relevant documents
~80ms
5
Vertex AI Vector Search + Cloud SQL pgvector
HNSW index with metadata filtering via SQL
Performs approximate nearest neighbor search across sharded indices
~120ms
6
Ranker Module
Re-ranking retrieved documents
Applies cross-encoder model to improve relevance ordering
~200ms
7
Generator Service (Vertex AI Gemini 1.5 Pro)
Vertex AI Gemini synthesis with Document AI context grounding
Creates coherent summary from top-5 documents with citations
~500ms
8
Citation Tracker
Add source references
Appends document IDs and confidence scores to response
~20ms

Microservices Orchestration Pattern

  • Service Choreography: Each service has a single responsibility and communicates through well-defined interfaces
  • Async Processing: Vector search and ranking can be parallelized for performance
  • Fallback Mechanisms: If vector store is unavailable, falls back to keyword search in Cloud SQL ERP database
Scaling Considerations

Vertex AI Vector Search uses managed index sharding for horizontal scaling

Generator Service can be replicated with GPU instances for parallel processing

📈 Flow 3: Time-Series Price Forecasting

1
Client Application
Request 6-month rice price forecast
GET /api/v1/forecast/rice-price?horizon=6m
2
API Gateway (Cloud Endpoints + Cloud Run)
Check cache for recent predictions
Forecasts cached for 24 hours to reduce compute load
~30ms
3
TS Forecasting Engine (BigQuery + Feast)
Feature extraction from BigQuery Feature Store via Feast registry
Retrieves price history, weather data, FX rates, market indicators
~100ms
4
BigQuery Feature Store with Feast
Serve pre-computed features
Returns versioned feature vectors with monitoring metadata
~50ms
5
Model Inference (Vertex AI Endpoints)
Load model from Vertex AI Model Registry with MLflow
Retrieves LSTM/Prophet ensemble model v2.3.1
~200ms
6
Vertex AI Custom Training Models
LSTM/Prophet ensemble on managed GPU instances
Runs inference for 180-day horizon with confidence intervals
~300ms
7
Post-processor & Explainability Module
Add interpretability layer
Generates SHAP values, identifies key price drivers
~150ms
8
Confidence Estimator
Calculate prediction confidence
Provides uncertainty bounds based on historical accuracy
~50ms

ML Pipeline Architecture

  • BigQuery Feature Store with Feast Pattern: Ensures training-serving consistency and feature reusability
  • MLflow Model Registry on Vertex AI: Enables A/B testing between champion and challenger models
  • Explainability First: Every prediction includes interpretability metrics for business trust
Degradation Strategy
  • If primary model fails → Use simpler baseline model
  • If features unavailable → Use cached historical features
  • If confidence too low → Flag prediction as unreliable

🔧 Cross-Cutting Architectural Patterns

Distributed Tracing

Every request gets a correlation ID at the gateway, propagated through all service calls for end-to-end visibility

X-Correlation-ID: uuid-v4

Service Mesh Communication

All gRPC inter-service communication with Protocol Buffers use gRPC for efficiency, with automatic retry logic and circuit breakers

Protocol: gRPC + Protobuf

Observability Stack

Metrics exported to Prometheus, logs to ELK, traces to Jaeger for complete system visibility

Metrics: /metrics endpoint

Async Event Bus

Model updates and data ingestion trigger events via Kafka for decoupled processing

Topics: model.updated, data.ingested

AI/ML & Language Models

Vertex AI Gemini Google Gemini LangChain TensorFlow PyTorch LSTM Models Prophet SHAP Values Embedding Models

Backend & APIs

Cloud Run + Cloud Run services Kong/Nginx gRPC Protocol Buffers WebSocket REST APIs OpenAPI/Swagger Pub/Sub BFF Layer

Data & Storage

Cloud SQL PostgreSQL BigQuery Vertex AI Vector Search Cloud SQL pgvector Redis Cache Feast Feature Store DVC HNSW Index Read Replicas

Frontend & UI

React Next.js TypeScript SSR/SSG Jest React Testing Library WebSocket Client Chat Interface

MLOps & Model Management

Vertex AI Pipelines MLflow Weights & Biases Vertex AI Endpoints Model Registry A/B Testing Feature Registry Champion/Challenger

Cloud Infrastructure

Google Cloud Platform GKE Docker Kubernetes Pulumi IaC Ansible GitHub Actions HPA Auto-scaling

Monitoring & Observability

Prometheus Grafana ELK Stack Jaeger Distributed Tracing Circuit Breakers Correlation IDs Metrics Endpoint

Security & Testing

OAuth2/JWT RBAC SQL Proxy/Guard OWASP Pytest Coverage.py Locust PII Scrubbing

Microservices Architecture Flow

Client UI
API Gateway
Microservices
NL+SQL Service
RAG Orchestrator
TS Forecasting
Data Layer
Client UI → API Gateway → Microservices → Data Layer

🔧 GCP Technology Implementation Layer

NL+SQL Service
Primary Database:
• Cloud SQL with PostgreSQL
Read replicas for query optimization
LLM Engine (Vertex AI):
• Vertex AI with Gemini 1.5 Pro
2M token context for complex queries
Orchestration:
• LangGraph StateGraph
Query refinement & error handling
Deployment:
• Cloud Run (Serverless)
Auto-scaling with load balancing
RAG Orchestrator
Core Platform:
• Vertex AI Agent Builder
Managed document processing
Document Processing:
• Document AI
OCR, table extraction, layout parsing
Vector Storage:
• Vertex AI Vector Search
Billions of docs, <10ms latency
• Cloud SQL with pgvector
Metadata filtering & SQL queries
Embeddings:
• textembedding-gecko@003
768-dim vectors for similarity
TS Forecasting
Data Warehouse:
• BigQuery
Time-series data & aggregations
ML Platform:
• Vertex AI Custom Training
LSTM/Prophet ensemble models
Feature Store:
• BigQuery with Feast
Training-serving consistency
Model Registry:
• Vertex AI Model Registry
A/B testing & versioning
Integration Pattern: All services deployed on Cloud Run with LangChain orchestration and LangGraph for stateful workflows
Sub-Second Latency
Target SLA < 1 second
📈
High Throughput
1000 requests/second
🔄
Automated ML Pipeline
Weekly model retraining

Microservices Team Structure

The Rice Market AI System requires a cross-functional team of 3-4 members with complementary skills, structured around business capabilities to ensure autonomy, ownership, and end-to-end responsibility. Teams follow the "you build it, you run it" philosophy, taking ownership from development through production.

I. Core Product Teams (Business Capabilities)

1. Natural Language SQL (NL+SQL) Team
Mission: Enable users to query ERP data using natural language, translating it into secure SQL queries.
Key Responsibilities & Components:
  • NL+SQL Agent Service (Vertex AI Gemini 1.5 Pro + LangChain): Process natural language, identify intent, extract entities using LLM Engine
  • SQL Generator (LangChain + Vertex AI): Create parameterized SQL queries with proper escaping
  • SQL Proxy/Guard (Cloud SQL Auth Proxy): Implement Cloud IAM-based validation, row-level security, allowlist enforcement
  • Result Formatter: Format query responses for user consumption (JSON with metadata)
  • Cloud SQL PostgreSQL with Read Replicas: Optimize query execution on read replicas, manage data retrieval
  • Containerization: Cloud Run containerization with multi-stage builds for NL+SQL Agent Service (Vertex AI Gemini 1.5 Pro + LangChain) with test query processing
  • Model Training: Fine-tune Vertex AI Gemini models for rice market domain, achieve >80% accuracy
Core Skills:
NLP with Vertex AI Gemini 1.5 Pro via LangChain Vertex AI LLMs (Gemini Models) SQL Cloud SQL Database Design Cloud IAM & SQL Security (RBAC) Cloud Run API Development
2. RAG-Based Document Summarization Team
Mission: Provide intelligent summarization of documents based on Retrieval-Augmented Generation.
Key Responsibilities & Components:
  • RAG Orchestrator (Vertex AI Agent Builder + Vector Search): Generate query embeddings (768-dimensional vectors)
  • Vertex AI Vector Search: Perform similarity searches using HNSW indexes
  • Ranker Module: Re-rank documents using cross-encoder models
  • Generator Service (Vertex AI Gemini): Create LLM-based summaries with citations
  • Query Embedding (textembedding-gecko@003): Convert text to 768-dimensional vectors, search Vertex AI Vector Search
  • Cloud Run Deployment: RAG Orchestrator (Vertex AI Agent Builder + Vector Search) with vector DB connectivity
  • Model Training: Fine-tune Vertex AI models leveraging RAG for domain accuracy
Core Skills:
NLP with Vertex AI Gemini 1.5 Pro via LangChain Vertex AI Vector Search for production embeddings Cloud SQL with pgvector for hybrid search Document AI & Information Retrieval Vertex AI Embedding Models (gecko family) ML Ranking
3. Time-Series Price Forecasting Team
Mission: Generate accurate, interpretable time-series forecasts for rice prices over specified horizons.
Key Responsibilities & Components:
  • Feature Pipeline: Extract features from BigQuery Feature Store with Feast (price history, weather, FX)
  • Model Inference Engine: Run LSTM/Prophet ensembles with confidence intervals
  • Explainability Module: Generate SHAP values for price driver identification
  • Response Formatting: Package predictions with metadata and explanations
  • Forecasting Models: Develop 6-month forecasts with confidence intervals
  • BigQuery Feature Store with Feast Management: Feature registry, generation, versioning
  • MLflow Model Registry on Vertex AI: Version control, champion/challenger strategies
Core Skills:
Time-Series Analysis with PyTorch LSTM models on Vertex AI LSTM/Prophet Feature Engineering Explainable AI Statistical Modeling Data Science

II. Supporting Teams (Platform-Oriented)

4. Platform & MLOps with Vertex AI Pipelines, MLflow, and Weights & Biases Team
Mission: Provide robust, scalable, secure microservice platform and MLOps with Vertex AI Pipelines, MLflow, and Weights & Biases infrastructure for all teams.
Key Responsibilities & Components:
  • API Gateway (Cloud Endpoints + Cloud Run)/BFF Layer: Authentication, authorization, rate limiting, caching
  • Environment Setup: GCP project configuration with IAM and billing
  • Repository Structure: Monorepo with proper microservice directories
  • Data Pipeline: Ingestion, PII scrubbing, embedding generation
  • Vector Store Setup: Deploy and manage Vertex AI Vector Search with Cloud SQL pgvector
  • BigQuery Feature Store with Feast: Registry, generation, online/offline storage
  • MLflow Model Registry on Vertex AI: Version control, deployment configurations
  • ML Pipeline: Automated model retraining workflows
  • Container Orchestration: GKE deployment, HPA auto-scaling, Ansible
  • CI/CD Pipeline: GitHub Actions for testing and deployment
  • Monitoring: Prometheus, Grafana, ELK stack, distributed tracing
  • Security: OWASP best practices, security headers implementation
Core Skills:
DevOps with GitHub Actions, Pulumi IaC, and Ansible automation MLOps with Vertex AI Pipelines, MLflow, and Weights & Biases Google Cloud Platform (GKE, Vertex AI, BigQuery)/Kubernetes orchestration Cloud Run containerization with multi-stage builds CI/CD Ansible/Terraform Monitoring Tools Security
5. Client Application (Frontend) Team
Mission: Design and implement the user interface that interacts with the AI System's capabilities.
Key Responsibilities & Components:
  • Frontend Design: React with NextJS framework for SSR/SSG capabilities implementation with chat interface
  • API Development: RESTful APIs with OpenAPI specification
  • User Experience: Intuitive interface for queries and results
  • API Gateway (Cloud Endpoints + Cloud Run) Integration: Consume backend microservices
  • Performance Optimization: Client-side performance tuning
Core Skills:
React with NextJS framework for SSR/SSG capabilities UI/UX Design API Integration Web Security Performance Optimization

Team Principles and Practices

1
Team Size
Each team consists of 3-4 members to foster effective collaboration and reduce communication overhead.
2
Autonomy
Teams operate with limited dependencies, making rapid, context-aware decisions.
3
Ownership
Long-term accountability from conception to production, including on-call responsibilities.
4
Cross-functional
Diverse skills within each team (ML engineers, backend developers, data engineers).
5
Knowledge Sharing
Communities of practice (Chapters and Guilds) to disseminate knowledge without dictating choices.
6
Design Reviews
RFC process for new services with feedback from various teams to catch issues early.
7
Documentation
Living documentation including service overviews, contracts, runbooks, and metadata.
8
Consistency
Microservice chassis for common functionalities ensuring consistency with technical heterogeneity.