AI QA Architect - Georgin Advisory Pvt. Ltd.

📋 Course Information

Course Code	AIQA-201
Duration	12 Weeks (25 Sessions)
Schedule	Weekends Only (Saturday & Sunday)
Session Duration	2.5 hours per session
Total Contact Hours	62.5 hours live
Self-Study Hours	60 hours (recommended)
Mode	Live Online / Hybrid
Level	Intermediate to Expert
Prerequisites	Basic QA or IT experience (no prior AI QA needed)
Language	English / Hindi
Certificate	Professional Certificate in AI QA Engineering & Architecture
Batch Size	Maximum 30 students (personalized attention)

🎯 Course Overview

This intensive 12-week programme is designed for QA professionals, IT engineers, and IT managers who want to master the testing of AI and GenAI systems — from basic LLM output testing all the way to multi-agent QA and AI governance.

Students will use a single real product story across all 25 sessions, applying every technique to the same AI application until they have built a complete, production-grade AI QA Architect portfolio.

👥 Target Audience

•

QA Engineers wanting to specialise in AI and GenAI systems testing

•

QA Managers and Test Leads overseeing AI product delivery

•

Automation Engineers ready to move into AI QA architecture

•

DevOps Engineers adding AI quality gates to CI/CD pipelines

•

IT Professionals transitioning into AI QA roles

•

Software Developers who want to properly test what they build with AI

•

MLOps Engineers needing quality engineering skills

•

Anyone preparing for an AI QA Architect role or interview

📖 Course Structure

Phase 1: AI Mindset + Vibe Coding + Local AI (Weeks 1–2)

AI QA landscape, LLM fundamentals, vibe coding with Copilot/Cursor, LM Studio local models, AI-assisted test design

Phase 2: Modern QA Toolkit + n8n (Weeks 3–4)

API testing (Bruno/Hurl/Pact/Schemathesis), n8n workflow testing, Playwright E2E vibe-coded, database SQL test suites

Phase 3: Testing GenAI Systems (Weeks 5–6)

LLM output quality, RAG pipeline testing, AI security red-teaming, MCP AI QA agents

Phase 4: Performance + AI Load Testing + Security (Weeks 7–8)

k6 performance, AI load testing (TTFT/token cost/RAG under load), chaos engineering, DAST security

Phase 5: Agentic QA + Multi-Agent + Observability (Weeks 9–10)

Goal-based agent testing, multi-agent systems, advanced n8n orchestration, AI drift monitoring

Phase 6: QA Architecture + Governance + Demo Day (Weeks 11–12)

Test pyramid design, CI/CD quality gates, EU AI Act compliance, fairness testing, capstone presentation

🗓️ Weekly Navigation

Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12

WEEK 1: AI Mindset + Vibe Coding + ShopSmart Kickoff

Sessions 1-2 • Building your AI QA foundation with real-world ShopSmart project

SESSION 1

The AI QA Revolution & ShopSmart Kickoff

Duration: 2.5 hours

🧵 ShopSmart product manager shares a 1-page brief: "Add an AI chatbot to handle 80% of tier-1 support queries." This is Session 1 of the ShopSmart story.

Topics Covered:

The AI QA revolution — how AI changes testing fundamentally
Traditional QA vs AI QA — what is different, what stays the same
Practical AI stack walkthrough: UI → n8n → LLM → RAG → API → DB
First AI test plan — paste ShopSmart brief into Claude, review output critically

💭 Think Before You Build:

What does it mean to "test" a system that gives a different answer every time?
How is testing an LLM different from testing a REST API?
What are the 4 layers of the ShopSmart AI stack — and what test type does each need?

🛠️ Hands-on Lab:

All 8 services running (shopmart-app, n8n, postgres, mockoon, prometheus, grafana, influxdb, localai)
Map ShopSmart AI stack layers on paper: identify every failure point
Generate first ShopSmart test plan using Claude — then review and identify gaps
Create GitHub repository: first commit

📝 Assignment: Write a 2-page ShopSmart AI QA Architecture Overview covering: stack diagram, test types per layer, risk assessment

SESSION 2

Vibe Coding for QA — AI-Accelerated Test Engineering

Duration: 2.5 hours

🧵 ShopSmart user story #1: "As a customer, I want to ask 'Where is my order?' and get a tracked answer." First Playwright smoke test needed.

Topics Covered:

Vibe coding methodology — Comments → Code workflow
Cursor IDE composer and agent mode — live demo
GitHub Copilot — chat, inline, multi-file generation
AI-generated code review — how to spot and fix hallucinated selectors
Vibe-code ShopSmart Playwright smoke tests — live coding
AI Audit technique — documenting what the AI got wrong

🛠️ Hands-on Lab:

Vibe-code ShopSmart Playwright smoke tests using Copilot scaffold in under 10 minutes
Review every generated test: find and fix at least 5 errors the AI introduced
Cursor agent mode: build full ShopSmart chat UI POM from a 1-paragraph description
Istanbul coverage: measure what % of ShopSmart flows the AI missed

📝 Assignment: 30-test ShopSmart AI-generated Playwright suite + AI Audit column (document what AI got wrong in each test)

WEEK 2: Local AI Models + AI Test Design

Sessions 3-4 • LM Studio setup, golden datasets, and risk-based testing

SESSION 3

🖥️ LM Studio: Running Local AI Models for QA

Duration: 2.5 hours • No API Cost, No Internet Required

🧵 ShopSmart QA team has no cloud API budget in the dev environment and cannot send customer order data to cloud providers. They need a local model for test generation and CI evals.

Topics Covered:

Why run local AI models? — cost, privacy, offline CI, rate limits
LM Studio installation — macOS, Windows, Linux, Apple Silicon M1/M2/M3
Downloading models: Mistral 7B, Phi-3.5 Mini, Llama 3.1, Codestral
Starting the OpenAI-compatible local server on localhost:1234
Connecting QA tools to LM Studio: Promptfoo, Bruno, Cursor IDE, n8n
Quality comparison: Mistral 7B vs Claude — scoring ShopSmart test generation
LocalAI in Docker for CI without internet

📝 Assignment: LM Studio config + Promptfoo local provider config + 20 ShopSmart test cases (local) + quality comparison table + LocalAI CI Docker setup

SESSION 4

AI-Powered Test Design: Requirements to Test Cases

Duration: 2.5 hours

🧵 ShopSmart story #2: "I want to know my refund status." 50 edge cases need to be identified and a golden dataset created for all future AI evals.

Topics Covered:

AI-assisted test design — from PRD to test cases in 3 minutes
What makes a test case "good"? — the 5-criteria scoring rubric
Building a golden dataset — the most important AI QA artefact
Risk-based testing for AI systems — how to prioritise when everything can fail
Exploratory testing with AI assistance

📝 Assignment: 50-test suite (AI-generated + reviewed) + ShopSmart golden dataset (20 pairs) + risk matrix

WEEK 3: LLM Internals + API Testing

Sessions 5-6 • Understanding GenAI architecture and comprehensive API testing

SESSION 5

GenAI Context: How LLMs, RAG & Agents Work (QA Perspective)

Duration: 2.5 hours

🧵 ShopSmart developer explains: the chatbot uses RAG to search a 500-document FAQ corpus before answering. QA must understand this to test it effectively.

Topics Covered:

LLM mechanics simplified — tokens, temperature, context window
Temperature 0 vs 0.7 — what changes and what that means for test design
RAG architecture — 4 failure points a QA engineer must know
Agent anatomy — goals, tools, memory, observation loop (ReAct pattern)
MCP (Model Context Protocol) — what it is and why it changes QA

📝 Assignment: Temperature comparison report + ShopSmart RAG failure map + agent anatomy diagram + 10 agent scenarios

SESSION 6

API Testing: Bruno, Hurl, Contract & Fuzz

Duration: 2.5 hours

🧵 ShopSmart backend exposes POST /chat, GET /order/{id}, POST /refund — all need contract tests before the frontend team depends on them.

Topics Covered:

API testing for AI systems — what is different from traditional REST testing
Bruno — git-native offline API client, generating from OpenAPI spec
Hurl — CLI HTTP testing, CI-native, scriptable
Pact — consumer-driven contract testing for AI APIs
Schemathesis — automated OpenAPI fuzz testing
Mockoon — API mocking, critical for isolating n8n AI node

📝 Assignment: Bruno collection (35 requests) + Hurl smoke + Pact contract + Schemathesis fuzz report

WEEK 4: n8n Integration + Playwright E2E + Database

Sessions 7-9 • Workflow testing, E2E automation, and SQL validation

SESSION 7

n8n Integration Testing: Workflows, AI Nodes & Failure Injection

Duration: 2.5 hours

🧵 ShopSmart uses n8n: webhook (new message) → AI classification node → route to refund/tracking/escalate. All 3 routing paths need testing, isolated from the LLM.

Topics Covered:

n8n architecture for QA — workflows, triggers, AI nodes, webhook handling
Testing trigger nodes — webhook validation and schema checking
Testing AI nodes — why you must mock the LLM output for routing tests
Failure injection with Mockoon — simulating AI node failures
Idempotency testing — why duplicate messages must not create duplicate DB records

📝 Assignment: n8n ShopSmart workflow + 12 test cases + Mockoon config + idempotency evidence

📦 PROJECT 1 SUBMISSION — ShopSmart AI QA Foundations Pack (End of Week 4)

Must include all of the following:

✅

ShopSmart AI QA Architecture Overview (2 pages) — Session 1

✅

LM Studio quality comparison: local vs cloud test generation scores — Session 3

✅

50-test suite with AI Audit column — Session 4

✅

Bruno API collection (35 requests) + Pact contract + Schemathesis fuzz — Session 6

✅

n8n test suite: 12 test cases + Mockoon failure evidence — Session 7

✅

Playwright POM: 12 ShopSmart E2E journeys + cross-browser + visual snapshots — Session 8

✅

SQL test suite: 15 ShopSmart business rules — Session 9

✅

GitHub Issues: 10 ShopSmart bugs with severity, priority, AI-suggested root cause

Assessment: Pass/Fail — instructor reviews GitHub repo. CI green badge required for Playwright.

WEEK 5: LLM Output Quality + RAG Testing

Sessions 10-11 • Promptfoo, DeepEval, RAGAS, and LangSmith

SESSION 10

🤖 LLM Output Quality: Promptfoo, DeepEval & Hallucination Detection

Duration: 2.5 hours

🧵 ShopSmart QA discovers the chatbot sometimes says "returns are free for 90 days" when the actual policy is 30 days. A systematic evaluation framework is needed.

Topics Covered:

Types of LLM failures: hallucination, refusal, relevance drift, consistency
Promptfoo — YAML eval suites, multi-provider, assertions
DeepEval — Answer Relevancy, Hallucination metrics
SelfCheckGPT — consistency-based hallucination scoring (no ground truth needed)
CI quality gate design — what threshold is "good enough" for ShopSmart?

📝 Assignment: 30-test Promptfoo YAML + hallucination report + CI quality gate

WEEK 6: AI Security + MCP Agents

Sessions 12-13 • Red-teaming, prompt injection, and AI QA agents

SESSION 12

🤖 AI Security: Prompt Injection, Garak & ShopSmart Red Team

Duration: 2.5 hours

🧵 ShopSmart red team challenge: "Can a customer trick the chatbot into revealing other customers' order details?" A full security test is required.

Topics Covered:

AI attack surface — how AI systems fail differently from traditional systems
Garak — automated LLM red-teaming: 100+ attack probe categories
Manual injection patterns — 15 ShopSmart-specific attack types
IDOR via prompt injection — testing cross-user data leakage
Indirect injection — embedding instructions in uploaded documents
Detoxify — toxicity scoring for AI responses

📝 Assignment: AI Security Report: Garak scan + 15 ShopSmart injection tests + IDOR evidence + Detoxify scores

WEEK 7: Performance Testing + ⚡ AI Load Testing

Sessions 14-15 • k6, Grafana, and the career-differentiating AI load testing session

SESSION 15 ⭐

⚡ Load Testing AI Systems: TTFT, Token Cost, RAG Quality & n8n Concurrent

Duration: 2.5 hours • THE SESSION ALMOST NO QA COURSE COVERS — YOUR CAREER DIFFERENTIATOR

🧵 ShopSmart CFO asks: "What will the AI chatbot COST at 5,000 concurrent users on Black Friday — and will the quality hold up?" Traditional load testing cannot answer this.

New Metrics You Will Learn:

Metric	What It Measures	Target
TTFT	Time-to-First-Token (streaming start)	< 800ms P95
TGT	Total Generation Time (full response)	< 4s P95
Tokens/sec	LLM processing speed under load	> 20 t/s at 50 VUs
Cost/request	Dollar cost per LLM call under concurrency	Track per level
RAGAS@Load	RAG quality degradation under concurrency	Faithfulness delta < 0.15

📝 Assignment: ⚡ ShopSmart AI Load Report: "Chatbot costs $847/hr at Black Friday peak · RAG quality drops 28% at 100 concurrent · n8n saturates at 47 concurrent" + Grafana dashboard

WEEK 8: Chaos Engineering + Security Testing

Sessions 16-17 • ToxiProxy, OWASP ZAP, and SOC2 compliance

📦 PROJECT 2 SUBMISSION — ShopSmart Automated AI QA Framework (End of Week 8)

Must include all of the following:

✅

Everything from Project 1 (updated)

✅

Promptfoo YAML suite: 30 test cases + hallucination report + CI gate — Session 10

✅

RAGAS eval: all 4 metrics + LangSmith trace + before/after improvement evidence — Session 11

✅

AI Security Report: Garak scan + 15 injection tests + IDOR evidence — Session 12

✅

MCP agent demo: ShopSmart issue → AI test → Playwright runs → PR (3-min video) — Session 13

✅

k6 load: ShopSmart baseline + Black Friday spike + Grafana dashboard — Session 14

✅

⚡ AI Load Report: TTFT + token cost + RAG quality under load + n8n saturation — Session 15

✅

Chaos Report: 6 failure scenarios + ToxiProxy config — Session 16

✅

Security Report: ZAP + nuclei + IDOR + CI integration — SOC2-ready pack — Session 17

Assessment: GitHub repo URL + CI green badge + live Allure + MCP demo video link

WEEK 9: Agentic QA — Testing AI That Decides & Acts

Sessions 18-19 • Goal-based testing and multi-agent systems

WEEK 10: Advanced n8n + AI Observability

Sessions 20-21 • Orchestrator testing and drift detection

WEEK 11: QA Architecture + CI/CD Quality Gates

Sessions 22-23 • Enterprise strategy and full pipeline implementation

WEEK 12: AI Governance + Capstone Demo Day

Sessions 24-25 • EU AI Act compliance and final presentation

SESSION 25 🎓

Demo Day: ShopSmart Full Journey Presentation & Certification

Duration: 2.5 hours

🧵 STORY COMPLETE: You started with a 1-page product brief in Session 1. You now have a fully tested, monitored, governed, CI/CD-integrated AI QA platform. Time to prove it.

Demo Day Activities:

Portfolio demo preparation — the 10-minute ShopSmart story structure
Student presentations — 10 minutes each
Peer review and scoring
Q&A challenge: "How would you adapt this for a financial services AI product?"
Certificate ceremony + career guidance + next steps

📊 Assessment & Grading

Grading Components:

Component	Weight	Description
Weekly Assignments	30%	24 practical assignments (one per session)
Project 1: Foundations Pack	15%	End of Week 4 submission
Project 2: Automated AI QA Framework	20%	End of Week 8 submission
Project 3: Capstone Portfolio	25%	Session 25 live demo + submission
Attendance & Participation	10%	Live session presence + GitHub activity

Certification Requirements:

Minimum 70% overall score
All 3 capstone projects submitted
80% attendance (minimum 20 out of 25 sessions)
Final ShopSmart portfolio on GitHub with CI green badge
Allure report deployed to GitHub Pages

🏆 PROFESSIONAL CERTIFICATION

Program Details

Duration

12 Weeks (25 Sessions)

Schedule

Weekends Only (Sat & Sun)

Total Hours

62.5 live + 60 self-study

Certificate

AI QA Engineer & Architect

Enroll Now Request Syllabus

AI-Native QA Engineer & Architect Mastery

📋 Course Information

🎯 Course Overview

👥 Target Audience

📖 Course Structure

Phase 1: AI Mindset + Vibe Coding + Local AI (Weeks 1–2)

Phase 2: Modern QA Toolkit + n8n (Weeks 3–4)

Phase 3: Testing GenAI Systems (Weeks 5–6)

Phase 4: Performance + AI Load Testing + Security (Weeks 7–8)

Phase 5: Agentic QA + Multi-Agent + Observability (Weeks 9–10)

Phase 6: QA Architecture + Governance + Demo Day (Weeks 11–12)

🗓️ Weekly Navigation

WEEK 1: AI Mindset + Vibe Coding + ShopSmart Kickoff

The AI QA Revolution & ShopSmart Kickoff

Topics Covered:

💭 Think Before You Build:

🛠️ Hands-on Lab:

Vibe Coding for QA — AI-Accelerated Test Engineering

Topics Covered:

🛠️ Hands-on Lab:

WEEK 2: Local AI Models + AI Test Design

🖥️ LM Studio: Running Local AI Models for QA

Topics Covered:

AI-Powered Test Design: Requirements to Test Cases

Topics Covered:

WEEK 3: LLM Internals + API Testing

GenAI Context: How LLMs, RAG & Agents Work (QA Perspective)

Topics Covered:

API Testing: Bruno, Hurl, Contract & Fuzz

Topics Covered:

WEEK 4: n8n Integration + Playwright E2E + Database

n8n Integration Testing: Workflows, AI Nodes & Failure Injection

Topics Covered:

📦 PROJECT 1 SUBMISSION — ShopSmart AI QA Foundations Pack (End of Week 4)

WEEK 5: LLM Output Quality + RAG Testing

🤖 LLM Output Quality: Promptfoo, DeepEval & Hallucination Detection

Topics Covered:

WEEK 6: AI Security + MCP Agents

🤖 AI Security: Prompt Injection, Garak & ShopSmart Red Team

Topics Covered:

WEEK 7: Performance Testing + ⚡ AI Load Testing

⚡ Load Testing AI Systems: TTFT, Token Cost, RAG Quality & n8n Concurrent

New Metrics You Will Learn:

WEEK 8: Chaos Engineering + Security Testing

📦 PROJECT 2 SUBMISSION — ShopSmart Automated AI QA Framework (End of Week 8)

WEEK 9: Agentic QA — Testing AI That Decides & Acts

WEEK 10: Advanced n8n + AI Observability

WEEK 11: QA Architecture + CI/CD Quality Gates

WEEK 12: AI Governance + Capstone Demo Day

Demo Day: ShopSmart Full Journey Presentation & Certification

Demo Day Activities:

📊 Assessment & Grading

Grading Components:

Certification Requirements:

Program Details

Technologies Covered

Key Outcomes