Monday, May 18, 2026

VMware — Complete Guide

VMware — Complete Guide

vSphere  ·  ESXi  ·  vCenter  ·  NSX-T  ·  vSAN  ·  Tanzu  ·  HCX  ·  Site Recovery  ·  Aria  ·  VCF  ·  Cheat Sheet



1. Core Concepts — VMware Portfolio Overview

What is VMware and how is its product portfolio organised?

VMware (now part of Broadcom) is the industry leader in virtualization and software-defined infrastructure — enabling organisations to abstract compute, network, and storage resources from underlying hardware and manage them as software.

CategoryProducts
Compute VirtualizationvSphere (ESXi + vCenter), VMware Workstation, VMware Fusion
Network VirtualizationNSX-T Data Center, NSX Advanced Load Balancer (Avi)
Storage VirtualizationvSAN, vSAN ESA, vSphere Virtual Volumes (vVols)
Cloud ManagementAria Suite (Operations, Automation, Log Insight, Network Insight)
Cloud-Native / ContainersTanzu (TKGs, TKGi, TMC, App Platform)
Hybrid CloudVMware Cloud Foundation (VCF), VMware Cloud on AWS/Azure/GCP
Disaster RecoverySite Recovery Manager (SRM), VMware Live Recovery
MigrationVMware HCX, vSphere Replication
Desktop VirtualizationHorizon (VDI), App Volumes, Dynamic Environment Manager
Tip: VMware's shift under Broadcom consolidates the portfolio into VMware Cloud Foundation (VCF) as the primary commercial offering — bundling vSphere, vSAN, and NSX-T into a single subscription. Understanding VCF is essential for modern VMware deployments.

What is the difference between VMware vSphere, ESXi, and vCenter?

ESXi (hypervisor):
→ Type-1 bare-metal hypervisor — runs directly on physical hardware
→ Replaces the OS: no underlying Windows or Linux required
→ Manages: CPUs, memory, storage, network interfaces
→ Hosts: virtual machines (VMs) and containers
→ Single host: can be managed standalone via DCUI or Host Client
→ Version: ESXi 8.x (current under Broadcom)

vCenter Server:
→ Centralised management platform for multiple ESXi hosts
→ Provides: vSphere Client (HTML5 UI), REST APIs, PowerCLI
→ Features: DRS, HA, vMotion, DPM, vSphere Lifecycle Manager
→ Deployment: vCenter Server Appliance (VCSA — Linux OVA)
→ No Windows vCenter — VCSA only since vSphere 7.x
→ Single Sign-On (SSO): authentication domain for all vSphere services

vSphere:
→ The product suite = ESXi + vCenter Server + related features
→ Not a single component — it's the umbrella term for the platform
→ Licences: vSphere Standard, vSphere Enterprise Plus (now VCF bundles)

Relationship:
Physical Server → ESXi (hypervisor) → Virtual Machines
ESXi (one or many) → managed by → vCenter Server
vCenter Server → part of → vSphere (the platform)

What are the core virtualisation concepts every VMware professional must know?

Hypervisor types:
Type-1 (bare-metal): ESXi, Hyper-V, KVM
  → Runs directly on hardware — no host OS
  → Lower overhead, better performance, enterprise use
Type-2 (hosted):     VMware Workstation, VirtualBox, Fusion
  → Runs on top of an existing OS (Windows/macOS)
  → Higher overhead, dev/test use only

Virtual Machine (VM):
→ Software-emulated computer — has vCPU, vRAM, vNIC, vDisk
→ Isolated from other VMs on same host
→ Runs any OS independently of underlying hardware
→ Portable: can migrate between hosts (vMotion), datacentres (HCX)

VM files:
.vmx       — VM configuration file (settings, hardware definition)
.vmdk      — VM disk file (virtual hard disk data)
-flat.vmdk — actual disk data; descriptor .vmdk is the header
.nvram     — VM BIOS/EFI state
.vmsd      — snapshot metadata
.vmsn      — snapshot state
.vmxf      — supplemental config
.log       — VM log file

Snapshots:
→ Point-in-time state capture (disk + memory + VM state optional)
→ NOT a backup — child disks grow; performance degrades over time
→ Best practice: no more than 2–3 snapshots per VM
→ Consolidate before decommissioning: right-click → Manage Snapshots

VMFS (Virtual Machine File System):
→ VMware's clustered file system for shared storage (SAN/iSCSI/FC)
→ Allows multiple ESXi hosts to access the same datastore concurrently
→ Supports vMotion, HA, DRS — requires shared VMFS datastore
→ Current version: VMFS-6 (supports 4K native drives, automatic UNMAP)

2. vSphere & ESXi — Deep Dive

How do vMotion, Storage vMotion, and Cross-vCenter vMotion work?

vMotion (live migration — compute):
→ Migrates a running VM between ESXi hosts with zero downtime
→ Requirements:
   - Shared storage (VMFS datastore accessible by both hosts)
   - Compatible CPUs (EVC mode if CPU generations differ)
   - L2 network connectivity (same port group or NSX-T stretched segment)
   - vMotion VMkernel port on both hosts
→ Process:
   1. vCenter pre-copies VM memory pages to destination host
   2. VM continues running on source; changed pages tracked
   3. At switchover: VM pauses (< 1 second), final memory state transferred
   4. VM resumes on destination; source VM deleted
→ Use case: host maintenance (Enter Maintenance Mode triggers auto-vMotion)

Storage vMotion (live migration — storage):
→ Migrates VM disk files between datastores with VM running
→ No CPU/network compatibility requirements (moves storage, not compute)
→ Requirements: vCenter, no RDMs (or use mapped LUN)
→ Use case: datastore maintenance, storage tiering, LUN reclamation

Cross-vCenter vMotion (xvMotion — long-distance):
→ Migrate VM between different vCenter Servers / sites
→ Can move compute + storage simultaneously
→ Requires: Enhanced Linked Mode OR HCX for stretched layer-2

EVC (Enhanced vMotion Compatibility):
→ Masks CPU features to a common baseline across a cluster
→ Allows vMotion between hosts with different CPU generations
→ Set at cluster level: Intel Broadwell, Cascade Lake, Ice Lake, etc.
→ EVC mode cannot be lowered if running VMs have higher baseline

What are vSphere HA, DRS, and DPM and when does each activate?

vSphere HA (High Availability):
→ Restarts VMs on surviving hosts if a host fails
→ Heartbeat: hosts exchange heartbeats every second (UDP 902)
→ Failure detection: no heartbeats for 10s → host declared failed
→ VM restart: vCenter restarts failed VMs on remaining hosts
→ VM/App monitoring: restarts VMs that stop sending VMware Tools heartbeats
→ Admission Control policies:
   - Percentage cluster resources reserved (e.g., 25% = tolerate 1/4 hosts)
   - Slot-based: reserves slots per host for worst-case failover
   - Dedicated failover hosts

DRS (Distributed Resource Scheduler):
→ Balances VM workloads across hosts in a cluster automatically
→ Monitors: CPU and memory utilisation per host every 5 minutes
→ Automation levels:
   Manual:         suggestions only — admin approves each migration
   Partially Auto: initial placement auto, balance manual
   Fully Auto:     vCenter migrates VMs automatically
→ DRS rules:
   Affinity:      keep VMs on same host (e.g., app + DB for latency)
   Anti-affinity: keep VMs on different hosts (e.g., HA pairs)
   Host affinity: pin VMs to specific hosts (licensing, hardware)

DPM (Distributed Power Management):
→ Consolidates workloads and powers down idle hosts to save energy
→ Wakes hosts when capacity is needed (IPMI/iLO/iDRAC WoL)
→ Best practice: enable only if hosts support remote power management

Cluster design best practice:
→ N+1 minimum for HA (lose 1 host, all VMs still fit)
→ N+2 for business-critical clusters
→ Never over-commit with HA admission control disabled
→ DRS Fully Automatic + Balanced threshold for production clusters

What is the VMkernel and what ports does it require?

VMkernel (vmk) — management network interfaces on ESXi:
→ Not a VM NIC — it's ESXi's own network stack
→ Each vmk port has: IP, subnet mask, gateway, VLAN tag, MTU

Service               Default vmk  Port / Protocol
─────────────────────────────────────────────────────────
Management            vmk0         TCP 443 (vSphere Client)
                                   TCP 902 (vCenter heartbeat)
vMotion               vmk1         TCP/UDP 8000 (vMotion data)
vSAN                  vmk2         UDP 12321, 23451 (vSAN traffic)
iSCSI / NFS           vmk3         TCP 3260 (iSCSI)
                                   TCP/UDP 111, 2049 (NFS)
Fault Tolerance (FT)  vmk4         TCP 80, 8100-8200 (FT logging)
Replication           vmk5         TCP 31031 (vSphere Replication)

Best practice:
→ Dedicate separate vmk ports to each service type
→ Use separate physical NICs (pNICs) for management vs vMotion vs vSAN
→ Jumbo frames (MTU 9000) for vMotion and vSAN traffic
→ vSAN and vMotion on 10/25/100 GbE NICs; management can share 1GbE

Networking constructs:
vSwitch (Standard): configured per host — no shared state
dvSwitch (VDS):     configured in vCenter — shared across hosts
  → Required for vMotion, DRS, HA features
  → Port groups: named network segments with policy (VLAN, security, QoS)
NSX-T segments:     overlay networks — no VLAN dependency

3. vCenter Server & Administration

How do you manage vSphere via PowerCLI?

# Connect to vCenter:
Connect-VIServer -Server vcenter.domain.local -Credential (Get-Credential)

# List all VMs in a cluster:
Get-Cluster "Production" | Get-VM | Select Name, PowerState, NumCpu, MemoryGB

# Power operations:
Start-VM -VM "WebServer01"
Stop-VMGuest -VM "WebServer01" -Confirm:$false    # graceful shutdown via Tools
Stop-VM -VM "WebServer01"     -Confirm:$false    # power off (hard)

# vMotion — live migrate VM to a different host:
Move-VM -VM "AppServer01" -Destination (Get-VMHost "esxi02.domain.local")

# Storage vMotion — migrate VM disk to a different datastore:
Move-VM -VM "AppServer01" -Datastore (Get-Datastore "SSD-DS-02")

# Create a snapshot (quiesced, no memory):
New-Snapshot -VM "DBServer01" -Name "Pre-Patch-$(Get-Date -f yyyyMMdd)" `
             -Description "Before monthly patching" -Memory $false -Quiesce $true

# Remove all snapshots on a VM:
Get-VM "DBServer01" | Get-Snapshot | Remove-Snapshot -Confirm:$false

# VM inventory export:
Get-VM | Select Name, @{N="ToolsVersion";E={$_.ExtensionData.Guest.ToolsVersion}},
                @{N="GuestOS";E={$_.ExtensionData.Guest.GuestFullName}},
                NumCpu, MemoryGB, ProvisionedSpaceGB |
         Export-Csv "vm-inventory.csv" -NoTypeInformation

# List hosts in maintenance mode:
Get-VMHost | Where-Object {$_.ConnectionState -eq "Maintenance"} |
             Select Name, ConnectionState, Version

# Get datastore space usage — sorted by used %:
Get-Datastore | Select Name, CapacityGB,
  @{N="FreeGB";E={[math]::Round($_.FreeSpaceGB,1)}},
  @{N="UsedPct";E={[math]::Round((1-($_.FreeSpaceGB/$_.CapacityGB))*100,1)}} |
  Sort UsedPct -Descending

# Place host into maintenance mode (evacuates VMs via vMotion):
Set-VMHost -VMHost "esxi01.domain.local" -State Maintenance

What are the key vCenter components and their dependencies?

vCenter Server Appliance (VCSA) components:
────────────────────────────────────────────────────────────────
Service              Port       Purpose
────────────────────────────────────────────────────────────────
vSphere Client       443        HTML5 web UI
vCenter SSO          443/7444   Authentication — issues SAML tokens
Lookup Service       7444       Service registry for vSphere services
Inventory Service    10443      VM/host/datastore inventory
vSphere API (SDK)    443        REST + SOAP APIs for automation
PostgreSQL DB        5432       Embedded DB (events, tasks, inventory)
vSAN Health          8006       vSAN monitoring service
Update Manager       9084/9087  vSphere Lifecycle Manager (patching)
────────────────────────────────────────────────────────────────

VCSA sizing (production recommendations):
Small:   up to 100 hosts /  1,000 VMs  —  2 vCPU, 12 GB RAM, 415 GB storage
Medium:  up to 400 hosts /  4,000 VMs  —  8 vCPU, 24 GB RAM, 480 GB storage
Large:   up to 1000 hosts / 10,000 VMs — 16 vCPU, 32 GB RAM, 870 GB storage
X-Large: up to 2000 hosts / 35,000 VMs — 24 vCPU, 48 GB RAM, 1665 GB storage

vCenter High Availability (vCHA):
→ Active-Passive pair with a Witness node (3 nodes total)
→ Automatic failover if Active VCSA fails (RTO: ~5 minutes)
→ No impact on running VMs (ESXi HA is independent of vCenter)
→ Requires: 3 VMs, shared/stretched network, 250 Mbps between sites

vSphere Lifecycle Manager (vLCM):
→ Manages ESXi patches, updates, firmware (replaces VUM in 7.x+)
→ Image-based management: define desired ESXi image per cluster
→ Depot: download patches from VMware or air-gapped offline bundle
→ Remediation: puts host in maintenance mode → patches → reboots → exits

4. VMware NSX-T — Network Virtualization

What is NSX-T and what problem does it solve?

NSX-T Data Center decouples networking and security from physical hardware — delivering software-defined networking, micro-segmentation, and consistent policy across on-premises and cloud environments.

Why NSX-T:
→ Traditional VLANs require physical switch changes for every new segment
→ Firewall rules applied at perimeter only — east-west traffic unprotected
→ No consistent networking policy across vSphere, bare-metal, and cloud
→ NSX-T solves all three with overlay networking and distributed security

NSX-T Architecture:
┌─────────────────────────────────────────────────────────────┐
│ NSX Manager (Control Plane + Management Plane)              │
│  - 3-node cluster for HA                                    │
│  - Stores policy, pushes config to data plane               │
└──────────────────────────────┬──────────────────────────────┘
                               │ Pushes fabric config
                               ▼
┌─────────────────────────────────────────────────────────────┐
│ Transport Nodes (Data Plane — ESXi hosts + bare-metal)      │
│  - NSX kernel modules installed on ESXi                     │
│  - N-VDS or VDS7+ as the virtual switch                     │
│  - TEP (Tunnel Endpoint): GENEVE overlay tunnels            │
└─────────────────────────────────────────────────────────────┘

Key NSX-T constructs:
Overlay Segments:           Layer-2 logical switches — no VLAN dependency
Tier-1 (T1) Router:         per-tenant/application gateway
Tier-0 (T0) Router:         datacenter edge — BGP/OSPF to physical fabric
Edge Nodes:                 NSX-T gateways for N-S traffic
Distributed Firewall (DFW): stateful firewall in ESXi kernel — every vNIC
NSX ALB (Avi):              L4/L7 load balancer, WAF, GSLB

Micro-segmentation with DFW:
→ Firewall rules enforced at vNIC level — every VM protected
→ Rules follow the VM (vMotion, migrations) — not tied to IP/port
→ Groups: dynamic workload grouping by tag, OS, VM name
→ East-west traffic secured without hairpinning to perimeter firewall
→ Example policy:
   Group "WebTier" (tag:Web) → Group "AppTier" (tag:App) → Allow TCP 8080
   Group "AppTier"            → Group "DBTier"  (tag:DB)  → Allow TCP 1433
   Any                        → Any                        → Block (default deny)

How does NSX-T routing work (Tier-0 / Tier-1)?

Two-tier routing model:

Tier-0 (T0) Gateway:
→ Connects NSX overlay to physical network (underlay)
→ Runs BGP/OSPF with physical ToR switches
→ Handles: N-S traffic, NAT (SNAT/DNAT), VPN (IPsec/L2VPN)
→ Deployed on Edge Transport Nodes (dedicated VMs or bare-metal)
→ Active-Active or Active-Standby HA modes

Tier-1 (T1) Gateway:
→ Per-application or per-tenant logical router
→ Connected to T0 upstream, overlay segments downstream
→ Handles: inter-segment routing, DHCP relay/server, DNS forwarder
→ Runs distributed on all ESXi transport nodes (low latency)

Traffic flow examples:
VM-to-VM (same segment):             switch locally on host — no router
VM-to-VM (different segments, T1):   routed at distributed T1 in host kernel
VM-to-VM (different T1s):            routed up to T0, back down to T1
VM-to-Internet:                      T1 SR → T0 SR on Edge → physical → Internet

Route redistribution:
→ T0 advertises NSX overlay prefixes into BGP to physical fabric
→ Physical fabric advertises default route / external routes into T0
→ T1 routes auto-redistributed to T0 via internal protocol

5. vSAN — Software-Defined Storage

What is vSAN and how does it work?

VMware vSAN aggregates local disks from ESXi hosts into a shared distributed datastore — eliminating the need for external SAN or NAS in most workloads.

vSAN architecture:

Each ESXi host contributes: NVMe/SSD (cache) + SSD/HDD (capacity)
→ All-Flash:  NVMe cache tier + SSD capacity tier (best performance)
→ Hybrid:     SSD cache tier + HDD capacity tier (lower cost)
→ ESA (Express Storage Architecture): all-NVMe, single-tier, vSAN 8.x

vSAN Objects and Components:
VM Storage Objects:
  - VM Home namespace (vmx, logs)
  - VM Swap object
  - VMDK data object (one per disk)
Each object is split into Components distributed across hosts:
  Component = chunk of data stored on a single host's disk group
  Witness   = tie-breaker metadata component (no user data)

Storage Policy Based Management (SPBM):
→ Per-VM storage policies — not per-datastore
→ FTT (Failures to Tolerate):
   FTT=1: survives 1 host/disk failure
   FTT=2: survives 2 failures
→ RAID type:
   RAID-1 (mirroring):  FTT=1 requires 3 hosts
   RAID-5 (erasure):    FTT=1 requires 4 hosts (more efficient)
   RAID-6:              FTT=2 requires 6 hosts

vSAN Cluster sizing:
Minimum hosts: 3 (FTT=1 RAID-1), 4 (FTT=1 RAID-5), 6 (FTT=2 RAID-6)
Recommended:   6+ hosts for production flexibility
vSAN network:  dedicated VMkernel port, 10/25 GbE, MTU 9000
2-node cluster: requires separate Witness Host (no storage contributed)

vSAN Stretched Cluster:
→ Two active sites + witness site
→ All writes mirrored to both sites (synchronous) — RPO = 0
→ Automatic HA failover between sites
→ Requires: < 5ms RTT between sites, < 200ms RTT to witness

What are vSAN performance monitoring best practices?

Key vSAN metrics (Skyline Health / vSAN Proactive Tests):
Latency:      read/write latency (target: <1ms read, <5ms write — all-flash)
Throughput:   MB/s per disk group and per VM object
IOPS:         read/write IOPS per VM, per host, per disk group
Congestion:   >0 congestion events = disk group bottleneck
Resync:       % resync traffic — elevated after host failure/add
Cache hit %:  flash read cache hit ratio (hybrid tier only)

Common vSAN issues and resolutions:
Issue                               Resolution
─────────────────────────────────────────────────────────────────────
vSAN component inaccessible         Check host connectivity, disk health
High congestion / latency           Check disk group IOPS; add hosts/disks
Resync traffic impacting production Enable resync throttling (IOPS limit)
Capacity imbalance between hosts    Rebalance via vSAN proactive rebalance
Disk failures                       Replace disk; policy compliance auto-restores
Snapshot performance degradation    Limit snapshot count; consolidate regularly

PowerCLI — vSAN health check:
Get-VsanDisk | Select VMHost, CanonicalName, State, Tier, OperationalState |
               Where-Object {$_.OperationalState -ne "ok"} |
               Format-Table -AutoSize

Get-VsanClusterConfiguration -Cluster "vSAN-Cluster" |
  Select SpaceEfficiencyEnabled, EncryptionEnabled, HealthCheckEnabled

6. VMware Tanzu & Cloud-Native

What is VMware Tanzu and how does it fit with vSphere?

VMware Tanzu is VMware's portfolio of products for running, managing, and securing Kubernetes workloads — both on-premises (on vSphere) and across clouds.

Tanzu product breakdown:
────────────────────────────────────────────────────────────────────────
Product                       Description
────────────────────────────────────────────────────────────────────────
TKGs (Tanzu K8s Grid          K8s clusters provisioned in vSphere Namespaces
 with Supervisor)              vSphere 7/8 required; NSX-T or NSX ALB for LB

TKGi (Tanzu K8s Grid          Enterprise K8s with BOSH; air-gapped, FIPS
 Integrated)                   Required for PKS migrations

Tanzu Mission Control (TMC)   Centralised multi-cluster K8s management (SaaS)
                               Policy, compliance, cost visibility

Tanzu Application Platform    Developer-focused app platform (TAP) on K8s
(TAP)                          Supply chain security, app accelerators

Tanzu Service Mesh             Service mesh for micro-services (Global Namespace)
────────────────────────────────────────────────────────────────────────

vSphere with Tanzu (Workload Control Plane — WCP):
→ vSphere 7+ Supervisor feature: run K8s natively in vCenter
→ vSphere Namespace: RBAC-controlled project space for K8s clusters
→ Storage: vSAN/vVols as PersistentVolumes via CNS (Container Native Storage)
→ Networking: NSX-T recommended; VDS 7+ also supported

Key concepts:
Supervisor Cluster:   vSphere control plane K8s cluster (runs on ESXi hosts)
Tanzu K8s Cluster:    guest cluster in a vSphere Namespace for workloads
VM Class:             pre-defined VM sizes (best-effort, guaranteed)
Storage Policy:       maps to vSAN SPBM policy for PVC provisioning
Load Balancer:        NSX ALB or HA Proxy for external service exposure

kubectl on vSphere:
# Login to Supervisor:
kubectl vsphere login --vsphere-username admin@vsphere.local \
                      --server wcp.domain.local

# List vSphere Namespaces:
kubectl get namespaces

# Apply a Tanzu K8s Cluster manifest:
kubectl apply -f tanzu-cluster.yaml

# Access workload cluster:
kubectl vsphere login --server wcp.domain.local \
  --vsphere-username admin@vsphere.local \
  --tanzu-kubernetes-cluster-name prod-cluster \
  --tanzu-kubernetes-cluster-namespace dev-namespace

7. VMware Cloud Foundation & Multi-Cloud

What is VMware Cloud Foundation (VCF) and what does it include?

VCF = VMware's integrated private cloud platform
→ Bundles: vSphere, vSAN, NSX-T, Aria Suite under one subscription
→ Deployed on validated hardware (VxRail, off-the-shelf servers)
→ Managed by: SDDC Manager (lifecycle automation, brownfield import)

VCF components:
┌─────────────────────────────────────────────────────────────────────┐
│ SDDC Manager — orchestrates bring-up, patching, scaling             │
├─────────────────────────────────────────────────────────────────────┤
│ vCenter Server (management) │ vSAN (storage) │ NSX-T (networking)   │
├─────────────────────────────────────────────────────────────────────┤
│ Aria Operations │ Aria Automation │ Aria Log Insight │ Aria Networks  │
└─────────────────────────────────────────────────────────────────────┘

VCF domains:
Management Domain:   vCenter, NSX Manager, SDDC Manager, Aria Suite
Workload Domains:    isolated vSphere+vSAN+NSX-T cluster for production VMs
VI Workload Domain:  standard VM workloads
VVD (Validated):     pre-tested reference architecture for specific use cases

Bring-up workflow:
1. Physical server preparation (BIOS settings, NIC connectivity verified)
2. Cloud Builder VM: orchestrates automated VCF bring-up (JSON spec input)
3. SDDC Manager deployed → manages ESXi patching, domain expansion
4. Workload domain: add hosts → SDDC Manager creates vSphere+vSAN+NSX cluster

VMware HCX (Hybrid Cloud Extension):
→ Application mobility across any vSphere environment or VMware Cloud
→ Migration types:
   Cold migration:              offline VM copy — no vMotion required
   vMotion:                     live migration (zero downtime) up to 150ms RTT
   Bulk migration:              mass move with maintenance window
   Replication-assisted vMotion (RAV): pre-seeds data, then live cutover
→ Components: HCX Manager, Service Mesh, Interconnect, WAN Opt, Network Ext
→ Network Extension: stretch L2 VLAN/NSX segment between sites — no IP renumbering

VMware Cloud on AWS (VMC):
→ VMware-managed vSphere SDDC running on dedicated AWS bare-metal
→ Same vSphere/NSX-T/vSAN stack — no re-training required
→ Native access to AWS services (RDS, S3, ELB in same AZ)
→ HCX included: migrate VMs from on-prem to VMC non-disruptively
→ Use case: DR, datacenter extension, cloud bursting

8. Scenario-Based Questions

Scenario 1: A production ESXi host fails. Walk through the HA and recovery process.

Immediate HA response (automatic):

  1. Failure detection: surviving ESXi hosts stop receiving heartbeats from the failed host (default 10 seconds). vCenter marks the host as "Not Responding."
  2. HA Master election: the HA Master host (elected from the cluster via datastore heartbeats) takes ownership of recovery.
  3. VM restart: the Master instructs surviving hosts to restart VMs from the failed host. Restart order follows VM restart priority (High → Medium → Low → Disabled).
  4. Admission Control validation: HA checks reserved capacity (N+1 design). If sufficient, all VMs restart within 1–3 minutes.

Post-failure remediation:

  1. Identify root cause: check iDRAC/iLO hardware alerts, DCUI on host, vCenter Alarms, Aria Log Insight for ESXi syslog around failure time.
  2. Host replacement/repair: replace failed components (PSU, NIC, disk). If connectivity issue, check ToR switch and vmk0 management network.
  3. Reintroduce host: reconnect host to vCenter. vSphere Lifecycle Manager (vLCM) remediates host to cluster image baseline.
  4. vSAN resync: if vSAN cluster, failed host's components are rebuilt on surviving hosts within FTT policy. Monitor resync completion in vSAN Health.
  5. Post-mortem: document timeline, RCA, and verify HA design covers future single-host failures.
Scenario 2: Design a VMware platform for a 500-VM production environment with DR.

Primary datacenter design:

  1. Compute: 6 ESXi hosts per cluster (N+2 for HA; DRS Fully Automatic). Two clusters: Management (vCenter, NSX Manager, Aria) + Workload (production VMs). All hosts: 2× 25GbE (vSAN + NSX-T TEP), 2× 10GbE (management + vMotion).
  2. Storage: vSAN All-Flash on workload hosts. Policy: FTT=1 RAID-5 for standard workloads, FTT=2 RAID-6 for Tier-1 databases. vSAN stretched cluster for RPO=0 if budget permits.
  3. Networking: NSX-T Data Center. Tier-0 BGP peering to ToR switches (ECMP for bandwidth). Tier-1 per application tier. DFW micro-segmentation: default deny east-west. NSX ALB for application load balancing.
  4. Management: VCSA in vCHA (3-node HA) on management cluster. NSX Manager 3-node cluster. Aria Operations for performance monitoring and capacity planning.

DR site design:

  1. Site Recovery Manager (SRM): policy-based replication and automated failover orchestration. Recovery Plans define failover order, IP customisation, pre/post scripts.
  2. vSphere Replication: RPO 5-minute minimum per VM (no shared storage required). Configure replication groups matching application tiers.
  3. RTO/RPO targets: RTO 4 hours (SRM automated failover + smoke tests), RPO 15 minutes. Tier-1: RPO 5 minutes.
  4. DR testing: SRM supports non-disruptive test failovers (isolated network bubble). Test quarterly; review recovery plan annually.
Scenario 3: Migrate 200 VMs from on-premises vSphere to VMware Cloud on AWS.
  1. Assessment: inventory VM sizes, dependencies, storage consumption, and network segments using RVTools and vSphere Replication assessment.
  2. HCX deployment: deploy HCX Connector on-premises (OVA), activate with VMC HCX Cloud Manager. Create Service Mesh (interconnect, WAN optimisation, network extension).
  3. Network Extension: extend on-premises VLANs/NSX segments to VMC via HCX Network Extension. VMs retain IP addresses — no renumbering during migration.
  4. Migration wave planning: group VMs by application dependency (DB before App, App before Web). Schedule maintenance windows for stateful workloads.
  5. Migration execution:
    • Tier-3 (dev/test): Bulk Migration — replicate overnight, cutover in window
    • Tier-2 (standard): RAV — pre-seed data, live cutover under 1 minute
    • Tier-1 (business-critical): vMotion — zero downtime live migration
  6. DNS/LB cutover: update DNS records and load balancer backends to VMC IPs after each wave. Validate application health before proceeding.
  7. Network extension retirement: once all VMs migrated, unextend segments, update routing, decommission HCX Network Extension.
  8. Post-migration: right-size VMs using Aria Operations recommendations. Apply VMC-appropriate storage policies. Remove HCX Service Mesh.
Scenario 4: Troubleshoot a VM with network connectivity issues in an NSX-T environment.
Step 1 — VM layer:
→ Ping default gateway from within VM
→ Check NIC driver and VMware Tools version
→ Verify IP/subnet/gateway configuration

Step 2 — NSX-T logical port:
→ NSX Manager → Networking → Segments → find VM's segment → check logical port
→ Verify port state: UP, Admin State: UP, Attachment: correct VM
→ API: GET /api/v1/logical-ports?attachment_id={vm-id}

Step 3 — Distributed Firewall:
→ NSX Manager → Security → Distributed Firewall → check flow analysis
→ Enable DFW logging on suspect rules temporarily
→ On ESXi host:
   vsipioctl getrules -f nic:xxxx   (DFW rules applied to vNIC)
   vsipioctl getflows -f nic:xxxx   (active connections)

Step 4 — Segment and T1 Gateway:
→ Check T1 routing table: NSX Manager → Tier-1 → Routes
→ Verify segment is attached to correct T1
→ Test: SSH to T1 edge SR node → ping VM gateway IP

Step 5 — T0 / Edge Node:
→ Check T0 BGP status: NSX Manager → Tier-0 → BGP Neighbours
→ Verify routes propagated to physical switches
→ Test N-S: ping from VM to external IP → check SNAT rules

Step 6 — Transport Node (ESXi):
→ NSX Manager → System → Fabric → Transport Nodes → Health (check VTEP)
→ On host: esxcli network vswitch dvs vmware vxlan stats get
→ Packet capture on TEP vmkernel: pktcap-uw --dir 2 --vmk vmk2

9. Cheat Sheet — Quick Reference

VMware Component Selection Guide

Requirement                                → Component
─────────────────────────────────────────────────────────────────────
Server virtualisation                      → vSphere (ESXi + vCenter)
Shared storage (all-flash, hyperconverged) → vSAN All-Flash / ESA
Network virtualisation + micro-segment.    → NSX-T Data Center
Load balancing (L4/L7, WAF, GSLB)         → NSX Advanced LB (Avi)
Kubernetes on vSphere                      → vSphere with Tanzu (WCP)
Multi-cluster K8s management               → Tanzu Mission Control (TMC)
VM performance monitoring / forecasting    → Aria Operations
Log aggregation (vSphere/NSX/guest)        → Aria Log Insight
IaC / self-service cloud automation        → Aria Automation
Network visibility and flow analysis       → Aria Networks (vRNI)
Live VM migration between sites            → vMotion / HCX vMotion
Mass migration on-prem → VMC/cloud         → HCX Bulk / RAV
DR orchestration + automated failover      → Site Recovery Manager (SRM)
Replication (no shared storage required)   → vSphere Replication
Integrated SDDC lifecycle management       → VMware Cloud Foundation (VCF)
VDI / virtual desktops                     → VMware Horizon

vSphere HA / DRS / DPM / FT Summary

FeatureTriggerAction
HAHost heartbeat loss (10s)Restart VMs on surviving hosts
HA App MonitorVMware Tools heartbeat lossRestart VM OS
DRSCPU/Mem imbalance (every 5 min)vMotion VMs to balance hosts
DRS RulesAffinity policy definedKeep / separate VMs on hosts
DPMLow cluster utilisationPower off hosts; wake on demand
FTVM failureInstant failover to shadow VM (RPO=0, RTO=0)

ESXi Networking Quick Reference

Standard vSwitch (VSS):
→ Per-host config — no central management
→ No distributed features (no vMotion port group portability)
→ Use for: management vmk0 on standalone/small hosts only

Distributed vSwitch (VDS):
→ Managed centrally in vCenter — config pushed to all member hosts
→ Required for DRS, vMotion, and advanced HA features
→ LACP, LLDP, NetFlow, port mirroring support
→ Use for: all production clusters

NSX-T N-VDS / VDS7+:
→ Overlay networking — VMs on virtual segments (no VLAN dependency)
→ GENEVE encapsulation for TEP tunnels (replaces VXLAN in NSX-T)
→ vSphere 7.x+: N-VDS merged into VDS — single switch for both

NIC Teaming policies (VDS):
Route based on originating port: default; good for most workloads
Route based on IP hash:          requires LACP/EtherChannel on physical switch
Route based on physical NIC load: LBT — VDS only; dynamic balancing
Use explicit failover order:      active-passive, deterministic

PowerCLI Quick Reference

# Connect / Disconnect:
Connect-VIServer -Server vcenter.domain.local
Disconnect-VIServer -Confirm:$false

# VM management:
Get-VM | Where PowerState -eq "PoweredOff"
Get-VM "VMName" | Start-VM
Get-VM "VMName" | Restart-VMGuest
Get-VM "VMName" | Get-Snapshot | Remove-Snapshot -Confirm:$false
Move-VM -VM "VMName" -Destination (Get-VMHost "esxi02")
Move-VM -VM "VMName" -Datastore  (Get-Datastore "DS01")

# Host management:
Get-VMHost | Select Name, ConnectionState, Version, NumCpu, MemoryTotalGB
Set-VMHost -VMHost "esxi01" -State Maintenance
Get-VMHost -State Maintenance | Set-VMHost -State Connected

# Inventory and reporting:
Get-VM | Select Name, NumCpu, MemoryGB, ProvisionedSpaceGB |
         Export-Csv "vm-report.csv" -NoTypeInformation
Get-Datastore | Select Name, CapacityGB, FreeSpaceGB |
                Sort FreeSpaceGB | Format-Table

# vSAN health:
Get-VsanDisk | Where OperationalState -ne "ok"
Get-VsanClusterConfiguration -Cluster "vSAN-Cluster"

# Alarms and events:
Get-AlarmDefinition | Where Enabled -eq $true
Get-VIEvent -MaxSamples 100 -Types Error | Select CreatedTime, FullFormattedMessage

vSAN Storage Policy Reference

FTTRAID TypeMin HostsOverheadUse Case
1RAID-13100%Small clusters, Tier-1 (mirroring)
1RAID-5433%Standard production (efficient)
2RAID-15200%Highest resilience, Tier-0
2RAID-6650%Balanced resilience + efficiency
0None10%Test/Dev only — no protection
All-Flash recommendations: Tier-1 (DB/critical): FTT=2 RAID-6 (6+ hosts)  ·  Tier-2 (standard VMs): FTT=1 RAID-5 (4+ hosts)  ·  Tier-3 (dev/test): FTT=1 RAID-1 (3+ hosts). Never use FTT=0 in production.

Top 10 VMware Tips

  1. vSphere is not just ESXi. vSphere is the platform (ESXi + vCenter + features). Always distinguish ESXi (hypervisor), VCSA (management), and vSAN/NSX-T (infrastructure) when designing or troubleshooting solutions.
  2. Snapshots are not backups. Snapshots consume disk space proportional to VM change rate and degrade performance over time. Use a dedicated backup solution (Veeam, VADP-based) for data protection. Consolidate snapshots before every maintenance window.
  3. EVC protects vMotion across CPU generations. Set EVC at cluster creation, before adding hosts. You can raise EVC mode but cannot lower it while VMs are powered on. Baseline to the oldest CPU family in the cluster.
  4. vSAN network requires jumbo frames (MTU 9000) end-to-end. Configure MTU 9000 on VMkernel port, VDS uplink, ToR switch ports, and all inter-switch links. An MTU mismatch silently drops vSAN traffic above 1500 bytes — often misdiagnosed as disk failure.
  5. NSX-T DFW follows the VM — not the IP. Firewall rules applied to Security Groups (dynamic by VM tag/name) stay with VMs through vMotion, DR failover, and migrations. Design security policy around workload identity, not network location.
  6. vCenter HA is not the same as vSphere HA. vSphere HA restarts VMs on host failure. vCenter HA protects the VCSA management appliance itself. Both coexist independently. Running VMs are completely unaffected by a vCenter outage.
  7. HCX is the safest migration path for large-scale moves. HCX Network Extension eliminates IP renumbering. Use RAV migration for Tier-1 workloads — replication pre-seeds data and final cutover is under 1 minute with minimal downtime.
  8. SPBM decouples storage policy from infrastructure. Define VM storage requirements (FTT, RAID, IOPS) in a policy; vSAN enforces it automatically. When hosts are added, vSAN rebalances to meet all policies without manual intervention.
  9. VCF SDDC Manager owns lifecycle management. In VCF deployments, never patch ESXi, vCenter, or NSX manually. SDDC Manager orchestrates the patching sequence (NSX → vSAN → vCenter → ESXi) to maintain component compatibility. Manual patching breaks SDDC Manager state.
  10. Monitor capacity proactively with Aria Operations. vSAN "time remaining" and Aria capacity forecasting give weeks of notice before exhaustion. Set alarms at 70% capacity threshold (not 90%) to allow orderly expansion without emergency procurement.

Sunday, May 17, 2026

Microsoft Azure Solutions Architect Expert (AZ-305) Complete Guide

Microsoft Azure Solutions Architect Expert (AZ-305) — Complete Guide

Well-Architected Framework · Identity & Governance · Data Storage · Infrastructure · Business Continuity · Landing Zones · Scenarios · Cheat Sheet

Top Hashtags: #AZ305, #AzureSolutionsArchitect, #MicrosoftAzure, #CloudArchitecture, #AzureArchitect, #WellArchitectedFramework, #CloudAdoptionFramework, #AzureDesign, #MicrosoftCertified, #CloudComputing


Table of Contents

  1. Exam Overview & Architect Mindset
  2. Design Identity, Governance & Monitoring (25–30%)
  3. Design Data Storage Solutions (25–30%)
  4. Design Infrastructure Solutions (30–35%)
  5. Design Business Continuity Solutions (10–15%)
  6. Azure Well-Architected Framework & Landing Zones
  7. Scenario-Based Questions
  8. Cheat Sheet — Quick Reference

1. Exam Overview & Architect Mindset

AZ-305 Exam at a Glance

The AZ-305 exam validates expertise in designing cloud and hybrid solutions — translating business requirements into Azure architectures aligned with the Well-Architected Framework and Cloud Adoption Framework. This is a design exam, not a configuration exam.

Skill Domain Exam Weight
Design infrastructure solutions 30–35% ← LARGEST
Design identity, governance, and monitoring solutions 25–30%
Design data storage solutions 25–30%
Design business continuity solutions 10–15%

Prerequisite: AZ-104 (Azure Administrator Associate) must be held before earning AZ-305 Expert certification.

Critical mindset shift: AZ-305 questions present a business scenario with stated requirements and ask you to recommend, evaluate, or compare Azure services. The key skill is understanding WHY one service fits a scenario better than another — not just knowing what each service does.


What is the Azure Well-Architected Framework?

The Azure Well-Architected Framework (WAF) is Microsoft's set of guiding principles for building high-quality cloud solutions. Every AZ-305 design decision should be justified using WAF pillars.

Five WAF Pillars:

1. Reliability:
→ Design for failure — assume components will fail
→ Use redundancy: Availability Zones, geo-replication, multi-region
→ Define and meet RTO (Recovery Time Objective) and RPO (Recovery Point Objective)
→ Health monitoring, automatic failover, circuit breakers
→ Example: deploy SQL Database with zone-redundant Business Critical tier
             + auto-failover group to paired region

2. Security:
→ Defence in depth: multiple layers of security
→ Principle of least privilege: minimum permissions needed
→ Zero Trust: verify explicitly, assume breach, limit blast radius
→ Encrypt data at rest and in transit; manage secrets in Key Vault
→ Example: Private Endpoints for PaaS, WAF on Application Gateway,
             Managed Identity instead of service accounts

3. Cost Optimisation:
→ Right-size resources — don't over-provision
→ Match consumption to business value (pay for what you use)
→ Reserved Instances for predictable workloads
→ Use PaaS over IaaS where possible (lower management overhead)
→ Azure Hybrid Benefit for existing licences

4. Operational Excellence:
→ Infrastructure as Code (Bicep/Terraform) — repeatable, version-controlled
→ CI/CD pipelines for deployment automation
→ Monitoring: Azure Monitor, Log Analytics, Application Insights
→ Runbooks and automated remediation
→ Change management: blue-green deployments, feature flags

5. Performance Efficiency:
→ Scale horizontally (add instances) not just vertically (bigger VM)
→ Cache frequently accessed data (Azure Cache for Redis)
→ CDN for static content delivery (Azure Front Door / CDN)
→ Choose the right service tier for the workload profile
→ Autoscaling: VMSS, App Service autoscale, AKS cluster autoscaler

What is the Cloud Adoption Framework (CAF)?

Cloud Adoption Framework (CAF):
→ Microsoft's guidance for adopting Azure at enterprise scale
→ Covers: strategy, planning, landing zones, governance, operations

CAF phases:
Strategy:    define business justification, expected outcomes
Plan:        assess current state, create migration backlog
Ready:       set up landing zones, management groups hierarchy
Adopt:       migrate or modernise workloads (lift-and-shift, re-architecture)
Govern:      establish guardrails — Azure Policy, cost management, security baseline
Manage:      operations baseline — monitoring, backup, disaster recovery
Innovate:    build cloud-native solutions, AI/ML, data analytics

Azure Landing Zone:
→ Preconfigured, governance-ready Azure environment for enterprise adoption
→ Includes: management group hierarchy, Azure Policy assignments, networking,
            identity, security baseline, monitoring
→ Conceptual architecture: Platform vs Application landing zones
→ Platform: connectivity hub, identity, management subscriptions (managed by central IT)
→ Application: individual workload subscriptions (managed by application teams)
→ Accelerator: Bicep/Terraform modules to deploy landing zone in hours

Management Group hierarchy (CAF-recommended):
Tenant Root Group
  ├── Platform (managed by central IT)
  │     ├── Management (Log Analytics, Automation)
  │     ├── Identity (AD DS, Entra Connect)
  │     └── Connectivity (hub VNet, VPN/ExpressRoute, Firewall)
  └── Landing Zones (workload teams)
        ├── Corp (connected to hub, private)
        └── Online (internet-facing, DMZ)

2. Design Identity, Governance & Monitoring (25–30%)

How do you design an identity architecture for a hybrid enterprise?

Hybrid identity decision tree:

Question 1: Do you have on-premises Active Directory?
No:  → Cloud-only Entra ID. Use Entra ID Joined devices. Done.
Yes: → Need Entra Connect for identity synchronisation

Question 2: Authentication method for synchronised identities?
Password Hash Sync (PHS): → RECOMMENDED for most orgs
  Pros: works if on-prem unavailable, leaked credential detection,
        seamless SSO with seamless SSO extension
  Cons: password hash in cloud (some orgs have policy against this)

Pass-Through Auth (PTA): → when policy prohibits cloud password hashes
  Pros: passwords never leave on-prem, real-time account state
  Cons: requires on-prem PTA agent availability, no leaked cred detection

Federation (AD FS): → only when advanced requirements (smart cards,
                       complex claims transformation)
  Pros: full control over authentication claims
  Cons: complex, expensive infrastructure, Microsoft recommends migrating away

Question 3: Device management?
Cloud-native devices:  → Entra ID Joined + Intune (MDM)
Existing domain:       → Hybrid Entra ID Joined (domain join + Entra registration)
BYOD:                  → Entra ID Registered + MAM (App Protection Policies)

Identity architecture for a 10,000-person hybrid enterprise:
→ Entra Connect with PHS + Seamless SSO
→ All admin roles in PIM (eligible assignments, not permanent)
→ Break-glass accounts: 2 cloud-only Global Admins, excluded from all CA
→ Conditional Access: MFA all users, block legacy auth, require compliant device
→ Identity Protection: sign-in risk ≥ Medium → MFA, user risk High → password reset
→ Named locations: corporate IP ranges for trusted sign-in signals
→ B2B: Entitlement Management for partner access, time-limited, self-service
→ Entra ID P2 required for PIM, Identity Protection risk policies, Access Reviews

How do you design a governance architecture at scale?

Governance architecture layers:

Layer 1 — Management Group hierarchy:
Root → Decommissioned MG | Platform MG | Landing Zones MG | Sandbox MG
→ Azure Policy applied at MG level inherits to all child subscriptions
→ RBAC applied at MG level inherits to all child subscriptions

Layer 2 — Azure Policy:
Initiative: "CIS Azure Benchmark" or custom baseline
Key policies:
  Deny:  "No public IP on VMs" (enforces private-only VMs)
  Deny:  "Allowed locations: UK South, UK West" (data residency)
  Deny:  "Allowed VM SKUs" (prevent large SKUs in dev)
  Audit: "Storage accounts should use private endpoints"
  DINE:  "Deploy Azure Monitor Agent if missing" (auto-remediate)

Layer 3 — Azure Blueprints (or Bicep/Terraform for landing zones):
→ Combine: role assignments + policy assignments + ARM templates
→ Assign to subscriptions for consistent baseline deployment
→ Note: Bicep + Azure DevOps is now preferred over Blueprints for new designs

Layer 4 — RBAC design:
Management Group → Owner (cloud platform team only — use PIM)
Subscription → Contributor (workload team — via PIM)
Resource Group → Contributor/specific roles (workload developers)
Resources → Reader/specific data roles (application service principals)

Layer 5 — Tagging strategy:
Mandatory tags enforced by Azure Policy:
  Environment: Production | Development | Test
  Owner:        team or product owner email
  CostCentre:   cost allocation code
  Application:  application name/ID
  DataClass:    Public | Internal | Confidential | Restricted
Cost Management: filter by tag to see per-team, per-app spending

Layer 6 — Subscription design:
Option A: One subscription per environment (simplest)
  Prod-Sub | Dev-Sub | Test-Sub
Option B: One subscription per workload (better isolation)
  Sales-Prod | HR-Prod | Finance-Prod (separate blast radius, billing)
Option C: Hybrid (recommended for large orgs)
  Platform subs (connectivity, identity, management) + Workload subs

How do you design a monitoring solution for Azure?

Monitoring architecture:
→ Centralised Log Analytics workspace (one per environment or per org)
→ All resources send diagnostic logs and metrics to central workspace
→ Azure Monitor alerts trigger action groups (email, Teams, PagerDuty)

Data sources and collection:
Activity Log:        subscription-level control plane operations (all subs → workspace)
Resource Diagnostics: service-level logs (SQL queries, App Gateway access, Key Vault ops)
VM Guest OS:          Azure Monitor Agent + Data Collection Rules → workspace
Application:          Application Insights SDK → telemetry (requests, exceptions, dependencies)
Container:            Container Insights → AKS node + pod metrics to workspace
Security:             Microsoft Defender for Cloud → security findings to workspace
Network:              Network Watcher → flow logs, connection monitor

Application Insights:
→ APM (Application Performance Monitoring) for web applications
→ Instrumentation: add SDK or use auto-instrumentation (App Service, AKS)
→ Telemetry: requests, exceptions, dependencies (SQL, HTTP calls), custom events
→ Live Metrics: real-time monitoring during deployment
→ Availability tests: ping test every 5 minutes from multiple Azure regions
→ Smart detection: ML-based anomaly detection (failure rate spikes, performance degradation)

Azure Monitor Alerts:
Metric alerts:   trigger on: CPU > 80%, DTU > 90%, response time > 2s
Log alerts (KQL): trigger on: error rate > 5%, specific exception count
Activity log:    trigger on: resource deleted, policy denied, security event

Action Groups (what happens when alert fires):
Email:           notify team
SMS:             notify on-call
Teams webhook:   post to Teams channel
Azure Function:  auto-remediate (restart service, scale out)
Logic App:       create ITSM ticket, update status page
Automation Runbook: execute remediation script

Dashboard and visualisation:
Azure Monitor Dashboards: operational view for NOC
Azure Workbooks:          custom parameterised reporting and analysis
Power BI:                 management-level KPI reporting from Log Analytics
Grafana:                  popular open-source dashboards (integrates with Azure Monitor)

3. Design Data Storage Solutions (25–30%)

How do you choose the right Azure storage service?

Storage service selection matrix:

Requirement                                  → Best Service
Unstructured blobs (documents, images, video) → Azure Blob Storage
Virtual machine disks                         → Azure Managed Disks
File shares (SMB/NFS, lift-and-shift)         → Azure Files
Hybrid file server (on-prem + cloud cache)    → Azure File Sync
NoSQL document database (JSON)                → Azure Cosmos DB (SQL API)
NoSQL key-value at massive scale              → Azure Cosmos DB (Table API)
Globally distributed, multi-write            → Azure Cosmos DB (any API)
Relational data, OLTP workloads              → Azure SQL Database
SQL Server features, lift-and-shift           → Azure SQL Managed Instance
Data warehouse, analytical queries (OLAP)    → Azure Synapse Analytics / Fabric
Real-time analytics over streaming data      → Azure Synapse Analytics + Event Hubs
Message queue (simple, cheap, massive)        → Azure Storage Queues
Enterprise messaging (ordering, DLQ, pub/sub) → Azure Service Bus
Event streaming (millions/sec, telemetry)     → Azure Event Hubs
Time-series / IoT analytics                  → Azure Data Explorer / KQL
Graph data (relationships between entities)  → Azure Cosmos DB (Gremlin API)
Cache layer (session, query results)          → Azure Cache for Redis

Blob storage access tiers (cost vs latency):
Hot:     frequent access, highest storage cost, lowest access cost
Cool:    30-day minimum, cheaper storage, higher access cost
Cold:    90-day minimum, very cheap storage, higher access cost
Archive: 180-day minimum, cheapest storage, OFFLINE (rehydrate before read)
Lifecycle policy: automatically tier blobs based on last modified age

Blob storage redundancy (durability vs cost):
LRS:   3 copies, single datacenter, cheapest
ZRS:   3 copies across AZs, zone-resilient (recommended for most production)
GRS:   LRS + async copy to paired region, secondary readable only after failover
GZRS:  ZRS + async to paired region, highest durability (16 nines)
RA-GRS/RA-GZRS: read from secondary at all times (DR read scale-out)

How do you design a relational database solution on Azure?

Azure SQL options:

Azure SQL Database (fully managed PaaS):
→ Serverless tier: auto-pause during idle (cost saving for dev/test)
→ General Purpose: balanced compute/storage (most production OLTP)
→ Business Critical: in-memory OLTP, readable secondary, local SSD
→ Hyperscale: up to 100 TB, fast backup/restore (large databases)
→ Elastic Pool: share resources across multiple databases (multi-tenant SaaS)
→ Auto-failover groups: automatic geo-replication + failover (RPO < 5 sec)
→ Zone-redundant: Business Critical and General Purpose (zonal HA)

Azure SQL Managed Instance (PaaS with near-full SQL Server compatibility):
→ Lift-and-shift of SQL Server workloads needing: SQL Agent, CLR, linked servers,
  Service Broker, CDC, cross-database queries (not available in SQL Database)
→ Deployed INSIDE a VNet (private, no public endpoint by default)
→ Business Critical tier: in-memory OLTP + readable secondary
→ Migration: Database Migration Service from on-prem SQL Server
→ Choice: if application needs SQL Agent or cross-database queries → Managed Instance

SQL Server on Azure VM (IaaS):
→ Full SQL Server control (any version, any edition, OS-level access)
→ Use when: need features only available on bare SQL Server (SSRS, SSAS, SSIS),
            or specific versions/configurations not supported in PaaS
→ Higher management overhead vs PaaS options

Decision framework:
Can PaaS SQL Database handle the requirements?
  Yes: use SQL Database (lowest management overhead)
  Needs SQL Agent, cross-DB queries, Service Broker?
    Yes: use SQL Managed Instance
    Needs full OS control or specific SQL Server version?
      Yes: SQL Server on Azure VM

High availability for SQL:
Single region:    Business Critical tier → built-in Always On AG (3 replicas, ZR)
Multi-region:     Auto-failover group → geo-replication + automatic DNS failover
RPO:              < 5 seconds (geo-replication log shipping)
RTO:              < 30 seconds (automatic failover with DNS redirect)

How do you design a Cosmos DB solution?

Azure Cosmos DB:
→ Globally distributed, multi-model NoSQL database
→ SLA: 99.999% availability, single-digit millisecond latency (P99)
→ APIs: SQL (Core), MongoDB, Cassandra, Gremlin (graph), Table, PostgreSQL

Key design decisions:

1. Partition key selection (MOST CRITICAL decision):
→ Must provide even distribution of data (high cardinality)
→ Must be included in most queries (enables partition-targeted reads)
→ Cannot be changed after container creation
→ Good: UserId, CustomerId, TenantId, ProductId
→ Bad: Boolean, Status (low cardinality — hot partitions)
→ Synthetic key: if single key has low cardinality, combine fields
   e.g., {country}-{city} instead of country alone

2. Consistency levels (5 levels, stronger = more latency/cost):
Strong:           linearisable — always reads latest write (highest cost)
Bounded Staleness: reads lag writes by configurable window (K ops or T seconds)
Session:           strong consistency within a session (DEFAULT — best for most apps)
Consistent Prefix: reads never see out-of-order writes
Eventual:          weakest — reads may lag, highest performance

3. Request Units (RUs):
→ Currency of Cosmos DB — normalised unit of throughput
→ Every operation (read, write, query) consumes RUs
→ Provision throughput (guaranteed RU/s) or serverless (pay per op)
→ 1 RU = reading a 1 KB document by its partition key + id

4. Global distribution:
→ Add regions with one click — data replicated automatically
→ Single-write region: one primary write, any region reads (default)
→ Multi-region write: any region can write (use for write-heavy global apps)
→ Conflict resolution: needed for multi-write (last write wins or custom)

5. Change Feed:
→ Ordered stream of inserts and updates in a Cosmos DB container
→ Use for: event-driven microservices, materialised views, ETL to analytics
→ Consumers: Azure Functions trigger, Spark, SDK change feed processor

4. Design Infrastructure Solutions (30–35%)

How do you design a compute solution — choosing between VM, App Service, AKS, and Functions?

Compute service selection framework:

Azure Virtual Machines (IaaS):
→ Use when: need full OS control, specific OS/software, GPU, legacy apps
→ Lift-and-shift: move existing apps without code changes
→ HA pattern: Availability Zones + Standard Load Balancer (99.99% SLA)
→ Scale: VM Scale Sets (VMSS) for auto-scaling clusters
→ Avoid for: stateless web apps (use App Service/containers instead)

Azure App Service (PaaS):
→ Use when: web apps, REST APIs, mobile backends in .NET, Java, Node, Python, PHP
→ No VM management: Azure handles OS, patching, load balancing, scaling
→ Deployment slots: zero-downtime deployments, blue-green, A/B testing
→ VNet Integration: outbound private connectivity to databases/services
→ Private Endpoint: inbound private access (no public internet)
→ Scale: horizontal (add instances) up to 30, vertical (bigger plan)
→ Choose tier: Free/Basic (dev/test) → Standard (production, autoscale)
             → Premium (VNet integration) → Isolated (ASE, dedicated)

Azure Kubernetes Service (AKS):
→ Use when: microservices, containerised workloads, complex orchestration
→ Benefits: auto-scaling, rolling deployments, self-healing, rich ecosystem
→ Design: multiple node pools (system + user), node autoscaler, HPA
→ Networking: Azure CNI (direct pod IPs from VNet — needed for private)
→ Storage: Azure Disk (ReadWriteOnce) or Azure Files (ReadWriteMany)
→ Security: Managed Identity for pod-level Azure access (Workload Identity)
→ Ingress: Azure Application Gateway Ingress Controller (AGIC) for WAF

Azure Functions (Serverless):
→ Use when: event-driven compute, short-lived operations, unknown/spiky traffic
→ Scale to zero: pay only during execution (Consumption plan)
→ Premium plan: pre-warmed instances, VNet integration, longer timeout
→ Triggers: HTTP, Service Bus, Event Hub, Blob Storage, Timer, Cosmos DB
→ Durable Functions: stateful orchestration (fan-out/fan-in, long workflows)
→ Avoid for: long-running processes > 10 min (use App Service or AKS)

Decision tree:
Need OS-level control?              → VM
Containerised microservices?        → AKS
Web/API app, managed PaaS?          → App Service
Event-driven, short duration?       → Azure Functions
Burst traffic, stateless?           → Functions or App Service with autoscale
Complex stateful workflow?          → Durable Functions or Logic Apps

How do you design a network architecture for enterprise Azure?

Hub-Spoke network topology (recommended for most enterprises):

Hub VNet (central):
→ Shared services: Azure Firewall, VPN/ExpressRoute Gateway, Bastion, DNS
→ Managed by central networking team
→ All spoke traffic routes through hub (forced tunnelling)
→ Hub in each Azure region for global presence

Spoke VNets (per workload/environment):
→ Each application or team gets own spoke VNet
→ Peered to Hub: VNet Peering (low latency, same region)
               Global VNet Peering (cross-region)
→ No direct spoke-to-spoke peering (routes through hub firewall for inspection)
→ Subnet design: AppSubnet, DataSubnet, PrivateEndpointSubnet, etc.

Virtual WAN (alternative for large/global organisations):
→ Microsoft-managed hub-and-spoke at scale
→ Supports: any-to-any connectivity (spoke-spoke via VWAN hub)
→ Integrates: VPN, ExpressRoute, SD-WAN partners, Azure Firewall Manager
→ Use for: organisations with 10+ spokes or multiple regions

On-premises connectivity:
VPN Gateway:     encrypted IPsec over internet (up to 10 Gbps, simpler, cheaper)
ExpressRoute:    private dedicated circuit via connectivity provider
                 (consistent latency, up to 100 Gbps, higher cost, compliance)
ExpressRoute + VPN: redundant connectivity (ExpressRoute primary, VPN backup)

DNS architecture:
Azure Private DNS zones: private name resolution within Azure VNet
  privatelink.blob.core.windows.net → maps to storage private endpoint IPs
  privatelink.database.windows.net  → maps to SQL private endpoint IPs
Azure DNS Resolver: forward on-prem DNS queries to Azure private zones
                    forward Azure queries to on-prem DNS servers
  → Enables split-horizon DNS for hybrid environments

DDoS Protection:
DDoS Network Protection: per-VNet (recommended for public-facing)
DDoS IP Protection:      per-public IP (simpler, less features)
WAF:                     Application Gateway WAF or Front Door WAF
                         (protects against OWASP Top 10, layer 7 attacks)

How do you design a secure solution using Azure Key Vault?

Azure Key Vault:
→ Centralised secrets, keys, and certificates management
→ Every application should use Key Vault — NEVER store secrets in code/config

Key Vault uses:
Secrets:      connection strings, API keys, passwords
Keys:         RSA/EC keys for envelope encryption, signing
Certificates: X.509 certificates with auto-renewal (Let's Encrypt, DigiCert)

Access patterns:
Managed Identity (RECOMMENDED):
  → Application Managed Identity → assigned Key Vault Secrets User role
  → No credentials to manage, no rotation risk
  → Code: var secret = await client.GetSecretAsync("ConnectionString");

RBAC roles for Key Vault:
  Key Vault Secrets User:    read secret values (applications)
  Key Vault Secrets Officer: manage secrets, not read values (operations)
  Key Vault Administrator:   full control (admin — use PIM)
  Key Vault Reader:          read metadata, not values (audit)

Key Vault networking:
  Private Endpoint:     Key Vault accessible only via private IP in VNet
  Service Endpoint:     traffic stays on Azure backbone (no private IP)
  Firewall:             allowlist specific IPs/VNets; deny all others
  Production: always Private Endpoint + disable public access

Soft delete + Purge Protection:
  Soft delete (90 days): deleted secrets recoverable for 90 days
  Purge Protection:      prevents permanent deletion for retention period
  Both required for: compliance, ransomware protection, accidental deletion
  NEVER disable purge protection in production

Key rotation:
  Manual:   generate new version, update application to use latest version
  Automatic (certificates): Key Vault rotates before expiry via CA integration
  Event-driven: Key Vault fires Event Grid event on near-expiry
                → Logic App / Azure Function generates and stores new key
                → Notifies application to reload secret

5. Design Business Continuity Solutions (10–15%)

How do you design for high availability and disaster recovery?

Key definitions:
RTO (Recovery Time Objective):
  Maximum acceptable time for service to be restored after failure
  "We can be down for no more than 4 hours" → RTO = 4 hours

RPO (Recovery Point Objective):
  Maximum acceptable data loss measured in time
  "We cannot lose more than 15 minutes of transactions" → RPO = 15 minutes

SLA tiers and what they require:
99.9%   = 8.7 hours downtime/year  → single instance + Standard SSD (min)
99.95%  = 4.4 hours downtime/year  → Availability Set (2+ VMs)
99.99%  = 52 minutes downtime/year → Availability Zones (2+ VMs, 2+ AZs)
99.999% = 5.2 minutes downtime/year → multi-region active-active

Azure HA architecture patterns:

Single region HA (protects against hardware/datacenter failure):
  Azure Load Balancer + VM Scale Set across 3 Availability Zones
  SQL: Business Critical (3 replicas, zone-redundant)
  Storage: ZRS (3 copies across AZs)
  App Service: Premium tier (zone redundancy)
  SLA: 99.99% for compute + 99.99% for database

Multi-region HA (protects against region-wide outage):
Active-Passive (failover):
  Primary region handles traffic
  Secondary region: pre-deployed, inactive (warm standby)
  DNS failover via Azure Traffic Manager or Azure Front Door
  SQL: auto-failover group with automatic DNS failover
  RTO: minutes (DNS TTL), RPO: < 5 seconds (geo-replication)
  Use for: moderate RTO/RPO requirements

Active-Active (both regions serve traffic simultaneously):
  Both regions active, load balanced globally
  Data: Cosmos DB multi-write, or synchronous replication
  DNS: Azure Front Door (anycast routing, fastest region wins)
  RTO: near-zero (no failover needed — traffic shifts automatically)
  RPO: near-zero (no replication lag for active-active write)
  Use for: highest availability requirements, global user base

Azure Site Recovery (ASR):
  VM replication to secondary region
  Recovery Plans: orchestrated failover in correct order
  Test failover: validate DR without impact to production
  RTO: 1-2 hours (VM startup + DNS propagation)
  RPO: 15 minutes (replication interval)

How do you design a backup strategy?

Azure Backup architecture:

Recovery Services Vault vs Backup Vault:
Recovery Services Vault: VMs, SQL in VM, Azure Files, on-prem (MARS/MABS)
Backup Vault:            newer vault type — Blobs, Disks, AKS, PostgreSQL Flexible

Backup policies (design per RPO/RTO/retention):
Frequency:  hourly, daily, weekly
Retention:  daily (7-30 days), weekly (up to 5 years), monthly/yearly (compliance)
Cross-region restore: restore to paired region for DR (opt-in, extra cost)

Azure Backup for VMs:
→ Crash-consistent (default) or application-consistent (VSS on Windows)
→ Instant restore: restore individual files in minutes from snapshot
→ Soft delete: 14 days protection against accidental/ransomware deletion
→ Immutable vault: prevents deletion of backup data (locked permanently)

SQL Server in Azure VM:
→ Workload-aware backup: full + differential + log backup
→ Log backup every 15 minutes → RPO of 15 minutes for SQL
→ Compression enabled by default

Azure SQL Database:
→ Built-in automated backups: full weekly, differential 12-24h, log every 5-12 min
→ Point-In-Time Restore (PITR): restore to any second within retention period
→ Long-term retention (LTR): store backups up to 10 years (compliance)

Geo-redundant backup storage:
→ Recovery Services Vault: GRS (default) or LRS
→ Cross-region restore: restore backup from primary in secondary region
→ Required for: regulatory compliance, DR strategy validation

Backup monitoring:
→ Azure Backup Reports (Power BI): compliance dashboard, which VMs NOT backed up
→ Azure Monitor alerts: backup job failures → email/Teams notification
→ Recovery Services Vault → Backup Jobs → filter by status = Failed

6. Azure Well-Architected Framework & Landing Zones

What is the Azure Architecture Centre and key architecture patterns?

Azure Architecture Centre (docs.microsoft.com/azure/architecture):
→ Microsoft's official reference for proven Azure architectural patterns
→ Reference architectures: tested designs for common scenarios
→ Design patterns: reusable solutions to recurring problems

Key cloud design patterns for AZ-305:

1. Retry pattern:
   Transient failures (network timeout, throttling) → automatically retry
   Exponential backoff: wait 1s, 2s, 4s, 8s between retries
   Circuit Breaker: stop retrying after threshold → fail fast

2. Circuit Breaker pattern:
   Prevent cascading failures when a downstream service is unhealthy
   States: Closed (normal) → Open (failing, reject calls) → Half-Open (test)
   Implementation: Polly (C#), resilience4j (Java)

3. CQRS (Command Query Responsibility Segregation):
   Separate read and write models
   Write model: transactional database (SQL), optimised for writes
   Read model: denormalised read store (Cosmos DB, Redis), optimised for reads
   Event sourcing: store events, not state; rebuild state by replaying events

4. Strangler Fig pattern:
   Incrementally replace a monolith with microservices
   New requests → new service; old requests → legacy system
   Gradually migrate functionality until legacy system is empty

5. Event-driven architecture:
   Services communicate via events (loose coupling)
   Event Grid: routing discrete state change events
   Service Bus: reliable ordered message delivery
   Event Hubs: high-volume event streaming

6. Bulkhead pattern:
   Isolate critical resources — if one component fails, others continue
   e.g., separate thread pools, connection pools per downstream service
   Prevents one slow service from exhausting all resources

7. Cache-Aside pattern:
   Application manages cache explicitly:
   1. Check cache (Redis) — if found, return cached value
   2. If not found: query database, store in cache, return value
   3. On update: update database, INVALIDATE cache (not update)
   Use: Azure Cache for Redis + Azure SQL / Cosmos DB

8. Federated Identity pattern:
   Delegate authentication to external identity provider (IdP)
   Application trusts tokens from Entra ID (or ADFS, Google, etc.)
   Benefits: SSO across apps, no password management per app

What are the key networking design patterns for AZ-305?

Private Endpoint (most important networking concept for AZ-305):
→ Assign a private IP from your VNet to an Azure PaaS service
→ Traffic between your VNet and the PaaS service never leaves the Microsoft backbone
→ Disable public endpoint on PaaS resource for maximum security
→ Requires: Private DNS Zone to resolve FQDN to private IP

Pattern: secure multi-tier application
  App Service (private endpoint inbound) ← only VNet traffic allowed
    ↓ (VNet Integration — outbound)
  Azure SQL Database (private endpoint) ← no public access
  Azure Key Vault (private endpoint) ← no public access
  Azure Storage (private endpoint) ← no public access

Azure Front Door:
→ Global application delivery network: CDN + WAF + load balancing + failover
→ Anycast: routes users to the closest Azure PoP (Point of Presence)
→ WAF: OWASP protection globally at the edge
→ Origin groups: pool of backends (App Service in US + App Service in EU)
→ Health probes: automatically route away from unhealthy backends
→ Use for: global web applications with multi-region active-active

Azure Traffic Manager:
→ DNS-based global load balancing (NOT a reverse proxy — users connect directly to origin)
→ Routing methods:
  Priority:      primary region + failover regions
  Weighted:      distribute traffic by percentage (canary deployments)
  Performance:   route to lowest latency region for the user
  Geographic:    route users to specific region (data residency)
  Multivalue:    return all healthy endpoints, client picks
→ Use for: global routing of any protocol (not just HTTP), DNS-based failover

Azure Front Door vs Traffic Manager:
Front Door:       HTTP/HTTPS only, reverse proxy, WAF, anycast, CDN, SSL offload
Traffic Manager:  any protocol, DNS-based (no proxy), no WAF, no CDN
Choose Front Door for modern web applications
Choose Traffic Manager for non-HTTP workloads or existing architectures

7. Scenario-Based Questions

Scenario: Design a globally available, zero-trust e-commerce platform handling 1M users.

Requirements:
→ 99.99% SLA, global users, GDPR compliance (EU data residency)
→ Zero Trust security, no public access to backend services
→ Auto-scaling for seasonal traffic spikes

Architecture:

GLOBAL LAYER:
Azure Front Door Premium:
  → WAF policy: OWASP 3.2, custom rules (bot protection, rate limiting)
  → Origins: App Service (West Europe) + App Service (East US)
  → Health probes: route away from unhealthy origins automatically
  → CDN: static assets (images, JS, CSS) cached at edge PoPs
  → Private Link to App Service backend (Front Door → App Service private)

COMPUTE LAYER (West Europe — primary):
Azure App Service (Premium v3, zone-redundant):
  → Outbound via VNet Integration → private access to all backends
  → Private Endpoint for inbound (Front Door is the only entry point)
  → Auto-scale: 2-50 instances based on CPU and HTTP queue
  → Deployment slots: blue-green deployments (swap staging → production)

DATA LAYER:
Azure SQL Database (Business Critical, zone-redundant):
  → Auto-failover group → East US (passive secondary)
  → Private Endpoint: no public internet access
  → Encryption: Always Encrypted for PII columns (GDPR)
  → Auditing: all queries logged to Log Analytics

Azure Cosmos DB (SQL API):
  → Multi-region: West Europe (write) + East US (read)
  → Product catalogue, shopping cart, user sessions
  → Session consistency (strong within session, eventual across regions)
  → PITR: 30-day point-in-time restore

Azure Cache for Redis:
  → Product catalogue cache (60-minute TTL)
  → Session state store (24-hour TTL)
  → Private Endpoint

SECURITY LAYER:
Azure Key Vault (private endpoint): connection strings, API keys, TLS certs
Managed Identity: App Service → Key Vault, SQL, Storage (no stored credentials)
Private DNS Zones: all PaaS services resolve to private IPs
DDoS Network Protection: on all public-facing VNets
Microsoft Defender for Cloud: security posture management across all services
Microsoft Purview: classify and track PII data (GDPR compliance)

MONITORING:
Application Insights: request traces, exceptions, dependency calls, availability tests
Azure Monitor: infrastructure metrics, alerts, action groups
Log Analytics workspace: centralised logs from all services
KQL alert: failed login rate > 5% in 5 minutes → Teams alert

Scenario: A company wants to migrate a 3-tier on-premises application to Azure. Design the migration approach.

Current state: 
  Web tier: 3 IIS servers (Windows Server 2019)
  App tier:  4 .NET application servers
  Data tier: SQL Server 2019 cluster (500 GB, 100 concurrent connections)

Migration phases using CAF Migrate + Modernise:

Phase 1 — Assess (2 weeks):
  Azure Migrate: discover and assess all servers
    → Right-size recommendations (map on-prem specs to Azure SKUs)
    → Dependency analysis (what does each server call?)
    → Cost estimates for IaaS (lift-and-shift) vs PaaS (modernise)

Phase 2 — Lift-and-Shift (IaaS first — 4 weeks):
  Web tier:  3 × D4s_v5 VMs → Application Gateway → 3 VMs in AZ1/2/3
  App tier:  4 × D8s_v5 VMs → Internal Load Balancer → AZ1/2/3
  Data tier: SQL Server on Azure VM (business critical, zone-redundant)
  Migration tool: Azure Migrate + Replication (agent-based for physical)
  Cutover: DNS switch, 4-hour maintenance window, rollback plan ready
  
Phase 3 — Optimise and Modernise (3 months post-migration):
  Web tier:  migrate IIS apps → Azure App Service (Premium, ZR)
             Benefits: no VM management, built-in scaling, deployment slots
  App tier:  containerise .NET apps → Azure Kubernetes Service
             Benefits: better resource utilisation, rolling deployments
  Data tier: assess SQL Managed Instance compatibility
             Azure Database Migration Service (DMS) → online migration (< 1h cutover)
             If SQL Agent or Service Broker needed → SQL Managed Instance
             If standard T-SQL → Azure SQL Database Business Critical

Phase 4 — Govern and Secure (ongoing):
  Azure Policy: enforce tagging, no public IPs on VMs, encryption at rest
  Cost Management: reserved instances for stable workloads (save 40%)
  Monitoring: Azure Monitor + Application Insights (enable for modernised apps)
  Security: Defender for Servers (VMs) + Defender for SQL (databases)

Scenario: Design a multi-tenant SaaS architecture on Azure.

Multi-tenant SaaS: one deployment serves multiple customer organisations

Tenancy model decision:

Model A — Shared everything (full multi-tenant):
  → One database, one compute, all tenants share
  → Lowest cost, highest density
  → Risk: noisy neighbour, data isolation challenges
  → Use for: small tenants, cost-sensitive, lower compliance requirements
  → Data isolation: Row-Level Security in SQL Database (TenantId filter)
  → Compute isolation: App Service multi-tenant with Entra ID OIDC tenant filtering

Model B — Shared compute, separate databases (hybrid):
  → One App Service plan, one Cosmos DB container per tenant (partition by TenantId)
  → OR: Elastic Pool — separate databases sharing one SQL server
  → Elastic Pool: up to 500 databases, pooled DTUs/vCores
  → Good balance: isolation at data layer, shared compute cost
  → Use for: SMB SaaS with moderate isolation requirements

Model C — Fully isolated (silo):
  → Separate subscription or resource group per tenant
  → Highest isolation, highest cost
  → Use for: enterprise tenants, strict compliance (FinTech, Healthcare)
  → Deployment: Bicep template, deploy new resources per tenant onboarding

SaaS-specific design patterns:
Onboarding automation: Logic App / Azure Function → provision new tenant infra
Tenant routing:   App Service → header/domain-based routing to correct data
Tenant billing:   Azure Cost Management tags (TenantId) → per-tenant cost tracking
Tenant offboarding: Data export → soft delete retention → hard delete after 90 days
Compliance:       Purview sensitivity labels per tenant, separate Key Vault per tenant

Scenario: How do you design for cost optimisation without sacrificing reliability?

Cost optimisation architecture decisions:

1. Right-size with Azure Advisor:
   → Azure Advisor → Cost → Underutilised resources
   → VMs with < 5% average CPU → downsize (D4 → D2)
   → SQL DTUs < 20% average → move to smaller service objective

2. Reserved Instances (40-72% savings):
   → 1-year RI for VMs known to run 24/7 (web servers, databases)
   → 3-year RI for very stable workloads (domain controllers, monitoring)
   → Scope: shared (across all subscriptions) for maximum flexibility

3. Azure Savings Plans (up to 65%):
   → Commitment to hourly spend on compute (any region, any size)
   → More flexible than Reserved Instances
   → Best for variable but consistent compute usage

4. Azure Hybrid Benefit:
   → Windows Server with SA: 40% savings on Windows VMs
   → SQL Server with SA: 55% savings on SQL licensing
   → Stack together with Reserved Instances for up to 80% savings

5. PaaS over IaaS:
   → Serverless/PaaS: pay only for what you use
   → App Service vs VMs: same capability, App Service scales to zero in some tiers
   → Azure Functions (Consumption): pay per execution (dev/test = free tier)

6. Autoscaling + scheduled scaling:
   → Scale in nights/weekends for non-production
   → VMSS: minimum 2 instances (HA), scale out on CPU > 70%
   → Dev VMs: auto-shutdown at 6pm via Azure Automation

7. Storage lifecycle management:
   → Blob: auto-tier (Hot → Cool after 30 days → Archive after 365 days)
   → Saves 70-95% on old data that must be retained

8. Dev/Test pricing:
   → Azure Dev/Test subscriptions: discounted Windows Server licensing
   → SQL in Dev/Test: no SQL licence cost (dev only workloads)

9. Spot VMs (up to 90% discount):
   → Fault-tolerant, interruptible workloads: batch processing, rendering, CI runners
   → Use VMSS with Spot priority + fallback to regular (Azure auto-evicts when needed)

10. Network costs:
    → Outbound bandwidth costs money; keep data in same region as consumers
    → VNet Peering cross-region: transfer costs (consider VPN vs peering)
    → CDN: serve static content from edge to reduce origin egress

8. Cheat Sheet — Quick Reference

AZ-305 Exam Domain Weights

Domain                                     Weight   Focus
Design infrastructure solutions            30-35%   Compute (VM/AKS/App Service/Functions)
                                                     Networking (hub-spoke, Front Door, VPN)
                                                     Security (Key Vault, Private Endpoints)
Design identity, governance & monitoring   25-30%   Hybrid identity, Conditional Access, PIM
                                                     Management Groups, Azure Policy, tagging
                                                     Azure Monitor, App Insights, Log Analytics
Design data storage solutions              25-30%   SQL DB vs MI vs VM, Cosmos DB design
                                                     Blob tiers, redundancy, Key Vault
                                                     Synapse / Fabric for analytics
Design business continuity solutions       10-15%   RTO/RPO, AZ vs multi-region, ASR
                                                     Azure Backup, immutable vault, PITR

Compute Decision Tree

Need OS-level control / specific software?     → VM + Availability Zones
Containerised microservices / complex scaling? → AKS
Web/API managed PaaS (.NET/Java/Node/Python)?  → App Service
Event-driven, short burst, unknown traffic?    → Azure Functions
Stateful orchestration (long workflow)?        → Durable Functions / Logic Apps
GPU workloads (AI/ML training, rendering)?     → N-series VMs
Burst compute (CI runners, batch)?             → Spot VMs or Batch
Virtual desktop for remote workers?            → Azure Virtual Desktop

SLA achieved by:
Single VM + Premium SSD:  99.9%
Availability Set (2+ VMs): 99.95%
Availability Zones (2+ AZs): 99.99%
Multi-region active-active: 99.999%

Storage Decision Matrix

Use case                              → Service
Blobs (unstructured: media, docs)     → Blob Storage (Hot/Cool/Archive)
VM Disks                              → Managed Disks (Premium SSD for prod)
Shared file system (SMB/NFS)          → Azure Files
Hybrid file server caching            → Azure File Sync
Relational OLTP (managed)             → Azure SQL Database
SQL Server features (SQL Agent, etc.) → Azure SQL Managed Instance
Full SQL Server control               → SQL Server on Azure VM
NoSQL document / JSON                 → Azure Cosmos DB (SQL API)
Global multi-region, multi-write      → Azure Cosmos DB
Data warehouse / OLAP                 → Azure Synapse Analytics / Fabric
Enterprise message queue              → Azure Service Bus
Event streaming (IoT, telemetry)      → Azure Event Hubs
Session cache, query cache            → Azure Cache for Redis
Graph data                            → Cosmos DB (Gremlin API)

Networking Quick Reference

Topology:           Hub-Spoke (most enterprises) or Azure Virtual WAN (large scale)
On-prem private:    ExpressRoute (dedicated, consistent) preferred over VPN
On-prem encrypted:  VPN Gateway (encrypted over internet, simpler, cheaper)
Global HTTP/HTTPS:  Azure Front Door (anycast, WAF, CDN, private link origins)
Global any protocol: Azure Traffic Manager (DNS-based, any protocol)
Regional HTTP:      Application Gateway (L7, WAF, SSL termination, URL routing)
Regional TCP/UDP:   Azure Load Balancer Standard (L4, zone-redundant)
Private PaaS:       Private Endpoint + Private DNS Zone (production standard)
Admin access:       Azure Bastion (no public IP on VMs, browser-based RDP/SSH)
Perimeter firewall: Azure Firewall Premium (IDPS, TLS inspection, centralised)
DDoS:               DDoS Network Protection (per-VNet for public-facing workloads)

Business Continuity Quick Reference

RTO/RPO targets → design pattern:
Hours/Hours:     Azure Backup + ASR warm standby (passive failover)
Minutes/Minutes: Auto-failover group + Traffic Manager / Front Door failover
Near-zero/zero:  Active-active multi-region (Cosmos DB multi-write, Front Door)

SQL DR:
Single region: Business Critical (3 replicas, AZ) — 99.995% SLA
Multi-region:  Auto-failover group → automatic DNS failover
               RPO: < 5 seconds, RTO: < 30 seconds

VM DR:
Azure Site Recovery: replicate VMs to secondary region
Recovery Plans: orchestrate failover order (DC first, then app, then DB)
Test failover: validate quarterly without impacting production

Backup essentials:
Soft delete: 14 days recovery after accidental deletion
Immutable vault (locked): ransomware protection (cannot be disabled)
Cross-region restore: restore backup in secondary region for DR validation
PITR (SQL): restore to any second in the retention window

Well-Architected Framework Pillars Summary

Pillar           Key services / patterns
Reliability      AZs, multi-region, auto-failover, health probes, retry+circuit breaker
Security         Zero Trust, Private Endpoints, Key Vault, Managed Identity, WAF
Cost             Reserved Instances, right-sizing, lifecycle tiers, PaaS over IaaS
Operational Exc. IaC (Bicep), CI/CD, Azure Monitor, runbooks, deployment slots
Performance      Autoscale, CDN (Front Door), Redis cache, read replicas, AKS HPA

Top 10 Tips

  1. AZ-305 is a DESIGN exam, not a configuration exam — every question presents a business scenario asking you to RECOMMEND, COMPARE, or EVALUATE. The key skill is justifying WHY one service fits better than another — not knowing every configuration option.

  2. Infrastructure is 30–35% of the exam — the largest domain. Master: compute service selection (VM vs App Service vs AKS vs Functions), hub-spoke networking, Private Endpoints, Key Vault, and Azure Front Door vs Traffic Manager.

  3. Always anchor decisions in the Well-Architected Framework — every design choice has a WAF justification. "Why Private Endpoints?" → Security pillar (defence in depth). "Why Reserved Instances?" → Cost pillar. love WAF-framed answers.

  4. Private Endpoints are the production networking standard — any PaaS service (SQL, Storage, Key Vault, App Service) in production should have a Private Endpoint and disabled public access. Service Endpoints are insufficient for most enterprise requirements.

  5. Managed Identity eliminates credential management — any Azure resource authenticating to another Azure service should use Managed Identity. No secrets in code, no rotation risk, no leakage. Always the first recommendation for any "how do you authenticate" question.

  6. Hub-Spoke topology with Azure Firewall — the standard enterprise network architecture. Hub contains shared services (Firewall, VPN/ExpressRoute GW, Bastion). Spokes are workload VNets peered to hub. All spoke-to-spoke traffic inspected via hub Azure Firewall.

  7. RTO and RPO drive the HA design — before answering any HA/DR question, establish the RTO and RPO requirements. RTO/RPO in hours → warm standby + ASR. Minutes → auto-failover groups + Traffic Manager. Near-zero → active-active multi-region with Cosmos DB.

  8. SQL Database vs SQL Managed Instance — the most commonly confused SQL decision. SQL Database = most managed, best for greenfield. SQL Managed Instance = when you need SQL Agent, cross-database queries, Service Broker, or CLR. Know this boundary cold.

  9. Azure Front Door vs Traffic Manager — Front Door is a reverse proxy (HTTP/HTTPS, WAF, CDN, anycast, SSL offload). Traffic Manager is DNS-based routing (any protocol, no proxy, no WAF). Global web apps → Front Door. Non-HTTP global routing → Traffic Manager.

  10. The Cloud Adoption Framework provides the governance story — management group hierarchy, landing zones, Azure Policy at scale, and subscription design are all CAF concepts. For any "how do you govern 50 subscriptions" question, the answer starts with CAF landing zones.



Power Platform ALM & DevOps (PL-400) Complete Guide

Power Platform ALM & DevOps (PL-400) — Complete Guide

Technical Design · Dataverse Plugins · PCF Controls · Custom Connectors · ALM · Pipelines · Power Platform CLI · Azure DevOps · Scenarios · Cheat Sheet

Top Hashtags: #PowerPlatformALM, #PL400, #PowerPlatformDeveloper, #PowerPlatform, #DataversePlugins, #PCFControls, #PowerPlatformCLI, #ALMPowerPlatform, #LowCode, #MicrosoftDeveloper


Table of Contents

  1. Exam Overview & Technical Design (10–15%)
  2. Build Power Platform Solutions (10–15%)
  3. Extend the User Experience (10–15%)
  4. Extend the Platform — Plugins & APIs (40–45%)
  5. Power Platform ALM & DevOps
  6. Develop Integrations (5–10%)
  7. Scenario-Based Questions
  8. Cheat Sheet — Quick Reference

1. Exam Overview & Technical Design (10–15%)

PL-400 Exam at a Glance

The PL-400 Microsoft Power Platform Developer exam validates expertise in designing, developing, testing, and troubleshooting Power Platform solution components using extension points. Candidates should be proficient in Microsoft Power Platform services, C#, JavaScript, TypeScript, JSON, HTML, RESTful Web APIs, and Azure.

Skill Domain Exam Weight
Create a technical design 10–15%
Build Power Platform Solutions 10–15%
Implement Power Apps improvements 5–10%
Extend the user experience 10–15%
Extend the platform 40–45% ← LARGEST
Develop integrations 5–10%

Key fact: Extend the platform is 40–45% of the exam. Prioritise: Dataverse plugins (C#), custom connectors, Power Platform CLI, ALM, and Dataverse Web API.


How do you decide between low-code and pro-code in Power Platform?

Decision framework: Low-code vs Pro-code

Always try low-code first (cheaper, faster, maintainable by citizen devs):
→ Standard Power Apps (canvas or model-driven) for the UI
→ Power Automate cloud flows for automation
→ Dataverse built-in business rules for simple field validation
→ Dataverse calculated/rollup columns for derived values

When to escalate to pro-code:
→ Complex server-side business logic requiring ACID transactions
  → Use: Dataverse C# Plugin
→ Performance-critical real-time validation across multiple tables
  → Use: Dataverse C# Plugin (synchronous pre-operation)
→ Custom UI control behaviour beyond standard controls
  → Use: PCF (Power Apps Component Framework) control
→ Connect to a service with no existing connector
  → Use: Custom Connector (OpenAPI/Swagger definition)
→ Long-running or complex operations with Azure infrastructure
  → Use: Azure Function called from plugin or Power Automate
→ Batch data processing or complex ETL
  → Use: Azure Function or Azure Logic App

Architecture decision checklist:
1. Can it be done with out-of-the-box Power Platform? → Do it low-code
2. Needs server-side logic run on create/update/delete? → C# Plugin
3. Needs a custom UI component? → PCF Control
4. Needs external system not in connector gallery? → Custom Connector
5. Needs Azure compute, ML, or AI? → Azure Function + connector
6. Needs human approval + complex branching? → Power Automate flow

What is the Power Platform solution structure and types?

Solutions:
→ Container for customisations and components that can be transported
  between environments (Dev → Test → Production)
→ Everything deployable must be inside a solution

Solution types:
Unmanaged solution:
→ Used in DEVELOPMENT environments only
→ All components editable
→ No version enforcement
→ Contains references to components (not copies)
→ Can be exported as unmanaged (.zip) for source control

Managed solution:
→ Deployed to TEST and PRODUCTION environments
→ Components are READ-ONLY (cannot be edited directly)
→ Version enforced — only newer version can replace it
→ Uninstalling removes all components cleanly
→ Created by exporting an unmanaged solution as managed

Key decision: NEVER develop in managed solution environments
             NEVER deploy unmanaged solutions to production

Solution components (what can be inside a solution):
Tables (entities), columns (fields), relationships
Views, forms, charts, dashboards
Cloud flows, canvas apps, model-driven apps
Security roles, environment variables, connection references
Plugins, custom workflow activities, web resources
PCF controls, custom connectors, dataflows

Environment variables:
→ Store environment-specific configuration (URLs, keys, thresholds)
→ Different values per environment (Dev/Test/Prod) without code change
→ Types: string, number, boolean, JSON, data source
→ Referenced in: flows, plugins, canvas apps, cloud flows

Connection references:
→ Abstractly reference a connector (instead of hardcoding a connection)
→ When solution deployed: admin configures the actual connection
→ Enables the same flow to connect to different accounts per environment
→ Always use connection references instead of direct connections in flows

2. Build Power Platform Solutions (10–15%)

How do you configure Dataverse tables and relationships for complex solutions?

Dataverse table types:
Standard table:    most tables — PK is a GUID, full Dataverse features
Activity table:    special type for activities (email, task, appointment)
                   enables activity timeline, sender/recipient model
Elastic table:     designed for high-volume, time-series or IoT data
                   uses Cosmos DB underneath — horizontal scale
Virtual table:     maps to external data source (SQL, OData, etc.)
                   appears as Dataverse table, queries external system live

Relationship types:
One-to-many (1:N): one parent row → many child rows
                   creates a lookup column on the child table
                   e.g., Account (1) → Contacts (N)

Many-to-many (N:N): via an intersect (junction) table
                    e.g., Contacts ↔ Marketing Lists
                    Use: relate records without a hierarchy

Polymorphic lookup: lookup that can point to multiple table types
                    e.g., Regarding field on Activity — can be Account,
                    Contact, Lead, or Opportunity

Column types:
Simple: Text, Number, Date, Choice (picklist), Lookup, File/Image
Calculated: value computed from formula (no triggers, re-computed on read)
Rollup: aggregate from child records (sum, count, avg, min, max)
        runs on a schedule (hourly) — not real-time
Formula: Power Fx expressions (Dataverse-native, new)
Auto-number: automatically incremented unique identifiers

Business rules (no-code server + client logic):
→ Set field values, show/hide fields, validate values, lock fields
→ Scope: entity (server-side — always runs) or form (client-side only)
→ Cannot: query other tables, send emails, perform HTTP calls
→ For complex logic: use plugins instead

What is Power Fx and how is it used in Power Apps?

Power Fx:
→ Open-source, declarative formula language based on Excel-style syntax
→ Used in: Canvas Apps, Model-driven app command bars, Dataverse formulas
→ Reactive: formulas automatically recalculate when data changes

Key Power Fx patterns for PL-400:

Collections and local data:
// Create a collection:
ClearCollect(colProducts,
  Filter(Products, Status.Value = "Active"))

// Add a record:
Collect(colProducts, {Name: "Widget", Price: 9.99})

// Update in collection:
UpdateIf(colProducts, ProductId = varSelectedId,
  {Price: 12.99})

Delegation (CRITICAL concept):
→ Delegation = Dataverse handles the filter/sort on the server
→ Non-delegable = Power Apps downloads ALL records (max 2000) then filters
→ Always use delegable functions for large datasets

Delegable functions with Dataverse:
  Filter(Products, Status.Value = "Active")  ← DELEGABLE (server-filters)
  Filter(Products, Text(Price) = "9.99")    ← NOT DELEGABLE (Text() blocks delegation)

Workaround for non-delegable:
  Set(varProducts, Filter(Products, Status.Value = "Active"))  // 2000 limit
  Filter(varProducts, Price > 5)  // then filter collection client-side

Named formulas (performance):
// Named formula — evaluated once, cached:
ProductCount = CountRows(Filter(Products, Status = "Active"))

With() — local variables for readability:
With({total: Sum(OrderLines, Quantity * UnitPrice),
       tax: Sum(OrderLines, Quantity * UnitPrice) * 0.20},
  total + tax)

3. Extend the User Experience (10–15%)

What is the Power Apps Component Framework (PCF) and when do you use it?

PCF (Power Apps Component Framework):
→ Framework for building custom reusable UI components for Power Apps
→ Replaces HTML web resources for custom UI in model-driven apps
→ Available in: Model-driven apps AND canvas apps
→ Built with: TypeScript, React (optional), CSS, third-party libraries

Two types of PCF controls:
Field controls: replace a single field on a form
  → e.g., custom date picker, colour picker, rating stars, signature pad
Dataset controls: replace a grid/subgrid view
  → e.g., custom chart, calendar view, map view, Kanban board

PCF control lifecycle:
init(context, notifyOutputChanged, state, container):
  → Called once when control loads
  → Receive context (current field value, user info, client info)
  → Set up the component

updateView(context):
  → Called whenever field value OR model changes
  → Re-render the component with new data

getOutputs():
  → Called when notifyOutputChanged() is triggered
  → Return the new value to write back to Dataverse

destroy():
  → Called when control removed from DOM — cleanup

PCF TypeScript example (field control — custom star rating):
import { IInputs, IOutputs } from "./generated/ManifestTypes";

export class StarRating implements ComponentFramework.StandardControl<IInputs, IOutputs> {
  private _value: number;
  private _notifyOutputChanged: () => void;
  private _container: HTMLDivElement;

  init(context: ComponentFramework.Context<IInputs>,
       notifyOutputChanged: () => void,
       state: ComponentFramework.Dictionary,
       container: HTMLDivElement): void {
    this._notifyOutputChanged = notifyOutputChanged;
    this._container = container;
    this._value = context.parameters.rating.raw ?? 0;
    this.renderStars();
  }

  updateView(context: ComponentFramework.Context<IInputs>): void {
    this._value = context.parameters.rating.raw ?? 0;
    this.renderStars();
  }

  getOutputs(): IOutputs {
    return { rating: this._value };
  }

  private renderStars(): void {
    this._container.innerHTML = '';
    for (let i = 1; i <= 5; i++) {
      const star = document.createElement('span');
      star.textContent = i <= this._value ? '★' : '☆';
      star.onclick = () => { this._value = i; this._notifyOutputChanged(); };
      this._container.appendChild(star);
    }
  }

  destroy(): void { /* cleanup */ }
}

PCF CLI commands:
pac pcf init --namespace Contoso --name StarRating --template field
npm install
npm run build
pac pcf push --publisher-prefix contoso  # push to dev environment for testing

What are client-side scripts (JavaScript) in model-driven apps?

Client-side scripts (JavaScript) in model-driven apps:
→ Customise form behaviour without server-side code
→ Registered on: form events, field events, grid events, ribbon buttons

Key form events:
OnLoad:   fires when form loads (initialise, set defaults, hide/show)
OnSave:   fires when user saves (validate, prevent save, show warnings)
OnChange: fires when a specific field value changes

Xrm object model — the JavaScript API for model-driven apps:
// Get a field value:
const accountName = formContext.getAttribute("name").getValue();

// Set a field value:
formContext.getAttribute("creditlimit").setValue(50000);

// Show/hide a field:
formContext.getControl("telephone1").setVisible(false);

// Make a field required:
formContext.getAttribute("emailaddress1")
  .setRequiredLevel("required");

// Display a form notification:
formContext.ui.setFormNotification(
  "Credit limit exceeded!",
  "WARNING",   // INFO | WARNING | ERROR
  "creditWarning"
);

// Prevent save with validation error:
function validateForm(executionContext) {
  const formContext = executionContext.getFormContext();
  const email = formContext.getAttribute("emailaddress1").getValue();
  if (email && !email.includes("@")) {
    executionContext.getEventArgs().preventDefault();
    formContext.ui.setFormNotification(
      "Invalid email address", "ERROR", "emailError");
  }
}

// Call Dataverse Web API from client script:
Xrm.WebApi.retrieveMultipleRecords("account",
  "?$select=name,creditlimit&$filter=statecode eq 0&$top=10")
  .then(result => {
    result.entities.forEach(account => {
      console.log(account.name, account.creditlimit);
    });
  });

// Open a related record:
Xrm.Navigation.openForm({
  entityName: "contact",
  entityId: contactId
});

4. Extend the Platform — Plugins & APIs (40–45%)

What is a Dataverse Plugin and how does it work?

Dataverse Plugin:
→ C# class library that executes server-side business logic
→ Triggered by Dataverse events (create, update, delete, retrieve, etc.)
→ Runs within the Dataverse platform — no external hosting needed
→ Enforced regardless of HOW data is modified (Portal, API, App, import)

Plugin architecture:
public class AccountValidationPlugin : IPlugin
{
  public void Execute(IServiceProvider serviceProvider)
  {
    // Get plugin execution context:
    var context = (IPluginExecutionContext)
      serviceProvider.GetService(typeof(IPluginExecutionContext));

    // Get organisation service for Dataverse operations:
    var serviceFactory = (IOrganizationServiceFactory)
      serviceProvider.GetService(typeof(IOrganizationServiceFactory));
    var service = serviceFactory.CreateOrganizationService(context.UserId);

    // Get tracing service for logging:
    var tracingService = (ITracingService)
      serviceProvider.GetService(typeof(ITracingService));

    // Only run on Account Create:
    if (context.MessageName != "Create" ||
        context.PrimaryEntityName != "account") return;

    // Get the target entity (the record being created):
    var target = (Entity)context.InputParameters["Target"];

    // Business logic:
    if (target.Contains("creditlimit") &&
        target.GetAttributeValue<Money>("creditlimit").Value > 1000000)
    {
      throw new InvalidPluginExecutionException(
        "Credit limit cannot exceed £1,000,000. Please contact Finance.");
    }

    tracingService.Trace("Account validation passed");
  }
}

What are plugin pipeline stages and what happens in each?

Plugin execution pipeline (for each Dataverse message):

PRE-VALIDATION (stage 10):
→ Fires BEFORE the database transaction begins
→ Can query external systems without transaction overhead
→ Can prevent the operation with an exception
→ Use for: input validation, external system checks
→ No access to pre-entity image (record doesn't exist yet for Create)

PRE-OPERATION (stage 20):
→ Fires WITHIN the database transaction (BEFORE write to DB)
→ Can modify the Target entity before it's saved
→ Changes to Target automatically saved — no explicit update needed
→ Use for: set default field values, enrich the record before save
→ Pre-entity image: available for Update (the record as it was before)

MAIN OPERATION (stage 30):
→ The actual Dataverse platform operation (Core operation)
→ Cannot register custom plugins at this stage
→ Platform writes data to the database here

POST-OPERATION (stage 40):
→ Fires WITHIN the database transaction (AFTER write to DB)
→ Record is now in the database
→ Can trigger follow-on operations (create related records, send emails)
→ Post-entity image: available (record as it is now after save)
→ Changes require explicit service.Update() calls
→ Use for: creating related records, triggering workflows, notifications

Execution mode:
Synchronous: runs in the same thread as the user operation
             failure rolls back the entire transaction
             use for: validation, data enrichment (pre-operation)
Asynchronous: runs in background queue (separate thread)
              failure does NOT roll back the user operation
              use for: send emails, call external APIs, non-critical operations

Plugin images (snapshots of the record):
Pre-image:  snapshot of the record BEFORE the operation
Post-image: snapshot of the record AFTER the operation
Both must be registered in Plugin Registration Tool (specify column names)

What are Custom Workflow Activities?

Custom Workflow Activities:
→ C# classes that extend Power Automate / Classic Workflows
→ Appear as steps that can be added in Power Automate (classic) or
  used in low-code flows
→ Use when: need reusable server-side C# logic callable from flows

Structure:
public class CalculateDiscount : CodeActivity
{
  // Input parameters from the flow:
  [Input("Order Amount")]
  [RequiredArgument]
  public InArgument<decimal> OrderAmount { get; set; }

  [Input("Customer Tier")]
  public InArgument<string> CustomerTier { get; set; }

  // Output parameters returned to the flow:
  [Output("Discount Percentage")]
  public OutArgument<decimal> DiscountPercentage { get; set; }

  protected override void Execute(CodeActivityContext context)
  {
    var amount = OrderAmount.Get(context);
    var tier = CustomerTier.Get(context) ?? "Standard";

    decimal discount = tier switch {
      "Gold"     => amount > 10000 ? 0.15m : 0.10m,
      "Silver"   => 0.05m,
      _          => 0m
    };

    DiscountPercentage.Set(context, discount);
  }
}

Plugin vs Custom Workflow Activity:
Plugin:                    Triggered by Dataverse events (create/update/delete)
Custom Workflow Activity:  Called from Power Automate or classic workflows
                           Reusable across multiple flows
                           User provides input, receives output

What is the Dataverse Web API and how do you use it?

Dataverse Web API:
→ RESTful OData v4 API for all Dataverse operations
→ Endpoint: https://{org}.api.crm11.dynamics.com/api/data/v9.2/
→ Authentication: OAuth 2.0 bearer token (Entra ID)
→ Available from: external apps, Azure Functions, Logic Apps, plugins

CRUD operations:

CREATE (POST):
POST /api/data/v9.2/accounts
Content-Type: application/json
{
  "name": "Contoso Ltd",
  "creditlimit": 50000,
  "telephone1": "0207 123 4567",
  "primarycontactid@odata.bind": "/contacts(00000000-0000-0000-0000-000000000001)"
}
Response: 201 Created + OData-EntityId header with new record GUID

READ (GET with OData):
GET /api/data/v9.2/accounts
  ?$select=name,creditlimit,telephone1
  &$filter=creditlimit gt 10000 and statecode eq 0
  &$orderby=name asc
  &$top=50
  &$expand=primarycontactid($select=fullname,emailaddress1)

UPDATE (PATCH — partial update):
PATCH /api/data/v9.2/accounts({accountId})
Content-Type: application/json
{
  "creditlimit": 75000,
  "telephone1": "0207 999 8888"
}

DELETE:
DELETE /api/data/v9.2/accounts({accountId})

Upsert (PATCH with If-None-Match: *):
PATCH /api/data/v9.2/accounts({alternateKey})
If-None-Match: *   ← create if not exists, update if exists
{ ... }

Batch operations (multiple requests in one HTTP call):
POST /api/data/v9.2/$batch
Content-Type: multipart/mixed; boundary=batch_boundary
→ Reduces round trips for bulk operations
→ Supports change sets (atomic — all succeed or all fail)

Execute actions:
POST /api/data/v9.2/WinOpportunity
{
  "OpportunityClose": {
    "subject": "Won deal",
    "opportunityid@odata.bind": "/opportunities({opportunityId})"
  },
  "Status": 3
}

5. Power Platform ALM & DevOps

What is Power Platform ALM and what are the key components?

ALM (Application Lifecycle Management) for Power Platform:
→ The practices and tools for managing the full lifecycle of Power Platform
  solutions from development through test to production

Core ALM components:
Environments:     Dev → Test/UAT → Production
                  Each environment is isolated with its own data
Solutions:        the packaging unit — transport customisations between envs
Source control:   Git repository stores solution source files
CI/CD pipelines:  automate build, test, deploy using Azure DevOps or GitHub

Environment strategy (recommended):
Developer (individual):   each developer has their own personal dev env
                          unmanaged solution, develop and unit test here
Test/QA:                  shared managed solution deployment, QA testing
UAT/Pre-prod:             user acceptance testing with business users
Production:               live environment — only managed solutions deployed

Key principles:
→ Never develop in Test or Production environments
→ Always deploy MANAGED solutions to non-dev environments
→ Every customisation lives inside a solution (nothing ungrouped)
→ Source control is the source of truth — not any environment
→ Automated deployment pipelines — no manual exports/imports

What is the Power Platform CLI and what can you do with it?

Power Platform CLI (pac CLI):
→ Command-line tool for Power Platform development and ALM
→ Install: dotnet tool install -g Microsoft.PowerApps.CLI
→ Or: via npm, VS Code extension, or Power Platform Tools installer

Key commands:

Authentication:
pac auth create --url https://org.crm.dynamics.com  # add environment
pac auth list                                        # list connections
pac auth select --index 1                            # switch environment

Solution management:
pac solution list                                    # list all solutions in env
pac solution export --path ./exported --name MySolution --managed false
pac solution import --path ./MySolution.zip
pac solution clone --name MySolution --outputDirectory ./src
# Clone: creates local folder structure with each component in separate file

Environment management:
pac env list                                         # list environments
pac env select --environment "Dev Environment"
pac env create --name "My-Dev-Env" --type Developer --region unitedstates

PCF development:
pac pcf init --namespace Contoso --name MyControl --template field --run-npm-install
pac pcf push --publisher-prefix contoso              # push to environment
pac pcf version --updateType patch                   # bump version

Plugin development:
pac plugin init                                      # scaffold plugin project
pac plugin push                                      # push assembly to env

Canvas app:
pac canvas pack --sources ./CanvasApp --msapp ./output/MyApp.msapp
pac canvas unpack --msapp ./MyApp.msapp --sources ./unpacked  # for source control

Solution unpack (for source control — YAML format):
pac solution unpack --zipFile MySolution.zip --folder ./src --processCanvasApps
# Creates: Entities, Flows, Workflows, WebResources folders
# Each component in separate file → meaningful Git diffs

Solution pack (from source control → zip for deployment):
pac solution pack --zipFile MySolution_managed.zip --folder ./src --managed

How do you implement CI/CD for Power Platform with Azure DevOps?

Power Platform Build Tools (Azure DevOps):
→ Microsoft-provided extension: "Power Platform Build Tools"
→ Provides DevOps tasks for: export, import, publish, check solution

Complete CI/CD pipeline pattern:

CI Pipeline (Build) — triggered on PR to main branch:
1. Power Platform: Who Am I            # validate connection
2. Power Platform: Export Solution     # export unmanaged from Dev env
3. Power Platform: Unpack Solution     # unpack to source files
4. Power Platform: Check Solution      # run Solution Checker (code quality)
5. Power Platform: Pack Solution       # pack as MANAGED for deployment
6. Publish Artifact                    # publish zip for release pipeline

CD Pipeline (Release) — triggered on merge to main:
Stage 1 — Deploy to Test:
  Power Platform: Import Solution      # import managed solution to Test
  Power Platform: Publish Customizations
  Run automated tests (EasyRepro / PAC test)

Stage 2 — Deploy to UAT (manual approval gate):
  Approval required from business stakeholder
  Power Platform: Import Solution      # import managed to UAT

Stage 3 — Deploy to Production (manual approval):
  Approval required from release manager
  Power Platform: Import Solution      # import managed to Production

Azure DevOps YAML pipeline example:
trigger:
  branches:
    include: [main]

pool:
  vmImage: 'windows-latest'

steps:
- task: PowerPlatformToolInstaller@2
  displayName: 'Install Power Platform Build Tools'

- task: PowerPlatformWhoAmi@2
  displayName: 'Authenticate to Dev'
  inputs:
    authenticationType: 'PowerPlatformSPN'
    PowerPlatformSPN: 'PP-Dev-SPN'   # service connection

- task: PowerPlatformExportSolution@2
  displayName: 'Export Solution'
  inputs:
    authenticationType: 'PowerPlatformSPN'
    PowerPlatformSPN: 'PP-Dev-SPN'
    SolutionName: 'ContosoSolution'
    SolutionOutputFile: '$(Build.ArtifactStagingDirectory)/ContosoSolution.zip'
    Managed: false

- task: PowerPlatformUnpackSolution@2
  displayName: 'Unpack Solution'
  inputs:
    SolutionInputFile: '$(Build.ArtifactStagingDirectory)/ContosoSolution.zip'
    SolutionTargetFolder: '$(Build.SourcesDirectory)/src/solution'

- task: PowerPlatformChecker@2
  displayName: 'Solution Checker'
  inputs:
    authenticationType: 'PowerPlatformSPN'
    PowerPlatformSPN: 'PP-Dev-SPN'
    FilesToAnalyze: '$(Build.ArtifactStagingDirectory)/ContosoSolution.zip'
    RuleSet: '0ad12346-e108-40b8-a956-9a373e9909ff'  # AppSource ruleset
    ErrorLevel: High
    FailOnPowerAppsCheckerAnalysisError: true

What are Power Platform Pipelines (in-product ALM)?

Power Platform Pipelines (Fabric/Managed Pipelines):
→ Microsoft-native ALM solution BUILT INTO Power Platform (no Azure DevOps needed)
→ Configured entirely in the Power Platform Admin Centre / Pipelines app
→ Designed for: citizen developer ALM, simpler governance without DevOps expertise

Pipeline concepts:
Host environment:   dedicated environment that hosts the pipeline configuration
                    + deployment history + logs
Linked environments: Dev → Test → Production environments connected to the pipeline
Stage gates:         manual approvals before deployment to each stage

Features:
→ One-click deployment from within the Maker experience (no DevOps knowledge needed)
→ Pre-deployment and post-deployment steps (run cloud flows)
→ Deployment notes and audit trail
→ Environment variable configuration per stage
→ Connection reference configuration per stage
→ Deployment history with rollback capability

When to use Pipelines vs Azure DevOps:
Power Platform Pipelines:
  → Simpler orgs, citizen developer-led ALM
  → No DevOps team or expertise
  → Quick setup, managed by Power Platform admins
  → Limited customisation and branching support

Azure DevOps / GitHub Actions:
  → Complex orgs, pro-dev teams
  → Need automated testing, branch strategies, complex approval gates
  → Full control over pipeline steps, scripts, and tooling
  → Integrates with broader DevOps ecosystem (work items, boards, repos)

6. Develop Integrations (5–10%)

What are Custom Connectors and how do you build one?

Custom Connector:
→ Wrapper around a REST or SOAP API that makes it usable in
  Power Automate, Power Apps, and Logic Apps
→ Defined using OpenAPI (Swagger) specification
→ Supports: authentication (API key, OAuth 2.0, Basic, Windows)

Building a custom connector:
Step 1 — Define the API:
→ Import an existing OpenAPI (Swagger) .json or .yaml file
→ Or: enter Postman collection URL
→ Or: build manually in the custom connector wizard

Step 2 — Configure authentication:
No auth:      public APIs, development
API key:      pass key in header or query string
OAuth 2.0:    full OAuth flow — user signs in to the external service
Generic OAuth: for non-standard OAuth flows

Step 3 — Define actions and triggers:
Action:    user calls this to perform an operation (POST order, GET customer)
Trigger:   external system notifies Power Automate of an event
  Polling trigger: Power Automate polls your API every N seconds
  Webhook trigger: external system sends event to Power Automate's webhook URL

Step 4 — Test and share:
→ Test each operation in the connector wizard
→ Share with specific users or the whole organisation
→ Submit to Microsoft AppSource for public listing (ISV scenario)

Custom connector policies (inline code transformations):
→ Policy templates: convert XML to JSON, set header, route request
→ Use for: API quirks without modifying the backend API
   e.g., external API uses XML response — policy converts to JSON
         external API URL changes per customer — route policy selects URL

Certification (certified connectors):
→ Submit to Microsoft for certification review
→ Certified connectors available to all Power Platform users
→ Requirements: public documentation, test accounts, SLA

What are Virtual Tables in Dataverse?

Virtual Tables (Virtual Entities):
→ Dataverse tables that map to EXTERNAL data sources
→ Data stays in the external system — Dataverse is a logical view
→ CRUD operations pass through to the external system in real time
→ Appear and behave like regular Dataverse tables (queries, forms, views)

How they work:
User opens a canvas app → retrieves Virtual Table records
→ Dataverse calls the Virtual Table Data Provider
→ Data Provider calls the external system (SQL, OData, REST API)
→ External system returns data → Dataverse returns to the user
→ No data copy, no sync, always live

Virtual Table providers:
OData v4 provider:  built-in — connect to any OData v4 endpoint
Custom provider:    C# plugin registered as a data provider
                    handles: retrieve, retrieve multiple, create, update, delete

Custom virtual table data provider (C#):
public class ExternalProductsPlugin : IPlugin
{
  public void Execute(IServiceProvider serviceProvider)
  {
    var context = serviceProvider.GetService<IPluginExecutionContext>();

    if (context.MessageName == "RetrieveMultiple")
    {
      // Call external API (e.g., legacy ERP system):
      var products = CallERPSystemAPI();

      // Convert to EntityCollection:
      var entityCollection = new EntityCollection();
      foreach (var product in products)
      {
        var entity = new Entity("contoso_product");
        entity["contoso_name"] = product.Name;
        entity["contoso_price"] = product.Price;
        entityCollection.Entities.Add(entity);
      }
      context.OutputParameters["BusinessEntityCollection"] = entityCollection;
    }
  }
}

When to use virtual tables:
→ External data that must appear in Power Apps without ETL
→ Read-only reference data from a legacy system
→ Data too large to copy into Dataverse (compliance, licensing)
→ Real-time inventory, pricing from ERP — must not have sync lag

Limitations:
→ No offline access (requires live connection to external system)
→ No Dataverse-native features (rollup columns, duplicate detection)
→ Performance depends on external system response time
→ No audit history (Dataverse doesn't own the data)

7. Scenario-Based Questions

Scenario: A business rule is causing performance issues on a large Account table. How do you fix it?

Problem analysis:
→ Business rules with entity scope run server-side on every record operation
→ Complex business rules with many conditions slow down create/update operations
→ Business rules cannot be indexed or optimised by the database

Diagnosis steps:
1. Review business rules: Dataverse table → Business Rules → check scope
   Entity-scoped rules run on every create/update from ANY source
2. Check if rule fires unnecessarily: is it checking conditions that
   rarely change? e.g., checking account name on every address update
3. Profile with XrmToolBox → Plugin Trace Log: measure execution time

Solutions:
Option A — Convert to Pre-Operation Plugin (best performance):
→ Plugin only fires on specific messages you configure
→ Can check context.InputParameters to abort early if not relevant
→ Can use pre-image to only run if relevant fields actually changed:

var preImage = context.PreEntityImages["PreImage"];
var oldCreditLimit = preImage.GetAttributeValue<Money>("creditlimit")?.Value ?? 0;
var newCreditLimit = ((Entity)context.InputParameters["Target"])
                      .GetAttributeValue<Money>("creditlimit")?.Value ?? 0;
if (oldCreditLimit == newCreditLimit) return; // skip if not changed

Option B — Scope to Form only (not entity):
→ Changes business rule scope from Entity to Form
→ Rule only fires client-side when user is on the form
→ Not enforced for API/import operations — use only for UX enhancement

Option C — Replace with Calculated/Formula column:
→ If rule is setting a derived value, use a calculated column instead
→ Evaluated on read, not on write — zero write overhead

Scenario: Design a plugin that auto-creates a follow-up Task when an Opportunity is marked as Won.

public class OpportunityWonPlugin : IPlugin
{
  public void Execute(IServiceProvider serviceProvider)
  {
    var context = (IPluginExecutionContext)
      serviceProvider.GetService(typeof(IPluginExecutionContext));
    var serviceFactory = (IOrganizationServiceFactory)
      serviceProvider.GetService(typeof(IOrganizationServiceFactory));
    var service = serviceFactory.CreateOrganizationService(context.UserId);
    var tracingService = (ITracingService)
      serviceProvider.GetService(typeof(ITracingService));

    // Only on Opportunity Update:
    if (context.MessageName != "Update" ||
        context.PrimaryEntityName != "opportunity") return;

    var target = (Entity)context.InputParameters["Target"];

    // Only if statecode changed to Won (statecode = 1):
    if (!target.Contains("statecode")) return;
    var stateCode = target.GetAttributeValue<OptionSetValue>("statecode")?.Value;
    if (stateCode != 1) return;  // 1 = Won

    // Get the Opportunity for owner details:
    var opportunity = service.Retrieve("opportunity",
      context.PrimaryEntityId,
      new ColumnSet("name", "ownerid", "estimatedclosedate"));

    tracingService.Trace($"Creating follow-up task for Won Opportunity: {opportunity["name"]}");

    // Create a follow-up Task (async plugin — won't block the user):
    var task = new Entity("task");
    task["subject"] = $"Follow-up: {opportunity["name"]} - Won!";
    task["description"] = "Send welcome pack and onboarding information";
    task["regardingobjectid"] = new EntityReference("opportunity", context.PrimaryEntityId);
    task["ownerid"] = opportunity["ownerid"];  // assign to opp owner
    task["scheduledend"] = DateTime.UtcNow.AddDays(3);  // due in 3 days
    task["prioritycode"] = new OptionSetValue(2);  // High priority

    var taskId = service.Create(task);
    tracingService.Trace($"Task created: {taskId}");
  }
}

Plugin registration:
Entity:   opportunity
Message:  Update
Stage:    Post-Operation (40)   ← after save, record is in DB
Mode:     Asynchronous           ← non-blocking, failure won't rollback opp win
Images:   none required (statecode is in Target)

Scenario: How do you implement a full ALM pipeline for a Power Platform solution from scratch?

Full ALM implementation steps:

1. Environment setup (Power Platform Admin Centre):
   Developer environment:  personal, each developer has their own
   Test environment:       shared, F-licence or Managed Environment
   Production:             Managed Environment, DLP policies enforced

2. Solution setup (Developer environment):
   → Create publisher: prefix "ctr", publisher name "Contoso"
   → Create solution: "ContosoSales" (associate with publisher)
   → ALL customisations inside this solution only
   → Set up Environment Variables for connection strings
   → Use Connection References for all flows

3. Azure DevOps setup:
   → Create repository: ContosoSales
   → Install "Power Platform Build Tools" extension from Marketplace
   → Create service connections (SPN):
     PP-Dev-SPN:  service principal with Maker role on Dev env
     PP-Test-SPN: service principal with Maker role on Test env
     PP-Prod-SPN: service principal with Maker role on Prod env

4. CI pipeline (builds/PRs):
   → Export solution (unmanaged) from Dev
   → Unpack to source files (pac solution unpack)
   → Run Solution Checker (fail on Critical errors)
   → Pack as Managed
   → Publish artifact (managed zip)

5. CD pipeline (deployments):
   → Import managed to Test (automatic)
   → Configure environment variables + connection references
   → Run smoke tests (EasyRepro UI tests or manual test checklist)
   → Approval gate: QA lead approves
   → Import managed to Production
   → Approval gate: Release manager approves

6. Developer daily workflow:
   git checkout -b feature/TICKET-123-account-validation
   # develop in personal dev environment
   # export + unpack + commit
   git push origin feature/TICKET-123-account-validation
   # raise PR → CI pipeline runs → solution checker passes → peer review
   # merge to main → CD pipeline deploys to Test

Scenario: A custom connector is failing authentication to a third-party API. How do you debug?

Systematic debugging approach:

Step 1 — Test the API directly (Postman or curl):
→ Call the API directly with the same credentials to confirm the API works
→ If direct call fails: the API itself has an issue (key expired, endpoint changed)
→ If direct call succeeds: the connector configuration is the issue

Step 2 — Check connector authentication configuration:
→ Power Automate → Custom Connectors → [connector] → Security tab
→ Verify: auth type, client ID, client secret, token URL, scopes

Step 3 — Check Power Automate run history:
→ Open failed flow run → expand the action → check error body
→ Common errors:
  401 Unauthorized:   wrong credentials, expired token, missing scope
  403 Forbidden:      correct auth but insufficient permissions
  400 Bad Request:    malformed request body — check connector definition

Step 4 — Test in connector wizard:
→ Custom Connectors → [connector] → Test tab
→ Create a new connection → test each operation individually
→ See raw request/response in test results

Step 5 — Check token refresh:
→ OAuth connectors: check if refresh token has expired (user must re-auth)
→ API key: check if key was rotated in the external system
→ Solution: update the connection in Power Automate with new credentials

Step 6 — Check environment variable / connection reference:
→ If connection reference used: verify the connection reference is configured
  correctly in the target environment
→ Power Platform Admin Centre → Connections → verify active connections

8. Cheat Sheet — Quick Reference

PL-400 Exam Domain Priorities

Domain                          Weight    Focus Areas
Extend the platform             40-45%    C# plugins, Custom connectors,
                                          Dataverse Web API, Virtual tables,
                                          Power Platform CLI, ALM
Create a technical design       10-15%    Low-code vs pro-code decision,
                                          Architecture, Environment variables
Extend the user experience      10-15%    PCF controls (TypeScript),
                                          Client-side JavaScript (Xrm API)
Build Power Platform solutions  10-15%    Solutions, Dataverse tables,
                                          Power Fx, Delegation
Develop integrations            5-10%     Custom connectors, Virtual tables,
                                          Azure Functions, OData
Implement Power Apps improvements 5-10%  Performance, delegation, formula optimisation

Plugin Pipeline Quick Reference

Stage    When                    Use for                    Transaction
10 Pre-  Before transaction      External validation,       No
   Valid                         external system checks
20 Pre-  In transaction,         Set defaults, enrich       Yes
   Op    before DB write         record before save
30 Main  The actual operation    Not available for plugins  Yes
40 Post- In transaction,         Create related records,    Yes
   Op    after DB write          trigger integrations, notify

Execution modes:
Sync:  blocks user, failure rolls back → use for validation
Async: background queue, failure does NOT roll back → use for notifications

Plugin images:
Pre-image (update/delete):  record state BEFORE this operation
Post-image (create/update): record state AFTER this operation
Register in Plugin Registration Tool with specific column names

Dataverse Web API Quick Reference

Base URL: https://{org}.api.crm11.dynamics.com/api/data/v9.2/

OData operations:
GET    /accounts?$select=name&$filter=...    → retrieve multiple
GET    /accounts({id})                        → retrieve single
POST   /accounts                              → create
PATCH  /accounts({id})                        → update (partial)
DELETE /accounts({id})                        → delete

OData query options:
$select    choose columns    $select=name,creditlimit
$filter    filter rows       $filter=creditlimit gt 50000
$orderby   sort              $orderby=name asc
$top       limit rows        $top=100
$expand    join related      $expand=primarycontactid($select=fullname)
$count     count results     $count=true

Association:
POST /accounts({id})/contact_customer_accounts/$ref
{ "@odata.id": "/contacts({contactId})" }

Unbound action example:
POST /WinOpportunity
{ "OpportunityClose": {...}, "Status": 3 }

ALM Quick Reference

Solution types:
Unmanaged: development environment only — editable
Managed:   non-dev environments — read-only, clean uninstall

Environment strategy:
Developer (personal) → Test (shared) → UAT → Production

Power Platform CLI key commands:
pac auth create --url {envUrl}
pac solution clone --name {solution}
pac solution pack --zipFile ./out.zip --managed
pac solution import --path ./out.zip
pac pcf init, push, version
pac canvas pack, unpack

Azure DevOps tasks:
PowerPlatformToolInstaller    → install tools
PowerPlatformWhoAmi           → validate connection
PowerPlatformExportSolution   → export from source env
PowerPlatformUnpackSolution   → unpack to source files
PowerPlatformChecker          → run solution checker
PowerPlatformImportSolution   → deploy to target env
PowerPlatformPublishCustomizations → publish after import

Pipeline pattern: Export → Unpack → Check → Pack → Import

PCF Control Lifecycle

Method          When called                       Use for
init()          Control loads (once)              Setup, subscribe to events
updateView()    Value or context changes           Re-render the control
getOutputs()    After notifyOutputChanged()        Return new value to Dataverse
destroy()       Control removed from DOM           Cleanup, unsubscribe

Key context properties:
context.parameters.{fieldName}.raw        → current field value
context.parameters.{fieldName}.formatted  → formatted display value
context.userSettings.userName             → current user name
context.client.isOffline                  → offline mode?
context.mode.isControlDisabled            → read-only?
context.factory.requestRender()           → request re-render

Top 10 Tips

  1. Extend the platform = 40–45% of PL-400 — this single domain is almost half the exam. Master: C# plugins (all pipeline stages), Dataverse Web API (OData), custom connectors, Power Platform CLI, and ALM. Everything else is secondary to this domain.

  2. Plugin stage selection is heavily tested — Pre-Operation (stage 20) to modify the Target before save (no explicit Update needed). Post-Operation (stage 40) to create related records (explicit Update needed). Pre-Validation (stage 10) for external system checks without a transaction. Know which stage to use for which scenario.

  3. Sync vs Async plugins = validation vs notification — synchronous plugins block the user and roll back on failure (use for validation). Asynchronous plugins run in background and don't roll back the triggering operation (use for sending emails, calling external APIs).

  4. Always try low-code first — the exam tests your ability to decide when pro-code is justified. If a Business Rule or Power Automate flow can solve the problem, that's the right answer. Only escalate to plugins when genuinely needed.

  5. Connection references and environment variables are the ALM foundation — flows must use Connection References (not direct connections) and variables must use Environment Variables (not hardcoded values). Without these, solutions break when moved between environments.

  6. pac solution unpack enables meaningful Git diffs — without unpacking, a solution is a binary ZIP file that can't be diff'd or reviewed. Unpacking creates YAML/XML files per component — pull requests show exactly what changed.

  7. PCF controls replace HTML web resources — HTML web resources are legacy. PCF controls are the modern extension point for custom UI, with proper TypeScript support, lifecycle management, and canvas app compatibility.

  8. Virtual tables = no data copy, always live — data stays in the external system and Dataverse proxies queries through. No sync lag, no duplication, but no offline access and no Dataverse-native features (rollup columns, duplicate detection).

  9. Power Platform Pipelines vs Azure DevOps — Pipelines for simpler orgs and citizen developer ALM (built into Power Platform, no DevOps expertise needed). Azure DevOps for complex orgs needing branch strategies, automated testing, and full DevOps integration.

  10. Solution Checker is mandatory before production — Solution Checker (run via Azure DevOps task or pac solution check) detects performance and reliability issues in plugins, JavaScript, and flows. Failing on Critical issues as a pipeline gate is the enterprise best practice for preventing broken deployments.



Featured Post

VMware — Complete Guide

VMware — Complete Guide vSphere  ·  ESXi  ·  vCenter  ·  NSX-T  ·  vSAN  ·  Tanzu  ·  HCX  ·  Site Recovery  ·  Aria  ·  VC...

Popular posts