Sunday, May 17, 2026

Microsoft Azure Solutions Architect Expert (AZ-305) Complete Guide

Microsoft Azure Solutions Architect Expert (AZ-305) — Complete Guide

Well-Architected Framework · Identity & Governance · Data Storage · Infrastructure · Business Continuity · Landing Zones · Scenarios · Cheat Sheet

Top Hashtags: #AZ305, #AzureSolutionsArchitect, #MicrosoftAzure, #CloudArchitecture, #AzureArchitect, #WellArchitectedFramework, #CloudAdoptionFramework, #AzureDesign, #MicrosoftCertified, #CloudComputing


Table of Contents

  1. Exam Overview & Architect Mindset
  2. Design Identity, Governance & Monitoring (25–30%)
  3. Design Data Storage Solutions (25–30%)
  4. Design Infrastructure Solutions (30–35%)
  5. Design Business Continuity Solutions (10–15%)
  6. Azure Well-Architected Framework & Landing Zones
  7. Scenario-Based Questions
  8. Cheat Sheet — Quick Reference

1. Exam Overview & Architect Mindset

AZ-305 Exam at a Glance

The AZ-305 exam validates expertise in designing cloud and hybrid solutions — translating business requirements into Azure architectures aligned with the Well-Architected Framework and Cloud Adoption Framework. This is a design exam, not a configuration exam.

Skill Domain Exam Weight
Design infrastructure solutions 30–35% ← LARGEST
Design identity, governance, and monitoring solutions 25–30%
Design data storage solutions 25–30%
Design business continuity solutions 10–15%

Prerequisite: AZ-104 (Azure Administrator Associate) must be held before earning AZ-305 Expert certification.

Critical mindset shift: AZ-305 questions present a business scenario with stated requirements and ask you to recommend, evaluate, or compare Azure services. The key skill is understanding WHY one service fits a scenario better than another — not just knowing what each service does.


What is the Azure Well-Architected Framework?

The Azure Well-Architected Framework (WAF) is Microsoft's set of guiding principles for building high-quality cloud solutions. Every AZ-305 design decision should be justified using WAF pillars.

Five WAF Pillars:

1. Reliability:
→ Design for failure — assume components will fail
→ Use redundancy: Availability Zones, geo-replication, multi-region
→ Define and meet RTO (Recovery Time Objective) and RPO (Recovery Point Objective)
→ Health monitoring, automatic failover, circuit breakers
→ Example: deploy SQL Database with zone-redundant Business Critical tier
             + auto-failover group to paired region

2. Security:
→ Defence in depth: multiple layers of security
→ Principle of least privilege: minimum permissions needed
→ Zero Trust: verify explicitly, assume breach, limit blast radius
→ Encrypt data at rest and in transit; manage secrets in Key Vault
→ Example: Private Endpoints for PaaS, WAF on Application Gateway,
             Managed Identity instead of service accounts

3. Cost Optimisation:
→ Right-size resources — don't over-provision
→ Match consumption to business value (pay for what you use)
→ Reserved Instances for predictable workloads
→ Use PaaS over IaaS where possible (lower management overhead)
→ Azure Hybrid Benefit for existing licences

4. Operational Excellence:
→ Infrastructure as Code (Bicep/Terraform) — repeatable, version-controlled
→ CI/CD pipelines for deployment automation
→ Monitoring: Azure Monitor, Log Analytics, Application Insights
→ Runbooks and automated remediation
→ Change management: blue-green deployments, feature flags

5. Performance Efficiency:
→ Scale horizontally (add instances) not just vertically (bigger VM)
→ Cache frequently accessed data (Azure Cache for Redis)
→ CDN for static content delivery (Azure Front Door / CDN)
→ Choose the right service tier for the workload profile
→ Autoscaling: VMSS, App Service autoscale, AKS cluster autoscaler

What is the Cloud Adoption Framework (CAF)?

Cloud Adoption Framework (CAF):
→ Microsoft's guidance for adopting Azure at enterprise scale
→ Covers: strategy, planning, landing zones, governance, operations

CAF phases:
Strategy:    define business justification, expected outcomes
Plan:        assess current state, create migration backlog
Ready:       set up landing zones, management groups hierarchy
Adopt:       migrate or modernise workloads (lift-and-shift, re-architecture)
Govern:      establish guardrails — Azure Policy, cost management, security baseline
Manage:      operations baseline — monitoring, backup, disaster recovery
Innovate:    build cloud-native solutions, AI/ML, data analytics

Azure Landing Zone:
→ Preconfigured, governance-ready Azure environment for enterprise adoption
→ Includes: management group hierarchy, Azure Policy assignments, networking,
            identity, security baseline, monitoring
→ Conceptual architecture: Platform vs Application landing zones
→ Platform: connectivity hub, identity, management subscriptions (managed by central IT)
→ Application: individual workload subscriptions (managed by application teams)
→ Accelerator: Bicep/Terraform modules to deploy landing zone in hours

Management Group hierarchy (CAF-recommended):
Tenant Root Group
  ├── Platform (managed by central IT)
  │     ├── Management (Log Analytics, Automation)
  │     ├── Identity (AD DS, Entra Connect)
  │     └── Connectivity (hub VNet, VPN/ExpressRoute, Firewall)
  └── Landing Zones (workload teams)
        ├── Corp (connected to hub, private)
        └── Online (internet-facing, DMZ)

2. Design Identity, Governance & Monitoring (25–30%)

How do you design an identity architecture for a hybrid enterprise?

Hybrid identity decision tree:

Question 1: Do you have on-premises Active Directory?
No:  → Cloud-only Entra ID. Use Entra ID Joined devices. Done.
Yes: → Need Entra Connect for identity synchronisation

Question 2: Authentication method for synchronised identities?
Password Hash Sync (PHS): → RECOMMENDED for most orgs
  Pros: works if on-prem unavailable, leaked credential detection,
        seamless SSO with seamless SSO extension
  Cons: password hash in cloud (some orgs have policy against this)

Pass-Through Auth (PTA): → when policy prohibits cloud password hashes
  Pros: passwords never leave on-prem, real-time account state
  Cons: requires on-prem PTA agent availability, no leaked cred detection

Federation (AD FS): → only when advanced requirements (smart cards,
                       complex claims transformation)
  Pros: full control over authentication claims
  Cons: complex, expensive infrastructure, Microsoft recommends migrating away

Question 3: Device management?
Cloud-native devices:  → Entra ID Joined + Intune (MDM)
Existing domain:       → Hybrid Entra ID Joined (domain join + Entra registration)
BYOD:                  → Entra ID Registered + MAM (App Protection Policies)

Identity architecture for a 10,000-person hybrid enterprise:
→ Entra Connect with PHS + Seamless SSO
→ All admin roles in PIM (eligible assignments, not permanent)
→ Break-glass accounts: 2 cloud-only Global Admins, excluded from all CA
→ Conditional Access: MFA all users, block legacy auth, require compliant device
→ Identity Protection: sign-in risk ≥ Medium → MFA, user risk High → password reset
→ Named locations: corporate IP ranges for trusted sign-in signals
→ B2B: Entitlement Management for partner access, time-limited, self-service
→ Entra ID P2 required for PIM, Identity Protection risk policies, Access Reviews

How do you design a governance architecture at scale?

Governance architecture layers:

Layer 1 — Management Group hierarchy:
Root → Decommissioned MG | Platform MG | Landing Zones MG | Sandbox MG
→ Azure Policy applied at MG level inherits to all child subscriptions
→ RBAC applied at MG level inherits to all child subscriptions

Layer 2 — Azure Policy:
Initiative: "CIS Azure Benchmark" or custom baseline
Key policies:
  Deny:  "No public IP on VMs" (enforces private-only VMs)
  Deny:  "Allowed locations: UK South, UK West" (data residency)
  Deny:  "Allowed VM SKUs" (prevent large SKUs in dev)
  Audit: "Storage accounts should use private endpoints"
  DINE:  "Deploy Azure Monitor Agent if missing" (auto-remediate)

Layer 3 — Azure Blueprints (or Bicep/Terraform for landing zones):
→ Combine: role assignments + policy assignments + ARM templates
→ Assign to subscriptions for consistent baseline deployment
→ Note: Bicep + Azure DevOps is now preferred over Blueprints for new designs

Layer 4 — RBAC design:
Management Group → Owner (cloud platform team only — use PIM)
Subscription → Contributor (workload team — via PIM)
Resource Group → Contributor/specific roles (workload developers)
Resources → Reader/specific data roles (application service principals)

Layer 5 — Tagging strategy:
Mandatory tags enforced by Azure Policy:
  Environment: Production | Development | Test
  Owner:        team or product owner email
  CostCentre:   cost allocation code
  Application:  application name/ID
  DataClass:    Public | Internal | Confidential | Restricted
Cost Management: filter by tag to see per-team, per-app spending

Layer 6 — Subscription design:
Option A: One subscription per environment (simplest)
  Prod-Sub | Dev-Sub | Test-Sub
Option B: One subscription per workload (better isolation)
  Sales-Prod | HR-Prod | Finance-Prod (separate blast radius, billing)
Option C: Hybrid (recommended for large orgs)
  Platform subs (connectivity, identity, management) + Workload subs

How do you design a monitoring solution for Azure?

Monitoring architecture:
→ Centralised Log Analytics workspace (one per environment or per org)
→ All resources send diagnostic logs and metrics to central workspace
→ Azure Monitor alerts trigger action groups (email, Teams, PagerDuty)

Data sources and collection:
Activity Log:        subscription-level control plane operations (all subs → workspace)
Resource Diagnostics: service-level logs (SQL queries, App Gateway access, Key Vault ops)
VM Guest OS:          Azure Monitor Agent + Data Collection Rules → workspace
Application:          Application Insights SDK → telemetry (requests, exceptions, dependencies)
Container:            Container Insights → AKS node + pod metrics to workspace
Security:             Microsoft Defender for Cloud → security findings to workspace
Network:              Network Watcher → flow logs, connection monitor

Application Insights:
→ APM (Application Performance Monitoring) for web applications
→ Instrumentation: add SDK or use auto-instrumentation (App Service, AKS)
→ Telemetry: requests, exceptions, dependencies (SQL, HTTP calls), custom events
→ Live Metrics: real-time monitoring during deployment
→ Availability tests: ping test every 5 minutes from multiple Azure regions
→ Smart detection: ML-based anomaly detection (failure rate spikes, performance degradation)

Azure Monitor Alerts:
Metric alerts:   trigger on: CPU > 80%, DTU > 90%, response time > 2s
Log alerts (KQL): trigger on: error rate > 5%, specific exception count
Activity log:    trigger on: resource deleted, policy denied, security event

Action Groups (what happens when alert fires):
Email:           notify team
SMS:             notify on-call
Teams webhook:   post to Teams channel
Azure Function:  auto-remediate (restart service, scale out)
Logic App:       create ITSM ticket, update status page
Automation Runbook: execute remediation script

Dashboard and visualisation:
Azure Monitor Dashboards: operational view for NOC
Azure Workbooks:          custom parameterised reporting and analysis
Power BI:                 management-level KPI reporting from Log Analytics
Grafana:                  popular open-source dashboards (integrates with Azure Monitor)

3. Design Data Storage Solutions (25–30%)

How do you choose the right Azure storage service?

Storage service selection matrix:

Requirement                                  → Best Service
Unstructured blobs (documents, images, video) → Azure Blob Storage
Virtual machine disks                         → Azure Managed Disks
File shares (SMB/NFS, lift-and-shift)         → Azure Files
Hybrid file server (on-prem + cloud cache)    → Azure File Sync
NoSQL document database (JSON)                → Azure Cosmos DB (SQL API)
NoSQL key-value at massive scale              → Azure Cosmos DB (Table API)
Globally distributed, multi-write            → Azure Cosmos DB (any API)
Relational data, OLTP workloads              → Azure SQL Database
SQL Server features, lift-and-shift           → Azure SQL Managed Instance
Data warehouse, analytical queries (OLAP)    → Azure Synapse Analytics / Fabric
Real-time analytics over streaming data      → Azure Synapse Analytics + Event Hubs
Message queue (simple, cheap, massive)        → Azure Storage Queues
Enterprise messaging (ordering, DLQ, pub/sub) → Azure Service Bus
Event streaming (millions/sec, telemetry)     → Azure Event Hubs
Time-series / IoT analytics                  → Azure Data Explorer / KQL
Graph data (relationships between entities)  → Azure Cosmos DB (Gremlin API)
Cache layer (session, query results)          → Azure Cache for Redis

Blob storage access tiers (cost vs latency):
Hot:     frequent access, highest storage cost, lowest access cost
Cool:    30-day minimum, cheaper storage, higher access cost
Cold:    90-day minimum, very cheap storage, higher access cost
Archive: 180-day minimum, cheapest storage, OFFLINE (rehydrate before read)
Lifecycle policy: automatically tier blobs based on last modified age

Blob storage redundancy (durability vs cost):
LRS:   3 copies, single datacenter, cheapest
ZRS:   3 copies across AZs, zone-resilient (recommended for most production)
GRS:   LRS + async copy to paired region, secondary readable only after failover
GZRS:  ZRS + async to paired region, highest durability (16 nines)
RA-GRS/RA-GZRS: read from secondary at all times (DR read scale-out)

How do you design a relational database solution on Azure?

Azure SQL options:

Azure SQL Database (fully managed PaaS):
→ Serverless tier: auto-pause during idle (cost saving for dev/test)
→ General Purpose: balanced compute/storage (most production OLTP)
→ Business Critical: in-memory OLTP, readable secondary, local SSD
→ Hyperscale: up to 100 TB, fast backup/restore (large databases)
→ Elastic Pool: share resources across multiple databases (multi-tenant SaaS)
→ Auto-failover groups: automatic geo-replication + failover (RPO < 5 sec)
→ Zone-redundant: Business Critical and General Purpose (zonal HA)

Azure SQL Managed Instance (PaaS with near-full SQL Server compatibility):
→ Lift-and-shift of SQL Server workloads needing: SQL Agent, CLR, linked servers,
  Service Broker, CDC, cross-database queries (not available in SQL Database)
→ Deployed INSIDE a VNet (private, no public endpoint by default)
→ Business Critical tier: in-memory OLTP + readable secondary
→ Migration: Database Migration Service from on-prem SQL Server
→ Choice: if application needs SQL Agent or cross-database queries → Managed Instance

SQL Server on Azure VM (IaaS):
→ Full SQL Server control (any version, any edition, OS-level access)
→ Use when: need features only available on bare SQL Server (SSRS, SSAS, SSIS),
            or specific versions/configurations not supported in PaaS
→ Higher management overhead vs PaaS options

Decision framework:
Can PaaS SQL Database handle the requirements?
  Yes: use SQL Database (lowest management overhead)
  Needs SQL Agent, cross-DB queries, Service Broker?
    Yes: use SQL Managed Instance
    Needs full OS control or specific SQL Server version?
      Yes: SQL Server on Azure VM

High availability for SQL:
Single region:    Business Critical tier → built-in Always On AG (3 replicas, ZR)
Multi-region:     Auto-failover group → geo-replication + automatic DNS failover
RPO:              < 5 seconds (geo-replication log shipping)
RTO:              < 30 seconds (automatic failover with DNS redirect)

How do you design a Cosmos DB solution?

Azure Cosmos DB:
→ Globally distributed, multi-model NoSQL database
→ SLA: 99.999% availability, single-digit millisecond latency (P99)
→ APIs: SQL (Core), MongoDB, Cassandra, Gremlin (graph), Table, PostgreSQL

Key design decisions:

1. Partition key selection (MOST CRITICAL decision):
→ Must provide even distribution of data (high cardinality)
→ Must be included in most queries (enables partition-targeted reads)
→ Cannot be changed after container creation
→ Good: UserId, CustomerId, TenantId, ProductId
→ Bad: Boolean, Status (low cardinality — hot partitions)
→ Synthetic key: if single key has low cardinality, combine fields
   e.g., {country}-{city} instead of country alone

2. Consistency levels (5 levels, stronger = more latency/cost):
Strong:           linearisable — always reads latest write (highest cost)
Bounded Staleness: reads lag writes by configurable window (K ops or T seconds)
Session:           strong consistency within a session (DEFAULT — best for most apps)
Consistent Prefix: reads never see out-of-order writes
Eventual:          weakest — reads may lag, highest performance

3. Request Units (RUs):
→ Currency of Cosmos DB — normalised unit of throughput
→ Every operation (read, write, query) consumes RUs
→ Provision throughput (guaranteed RU/s) or serverless (pay per op)
→ 1 RU = reading a 1 KB document by its partition key + id

4. Global distribution:
→ Add regions with one click — data replicated automatically
→ Single-write region: one primary write, any region reads (default)
→ Multi-region write: any region can write (use for write-heavy global apps)
→ Conflict resolution: needed for multi-write (last write wins or custom)

5. Change Feed:
→ Ordered stream of inserts and updates in a Cosmos DB container
→ Use for: event-driven microservices, materialised views, ETL to analytics
→ Consumers: Azure Functions trigger, Spark, SDK change feed processor

4. Design Infrastructure Solutions (30–35%)

How do you design a compute solution — choosing between VM, App Service, AKS, and Functions?

Compute service selection framework:

Azure Virtual Machines (IaaS):
→ Use when: need full OS control, specific OS/software, GPU, legacy apps
→ Lift-and-shift: move existing apps without code changes
→ HA pattern: Availability Zones + Standard Load Balancer (99.99% SLA)
→ Scale: VM Scale Sets (VMSS) for auto-scaling clusters
→ Avoid for: stateless web apps (use App Service/containers instead)

Azure App Service (PaaS):
→ Use when: web apps, REST APIs, mobile backends in .NET, Java, Node, Python, PHP
→ No VM management: Azure handles OS, patching, load balancing, scaling
→ Deployment slots: zero-downtime deployments, blue-green, A/B testing
→ VNet Integration: outbound private connectivity to databases/services
→ Private Endpoint: inbound private access (no public internet)
→ Scale: horizontal (add instances) up to 30, vertical (bigger plan)
→ Choose tier: Free/Basic (dev/test) → Standard (production, autoscale)
             → Premium (VNet integration) → Isolated (ASE, dedicated)

Azure Kubernetes Service (AKS):
→ Use when: microservices, containerised workloads, complex orchestration
→ Benefits: auto-scaling, rolling deployments, self-healing, rich ecosystem
→ Design: multiple node pools (system + user), node autoscaler, HPA
→ Networking: Azure CNI (direct pod IPs from VNet — needed for private)
→ Storage: Azure Disk (ReadWriteOnce) or Azure Files (ReadWriteMany)
→ Security: Managed Identity for pod-level Azure access (Workload Identity)
→ Ingress: Azure Application Gateway Ingress Controller (AGIC) for WAF

Azure Functions (Serverless):
→ Use when: event-driven compute, short-lived operations, unknown/spiky traffic
→ Scale to zero: pay only during execution (Consumption plan)
→ Premium plan: pre-warmed instances, VNet integration, longer timeout
→ Triggers: HTTP, Service Bus, Event Hub, Blob Storage, Timer, Cosmos DB
→ Durable Functions: stateful orchestration (fan-out/fan-in, long workflows)
→ Avoid for: long-running processes > 10 min (use App Service or AKS)

Decision tree:
Need OS-level control?              → VM
Containerised microservices?        → AKS
Web/API app, managed PaaS?          → App Service
Event-driven, short duration?       → Azure Functions
Burst traffic, stateless?           → Functions or App Service with autoscale
Complex stateful workflow?          → Durable Functions or Logic Apps

How do you design a network architecture for enterprise Azure?

Hub-Spoke network topology (recommended for most enterprises):

Hub VNet (central):
→ Shared services: Azure Firewall, VPN/ExpressRoute Gateway, Bastion, DNS
→ Managed by central networking team
→ All spoke traffic routes through hub (forced tunnelling)
→ Hub in each Azure region for global presence

Spoke VNets (per workload/environment):
→ Each application or team gets own spoke VNet
→ Peered to Hub: VNet Peering (low latency, same region)
               Global VNet Peering (cross-region)
→ No direct spoke-to-spoke peering (routes through hub firewall for inspection)
→ Subnet design: AppSubnet, DataSubnet, PrivateEndpointSubnet, etc.

Virtual WAN (alternative for large/global organisations):
→ Microsoft-managed hub-and-spoke at scale
→ Supports: any-to-any connectivity (spoke-spoke via VWAN hub)
→ Integrates: VPN, ExpressRoute, SD-WAN partners, Azure Firewall Manager
→ Use for: organisations with 10+ spokes or multiple regions

On-premises connectivity:
VPN Gateway:     encrypted IPsec over internet (up to 10 Gbps, simpler, cheaper)
ExpressRoute:    private dedicated circuit via connectivity provider
                 (consistent latency, up to 100 Gbps, higher cost, compliance)
ExpressRoute + VPN: redundant connectivity (ExpressRoute primary, VPN backup)

DNS architecture:
Azure Private DNS zones: private name resolution within Azure VNet
  privatelink.blob.core.windows.net → maps to storage private endpoint IPs
  privatelink.database.windows.net  → maps to SQL private endpoint IPs
Azure DNS Resolver: forward on-prem DNS queries to Azure private zones
                    forward Azure queries to on-prem DNS servers
  → Enables split-horizon DNS for hybrid environments

DDoS Protection:
DDoS Network Protection: per-VNet (recommended for public-facing)
DDoS IP Protection:      per-public IP (simpler, less features)
WAF:                     Application Gateway WAF or Front Door WAF
                         (protects against OWASP Top 10, layer 7 attacks)

How do you design a secure solution using Azure Key Vault?

Azure Key Vault:
→ Centralised secrets, keys, and certificates management
→ Every application should use Key Vault — NEVER store secrets in code/config

Key Vault uses:
Secrets:      connection strings, API keys, passwords
Keys:         RSA/EC keys for envelope encryption, signing
Certificates: X.509 certificates with auto-renewal (Let's Encrypt, DigiCert)

Access patterns:
Managed Identity (RECOMMENDED):
  → Application Managed Identity → assigned Key Vault Secrets User role
  → No credentials to manage, no rotation risk
  → Code: var secret = await client.GetSecretAsync("ConnectionString");

RBAC roles for Key Vault:
  Key Vault Secrets User:    read secret values (applications)
  Key Vault Secrets Officer: manage secrets, not read values (operations)
  Key Vault Administrator:   full control (admin — use PIM)
  Key Vault Reader:          read metadata, not values (audit)

Key Vault networking:
  Private Endpoint:     Key Vault accessible only via private IP in VNet
  Service Endpoint:     traffic stays on Azure backbone (no private IP)
  Firewall:             allowlist specific IPs/VNets; deny all others
  Production: always Private Endpoint + disable public access

Soft delete + Purge Protection:
  Soft delete (90 days): deleted secrets recoverable for 90 days
  Purge Protection:      prevents permanent deletion for retention period
  Both required for: compliance, ransomware protection, accidental deletion
  NEVER disable purge protection in production

Key rotation:
  Manual:   generate new version, update application to use latest version
  Automatic (certificates): Key Vault rotates before expiry via CA integration
  Event-driven: Key Vault fires Event Grid event on near-expiry
                → Logic App / Azure Function generates and stores new key
                → Notifies application to reload secret

5. Design Business Continuity Solutions (10–15%)

How do you design for high availability and disaster recovery?

Key definitions:
RTO (Recovery Time Objective):
  Maximum acceptable time for service to be restored after failure
  "We can be down for no more than 4 hours" → RTO = 4 hours

RPO (Recovery Point Objective):
  Maximum acceptable data loss measured in time
  "We cannot lose more than 15 minutes of transactions" → RPO = 15 minutes

SLA tiers and what they require:
99.9%   = 8.7 hours downtime/year  → single instance + Standard SSD (min)
99.95%  = 4.4 hours downtime/year  → Availability Set (2+ VMs)
99.99%  = 52 minutes downtime/year → Availability Zones (2+ VMs, 2+ AZs)
99.999% = 5.2 minutes downtime/year → multi-region active-active

Azure HA architecture patterns:

Single region HA (protects against hardware/datacenter failure):
  Azure Load Balancer + VM Scale Set across 3 Availability Zones
  SQL: Business Critical (3 replicas, zone-redundant)
  Storage: ZRS (3 copies across AZs)
  App Service: Premium tier (zone redundancy)
  SLA: 99.99% for compute + 99.99% for database

Multi-region HA (protects against region-wide outage):
Active-Passive (failover):
  Primary region handles traffic
  Secondary region: pre-deployed, inactive (warm standby)
  DNS failover via Azure Traffic Manager or Azure Front Door
  SQL: auto-failover group with automatic DNS failover
  RTO: minutes (DNS TTL), RPO: < 5 seconds (geo-replication)
  Use for: moderate RTO/RPO requirements

Active-Active (both regions serve traffic simultaneously):
  Both regions active, load balanced globally
  Data: Cosmos DB multi-write, or synchronous replication
  DNS: Azure Front Door (anycast routing, fastest region wins)
  RTO: near-zero (no failover needed — traffic shifts automatically)
  RPO: near-zero (no replication lag for active-active write)
  Use for: highest availability requirements, global user base

Azure Site Recovery (ASR):
  VM replication to secondary region
  Recovery Plans: orchestrated failover in correct order
  Test failover: validate DR without impact to production
  RTO: 1-2 hours (VM startup + DNS propagation)
  RPO: 15 minutes (replication interval)

How do you design a backup strategy?

Azure Backup architecture:

Recovery Services Vault vs Backup Vault:
Recovery Services Vault: VMs, SQL in VM, Azure Files, on-prem (MARS/MABS)
Backup Vault:            newer vault type — Blobs, Disks, AKS, PostgreSQL Flexible

Backup policies (design per RPO/RTO/retention):
Frequency:  hourly, daily, weekly
Retention:  daily (7-30 days), weekly (up to 5 years), monthly/yearly (compliance)
Cross-region restore: restore to paired region for DR (opt-in, extra cost)

Azure Backup for VMs:
→ Crash-consistent (default) or application-consistent (VSS on Windows)
→ Instant restore: restore individual files in minutes from snapshot
→ Soft delete: 14 days protection against accidental/ransomware deletion
→ Immutable vault: prevents deletion of backup data (locked permanently)

SQL Server in Azure VM:
→ Workload-aware backup: full + differential + log backup
→ Log backup every 15 minutes → RPO of 15 minutes for SQL
→ Compression enabled by default

Azure SQL Database:
→ Built-in automated backups: full weekly, differential 12-24h, log every 5-12 min
→ Point-In-Time Restore (PITR): restore to any second within retention period
→ Long-term retention (LTR): store backups up to 10 years (compliance)

Geo-redundant backup storage:
→ Recovery Services Vault: GRS (default) or LRS
→ Cross-region restore: restore backup from primary in secondary region
→ Required for: regulatory compliance, DR strategy validation

Backup monitoring:
→ Azure Backup Reports (Power BI): compliance dashboard, which VMs NOT backed up
→ Azure Monitor alerts: backup job failures → email/Teams notification
→ Recovery Services Vault → Backup Jobs → filter by status = Failed

6. Azure Well-Architected Framework & Landing Zones

What is the Azure Architecture Centre and key architecture patterns?

Azure Architecture Centre (docs.microsoft.com/azure/architecture):
→ Microsoft's official reference for proven Azure architectural patterns
→ Reference architectures: tested designs for common scenarios
→ Design patterns: reusable solutions to recurring problems

Key cloud design patterns for AZ-305:

1. Retry pattern:
   Transient failures (network timeout, throttling) → automatically retry
   Exponential backoff: wait 1s, 2s, 4s, 8s between retries
   Circuit Breaker: stop retrying after threshold → fail fast

2. Circuit Breaker pattern:
   Prevent cascading failures when a downstream service is unhealthy
   States: Closed (normal) → Open (failing, reject calls) → Half-Open (test)
   Implementation: Polly (C#), resilience4j (Java)

3. CQRS (Command Query Responsibility Segregation):
   Separate read and write models
   Write model: transactional database (SQL), optimised for writes
   Read model: denormalised read store (Cosmos DB, Redis), optimised for reads
   Event sourcing: store events, not state; rebuild state by replaying events

4. Strangler Fig pattern:
   Incrementally replace a monolith with microservices
   New requests → new service; old requests → legacy system
   Gradually migrate functionality until legacy system is empty

5. Event-driven architecture:
   Services communicate via events (loose coupling)
   Event Grid: routing discrete state change events
   Service Bus: reliable ordered message delivery
   Event Hubs: high-volume event streaming

6. Bulkhead pattern:
   Isolate critical resources — if one component fails, others continue
   e.g., separate thread pools, connection pools per downstream service
   Prevents one slow service from exhausting all resources

7. Cache-Aside pattern:
   Application manages cache explicitly:
   1. Check cache (Redis) — if found, return cached value
   2. If not found: query database, store in cache, return value
   3. On update: update database, INVALIDATE cache (not update)
   Use: Azure Cache for Redis + Azure SQL / Cosmos DB

8. Federated Identity pattern:
   Delegate authentication to external identity provider (IdP)
   Application trusts tokens from Entra ID (or ADFS, Google, etc.)
   Benefits: SSO across apps, no password management per app

What are the key networking design patterns for AZ-305?

Private Endpoint (most important networking concept for AZ-305):
→ Assign a private IP from your VNet to an Azure PaaS service
→ Traffic between your VNet and the PaaS service never leaves the Microsoft backbone
→ Disable public endpoint on PaaS resource for maximum security
→ Requires: Private DNS Zone to resolve FQDN to private IP

Pattern: secure multi-tier application
  App Service (private endpoint inbound) ← only VNet traffic allowed
    ↓ (VNet Integration — outbound)
  Azure SQL Database (private endpoint) ← no public access
  Azure Key Vault (private endpoint) ← no public access
  Azure Storage (private endpoint) ← no public access

Azure Front Door:
→ Global application delivery network: CDN + WAF + load balancing + failover
→ Anycast: routes users to the closest Azure PoP (Point of Presence)
→ WAF: OWASP protection globally at the edge
→ Origin groups: pool of backends (App Service in US + App Service in EU)
→ Health probes: automatically route away from unhealthy backends
→ Use for: global web applications with multi-region active-active

Azure Traffic Manager:
→ DNS-based global load balancing (NOT a reverse proxy — users connect directly to origin)
→ Routing methods:
  Priority:      primary region + failover regions
  Weighted:      distribute traffic by percentage (canary deployments)
  Performance:   route to lowest latency region for the user
  Geographic:    route users to specific region (data residency)
  Multivalue:    return all healthy endpoints, client picks
→ Use for: global routing of any protocol (not just HTTP), DNS-based failover

Azure Front Door vs Traffic Manager:
Front Door:       HTTP/HTTPS only, reverse proxy, WAF, anycast, CDN, SSL offload
Traffic Manager:  any protocol, DNS-based (no proxy), no WAF, no CDN
Choose Front Door for modern web applications
Choose Traffic Manager for non-HTTP workloads or existing architectures

7. Scenario-Based Questions

Scenario: Design a globally available, zero-trust e-commerce platform handling 1M users.

Requirements:
→ 99.99% SLA, global users, GDPR compliance (EU data residency)
→ Zero Trust security, no public access to backend services
→ Auto-scaling for seasonal traffic spikes

Architecture:

GLOBAL LAYER:
Azure Front Door Premium:
  → WAF policy: OWASP 3.2, custom rules (bot protection, rate limiting)
  → Origins: App Service (West Europe) + App Service (East US)
  → Health probes: route away from unhealthy origins automatically
  → CDN: static assets (images, JS, CSS) cached at edge PoPs
  → Private Link to App Service backend (Front Door → App Service private)

COMPUTE LAYER (West Europe — primary):
Azure App Service (Premium v3, zone-redundant):
  → Outbound via VNet Integration → private access to all backends
  → Private Endpoint for inbound (Front Door is the only entry point)
  → Auto-scale: 2-50 instances based on CPU and HTTP queue
  → Deployment slots: blue-green deployments (swap staging → production)

DATA LAYER:
Azure SQL Database (Business Critical, zone-redundant):
  → Auto-failover group → East US (passive secondary)
  → Private Endpoint: no public internet access
  → Encryption: Always Encrypted for PII columns (GDPR)
  → Auditing: all queries logged to Log Analytics

Azure Cosmos DB (SQL API):
  → Multi-region: West Europe (write) + East US (read)
  → Product catalogue, shopping cart, user sessions
  → Session consistency (strong within session, eventual across regions)
  → PITR: 30-day point-in-time restore

Azure Cache for Redis:
  → Product catalogue cache (60-minute TTL)
  → Session state store (24-hour TTL)
  → Private Endpoint

SECURITY LAYER:
Azure Key Vault (private endpoint): connection strings, API keys, TLS certs
Managed Identity: App Service → Key Vault, SQL, Storage (no stored credentials)
Private DNS Zones: all PaaS services resolve to private IPs
DDoS Network Protection: on all public-facing VNets
Microsoft Defender for Cloud: security posture management across all services
Microsoft Purview: classify and track PII data (GDPR compliance)

MONITORING:
Application Insights: request traces, exceptions, dependency calls, availability tests
Azure Monitor: infrastructure metrics, alerts, action groups
Log Analytics workspace: centralised logs from all services
KQL alert: failed login rate > 5% in 5 minutes → Teams alert

Scenario: A company wants to migrate a 3-tier on-premises application to Azure. Design the migration approach.

Current state: 
  Web tier: 3 IIS servers (Windows Server 2019)
  App tier:  4 .NET application servers
  Data tier: SQL Server 2019 cluster (500 GB, 100 concurrent connections)

Migration phases using CAF Migrate + Modernise:

Phase 1 — Assess (2 weeks):
  Azure Migrate: discover and assess all servers
    → Right-size recommendations (map on-prem specs to Azure SKUs)
    → Dependency analysis (what does each server call?)
    → Cost estimates for IaaS (lift-and-shift) vs PaaS (modernise)

Phase 2 — Lift-and-Shift (IaaS first — 4 weeks):
  Web tier:  3 × D4s_v5 VMs → Application Gateway → 3 VMs in AZ1/2/3
  App tier:  4 × D8s_v5 VMs → Internal Load Balancer → AZ1/2/3
  Data tier: SQL Server on Azure VM (business critical, zone-redundant)
  Migration tool: Azure Migrate + Replication (agent-based for physical)
  Cutover: DNS switch, 4-hour maintenance window, rollback plan ready
  
Phase 3 — Optimise and Modernise (3 months post-migration):
  Web tier:  migrate IIS apps → Azure App Service (Premium, ZR)
             Benefits: no VM management, built-in scaling, deployment slots
  App tier:  containerise .NET apps → Azure Kubernetes Service
             Benefits: better resource utilisation, rolling deployments
  Data tier: assess SQL Managed Instance compatibility
             Azure Database Migration Service (DMS) → online migration (< 1h cutover)
             If SQL Agent or Service Broker needed → SQL Managed Instance
             If standard T-SQL → Azure SQL Database Business Critical

Phase 4 — Govern and Secure (ongoing):
  Azure Policy: enforce tagging, no public IPs on VMs, encryption at rest
  Cost Management: reserved instances for stable workloads (save 40%)
  Monitoring: Azure Monitor + Application Insights (enable for modernised apps)
  Security: Defender for Servers (VMs) + Defender for SQL (databases)

Scenario: Design a multi-tenant SaaS architecture on Azure.

Multi-tenant SaaS: one deployment serves multiple customer organisations

Tenancy model decision:

Model A — Shared everything (full multi-tenant):
  → One database, one compute, all tenants share
  → Lowest cost, highest density
  → Risk: noisy neighbour, data isolation challenges
  → Use for: small tenants, cost-sensitive, lower compliance requirements
  → Data isolation: Row-Level Security in SQL Database (TenantId filter)
  → Compute isolation: App Service multi-tenant with Entra ID OIDC tenant filtering

Model B — Shared compute, separate databases (hybrid):
  → One App Service plan, one Cosmos DB container per tenant (partition by TenantId)
  → OR: Elastic Pool — separate databases sharing one SQL server
  → Elastic Pool: up to 500 databases, pooled DTUs/vCores
  → Good balance: isolation at data layer, shared compute cost
  → Use for: SMB SaaS with moderate isolation requirements

Model C — Fully isolated (silo):
  → Separate subscription or resource group per tenant
  → Highest isolation, highest cost
  → Use for: enterprise tenants, strict compliance (FinTech, Healthcare)
  → Deployment: Bicep template, deploy new resources per tenant onboarding

SaaS-specific design patterns:
Onboarding automation: Logic App / Azure Function → provision new tenant infra
Tenant routing:   App Service → header/domain-based routing to correct data
Tenant billing:   Azure Cost Management tags (TenantId) → per-tenant cost tracking
Tenant offboarding: Data export → soft delete retention → hard delete after 90 days
Compliance:       Purview sensitivity labels per tenant, separate Key Vault per tenant

Scenario: How do you design for cost optimisation without sacrificing reliability?

Cost optimisation architecture decisions:

1. Right-size with Azure Advisor:
   → Azure Advisor → Cost → Underutilised resources
   → VMs with < 5% average CPU → downsize (D4 → D2)
   → SQL DTUs < 20% average → move to smaller service objective

2. Reserved Instances (40-72% savings):
   → 1-year RI for VMs known to run 24/7 (web servers, databases)
   → 3-year RI for very stable workloads (domain controllers, monitoring)
   → Scope: shared (across all subscriptions) for maximum flexibility

3. Azure Savings Plans (up to 65%):
   → Commitment to hourly spend on compute (any region, any size)
   → More flexible than Reserved Instances
   → Best for variable but consistent compute usage

4. Azure Hybrid Benefit:
   → Windows Server with SA: 40% savings on Windows VMs
   → SQL Server with SA: 55% savings on SQL licensing
   → Stack together with Reserved Instances for up to 80% savings

5. PaaS over IaaS:
   → Serverless/PaaS: pay only for what you use
   → App Service vs VMs: same capability, App Service scales to zero in some tiers
   → Azure Functions (Consumption): pay per execution (dev/test = free tier)

6. Autoscaling + scheduled scaling:
   → Scale in nights/weekends for non-production
   → VMSS: minimum 2 instances (HA), scale out on CPU > 70%
   → Dev VMs: auto-shutdown at 6pm via Azure Automation

7. Storage lifecycle management:
   → Blob: auto-tier (Hot → Cool after 30 days → Archive after 365 days)
   → Saves 70-95% on old data that must be retained

8. Dev/Test pricing:
   → Azure Dev/Test subscriptions: discounted Windows Server licensing
   → SQL in Dev/Test: no SQL licence cost (dev only workloads)

9. Spot VMs (up to 90% discount):
   → Fault-tolerant, interruptible workloads: batch processing, rendering, CI runners
   → Use VMSS with Spot priority + fallback to regular (Azure auto-evicts when needed)

10. Network costs:
    → Outbound bandwidth costs money; keep data in same region as consumers
    → VNet Peering cross-region: transfer costs (consider VPN vs peering)
    → CDN: serve static content from edge to reduce origin egress

8. Cheat Sheet — Quick Reference

AZ-305 Exam Domain Weights

Domain                                     Weight   Focus
Design infrastructure solutions            30-35%   Compute (VM/AKS/App Service/Functions)
                                                     Networking (hub-spoke, Front Door, VPN)
                                                     Security (Key Vault, Private Endpoints)
Design identity, governance & monitoring   25-30%   Hybrid identity, Conditional Access, PIM
                                                     Management Groups, Azure Policy, tagging
                                                     Azure Monitor, App Insights, Log Analytics
Design data storage solutions              25-30%   SQL DB vs MI vs VM, Cosmos DB design
                                                     Blob tiers, redundancy, Key Vault
                                                     Synapse / Fabric for analytics
Design business continuity solutions       10-15%   RTO/RPO, AZ vs multi-region, ASR
                                                     Azure Backup, immutable vault, PITR

Compute Decision Tree

Need OS-level control / specific software?     → VM + Availability Zones
Containerised microservices / complex scaling? → AKS
Web/API managed PaaS (.NET/Java/Node/Python)?  → App Service
Event-driven, short burst, unknown traffic?    → Azure Functions
Stateful orchestration (long workflow)?        → Durable Functions / Logic Apps
GPU workloads (AI/ML training, rendering)?     → N-series VMs
Burst compute (CI runners, batch)?             → Spot VMs or Batch
Virtual desktop for remote workers?            → Azure Virtual Desktop

SLA achieved by:
Single VM + Premium SSD:  99.9%
Availability Set (2+ VMs): 99.95%
Availability Zones (2+ AZs): 99.99%
Multi-region active-active: 99.999%

Storage Decision Matrix

Use case                              → Service
Blobs (unstructured: media, docs)     → Blob Storage (Hot/Cool/Archive)
VM Disks                              → Managed Disks (Premium SSD for prod)
Shared file system (SMB/NFS)          → Azure Files
Hybrid file server caching            → Azure File Sync
Relational OLTP (managed)             → Azure SQL Database
SQL Server features (SQL Agent, etc.) → Azure SQL Managed Instance
Full SQL Server control               → SQL Server on Azure VM
NoSQL document / JSON                 → Azure Cosmos DB (SQL API)
Global multi-region, multi-write      → Azure Cosmos DB
Data warehouse / OLAP                 → Azure Synapse Analytics / Fabric
Enterprise message queue              → Azure Service Bus
Event streaming (IoT, telemetry)      → Azure Event Hubs
Session cache, query cache            → Azure Cache for Redis
Graph data                            → Cosmos DB (Gremlin API)

Networking Quick Reference

Topology:           Hub-Spoke (most enterprises) or Azure Virtual WAN (large scale)
On-prem private:    ExpressRoute (dedicated, consistent) preferred over VPN
On-prem encrypted:  VPN Gateway (encrypted over internet, simpler, cheaper)
Global HTTP/HTTPS:  Azure Front Door (anycast, WAF, CDN, private link origins)
Global any protocol: Azure Traffic Manager (DNS-based, any protocol)
Regional HTTP:      Application Gateway (L7, WAF, SSL termination, URL routing)
Regional TCP/UDP:   Azure Load Balancer Standard (L4, zone-redundant)
Private PaaS:       Private Endpoint + Private DNS Zone (production standard)
Admin access:       Azure Bastion (no public IP on VMs, browser-based RDP/SSH)
Perimeter firewall: Azure Firewall Premium (IDPS, TLS inspection, centralised)
DDoS:               DDoS Network Protection (per-VNet for public-facing workloads)

Business Continuity Quick Reference

RTO/RPO targets → design pattern:
Hours/Hours:     Azure Backup + ASR warm standby (passive failover)
Minutes/Minutes: Auto-failover group + Traffic Manager / Front Door failover
Near-zero/zero:  Active-active multi-region (Cosmos DB multi-write, Front Door)

SQL DR:
Single region: Business Critical (3 replicas, AZ) — 99.995% SLA
Multi-region:  Auto-failover group → automatic DNS failover
               RPO: < 5 seconds, RTO: < 30 seconds

VM DR:
Azure Site Recovery: replicate VMs to secondary region
Recovery Plans: orchestrate failover order (DC first, then app, then DB)
Test failover: validate quarterly without impacting production

Backup essentials:
Soft delete: 14 days recovery after accidental deletion
Immutable vault (locked): ransomware protection (cannot be disabled)
Cross-region restore: restore backup in secondary region for DR validation
PITR (SQL): restore to any second in the retention window

Well-Architected Framework Pillars Summary

Pillar           Key services / patterns
Reliability      AZs, multi-region, auto-failover, health probes, retry+circuit breaker
Security         Zero Trust, Private Endpoints, Key Vault, Managed Identity, WAF
Cost             Reserved Instances, right-sizing, lifecycle tiers, PaaS over IaaS
Operational Exc. IaC (Bicep), CI/CD, Azure Monitor, runbooks, deployment slots
Performance      Autoscale, CDN (Front Door), Redis cache, read replicas, AKS HPA

Top 10 Tips

  1. AZ-305 is a DESIGN exam, not a configuration exam — every question presents a business scenario asking you to RECOMMEND, COMPARE, or EVALUATE. The key skill is justifying WHY one service fits better than another — not knowing every configuration option.

  2. Infrastructure is 30–35% of the exam — the largest domain. Master: compute service selection (VM vs App Service vs AKS vs Functions), hub-spoke networking, Private Endpoints, Key Vault, and Azure Front Door vs Traffic Manager.

  3. Always anchor decisions in the Well-Architected Framework — every design choice has a WAF justification. "Why Private Endpoints?" → Security pillar (defence in depth). "Why Reserved Instances?" → Cost pillar. love WAF-framed answers.

  4. Private Endpoints are the production networking standard — any PaaS service (SQL, Storage, Key Vault, App Service) in production should have a Private Endpoint and disabled public access. Service Endpoints are insufficient for most enterprise requirements.

  5. Managed Identity eliminates credential management — any Azure resource authenticating to another Azure service should use Managed Identity. No secrets in code, no rotation risk, no leakage. Always the first recommendation for any "how do you authenticate" question.

  6. Hub-Spoke topology with Azure Firewall — the standard enterprise network architecture. Hub contains shared services (Firewall, VPN/ExpressRoute GW, Bastion). Spokes are workload VNets peered to hub. All spoke-to-spoke traffic inspected via hub Azure Firewall.

  7. RTO and RPO drive the HA design — before answering any HA/DR question, establish the RTO and RPO requirements. RTO/RPO in hours → warm standby + ASR. Minutes → auto-failover groups + Traffic Manager. Near-zero → active-active multi-region with Cosmos DB.

  8. SQL Database vs SQL Managed Instance — the most commonly confused SQL decision. SQL Database = most managed, best for greenfield. SQL Managed Instance = when you need SQL Agent, cross-database queries, Service Broker, or CLR. Know this boundary cold.

  9. Azure Front Door vs Traffic Manager — Front Door is a reverse proxy (HTTP/HTTPS, WAF, CDN, anycast, SSL offload). Traffic Manager is DNS-based routing (any protocol, no proxy, no WAF). Global web apps → Front Door. Non-HTTP global routing → Traffic Manager.

  10. The Cloud Adoption Framework provides the governance story — management group hierarchy, landing zones, Azure Policy at scale, and subscription design are all CAF concepts. For any "how do you govern 50 subscriptions" question, the answer starts with CAF landing zones.



Power Platform ALM & DevOps (PL-400) Complete Guide

Power Platform ALM & DevOps (PL-400) — Complete Guide

Technical Design · Dataverse Plugins · PCF Controls · Custom Connectors · ALM · Pipelines · Power Platform CLI · Azure DevOps · Scenarios · Cheat Sheet

Top Hashtags: #PowerPlatformALM, #PL400, #PowerPlatformDeveloper, #PowerPlatform, #DataversePlugins, #PCFControls, #PowerPlatformCLI, #ALMPowerPlatform, #LowCode, #MicrosoftDeveloper


Table of Contents

  1. Exam Overview & Technical Design (10–15%)
  2. Build Power Platform Solutions (10–15%)
  3. Extend the User Experience (10–15%)
  4. Extend the Platform — Plugins & APIs (40–45%)
  5. Power Platform ALM & DevOps
  6. Develop Integrations (5–10%)
  7. Scenario-Based Questions
  8. Cheat Sheet — Quick Reference

1. Exam Overview & Technical Design (10–15%)

PL-400 Exam at a Glance

The PL-400 Microsoft Power Platform Developer exam validates expertise in designing, developing, testing, and troubleshooting Power Platform solution components using extension points. Candidates should be proficient in Microsoft Power Platform services, C#, JavaScript, TypeScript, JSON, HTML, RESTful Web APIs, and Azure.

Skill Domain Exam Weight
Create a technical design 10–15%
Build Power Platform Solutions 10–15%
Implement Power Apps improvements 5–10%
Extend the user experience 10–15%
Extend the platform 40–45% ← LARGEST
Develop integrations 5–10%

Key fact: Extend the platform is 40–45% of the exam. Prioritise: Dataverse plugins (C#), custom connectors, Power Platform CLI, ALM, and Dataverse Web API.


How do you decide between low-code and pro-code in Power Platform?

Decision framework: Low-code vs Pro-code

Always try low-code first (cheaper, faster, maintainable by citizen devs):
→ Standard Power Apps (canvas or model-driven) for the UI
→ Power Automate cloud flows for automation
→ Dataverse built-in business rules for simple field validation
→ Dataverse calculated/rollup columns for derived values

When to escalate to pro-code:
→ Complex server-side business logic requiring ACID transactions
  → Use: Dataverse C# Plugin
→ Performance-critical real-time validation across multiple tables
  → Use: Dataverse C# Plugin (synchronous pre-operation)
→ Custom UI control behaviour beyond standard controls
  → Use: PCF (Power Apps Component Framework) control
→ Connect to a service with no existing connector
  → Use: Custom Connector (OpenAPI/Swagger definition)
→ Long-running or complex operations with Azure infrastructure
  → Use: Azure Function called from plugin or Power Automate
→ Batch data processing or complex ETL
  → Use: Azure Function or Azure Logic App

Architecture decision checklist:
1. Can it be done with out-of-the-box Power Platform? → Do it low-code
2. Needs server-side logic run on create/update/delete? → C# Plugin
3. Needs a custom UI component? → PCF Control
4. Needs external system not in connector gallery? → Custom Connector
5. Needs Azure compute, ML, or AI? → Azure Function + connector
6. Needs human approval + complex branching? → Power Automate flow

What is the Power Platform solution structure and types?

Solutions:
→ Container for customisations and components that can be transported
  between environments (Dev → Test → Production)
→ Everything deployable must be inside a solution

Solution types:
Unmanaged solution:
→ Used in DEVELOPMENT environments only
→ All components editable
→ No version enforcement
→ Contains references to components (not copies)
→ Can be exported as unmanaged (.zip) for source control

Managed solution:
→ Deployed to TEST and PRODUCTION environments
→ Components are READ-ONLY (cannot be edited directly)
→ Version enforced — only newer version can replace it
→ Uninstalling removes all components cleanly
→ Created by exporting an unmanaged solution as managed

Key decision: NEVER develop in managed solution environments
             NEVER deploy unmanaged solutions to production

Solution components (what can be inside a solution):
Tables (entities), columns (fields), relationships
Views, forms, charts, dashboards
Cloud flows, canvas apps, model-driven apps
Security roles, environment variables, connection references
Plugins, custom workflow activities, web resources
PCF controls, custom connectors, dataflows

Environment variables:
→ Store environment-specific configuration (URLs, keys, thresholds)
→ Different values per environment (Dev/Test/Prod) without code change
→ Types: string, number, boolean, JSON, data source
→ Referenced in: flows, plugins, canvas apps, cloud flows

Connection references:
→ Abstractly reference a connector (instead of hardcoding a connection)
→ When solution deployed: admin configures the actual connection
→ Enables the same flow to connect to different accounts per environment
→ Always use connection references instead of direct connections in flows

2. Build Power Platform Solutions (10–15%)

How do you configure Dataverse tables and relationships for complex solutions?

Dataverse table types:
Standard table:    most tables — PK is a GUID, full Dataverse features
Activity table:    special type for activities (email, task, appointment)
                   enables activity timeline, sender/recipient model
Elastic table:     designed for high-volume, time-series or IoT data
                   uses Cosmos DB underneath — horizontal scale
Virtual table:     maps to external data source (SQL, OData, etc.)
                   appears as Dataverse table, queries external system live

Relationship types:
One-to-many (1:N): one parent row → many child rows
                   creates a lookup column on the child table
                   e.g., Account (1) → Contacts (N)

Many-to-many (N:N): via an intersect (junction) table
                    e.g., Contacts ↔ Marketing Lists
                    Use: relate records without a hierarchy

Polymorphic lookup: lookup that can point to multiple table types
                    e.g., Regarding field on Activity — can be Account,
                    Contact, Lead, or Opportunity

Column types:
Simple: Text, Number, Date, Choice (picklist), Lookup, File/Image
Calculated: value computed from formula (no triggers, re-computed on read)
Rollup: aggregate from child records (sum, count, avg, min, max)
        runs on a schedule (hourly) — not real-time
Formula: Power Fx expressions (Dataverse-native, new)
Auto-number: automatically incremented unique identifiers

Business rules (no-code server + client logic):
→ Set field values, show/hide fields, validate values, lock fields
→ Scope: entity (server-side — always runs) or form (client-side only)
→ Cannot: query other tables, send emails, perform HTTP calls
→ For complex logic: use plugins instead

What is Power Fx and how is it used in Power Apps?

Power Fx:
→ Open-source, declarative formula language based on Excel-style syntax
→ Used in: Canvas Apps, Model-driven app command bars, Dataverse formulas
→ Reactive: formulas automatically recalculate when data changes

Key Power Fx patterns for PL-400:

Collections and local data:
// Create a collection:
ClearCollect(colProducts,
  Filter(Products, Status.Value = "Active"))

// Add a record:
Collect(colProducts, {Name: "Widget", Price: 9.99})

// Update in collection:
UpdateIf(colProducts, ProductId = varSelectedId,
  {Price: 12.99})

Delegation (CRITICAL concept):
→ Delegation = Dataverse handles the filter/sort on the server
→ Non-delegable = Power Apps downloads ALL records (max 2000) then filters
→ Always use delegable functions for large datasets

Delegable functions with Dataverse:
  Filter(Products, Status.Value = "Active")  ← DELEGABLE (server-filters)
  Filter(Products, Text(Price) = "9.99")    ← NOT DELEGABLE (Text() blocks delegation)

Workaround for non-delegable:
  Set(varProducts, Filter(Products, Status.Value = "Active"))  // 2000 limit
  Filter(varProducts, Price > 5)  // then filter collection client-side

Named formulas (performance):
// Named formula — evaluated once, cached:
ProductCount = CountRows(Filter(Products, Status = "Active"))

With() — local variables for readability:
With({total: Sum(OrderLines, Quantity * UnitPrice),
       tax: Sum(OrderLines, Quantity * UnitPrice) * 0.20},
  total + tax)

3. Extend the User Experience (10–15%)

What is the Power Apps Component Framework (PCF) and when do you use it?

PCF (Power Apps Component Framework):
→ Framework for building custom reusable UI components for Power Apps
→ Replaces HTML web resources for custom UI in model-driven apps
→ Available in: Model-driven apps AND canvas apps
→ Built with: TypeScript, React (optional), CSS, third-party libraries

Two types of PCF controls:
Field controls: replace a single field on a form
  → e.g., custom date picker, colour picker, rating stars, signature pad
Dataset controls: replace a grid/subgrid view
  → e.g., custom chart, calendar view, map view, Kanban board

PCF control lifecycle:
init(context, notifyOutputChanged, state, container):
  → Called once when control loads
  → Receive context (current field value, user info, client info)
  → Set up the component

updateView(context):
  → Called whenever field value OR model changes
  → Re-render the component with new data

getOutputs():
  → Called when notifyOutputChanged() is triggered
  → Return the new value to write back to Dataverse

destroy():
  → Called when control removed from DOM — cleanup

PCF TypeScript example (field control — custom star rating):
import { IInputs, IOutputs } from "./generated/ManifestTypes";

export class StarRating implements ComponentFramework.StandardControl<IInputs, IOutputs> {
  private _value: number;
  private _notifyOutputChanged: () => void;
  private _container: HTMLDivElement;

  init(context: ComponentFramework.Context<IInputs>,
       notifyOutputChanged: () => void,
       state: ComponentFramework.Dictionary,
       container: HTMLDivElement): void {
    this._notifyOutputChanged = notifyOutputChanged;
    this._container = container;
    this._value = context.parameters.rating.raw ?? 0;
    this.renderStars();
  }

  updateView(context: ComponentFramework.Context<IInputs>): void {
    this._value = context.parameters.rating.raw ?? 0;
    this.renderStars();
  }

  getOutputs(): IOutputs {
    return { rating: this._value };
  }

  private renderStars(): void {
    this._container.innerHTML = '';
    for (let i = 1; i <= 5; i++) {
      const star = document.createElement('span');
      star.textContent = i <= this._value ? '★' : '☆';
      star.onclick = () => { this._value = i; this._notifyOutputChanged(); };
      this._container.appendChild(star);
    }
  }

  destroy(): void { /* cleanup */ }
}

PCF CLI commands:
pac pcf init --namespace Contoso --name StarRating --template field
npm install
npm run build
pac pcf push --publisher-prefix contoso  # push to dev environment for testing

What are client-side scripts (JavaScript) in model-driven apps?

Client-side scripts (JavaScript) in model-driven apps:
→ Customise form behaviour without server-side code
→ Registered on: form events, field events, grid events, ribbon buttons

Key form events:
OnLoad:   fires when form loads (initialise, set defaults, hide/show)
OnSave:   fires when user saves (validate, prevent save, show warnings)
OnChange: fires when a specific field value changes

Xrm object model — the JavaScript API for model-driven apps:
// Get a field value:
const accountName = formContext.getAttribute("name").getValue();

// Set a field value:
formContext.getAttribute("creditlimit").setValue(50000);

// Show/hide a field:
formContext.getControl("telephone1").setVisible(false);

// Make a field required:
formContext.getAttribute("emailaddress1")
  .setRequiredLevel("required");

// Display a form notification:
formContext.ui.setFormNotification(
  "Credit limit exceeded!",
  "WARNING",   // INFO | WARNING | ERROR
  "creditWarning"
);

// Prevent save with validation error:
function validateForm(executionContext) {
  const formContext = executionContext.getFormContext();
  const email = formContext.getAttribute("emailaddress1").getValue();
  if (email && !email.includes("@")) {
    executionContext.getEventArgs().preventDefault();
    formContext.ui.setFormNotification(
      "Invalid email address", "ERROR", "emailError");
  }
}

// Call Dataverse Web API from client script:
Xrm.WebApi.retrieveMultipleRecords("account",
  "?$select=name,creditlimit&$filter=statecode eq 0&$top=10")
  .then(result => {
    result.entities.forEach(account => {
      console.log(account.name, account.creditlimit);
    });
  });

// Open a related record:
Xrm.Navigation.openForm({
  entityName: "contact",
  entityId: contactId
});

4. Extend the Platform — Plugins & APIs (40–45%)

What is a Dataverse Plugin and how does it work?

Dataverse Plugin:
→ C# class library that executes server-side business logic
→ Triggered by Dataverse events (create, update, delete, retrieve, etc.)
→ Runs within the Dataverse platform — no external hosting needed
→ Enforced regardless of HOW data is modified (Portal, API, App, import)

Plugin architecture:
public class AccountValidationPlugin : IPlugin
{
  public void Execute(IServiceProvider serviceProvider)
  {
    // Get plugin execution context:
    var context = (IPluginExecutionContext)
      serviceProvider.GetService(typeof(IPluginExecutionContext));

    // Get organisation service for Dataverse operations:
    var serviceFactory = (IOrganizationServiceFactory)
      serviceProvider.GetService(typeof(IOrganizationServiceFactory));
    var service = serviceFactory.CreateOrganizationService(context.UserId);

    // Get tracing service for logging:
    var tracingService = (ITracingService)
      serviceProvider.GetService(typeof(ITracingService));

    // Only run on Account Create:
    if (context.MessageName != "Create" ||
        context.PrimaryEntityName != "account") return;

    // Get the target entity (the record being created):
    var target = (Entity)context.InputParameters["Target"];

    // Business logic:
    if (target.Contains("creditlimit") &&
        target.GetAttributeValue<Money>("creditlimit").Value > 1000000)
    {
      throw new InvalidPluginExecutionException(
        "Credit limit cannot exceed £1,000,000. Please contact Finance.");
    }

    tracingService.Trace("Account validation passed");
  }
}

What are plugin pipeline stages and what happens in each?

Plugin execution pipeline (for each Dataverse message):

PRE-VALIDATION (stage 10):
→ Fires BEFORE the database transaction begins
→ Can query external systems without transaction overhead
→ Can prevent the operation with an exception
→ Use for: input validation, external system checks
→ No access to pre-entity image (record doesn't exist yet for Create)

PRE-OPERATION (stage 20):
→ Fires WITHIN the database transaction (BEFORE write to DB)
→ Can modify the Target entity before it's saved
→ Changes to Target automatically saved — no explicit update needed
→ Use for: set default field values, enrich the record before save
→ Pre-entity image: available for Update (the record as it was before)

MAIN OPERATION (stage 30):
→ The actual Dataverse platform operation (Core operation)
→ Cannot register custom plugins at this stage
→ Platform writes data to the database here

POST-OPERATION (stage 40):
→ Fires WITHIN the database transaction (AFTER write to DB)
→ Record is now in the database
→ Can trigger follow-on operations (create related records, send emails)
→ Post-entity image: available (record as it is now after save)
→ Changes require explicit service.Update() calls
→ Use for: creating related records, triggering workflows, notifications

Execution mode:
Synchronous: runs in the same thread as the user operation
             failure rolls back the entire transaction
             use for: validation, data enrichment (pre-operation)
Asynchronous: runs in background queue (separate thread)
              failure does NOT roll back the user operation
              use for: send emails, call external APIs, non-critical operations

Plugin images (snapshots of the record):
Pre-image:  snapshot of the record BEFORE the operation
Post-image: snapshot of the record AFTER the operation
Both must be registered in Plugin Registration Tool (specify column names)

What are Custom Workflow Activities?

Custom Workflow Activities:
→ C# classes that extend Power Automate / Classic Workflows
→ Appear as steps that can be added in Power Automate (classic) or
  used in low-code flows
→ Use when: need reusable server-side C# logic callable from flows

Structure:
public class CalculateDiscount : CodeActivity
{
  // Input parameters from the flow:
  [Input("Order Amount")]
  [RequiredArgument]
  public InArgument<decimal> OrderAmount { get; set; }

  [Input("Customer Tier")]
  public InArgument<string> CustomerTier { get; set; }

  // Output parameters returned to the flow:
  [Output("Discount Percentage")]
  public OutArgument<decimal> DiscountPercentage { get; set; }

  protected override void Execute(CodeActivityContext context)
  {
    var amount = OrderAmount.Get(context);
    var tier = CustomerTier.Get(context) ?? "Standard";

    decimal discount = tier switch {
      "Gold"     => amount > 10000 ? 0.15m : 0.10m,
      "Silver"   => 0.05m,
      _          => 0m
    };

    DiscountPercentage.Set(context, discount);
  }
}

Plugin vs Custom Workflow Activity:
Plugin:                    Triggered by Dataverse events (create/update/delete)
Custom Workflow Activity:  Called from Power Automate or classic workflows
                           Reusable across multiple flows
                           User provides input, receives output

What is the Dataverse Web API and how do you use it?

Dataverse Web API:
→ RESTful OData v4 API for all Dataverse operations
→ Endpoint: https://{org}.api.crm11.dynamics.com/api/data/v9.2/
→ Authentication: OAuth 2.0 bearer token (Entra ID)
→ Available from: external apps, Azure Functions, Logic Apps, plugins

CRUD operations:

CREATE (POST):
POST /api/data/v9.2/accounts
Content-Type: application/json
{
  "name": "Contoso Ltd",
  "creditlimit": 50000,
  "telephone1": "0207 123 4567",
  "primarycontactid@odata.bind": "/contacts(00000000-0000-0000-0000-000000000001)"
}
Response: 201 Created + OData-EntityId header with new record GUID

READ (GET with OData):
GET /api/data/v9.2/accounts
  ?$select=name,creditlimit,telephone1
  &$filter=creditlimit gt 10000 and statecode eq 0
  &$orderby=name asc
  &$top=50
  &$expand=primarycontactid($select=fullname,emailaddress1)

UPDATE (PATCH — partial update):
PATCH /api/data/v9.2/accounts({accountId})
Content-Type: application/json
{
  "creditlimit": 75000,
  "telephone1": "0207 999 8888"
}

DELETE:
DELETE /api/data/v9.2/accounts({accountId})

Upsert (PATCH with If-None-Match: *):
PATCH /api/data/v9.2/accounts({alternateKey})
If-None-Match: *   ← create if not exists, update if exists
{ ... }

Batch operations (multiple requests in one HTTP call):
POST /api/data/v9.2/$batch
Content-Type: multipart/mixed; boundary=batch_boundary
→ Reduces round trips for bulk operations
→ Supports change sets (atomic — all succeed or all fail)

Execute actions:
POST /api/data/v9.2/WinOpportunity
{
  "OpportunityClose": {
    "subject": "Won deal",
    "opportunityid@odata.bind": "/opportunities({opportunityId})"
  },
  "Status": 3
}

5. Power Platform ALM & DevOps

What is Power Platform ALM and what are the key components?

ALM (Application Lifecycle Management) for Power Platform:
→ The practices and tools for managing the full lifecycle of Power Platform
  solutions from development through test to production

Core ALM components:
Environments:     Dev → Test/UAT → Production
                  Each environment is isolated with its own data
Solutions:        the packaging unit — transport customisations between envs
Source control:   Git repository stores solution source files
CI/CD pipelines:  automate build, test, deploy using Azure DevOps or GitHub

Environment strategy (recommended):
Developer (individual):   each developer has their own personal dev env
                          unmanaged solution, develop and unit test here
Test/QA:                  shared managed solution deployment, QA testing
UAT/Pre-prod:             user acceptance testing with business users
Production:               live environment — only managed solutions deployed

Key principles:
→ Never develop in Test or Production environments
→ Always deploy MANAGED solutions to non-dev environments
→ Every customisation lives inside a solution (nothing ungrouped)
→ Source control is the source of truth — not any environment
→ Automated deployment pipelines — no manual exports/imports

What is the Power Platform CLI and what can you do with it?

Power Platform CLI (pac CLI):
→ Command-line tool for Power Platform development and ALM
→ Install: dotnet tool install -g Microsoft.PowerApps.CLI
→ Or: via npm, VS Code extension, or Power Platform Tools installer

Key commands:

Authentication:
pac auth create --url https://org.crm.dynamics.com  # add environment
pac auth list                                        # list connections
pac auth select --index 1                            # switch environment

Solution management:
pac solution list                                    # list all solutions in env
pac solution export --path ./exported --name MySolution --managed false
pac solution import --path ./MySolution.zip
pac solution clone --name MySolution --outputDirectory ./src
# Clone: creates local folder structure with each component in separate file

Environment management:
pac env list                                         # list environments
pac env select --environment "Dev Environment"
pac env create --name "My-Dev-Env" --type Developer --region unitedstates

PCF development:
pac pcf init --namespace Contoso --name MyControl --template field --run-npm-install
pac pcf push --publisher-prefix contoso              # push to environment
pac pcf version --updateType patch                   # bump version

Plugin development:
pac plugin init                                      # scaffold plugin project
pac plugin push                                      # push assembly to env

Canvas app:
pac canvas pack --sources ./CanvasApp --msapp ./output/MyApp.msapp
pac canvas unpack --msapp ./MyApp.msapp --sources ./unpacked  # for source control

Solution unpack (for source control — YAML format):
pac solution unpack --zipFile MySolution.zip --folder ./src --processCanvasApps
# Creates: Entities, Flows, Workflows, WebResources folders
# Each component in separate file → meaningful Git diffs

Solution pack (from source control → zip for deployment):
pac solution pack --zipFile MySolution_managed.zip --folder ./src --managed

How do you implement CI/CD for Power Platform with Azure DevOps?

Power Platform Build Tools (Azure DevOps):
→ Microsoft-provided extension: "Power Platform Build Tools"
→ Provides DevOps tasks for: export, import, publish, check solution

Complete CI/CD pipeline pattern:

CI Pipeline (Build) — triggered on PR to main branch:
1. Power Platform: Who Am I            # validate connection
2. Power Platform: Export Solution     # export unmanaged from Dev env
3. Power Platform: Unpack Solution     # unpack to source files
4. Power Platform: Check Solution      # run Solution Checker (code quality)
5. Power Platform: Pack Solution       # pack as MANAGED for deployment
6. Publish Artifact                    # publish zip for release pipeline

CD Pipeline (Release) — triggered on merge to main:
Stage 1 — Deploy to Test:
  Power Platform: Import Solution      # import managed solution to Test
  Power Platform: Publish Customizations
  Run automated tests (EasyRepro / PAC test)

Stage 2 — Deploy to UAT (manual approval gate):
  Approval required from business stakeholder
  Power Platform: Import Solution      # import managed to UAT

Stage 3 — Deploy to Production (manual approval):
  Approval required from release manager
  Power Platform: Import Solution      # import managed to Production

Azure DevOps YAML pipeline example:
trigger:
  branches:
    include: [main]

pool:
  vmImage: 'windows-latest'

steps:
- task: PowerPlatformToolInstaller@2
  displayName: 'Install Power Platform Build Tools'

- task: PowerPlatformWhoAmi@2
  displayName: 'Authenticate to Dev'
  inputs:
    authenticationType: 'PowerPlatformSPN'
    PowerPlatformSPN: 'PP-Dev-SPN'   # service connection

- task: PowerPlatformExportSolution@2
  displayName: 'Export Solution'
  inputs:
    authenticationType: 'PowerPlatformSPN'
    PowerPlatformSPN: 'PP-Dev-SPN'
    SolutionName: 'ContosoSolution'
    SolutionOutputFile: '$(Build.ArtifactStagingDirectory)/ContosoSolution.zip'
    Managed: false

- task: PowerPlatformUnpackSolution@2
  displayName: 'Unpack Solution'
  inputs:
    SolutionInputFile: '$(Build.ArtifactStagingDirectory)/ContosoSolution.zip'
    SolutionTargetFolder: '$(Build.SourcesDirectory)/src/solution'

- task: PowerPlatformChecker@2
  displayName: 'Solution Checker'
  inputs:
    authenticationType: 'PowerPlatformSPN'
    PowerPlatformSPN: 'PP-Dev-SPN'
    FilesToAnalyze: '$(Build.ArtifactStagingDirectory)/ContosoSolution.zip'
    RuleSet: '0ad12346-e108-40b8-a956-9a373e9909ff'  # AppSource ruleset
    ErrorLevel: High
    FailOnPowerAppsCheckerAnalysisError: true

What are Power Platform Pipelines (in-product ALM)?

Power Platform Pipelines (Fabric/Managed Pipelines):
→ Microsoft-native ALM solution BUILT INTO Power Platform (no Azure DevOps needed)
→ Configured entirely in the Power Platform Admin Centre / Pipelines app
→ Designed for: citizen developer ALM, simpler governance without DevOps expertise

Pipeline concepts:
Host environment:   dedicated environment that hosts the pipeline configuration
                    + deployment history + logs
Linked environments: Dev → Test → Production environments connected to the pipeline
Stage gates:         manual approvals before deployment to each stage

Features:
→ One-click deployment from within the Maker experience (no DevOps knowledge needed)
→ Pre-deployment and post-deployment steps (run cloud flows)
→ Deployment notes and audit trail
→ Environment variable configuration per stage
→ Connection reference configuration per stage
→ Deployment history with rollback capability

When to use Pipelines vs Azure DevOps:
Power Platform Pipelines:
  → Simpler orgs, citizen developer-led ALM
  → No DevOps team or expertise
  → Quick setup, managed by Power Platform admins
  → Limited customisation and branching support

Azure DevOps / GitHub Actions:
  → Complex orgs, pro-dev teams
  → Need automated testing, branch strategies, complex approval gates
  → Full control over pipeline steps, scripts, and tooling
  → Integrates with broader DevOps ecosystem (work items, boards, repos)

6. Develop Integrations (5–10%)

What are Custom Connectors and how do you build one?

Custom Connector:
→ Wrapper around a REST or SOAP API that makes it usable in
  Power Automate, Power Apps, and Logic Apps
→ Defined using OpenAPI (Swagger) specification
→ Supports: authentication (API key, OAuth 2.0, Basic, Windows)

Building a custom connector:
Step 1 — Define the API:
→ Import an existing OpenAPI (Swagger) .json or .yaml file
→ Or: enter Postman collection URL
→ Or: build manually in the custom connector wizard

Step 2 — Configure authentication:
No auth:      public APIs, development
API key:      pass key in header or query string
OAuth 2.0:    full OAuth flow — user signs in to the external service
Generic OAuth: for non-standard OAuth flows

Step 3 — Define actions and triggers:
Action:    user calls this to perform an operation (POST order, GET customer)
Trigger:   external system notifies Power Automate of an event
  Polling trigger: Power Automate polls your API every N seconds
  Webhook trigger: external system sends event to Power Automate's webhook URL

Step 4 — Test and share:
→ Test each operation in the connector wizard
→ Share with specific users or the whole organisation
→ Submit to Microsoft AppSource for public listing (ISV scenario)

Custom connector policies (inline code transformations):
→ Policy templates: convert XML to JSON, set header, route request
→ Use for: API quirks without modifying the backend API
   e.g., external API uses XML response — policy converts to JSON
         external API URL changes per customer — route policy selects URL

Certification (certified connectors):
→ Submit to Microsoft for certification review
→ Certified connectors available to all Power Platform users
→ Requirements: public documentation, test accounts, SLA

What are Virtual Tables in Dataverse?

Virtual Tables (Virtual Entities):
→ Dataverse tables that map to EXTERNAL data sources
→ Data stays in the external system — Dataverse is a logical view
→ CRUD operations pass through to the external system in real time
→ Appear and behave like regular Dataverse tables (queries, forms, views)

How they work:
User opens a canvas app → retrieves Virtual Table records
→ Dataverse calls the Virtual Table Data Provider
→ Data Provider calls the external system (SQL, OData, REST API)
→ External system returns data → Dataverse returns to the user
→ No data copy, no sync, always live

Virtual Table providers:
OData v4 provider:  built-in — connect to any OData v4 endpoint
Custom provider:    C# plugin registered as a data provider
                    handles: retrieve, retrieve multiple, create, update, delete

Custom virtual table data provider (C#):
public class ExternalProductsPlugin : IPlugin
{
  public void Execute(IServiceProvider serviceProvider)
  {
    var context = serviceProvider.GetService<IPluginExecutionContext>();

    if (context.MessageName == "RetrieveMultiple")
    {
      // Call external API (e.g., legacy ERP system):
      var products = CallERPSystemAPI();

      // Convert to EntityCollection:
      var entityCollection = new EntityCollection();
      foreach (var product in products)
      {
        var entity = new Entity("contoso_product");
        entity["contoso_name"] = product.Name;
        entity["contoso_price"] = product.Price;
        entityCollection.Entities.Add(entity);
      }
      context.OutputParameters["BusinessEntityCollection"] = entityCollection;
    }
  }
}

When to use virtual tables:
→ External data that must appear in Power Apps without ETL
→ Read-only reference data from a legacy system
→ Data too large to copy into Dataverse (compliance, licensing)
→ Real-time inventory, pricing from ERP — must not have sync lag

Limitations:
→ No offline access (requires live connection to external system)
→ No Dataverse-native features (rollup columns, duplicate detection)
→ Performance depends on external system response time
→ No audit history (Dataverse doesn't own the data)

7. Scenario-Based Questions

Scenario: A business rule is causing performance issues on a large Account table. How do you fix it?

Problem analysis:
→ Business rules with entity scope run server-side on every record operation
→ Complex business rules with many conditions slow down create/update operations
→ Business rules cannot be indexed or optimised by the database

Diagnosis steps:
1. Review business rules: Dataverse table → Business Rules → check scope
   Entity-scoped rules run on every create/update from ANY source
2. Check if rule fires unnecessarily: is it checking conditions that
   rarely change? e.g., checking account name on every address update
3. Profile with XrmToolBox → Plugin Trace Log: measure execution time

Solutions:
Option A — Convert to Pre-Operation Plugin (best performance):
→ Plugin only fires on specific messages you configure
→ Can check context.InputParameters to abort early if not relevant
→ Can use pre-image to only run if relevant fields actually changed:

var preImage = context.PreEntityImages["PreImage"];
var oldCreditLimit = preImage.GetAttributeValue<Money>("creditlimit")?.Value ?? 0;
var newCreditLimit = ((Entity)context.InputParameters["Target"])
                      .GetAttributeValue<Money>("creditlimit")?.Value ?? 0;
if (oldCreditLimit == newCreditLimit) return; // skip if not changed

Option B — Scope to Form only (not entity):
→ Changes business rule scope from Entity to Form
→ Rule only fires client-side when user is on the form
→ Not enforced for API/import operations — use only for UX enhancement

Option C — Replace with Calculated/Formula column:
→ If rule is setting a derived value, use a calculated column instead
→ Evaluated on read, not on write — zero write overhead

Scenario: Design a plugin that auto-creates a follow-up Task when an Opportunity is marked as Won.

public class OpportunityWonPlugin : IPlugin
{
  public void Execute(IServiceProvider serviceProvider)
  {
    var context = (IPluginExecutionContext)
      serviceProvider.GetService(typeof(IPluginExecutionContext));
    var serviceFactory = (IOrganizationServiceFactory)
      serviceProvider.GetService(typeof(IOrganizationServiceFactory));
    var service = serviceFactory.CreateOrganizationService(context.UserId);
    var tracingService = (ITracingService)
      serviceProvider.GetService(typeof(ITracingService));

    // Only on Opportunity Update:
    if (context.MessageName != "Update" ||
        context.PrimaryEntityName != "opportunity") return;

    var target = (Entity)context.InputParameters["Target"];

    // Only if statecode changed to Won (statecode = 1):
    if (!target.Contains("statecode")) return;
    var stateCode = target.GetAttributeValue<OptionSetValue>("statecode")?.Value;
    if (stateCode != 1) return;  // 1 = Won

    // Get the Opportunity for owner details:
    var opportunity = service.Retrieve("opportunity",
      context.PrimaryEntityId,
      new ColumnSet("name", "ownerid", "estimatedclosedate"));

    tracingService.Trace($"Creating follow-up task for Won Opportunity: {opportunity["name"]}");

    // Create a follow-up Task (async plugin — won't block the user):
    var task = new Entity("task");
    task["subject"] = $"Follow-up: {opportunity["name"]} - Won!";
    task["description"] = "Send welcome pack and onboarding information";
    task["regardingobjectid"] = new EntityReference("opportunity", context.PrimaryEntityId);
    task["ownerid"] = opportunity["ownerid"];  // assign to opp owner
    task["scheduledend"] = DateTime.UtcNow.AddDays(3);  // due in 3 days
    task["prioritycode"] = new OptionSetValue(2);  // High priority

    var taskId = service.Create(task);
    tracingService.Trace($"Task created: {taskId}");
  }
}

Plugin registration:
Entity:   opportunity
Message:  Update
Stage:    Post-Operation (40)   ← after save, record is in DB
Mode:     Asynchronous           ← non-blocking, failure won't rollback opp win
Images:   none required (statecode is in Target)

Scenario: How do you implement a full ALM pipeline for a Power Platform solution from scratch?

Full ALM implementation steps:

1. Environment setup (Power Platform Admin Centre):
   Developer environment:  personal, each developer has their own
   Test environment:       shared, F-licence or Managed Environment
   Production:             Managed Environment, DLP policies enforced

2. Solution setup (Developer environment):
   → Create publisher: prefix "ctr", publisher name "Contoso"
   → Create solution: "ContosoSales" (associate with publisher)
   → ALL customisations inside this solution only
   → Set up Environment Variables for connection strings
   → Use Connection References for all flows

3. Azure DevOps setup:
   → Create repository: ContosoSales
   → Install "Power Platform Build Tools" extension from Marketplace
   → Create service connections (SPN):
     PP-Dev-SPN:  service principal with Maker role on Dev env
     PP-Test-SPN: service principal with Maker role on Test env
     PP-Prod-SPN: service principal with Maker role on Prod env

4. CI pipeline (builds/PRs):
   → Export solution (unmanaged) from Dev
   → Unpack to source files (pac solution unpack)
   → Run Solution Checker (fail on Critical errors)
   → Pack as Managed
   → Publish artifact (managed zip)

5. CD pipeline (deployments):
   → Import managed to Test (automatic)
   → Configure environment variables + connection references
   → Run smoke tests (EasyRepro UI tests or manual test checklist)
   → Approval gate: QA lead approves
   → Import managed to Production
   → Approval gate: Release manager approves

6. Developer daily workflow:
   git checkout -b feature/TICKET-123-account-validation
   # develop in personal dev environment
   # export + unpack + commit
   git push origin feature/TICKET-123-account-validation
   # raise PR → CI pipeline runs → solution checker passes → peer review
   # merge to main → CD pipeline deploys to Test

Scenario: A custom connector is failing authentication to a third-party API. How do you debug?

Systematic debugging approach:

Step 1 — Test the API directly (Postman or curl):
→ Call the API directly with the same credentials to confirm the API works
→ If direct call fails: the API itself has an issue (key expired, endpoint changed)
→ If direct call succeeds: the connector configuration is the issue

Step 2 — Check connector authentication configuration:
→ Power Automate → Custom Connectors → [connector] → Security tab
→ Verify: auth type, client ID, client secret, token URL, scopes

Step 3 — Check Power Automate run history:
→ Open failed flow run → expand the action → check error body
→ Common errors:
  401 Unauthorized:   wrong credentials, expired token, missing scope
  403 Forbidden:      correct auth but insufficient permissions
  400 Bad Request:    malformed request body — check connector definition

Step 4 — Test in connector wizard:
→ Custom Connectors → [connector] → Test tab
→ Create a new connection → test each operation individually
→ See raw request/response in test results

Step 5 — Check token refresh:
→ OAuth connectors: check if refresh token has expired (user must re-auth)
→ API key: check if key was rotated in the external system
→ Solution: update the connection in Power Automate with new credentials

Step 6 — Check environment variable / connection reference:
→ If connection reference used: verify the connection reference is configured
  correctly in the target environment
→ Power Platform Admin Centre → Connections → verify active connections

8. Cheat Sheet — Quick Reference

PL-400 Exam Domain Priorities

Domain                          Weight    Focus Areas
Extend the platform             40-45%    C# plugins, Custom connectors,
                                          Dataverse Web API, Virtual tables,
                                          Power Platform CLI, ALM
Create a technical design       10-15%    Low-code vs pro-code decision,
                                          Architecture, Environment variables
Extend the user experience      10-15%    PCF controls (TypeScript),
                                          Client-side JavaScript (Xrm API)
Build Power Platform solutions  10-15%    Solutions, Dataverse tables,
                                          Power Fx, Delegation
Develop integrations            5-10%     Custom connectors, Virtual tables,
                                          Azure Functions, OData
Implement Power Apps improvements 5-10%  Performance, delegation, formula optimisation

Plugin Pipeline Quick Reference

Stage    When                    Use for                    Transaction
10 Pre-  Before transaction      External validation,       No
   Valid                         external system checks
20 Pre-  In transaction,         Set defaults, enrich       Yes
   Op    before DB write         record before save
30 Main  The actual operation    Not available for plugins  Yes
40 Post- In transaction,         Create related records,    Yes
   Op    after DB write          trigger integrations, notify

Execution modes:
Sync:  blocks user, failure rolls back → use for validation
Async: background queue, failure does NOT roll back → use for notifications

Plugin images:
Pre-image (update/delete):  record state BEFORE this operation
Post-image (create/update): record state AFTER this operation
Register in Plugin Registration Tool with specific column names

Dataverse Web API Quick Reference

Base URL: https://{org}.api.crm11.dynamics.com/api/data/v9.2/

OData operations:
GET    /accounts?$select=name&$filter=...    → retrieve multiple
GET    /accounts({id})                        → retrieve single
POST   /accounts                              → create
PATCH  /accounts({id})                        → update (partial)
DELETE /accounts({id})                        → delete

OData query options:
$select    choose columns    $select=name,creditlimit
$filter    filter rows       $filter=creditlimit gt 50000
$orderby   sort              $orderby=name asc
$top       limit rows        $top=100
$expand    join related      $expand=primarycontactid($select=fullname)
$count     count results     $count=true

Association:
POST /accounts({id})/contact_customer_accounts/$ref
{ "@odata.id": "/contacts({contactId})" }

Unbound action example:
POST /WinOpportunity
{ "OpportunityClose": {...}, "Status": 3 }

ALM Quick Reference

Solution types:
Unmanaged: development environment only — editable
Managed:   non-dev environments — read-only, clean uninstall

Environment strategy:
Developer (personal) → Test (shared) → UAT → Production

Power Platform CLI key commands:
pac auth create --url {envUrl}
pac solution clone --name {solution}
pac solution pack --zipFile ./out.zip --managed
pac solution import --path ./out.zip
pac pcf init, push, version
pac canvas pack, unpack

Azure DevOps tasks:
PowerPlatformToolInstaller    → install tools
PowerPlatformWhoAmi           → validate connection
PowerPlatformExportSolution   → export from source env
PowerPlatformUnpackSolution   → unpack to source files
PowerPlatformChecker          → run solution checker
PowerPlatformImportSolution   → deploy to target env
PowerPlatformPublishCustomizations → publish after import

Pipeline pattern: Export → Unpack → Check → Pack → Import

PCF Control Lifecycle

Method          When called                       Use for
init()          Control loads (once)              Setup, subscribe to events
updateView()    Value or context changes           Re-render the control
getOutputs()    After notifyOutputChanged()        Return new value to Dataverse
destroy()       Control removed from DOM           Cleanup, unsubscribe

Key context properties:
context.parameters.{fieldName}.raw        → current field value
context.parameters.{fieldName}.formatted  → formatted display value
context.userSettings.userName             → current user name
context.client.isOffline                  → offline mode?
context.mode.isControlDisabled            → read-only?
context.factory.requestRender()           → request re-render

Top 10 Tips

  1. Extend the platform = 40–45% of PL-400 — this single domain is almost half the exam. Master: C# plugins (all pipeline stages), Dataverse Web API (OData), custom connectors, Power Platform CLI, and ALM. Everything else is secondary to this domain.

  2. Plugin stage selection is heavily tested — Pre-Operation (stage 20) to modify the Target before save (no explicit Update needed). Post-Operation (stage 40) to create related records (explicit Update needed). Pre-Validation (stage 10) for external system checks without a transaction. Know which stage to use for which scenario.

  3. Sync vs Async plugins = validation vs notification — synchronous plugins block the user and roll back on failure (use for validation). Asynchronous plugins run in background and don't roll back the triggering operation (use for sending emails, calling external APIs).

  4. Always try low-code first — the exam tests your ability to decide when pro-code is justified. If a Business Rule or Power Automate flow can solve the problem, that's the right answer. Only escalate to plugins when genuinely needed.

  5. Connection references and environment variables are the ALM foundation — flows must use Connection References (not direct connections) and variables must use Environment Variables (not hardcoded values). Without these, solutions break when moved between environments.

  6. pac solution unpack enables meaningful Git diffs — without unpacking, a solution is a binary ZIP file that can't be diff'd or reviewed. Unpacking creates YAML/XML files per component — pull requests show exactly what changed.

  7. PCF controls replace HTML web resources — HTML web resources are legacy. PCF controls are the modern extension point for custom UI, with proper TypeScript support, lifecycle management, and canvas app compatibility.

  8. Virtual tables = no data copy, always live — data stays in the external system and Dataverse proxies queries through. No sync lag, no duplication, but no offline access and no Dataverse-native features (rollup columns, duplicate detection).

  9. Power Platform Pipelines vs Azure DevOps — Pipelines for simpler orgs and citizen developer ALM (built into Power Platform, no DevOps expertise needed). Azure DevOps for complex orgs needing branch strategies, automated testing, and full DevOps integration.

  10. Solution Checker is mandatory before production — Solution Checker (run via Azure DevOps task or pac solution check) detects performance and reliability issues in plugins, JavaScript, and flows. Failing on Critical issues as a pipeline gate is the enterprise best practice for preventing broken deployments.



Microsoft Fabric Analytics Engineer (DP-600) Complete Guide

Microsoft Fabric Analytics Engineer (DP-600) — Complete Guide

OneLake · Lakehouse · Data Warehouse · Semantic Models · DAX · Dataflows · Pipelines · DirectLake · Real-Time Analytics · Scenarios · Cheat Sheet

Top Hashtags: #MicrosoftFabric, #DP600, #FabricAnalytics, #OneLake, #DataEngineering, #SemanticModels, #DAX, #PowerBI, #DataAnalytics, #MicrosoftCertified


Table of Contents

  1. Exam Overview & Microsoft Fabric Platform
  2. Plan, Implement & Manage Analytics Solutions (10–15%)
  3. Prepare and Serve Data (40–45%)
  4. Implement and Manage Semantic Models (20–25%)
  5. Explore and Analyze Data (20–25%)
  6. Scenario-Based Questions
  7. Cheat Sheet — Quick Reference

1. Exam Overview & Microsoft Fabric Platform

DP-600 Exam at a Glance

The DP-600 Microsoft Fabric Analytics Engineer exam validates expertise in designing, creating, and managing enterprise-scale analytics solutions using Microsoft Fabric. Candidates should have subject matter expertise in designing, creating, and managing analytical assets such as semantic models, warehouses, or lakehouses — and be able to query and analyze data using SQL, KQL, and DAX.

Skill Domain Exam Weight
Plan, implement, and manage a solution for data analytics 10–15%
Prepare and serve data 40–45%
Implement and manage semantic models 20–25%
Explore and analyze data 20–25%

Key fact: Candidates should have expertise in designing, creating, and deploying enterprise-scale data analytics solutions using Microsoft Fabric components like lakehouses, data warehouses, notebooks, dataflows, data pipelines, semantic models, and reports.


What is Microsoft Fabric and how does it unify the analytics platform?

Microsoft Fabric is a unified, end-to-end analytics platform that brings together data engineering, data integration, data warehousing, data science, real-time analytics, and business intelligence in a single SaaS offering.

Microsoft Fabric workloads (all in one platform):
┌─────────────────────────────────────────────────────────┐
│                    Microsoft Fabric                     │
├──────────┬──────────┬──────────┬──────────┬─────────────┤
│  Data    │  Data    │   Data   │   Data   │  Real-Time  │
│Engineering│Warehouse│ Science  │  Factory │  Analytics  │
│(Lakehouse│(SQL      │(Notebooks│(Pipelines│(Eventstream │
│ Spark)   │ Analytics│ ML)      │ Dataflows│  KQL DB)    │
├──────────┴──────────┴──────────┴──────────┴─────────────┤
│              Power BI (Business Intelligence)           │
├─────────────────────────────────────────────────────────┤
│                    OneLake (Storage)                    │
└─────────────────────────────────────────────────────────┘

Key unification points:
→ OneLake: single storage layer — all workloads read/write the same data
→ One security model: workspace-level and item-level permissions
→ One governance: Microsoft Purview integration for data lineage and cataloguing
→ One billing: Fabric capacity (F-SKUs) covers all workloads
→ No data silos: Lakehouse tables appear in Warehouse, Power BI, Notebooks

What is OneLake and why is it the foundation of Fabric?

OneLake:
→ Fabric's single, unified data lake — built on Azure Data Lake Storage Gen2
→ One lake per Fabric tenant (like OneDrive for data)
→ All Fabric items (Lakehouses, Warehouses, KQL Databases) store
  their data in OneLake automatically
→ Delta Parquet format: all table data stored as Delta Lake tables
→ No data duplication: Warehouse tables are Delta tables in OneLake;
  same files accessible from Lakehouse SQL endpoint, Notebooks, Power BI

OneLake hierarchy:
Tenant → Workspaces → Items (Lakehouses, Warehouses, KQL DBs)
         (like folders)   (each item = a folder in OneLake)

Shortcuts:
→ Virtual links to data stored OUTSIDE OneLake (or in other Fabric items)
→ No data movement — shortcut references the data in place
→ Sources: Azure Data Lake Storage Gen2, Amazon S3, Google Cloud Storage,
            another Fabric workspace (cross-workspace shortcut)
→ Use case: query S3 data from a Fabric Lakehouse without copying it

OneLake file explorer:
→ Windows app that mounts OneLake as a network drive (like OneDrive sync)
→ Browse and manage OneLake files directly from Windows Explorer

What are the Fabric experience roles and when do you use each?

Lakehouse (Data Engineering experience):
→ Combines a data lake (unstructured files) + a data warehouse (Delta tables)
→ Two endpoints automatically created:
  SQL analytics endpoint: T-SQL read-only access to Delta tables
  Semantic model: default Power BI dataset over all Delta tables
→ Storage: Files section (raw files) + Tables section (Delta Lake tables)
→ Compute: Apache Spark notebooks or Spark pipelines
→ Use for: raw data ingestion, Spark transformations, medallion architecture

Data Warehouse (Data Warehouse experience):
→ Enterprise SQL data warehouse with full T-SQL read/write support
→ Columnar storage, MPP (Massively Parallel Processing)
→ Supports: INSERT, UPDATE, DELETE, CREATE TABLE, stored procedures
→ Storage: Delta tables in OneLake (same as Lakehouse!)
→ Cross-database queries: query Lakehouse SQL endpoint from Warehouse
→ Use for: structured analytics, BI reporting, complex SQL workloads

KQL Database (Real-Time Analytics experience):
→ Optimised for time-series, log, and telemetry data
→ Query with Kusto Query Language (KQL)
→ Ingestion from: Eventstream, Azure Event Hubs, IoT Hub, REST API
→ Use for: IoT telemetry, application logs, clickstream, security events

Notebooks (Data Science / Data Engineering):
→ Interactive Jupyter notebooks with Spark (Python, Scala, R, SQL)
→ Connect to Lakehouse as default storage
→ Libraries: pandas, scikit-learn, PyTorch, MLflow, SynapseML
→ Use for: data exploration, ML model training, complex transformations

Dataflows Gen2 (Data Factory experience):
→ Power Query-based ETL — low-code data transformation
→ 150+ connectors (SQL Server, Salesforce, SharePoint, REST APIs)
→ Output: write transformed data to Lakehouse, Warehouse, or other destinations
→ Use for: self-service ETL, non-technical data preparation

Data Pipelines (Data Factory experience):
→ Orchestration tool for complex multi-step data workflows
→ Activities: Copy Data, Notebook, Stored Procedure, Dataflow, Web, Delete
→ Scheduling, monitoring, error handling
→ Use for: orchestrating end-to-end data movement and transformation

2. Plan, Implement & Manage Analytics Solutions (10–15%)

How do you design a Fabric workspace strategy?

Workspace = collaboration unit in Fabric (like a project or team space)

Workspace strategy patterns:
1. Environment-based (recommended for enterprise):
   Prod-Sales-WS    → production reports and models
   Dev-Sales-WS     → development and testing
   Staging-Sales-WS → pre-production validation

2. Domain-based:
   Finance-WS   → all Finance items (Lakehouse, Warehouse, Reports)
   HR-WS        → all HR items
   Marketing-WS → all Marketing items

3. Hybrid: domain + environment
   Finance-Dev-WS / Finance-Prod-WS

Workspace capacity:
→ Each workspace assigned to a Fabric capacity (F2, F4, F8, F16 ... F2048)
→ Capacity determines: compute power, storage throughput, concurrency
→ Premium capacity (P-SKUs) or Fabric capacity (F-SKUs) required for
  most Fabric features (Lakehouse, Warehouse, etc.)
→ Free tier: limited features (Power BI reports only)

F-SKU vs P-SKU:
F-SKUs (Fabric capacities):    pay-as-you-go or reserved
P-SKUs (Power BI Premium):     legacy, being phased out in favour of F-SKUs
F64 ≈ P1 in compute equivalence

Workspace roles:
Admin:       full control — manage members, settings, delete workspace
Member:      create/edit/delete all items, manage permissions on items
Contributor: create/edit items they own, cannot delete others' items
Viewer:      read-only access — cannot create or edit items

What is Git integration in Microsoft Fabric and why does it matter?

Fabric Git integration:
→ Connect a Fabric workspace to an Azure DevOps Git repository (or GitHub)
→ Fabric items serialised to YAML/JSON definition files in the repo
→ Enables: version control, code review (PRs), branching strategies,
           rollback, CI/CD deployment pipelines

Supported items for Git sync:
→ Semantic models (Power BI datasets)
→ Reports
→ Notebooks
→ Data pipelines
→ Dataflows Gen2
→ Lakehouses (metadata, not data)
→ Warehouses (metadata, not data)

Git workflow:
Developer → edits item in Fabric workspace
         → commits changes to feature branch in Git
         → raises Pull Request for review
         → approved PR merges to main branch
         → CI/CD pipeline deploys to production workspace

Deployment Pipelines (Fabric native CI/CD):
→ Move Fabric items between Dev → Test → Production workspaces
→ Deployment rules: map workspace-specific settings
  (e.g., different data source connection strings per environment)
→ Compare stages: see what changed between Dev and Production
→ Supported with both manual promotion and automated triggers

Git + Deployment Pipelines together:
Git = source control (version history, branching, review)
Deployment Pipelines = promotion workflow (Dev → Staging → Prod)
Best practice: use both together for enterprise Fabric governance

How do you implement security in Microsoft Fabric?

Fabric security layers:

1. Workspace-level security (Entra ID groups):
   Assign Entra ID security groups to workspace roles
   All items in workspace inherit the workspace permissions

2. Item-level permissions (share individual items):
   Share a specific report or semantic model without granting
   workspace access → viewer can only see that one item

3. OneLake data access roles (preview):
   Grant read access to specific folders/tables in a Lakehouse
   without exposing the entire Lakehouse

4. Semantic model Row-Level Security (RLS):
   Define roles in the semantic model with DAX filter rules
   Users see only the data their role permits
   e.g., UK managers see only UK rows; US managers see only US rows

5. Object-level security (OLS):
   Restrict access to specific tables or columns in a semantic model
   Hidden tables/columns don't appear in the model for restricted users

6. Warehouse column-level security:
   T-SQL GRANT/DENY on specific columns
   Restrict PII columns from certain users

7. Dynamic data masking:
   Mask sensitive column values (e.g., show **** instead of SSN)
   without changing the underlying data

Microsoft Purview integration:
→ Fabric data assets automatically catalogued in Purview
→ Sensitivity labels flow from Purview to Fabric items
→ Data lineage: track data flow from source through transformations to report
→ Scan Fabric workspaces for sensitive information (PII, financial data)

3. Prepare and Serve Data (40–45%)

What is the Medallion Architecture in Fabric and how do you implement it?

Medallion Architecture (Bronze → Silver → Gold):
→ Layered data organisation pattern for lakehouses
→ Each layer increases data quality and refinement

Bronze (Raw) layer:
→ Land raw data exactly as received — no transformation
→ Append-only: never delete or modify
→ Formats: CSV, JSON, Parquet, Delta, images, whatever the source sends
→ Serves as the "source of truth" — can reprocess from here if needed
→ Implementation: Files section of Lakehouse or Bronze Lakehouse

Silver (Cleansed/Conformed) layer:
→ Validated, deduplicated, standardised data
→ Apply business rules: correct data types, handle nulls, join related data
→ Delta Lake tables: schema enforced, history preserved
→ Implementation: Silver Lakehouse (separate workspace item)

Gold (Business/Serving) layer:
→ Aggregated, business-ready data modelled for analytics
→ Star schema: fact and dimension tables
→ Optimised for query performance (ZORDER, vacuum, partitioning)
→ Powers semantic models, reports, dashboards
→ Implementation: Gold Lakehouse or Fabric Data Warehouse

Fabric implementation:
Bronze Lakehouse → Notebook/Pipeline → Silver Lakehouse
                                     → Notebook/Pipeline → Gold Lakehouse/Warehouse
                                                         → Semantic Model → Power BI

Shortcuts between layers:
→ Silver Lakehouse shortcuts Bronze files — no data copy
→ Gold Warehouse shortcuts Silver Lakehouse tables
→ Same physical data, different logical views

What are the data ingestion options in Microsoft Fabric?

1. Data Pipelines (Copy Data activity):
→ Orchestrated, scheduled data movement
→ 100+ connectors: SQL Server, Oracle, Salesforce, REST APIs, SAP, etc.
→ Supports: full load, incremental load (watermark or Change Data Capture)
→ Configure in: Fabric workspace → New item → Data Pipeline
→ Best for: production ETL, scheduled batch ingestion, complex orchestration

2. Dataflows Gen2 (Power Query):
→ Low-code ETL with 150+ connectors
→ Transformations: filter, merge, pivot, custom columns, M language
→ Output destinations: Lakehouse, Warehouse, Azure SQL, etc.
→ Best for: self-service ETL, non-technical data prep, connecting to
  data sources that pipelines don't natively support well

3. Notebooks (Spark):
→ Python, PySpark, Scala, R, Spark SQL
→ Full programmatic control — complex transformations, ML prep
→ Write directly to Lakehouse Delta tables
→ Best for: complex transformations, ML feature engineering, custom logic

4. Eventstream (Real-Time Analytics):
→ Ingest streaming data: Azure Event Hubs, IoT Hub, Kafka, custom endpoints
→ Route, transform, and fan-out streams to: KQL Database, Lakehouse, Warehouse
→ Best for: IoT telemetry, application events, real-time data feeds

5. Mirroring:
→ Continuously replicate data FROM external databases INTO OneLake
→ Sources: Azure SQL Database, Azure Cosmos DB, Snowflake, Azure Databricks
→ Near real-time replication — no pipeline needed
→ Data appears as Delta tables in Fabric — queryable immediately
→ Best for: zero-ETL analytics on operational data

6. COPY INTO (Warehouse T-SQL):
→ Bulk load Parquet or CSV files from OneLake into Warehouse tables
COPY INTO dbo.Sales
FROM 'https://onelake.dfs.fabric.microsoft.com/.../Files/sales.parquet'
WITH (FILE_TYPE = 'PARQUET');

How do Delta Lake tables work in Fabric Lakehouses?

Delta Lake:
→ Open-source storage layer that adds ACID transactions to data lakes
→ All Lakehouse Tables are Delta Lake tables in Fabric
→ File format: Parquet data files + _delta_log (transaction log as JSON)

Delta Lake key features:
ACID transactions:   reads and writes are atomic — no partial data
Schema enforcement:  reject data that doesn't match table schema
Schema evolution:    add/rename columns without rewriting data
Time travel:         query historical versions of the table
  SELECT * FROM lakehouse.Sales VERSION AS OF 5  -- version 5
  SELECT * FROM lakehouse.Sales TIMESTAMP AS OF '2025-01-01'
Upserts (MERGE):     update + insert in one operation

Creating and writing Delta tables in Notebooks:
# PySpark — write DataFrame as Delta table:
df.write.format("delta").mode("overwrite").saveAsTable("Sales")

# Append mode:
df.write.format("delta").mode("append").saveAsTable("Sales")

# Upsert (MERGE):
from delta.tables import DeltaTable
deltaTable = DeltaTable.forName(spark, "Sales")
deltaTable.alias("target").merge(
  updates_df.alias("source"),
  "target.SaleId = source.SaleId"
).whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()

Delta maintenance:
OPTIMIZE: compacts small files into larger Parquet files (better query performance)
  OPTIMIZE lakehouse.Sales
  OPTIMIZE lakehouse.Sales ZORDER BY (Region, SaleDate)  -- sort for better pruning

VACUUM: removes old files no longer needed (frees storage)
  VACUUM lakehouse.Sales RETAIN 168 HOURS  -- keep 7 days of history

V-ORDER: Fabric-specific write-time optimisation for faster reads
  Automatically applied when writing Delta tables in Fabric
  Uses sorting, row groups, and compression for Power BI performance

What is the Lakehouse SQL Analytics Endpoint?

SQL Analytics Endpoint:
→ Auto-generated, read-only T-SQL interface over all Delta tables
  in a Lakehouse — no configuration needed
→ Every Lakehouse gets one automatically
→ Accessible via: SQL Server Management Studio (SSMS), Azure Data Studio,
                  any JDBC/ODBC tool using the connection string

What you can do:
→ Query Delta tables with T-SQL (SELECT, JOINs, aggregations)
→ Create SQL views, stored procedures, functions
→ Share with business users for SQL-based reporting
→ Connect from Power BI Desktop (DirectQuery or Import)

What you cannot do:
→ INSERT, UPDATE, DELETE on Lakehouse tables from SQL endpoint
  (data modifications only via Spark notebooks or pipelines)
→ CREATE TABLE with new data (read-only for tables created by Spark)

Cross-item queries (Fabric virtual network of items):
-- Query Lakehouse table from Warehouse:
SELECT s.*, c.CustomerName
FROM [Sales_Lakehouse].[dbo].[FactSales] s
JOIN [DimCustomers_Warehouse].[dbo].[Customer] c
  ON s.CustomerId = c.CustomerId

-- Cross-workspace query:
SELECT * FROM [OtherWorkspace_Lakehouse].[dbo].[FactOrders]
-- Requires appropriate permissions on the other workspace

What is the Fabric Data Warehouse and how does it differ from Lakehouse?

Feature              Lakehouse                    Data Warehouse
Primary compute:     Apache Spark (notebooks)    T-SQL (SQL engine)
DML support:         Spark only (not SQL EP)      Full T-SQL (INSERT/UPDATE/DELETE)
Schema:              Schema-on-read (flexible)   Schema-on-write (enforced)
Best for:            Big data, ML, raw data       BI reporting, structured analytics
Tables format:       Delta Lake in OneLake         Delta Lake in OneLake (same!)
SQL endpoint:        Read-only (SQL analytics EP) Full read/write T-SQL
Stored procedures:   No                           Yes
Cross-DB queries:    Via shortcuts                Via cross-database T-SQL
Streaming ingest:    Via Spark notebooks          Via pipelines or COPY INTO
Star schema:         Can implement                Recommended approach

Fabric Warehouse T-SQL capabilities:
→ CREATE TABLE, CREATE VIEW, CREATE PROCEDURE, CREATE FUNCTION
→ INSERT, UPDATE, DELETE, MERGE (full DML)
→ Distributed transactions across warehouse tables
→ Column-level security and row-level security
→ Automatic statistics maintenance for query optimisation
→ Cross-database queries: query Lakehouse SQL endpoint tables
→ COPY INTO: bulk load from OneLake files

When to use Warehouse over Lakehouse:
→ Traditional BI/DWH team comfortable with T-SQL
→ Need stored procedures and full DML for complex ETL
→ Strict schema enforcement required
→ Multiple data consumers using BI/reporting tools via SQL
→ Existing SQL Server/Synapse Analytics migration

4. Implement and Manage Semantic Models (20–25%)

What is a Fabric semantic model (Power BI dataset)?

Semantic model:
→ The business logic layer on top of data — defines metrics, relationships,
  hierarchies, and calculations for business users and reports
→ All Power BI reports connect to a semantic model
→ Every Lakehouse and Warehouse auto-creates a default semantic model
→ Custom semantic models can be created manually in Power BI Desktop
  and published to Fabric workspace

Semantic model components:
Tables:        imported or DirectQuery/DirectLake views of data
Relationships: define how tables join (one-to-many, many-to-many)
Measures:      DAX calculations (Total Sales, YoY Growth, Profit Margin)
Calculated columns: row-level computed values in a table
Hierarchies:   drill paths (Year → Quarter → Month → Day)
Perspectives:  subset views of the model for different audiences
Row-Level Security (RLS): data access filters per user/role

Connectivity modes:
Import:      data copied INTO the semantic model (fast, but has refresh lag)
DirectQuery: queries go to source on each visual interaction (live, slower)
DirectLake:  Fabric-only: reads Delta files directly — combines speed
             of Import with freshness of DirectQuery
Composite:   mix of Import and DirectQuery tables in one model

What is DirectLake mode and why is it a game-changer?

DirectLake (Fabric-exclusive):
→ Power BI reads Delta Parquet files from OneLake directly
→ No data import, no DirectQuery round-trips to a database
→ Transcoding: Delta Parquet → in-memory columnar format on first access
→ Parquet data stays hot in memory — subsequent queries serve from memory

Performance comparison:
Import:       Fast (in-memory), but data is stale until next refresh
DirectQuery:  Always fresh, but every visual = live SQL query (slower)
DirectLake:   Fast AS Import (in-memory after first load) AND fresh
              AS DirectQuery (always reads latest Delta files)

How DirectLake works:
1. Delta table updated (by pipeline, notebook, or streaming)
2. Power BI report loads — no import/refresh needed
3. On query: Fabric engine reads Delta files, transcodes to memory
4. Subsequent queries: served from in-memory transcoded data
5. When Delta table updated again: cache invalidated, re-read on next query

DirectLake limitations (when it falls back to DirectQuery):
→ Complex DAX not optimisable in DirectLake engine
→ Non-Delta data sources
→ Exceeds row/column limits for the capacity tier

DirectLake framing (parquet file optimisation):
→ Framing: Fabric creates an optimised version of Delta table for DL
→ Run manually: Semantic model → Refresh → Frame table
→ V-ORDER on write automatically helps DirectLake performance
→ Avoid excessively wide tables (many columns) — increases frame size

What are the key DAX patterns for DP-600?

Measures vs Calculated Columns:
Calculated column:  evaluated row by row at data refresh time
                    stored in the model, increases model size
                    use for: row-level values used in slicers/filters
Measure:            evaluated at query time based on filter context
                    not stored, always calculated on the fly
                    use for: aggregations, KPIs, dynamic calculations
                    ALWAYS prefer measures over calculated columns

Common DAX functions tested:

CALCULATE — modify filter context:
Total Sales EU = CALCULATE([Total Sales], Region[Region] = "EU")
Sales LY = CALCULATE([Total Sales], SAMEPERIODLASTYEAR(Date[Date]))

FILTER — row-by-row table filter:
High Value Sales = CALCULATE([Total Sales],
  FILTER(Sales, Sales[Amount] > 10000))

SUMX / AVERAGEX / MAXX (iterator functions):
Weighted Avg Price = DIVIDE(
  SUMX(Sales, Sales[Quantity] * Sales[UnitPrice]),
  SUM(Sales[Quantity]))

Time intelligence:
YTD Sales = TOTALYTD([Total Sales], Date[Date])
MTD Sales = TOTALMTD([Total Sales], Date[Date])
QTD Sales = TOTALQTD([Total Sales], Date[Date])
YoY Growth = DIVIDE([Total Sales] - [Sales LY], [Sales LY])
Rolling 12M = CALCULATE([Total Sales],
  DATESINPERIOD(Date[Date], LASTDATE(Date[Date]), -12, MONTH))

ALLSELECTED — include slicer filters but remove model filters:
Market Share = DIVIDE([Total Sales],
  CALCULATE([Total Sales], ALLSELECTED(Product[Category])))

Window functions (DAX 2023+):
Rank = RANKX(ALL(Product[Product]), [Total Sales],, DESC)
Running Total = CALCULATE([Total Sales],
  WINDOW(1, ABS, 0, REL, ORDERBY(Date[Date])))

Variables (always use for readability and performance):
YoY % =
VAR CurrentSales = [Total Sales]
VAR PriorYearSales = CALCULATE([Total Sales], SAMEPERIODLASTYEAR(Date[Date]))
VAR YoYGrowth = DIVIDE(CurrentSales - PriorYearSales, PriorYearSales)
RETURN
  IF(ISBLANK(PriorYearSales), BLANK(), YoYGrowth)

What are calculation groups and when do you use them?

Calculation groups:
→ Reusable sets of DAX calculations applied to any measure in the model
→ Solve the "many measures × many time periods" explosion
→ Defined in Tabular Editor (external tool — not native in Power BI Desktop)

Without calculation groups:
Sales Current Month, Sales Prior Month, Sales YoY, Sales YTD
Profit Current Month, Profit Prior Month, Profit YoY, Profit YTD
Revenue Current Month, Revenue Prior Month, Revenue YoY, Revenue YTD
→ 12 measures! For 10 base measures: 40 measures to maintain

With calculation groups:
Time Intelligence calculation group with items:
  Current:      SELECTEDMEASURE()  -- passes through whatever measure is selected
  Prior Month:  CALCULATE(SELECTEDMEASURE(), PREVIOUSMONTH(Date[Date]))
  YoY:          CALCULATE(SELECTEDMEASURE(), SAMEPERIODLASTYEAR(Date[Date]))
  YTD:          TOTALYTD(SELECTEDMEASURE(), Date[Date])

→ 4 calculation items × any number of base measures = instant time intelligence
→ User selects measure (Sales) + time item (YoY) → sees Sales YoY automatically

Other calculation group use cases:
→ Currency conversion: apply FX rates to any financial measure
→ Format strings: dynamic number formatting per calculation item
→ Scenario comparison: Actual vs Budget vs Forecast for any metric

Precedence:
Multiple calculation groups applied to one measure:
  Time Intelligence (precedence 10) applied first
  Currency (precedence 5) applied second
  Lower precedence number = applied later (outer context)

What is Row-Level Security (RLS) in semantic models?

Row-Level Security (RLS):
→ Restricts which data rows a user can see in reports
→ Defined as roles with DAX filter expressions in the semantic model
→ Works in Power BI service — enforced by the service (users cannot bypass)

Static RLS:
Create role "UK Sales" with table filter:
  Geography[Country] = "UK"
→ Users assigned to "UK Sales" role see ONLY UK rows

Dynamic RLS (most common for enterprise):
Create role "Employee Data Filter" with table filter:
  SalesPerson[Email] = USERPRINCIPALNAME()
→ User alice@contoso.com sees only rows where SalesPerson email = alice@contoso.com
→ One role definition, automatically filters per signed-in user

RLS implementation steps:
1. Power BI Desktop: Modelling → Manage Roles → New Role → add DAX filter
2. Publish to Fabric workspace
3. Semantic model → Security → assign Entra ID users/groups to each role
4. Test: Semantic model → Security → "Test as role" to verify filter

Object-Level Security (OLS):
→ Hide entire tables or columns from specific roles
→ Hidden objects don't appear in the model at all for restricted users
→ Defined in Tabular Editor (not Power BI Desktop)

RLS vs OLS:
RLS: controls WHICH ROWS a user sees
OLS: controls WHICH TABLES/COLUMNS a user sees
Use both together for comprehensive data security

5. Explore and Analyze Data (20–25%)

How do you query data in Fabric using SQL, KQL, and Notebooks?

T-SQL in Fabric Warehouse or Lakehouse SQL endpoint:
-- Aggregation:
SELECT
  d.Year,
  d.Quarter,
  p.Category,
  SUM(f.SalesAmount) AS TotalSales,
  COUNT(DISTINCT f.CustomerId) AS UniqueCustomers
FROM FactSales f
JOIN DimDate d ON f.DateKey = d.DateKey
JOIN DimProduct p ON f.ProductKey = p.ProductKey
GROUP BY d.Year, d.Quarter, p.Category
ORDER BY d.Year, d.Quarter, TotalSales DESC;

-- Window function:
SELECT
  ProductName,
  SalesAmount,
  RANK() OVER (PARTITION BY Category ORDER BY SalesAmount DESC) AS RankInCategory,
  SUM(SalesAmount) OVER (ORDER BY SaleDate) AS RunningTotal
FROM FactSales;

-- CROSS DATABASE query (Lakehouse → Warehouse):
SELECT w.ProductKey, w.UnitPrice, l.StockLevel
FROM [SalesWarehouse].[dbo].[DimProduct] w
JOIN [InventoryLakehouse].[dbo].[StockLevels] l
  ON w.ProductKey = l.ProductKey;

KQL in KQL Database or Real-Time Analytics:
// Query telemetry data:
SensorReadings
| where Timestamp > ago(1h)
| where DeviceId startswith "factory-"
| summarize avg(Temperature), max(Temperature), count()
    by bin(Timestamp, 5m), DeviceId
| order by Timestamp desc

// Detect anomalies:
AppLogs
| where Level == "Error"
| summarize ErrorCount=count() by bin(Timestamp, 1h), AppName
| where ErrorCount > 100  // alert threshold
| render timechart

PySpark in Notebooks:
# Read Delta table:
df = spark.read.format("delta").load("Tables/FactSales")

# Transformation:
from pyspark.sql.functions import col, sum, year, month

monthly_sales = df \
  .withColumn("Year", year(col("SaleDate"))) \
  .withColumn("Month", month(col("SaleDate"))) \
  .groupBy("Year", "Month", "ProductCategory") \
  .agg(sum("SalesAmount").alias("TotalSales")) \
  .orderBy("Year", "Month")

# Write to Gold Lakehouse:
monthly_sales.write \
  .format("delta") \
  .mode("overwrite") \
  .option("overwriteSchema", "true") \
  .saveAsTable("GoldLakehouse.MonthlySalesSummary")

What is Activator (Data Activator) in Fabric?

Microsoft Fabric Activator (formerly Data Activator):
→ No-code, event-driven alerting and automation within Fabric
→ Monitors data in real time and triggers actions when conditions are met

Key concepts:
Reflex:       the container for data monitoring rules and actions
Object:       business entity being monitored (Order, Device, Customer)
Trigger:      condition that fires an action (temperature > 80°C,
              order value > £10,000, stock level < 100 units)
Action:       what happens when trigger fires (Teams alert, Power Automate
              flow, send email, call Logic App)

Data sources for Activator:
→ Power BI report visuals (monitor a KPI on a report)
→ Eventstream (real-time streaming data)
→ Fabric Real-Time Analytics KQL query results

Example use cases:
→ Alert operations team via Teams when IoT temperature exceeds threshold
→ Notify sales manager when a high-value deal is marked as Lost in CRM
→ Trigger Power Automate reorder flow when inventory drops below min level
→ Alert finance when daily spend exceeds budget threshold

Integration:
→ Activator natively integrates with Teams, Outlook, Power Automate
→ Accessible in Fabric: New item → Activator (Reflex)

How do you implement data lineage and impact analysis in Fabric?

Lineage view in Fabric:
→ Visual map showing data flow from source to report
→ Access: Workspace → Lineage view (top-right button)
→ Shows: source → Dataflow/Pipeline → Lakehouse → Semantic Model → Reports

Impact analysis:
→ Before modifying a Lakehouse table or Warehouse view, check impact
→ Right-click a Fabric item → Impact analysis
→ Shows: all downstream items that depend on the selected item
→ Critical before: renaming tables, changing column types, deleting data

Scenarios requiring impact analysis:
→ Renaming a column in a Lakehouse table:
  Impact analysis → shows which semantic models use this column
  → update semantic models and reports before renaming
→ Modifying a dataflow output:
  Impact analysis → shows which Lakehouses and semantic models
  receive output from this dataflow
→ Deleting a data source connection:
  Impact analysis → shows all pipelines and dataflows using the connection

Microsoft Purview integration:
→ Fabric workspaces automatically scanned by Purview
→ Data map: lineage tracked across Fabric, Azure Data Factory, ADLS
→ Sensitivity labels: propagate from Purview to Fabric items
→ Data catalogue: business glossary terms linked to Fabric tables
→ Access governance: classify and govern Fabric data assets in Purview

6. Scenario-Based Questions

Scenario: Design a medallion lakehouse for a retail company's sales analytics platform.

Requirements: ingest from 5 source systems, daily batch + real-time,
Power BI reports for 500 business users, 3-year data history

Architecture:
SOURCE SYSTEMS:
  SQL Server (ERP), Salesforce CRM, Azure Event Hubs (clickstream),
  SharePoint (product catalogue), REST API (weather data)

BRONZE Lakehouse — Raw Zone:
  Data Pipeline (Copy Data):
    → SQL Server → incremental CDC → Delta tables
    → Salesforce → Dataflow Gen2 → CSV → Bronze Files
  Eventstream:
    → Azure Event Hubs → Eventstream → Bronze Lakehouse Delta table
  Notebooks schedule (daily):
    → REST API weather data → JSON files → Bronze Files

SILVER Lakehouse — Cleansed Zone:
  Notebooks (PySpark, scheduled by Pipeline):
    → Deduplicate customer records (fuzzy match on email/phone)
    → Standardise: date formats, currency, country codes
    → Validate: null checks, referential integrity, range checks
    → Upsert MERGE → Silver Delta tables
  Schema: FactSales (raw), DimCustomer (cleansed), DimProduct (cleansed)

GOLD Lakehouse — Serving Zone:
  Notebooks (PySpark):
    → Build star schema: FactSales, DimCustomer, DimProduct, DimDate
    → Aggregations: DailySalesSummary, MonthlyRegionalSales
    → Apply OPTIMIZE + ZORDER for query performance
    → V-ORDER enabled automatically in Fabric

SEMANTIC MODEL (DirectLake):
  → Connect to Gold Lakehouse via DirectLake mode
  → Define: measures (Total Sales, YoY Growth, Customer LTV)
  → RLS: regional managers see only their region's data
  → Calculation groups: Current / Prior Period / YTD / Rolling 12M

POWER BI REPORTS:
  → Executive dashboard (summary KPIs)
  → Regional sales report (filtered by RLS)
  → Real-time clickstream dashboard (DirectQuery to KQL Database)

Scenario: A DirectLake semantic model is running slowly. How do you diagnose and fix it?

  1. Check fallback to DirectQuery: Open Performance Analyzer in Power BI Desktop. If queries show "Direct Query" instead of "Storage Engine", DirectLake has fallen back. Common causes: complex DAX, non-Delta tables, unsupported features.

  2. Check V-ORDER: Fabric applies V-ORDER on Delta writes automatically. Verify the table was written using Spark in Fabric (not copied from external Delta without V-ORDER). Re-write the table via a Fabric notebook to apply V-ORDER.

  3. Run OPTIMIZE: small Delta files slow DirectLake framing.

    # In a Fabric notebook:
    spark.sql("OPTIMIZE GoldLakehouse.FactSales ZORDER BY (DateKey, ProductKey)")
    
  4. Reduce table width: DirectLake reads all columns in a table into memory. Remove unused columns from the Lakehouse table. Don't include columns that aren't needed in the semantic model.

  5. Check capacity: if the Fabric capacity is undersized (F2 or F4), increase to F8 or F16 — DirectLake in-memory transcoding is CPU-bound.

  6. Review DAX measures: complex measures force DirectLake to DirectQuery fallback. Simplify measures, use variables, pre-aggregate data in Gold layer rather than computing in DAX.

  7. Frame the tables: manually trigger framing for large tables: Semantic model → Settings → Scheduled refresh → Refresh now (includes framing).


Scenario: How do you implement a near real-time analytics solution for IoT sensor data?

Architecture:
IoT Devices
  ↓
[Azure IoT Hub]
  ↓
[Fabric Eventstream]
  ↓ (fan-out to multiple destinations simultaneously)
  ├── [KQL Database] ← real-time querying (last 30 days hot data)
  │     ↓
  │   [Activator Reflex] ← alert when temperature > 85°C
  │     ↓
  │   Teams alert to Operations team
  │
  └── [Lakehouse (Bronze)] ← all raw events for historical analysis
        ↓ (hourly Notebook job)
      [Lakehouse (Silver)] ← cleansed, validated sensor readings
        ↓ (daily Notebook job)
      [Lakehouse (Gold)] ← aggregated: hourly avg/max per device/location
        ↓
      [Semantic Model (DirectLake)] ← Power BI operational dashboard
        ↓
      [Power BI Real-Time Dashboard] ← auto-refresh every 5 minutes

KQL real-time query:
SensorReadings
| where Timestamp > ago(1h)
| summarize
    AvgTemp = avg(Temperature),
    MaxTemp = max(Temperature),
    AlertCount = countif(Temperature > 85)
  by bin(Timestamp, 5m), DeviceId, Location
| order by Timestamp desc

Activator trigger:
Object: DeviceId
Monitor: MaxTemp every 1 minute from Eventstream
Trigger: when MaxTemp > 85 for 3 consecutive readings
Action: Send Teams message to #ops-alerts channel

Scenario: Migrate a traditional Power BI Premium dataset to a Fabric semantic model with DirectLake.

Current state:
→ Power BI Premium workspace with large Import dataset
→ 50GB semantic model, 3-hour daily refresh window
→ 8 reports connected, 200 daily users
→ Data source: Azure Synapse Analytics SQL Pool

Target state:
→ Fabric workspace with DirectLake semantic model
→ No scheduled refresh needed — always current
→ Same reports and users, same RLS rules

Migration steps:

Step 1 — Set up Fabric workspace:
→ Create Fabric workspace on F64 capacity (equivalent to P1)
→ Configure Git integration with Azure DevOps

Step 2 — Create Fabric Lakehouse:
→ New Lakehouse "SalesLakehouse" in the workspace
→ Create Shortcut to the Synapse SQL Pool data
  OR
→ Create Data Pipeline: Synapse SQL → Lakehouse Delta tables (initial load)
→ Incremental pipeline: daily CDC from Synapse → Lakehouse (MERGE)

Step 3 — Optimise Lakehouse tables:
→ Notebook: OPTIMIZE + ZORDER on all large fact tables
→ Verify V-ORDER enabled (default in Fabric)
→ Apply partitioning on date columns for FactSales

Step 4 — Create DirectLake semantic model:
→ Fabric workspace → New semantic model
→ Select Lakehouse tables
→ Import existing DAX measures from old .pbix file
→ Re-create RLS roles (DAX filters identical)
→ Re-create calculation groups in Tabular Editor

Step 5 — Reconnect reports:
→ Each report: Change data source → new Fabric semantic model
→ Verify all visuals render correctly with DirectLake

Step 6 — Validate and switch:
→ Run Query Performance Analyzer — confirm DirectLake (not DirectQuery)
→ Compare report output: DirectLake vs old Import model (data parity check)
→ Update app/workspace access for 200 users
→ Decommission old Power BI Premium workspace

Benefits achieved:
→ Eliminated 3-hour refresh window — data always fresh
→ Reduced model size (no data import — reads Delta files directly)
→ Reduced costs: one Fabric capacity covers all workloads

7. Cheat Sheet — Quick Reference

DP-600 Exam Domain Quick Reference

Domain                                    Weight   Key Topics
Plan, implement & manage (10-15%)         Low      Workspaces, Git, Deployment
                                                   Pipelines, Security, Purview
Prepare and serve data (40-45%)           HIGHEST  Lakehouse, Warehouse, Delta,
                                                   Pipelines, Dataflows, Mirroring,
                                                   Medallion architecture
Implement & manage semantic models (20-25%) High   DAX, DirectLake, RLS, OLS,
                                                   Calculation groups, Relationships
Explore and analyze data (20-25%)         High     SQL, KQL, Spark notebooks,
                                                   Lineage, Impact analysis, Activator

Fabric Item Selection Guide

Need to...                                      → Use
Ingest streaming data in real time              → Eventstream
Query time-series/log data with KQL             → KQL Database
Big data transformation with Spark/Python       → Lakehouse + Notebooks
SQL-based ETL with full DML                     → Data Warehouse
Low-code ETL from 150+ sources                  → Dataflow Gen2
Orchestrate multi-step data workflows           → Data Pipeline
No-code alerting when data conditions met       → Activator
Replicate operational DB data to Fabric         → Mirroring
Create business metrics and KPIs                → Semantic Model
Self-service BI for business users              → Power BI Reports
Always-fresh Power BI (no scheduled refresh)    → DirectLake semantic model

Delta Lake Quick Reference

Write modes:
overwrite     → replace all data
append        → add new rows (no deduplication)
merge/upsert  → update existing + insert new (no duplicates)

Read historical data (time travel):
df = spark.read.format("delta").option("versionAsOf", 5).load(...)
df = spark.read.format("delta").option("timestampAsOf", "2025-01-01").load(...)

Maintenance:
OPTIMIZE tableName                 → compact small files
OPTIMIZE tableName ZORDER BY (col) → sort for query pruning
VACUUM tableName RETAIN 168 HOURS  → clean old files (keep 7-day history)

V-ORDER: Fabric automatically applies on write → faster DirectLake reads

DAX Quick Reference

Key patterns:
Total Sales = SUM(FactSales[SalesAmount])
Sales LY = CALCULATE([Total Sales], SAMEPERIODLASTYEAR(Date[Date]))
YoY % = DIVIDE([Total Sales] - [Sales LY], [Sales LY])
YTD = TOTALYTD([Total Sales], Date[Date])
Running Total = CALCULATE([Total Sales], DATESYTD(Date[Date]))
Market Share = DIVIDE([Total Sales], CALCULATE([Total Sales], ALL(Product)))
Dynamic = SELECTEDMEASURE()  ← inside calculation group item

Variables (always use):
Measure =
VAR x = [Base Measure]
VAR y = CALCULATE([Base Measure], PREVIOUSMONTH(Date[Date]))
RETURN DIVIDE(x - y, y)

Measures vs Calculated Columns:
Calculated columns: row-level, at refresh, stored → use sparingly
Measures:           query-time, dynamic, not stored → always prefer

Security Layers Summary

Layer                   Scope                   Tool
Workspace roles         All items in workspace  Entra ID groups → Admin/Member/
                                                Contributor/Viewer
Item-level sharing      Specific item           Share individual item
OneLake data access     Specific folder/table   OneLake data access roles (preview)
Semantic model RLS      Row-level data filter   DAX roles in semantic model
Semantic model OLS      Table/column hiding     Tabular Editor roles
Warehouse column sec.   Column-level access     T-SQL GRANT/DENY
Sensitivity labels      Item classification     Microsoft Purview labels

Medallion Architecture Summary

Layer    Storage          Purpose              Quality
Bronze   Files/Delta      Raw as-received      Untransformed
Silver   Delta tables     Cleaned, validated   Deduplicated, typed
Gold     Delta tables     Business-ready       Aggregated, star schema

Compute per layer:
Bronze: Pipelines (Copy Data) + Eventstream
Silver: Notebooks (PySpark MERGE) + Dataflows Gen2
Gold:   Notebooks (aggregations, star schema) + Pipelines

Performance optimisation:
→ OPTIMIZE + ZORDER on Gold tables (query pruning)
→ V-ORDER on all writes (DirectLake speed)
→ Partition large tables by date column
→ VACUUM regularly (remove old Delta files)

Top 10 Tips

  1. Prepare and serve data is 40–45% of the exam — the largest domain by far. Prioritise: medallion architecture, Delta Lake operations (OPTIMIZE, VACUUM, MERGE, time travel), Lakehouse vs Warehouse decisions, and all ingestion patterns (Pipelines, Dataflows, Notebooks, Mirroring, Eventstream).

  2. DirectLake is Fabric's biggest innovation — it combines Import speed with DirectQuery freshness by reading Delta Parquet files directly. Know: when it falls back to DirectQuery, how V-ORDER helps, how to optimise tables for DirectLake, and how framing works.

  3. OneLake unifies everything — all Fabric items (Lakehouse, Warehouse, KQL DB) store data in OneLake as Delta Parquet. The same physical files are accessible from all Fabric experiences. Shortcuts let you reference external data without copying it.

  4. Measures over calculated columns always — calculated columns are stored in the model and computed at refresh time. Measures are computed at query time and are dynamic. Using calculated columns where measures should be used is a common design mistake.

  5. DAX variables improve both readability and performance — always use VAR to avoid evaluating the same expression multiple times. Variables are evaluated once in the current filter context — referencing them multiple times doesn't re-evaluate.

  6. Calculation groups solve measure proliferation — instead of creating YTD/YoY/Prior Period versions of every measure, one calculation group applies those calculations to any measure dynamically. This is a senior analytics engineer pattern.

  7. RLS is enforced in the service, not the model — users cannot bypass RLS through Power BI Desktop or external tools once published. Dynamic RLS using USERPRINCIPALNAME() is the enterprise pattern — one role definition scales to all users.

  8. Impact analysis before modifying anything — always run impact analysis before renaming columns, changing table schemas, or deleting items. Unplanned changes cascade to semantic models and reports, breaking downstream consumers.

  9. Git integration + Deployment Pipelines together — Git provides source control and code review (PRs). Deployment Pipelines manage Dev → Staging → Production promotion. Using both together is the enterprise Fabric governance best practice.

  10. Know when to use Lakehouse vs Warehouse — Lakehouse (Spark + flexible schema) for big data, ML, and raw/intermediate layers. Warehouse (T-SQL + full DML) for structured BI, stored procedures, and when your team is SQL-native. The Gold layer can use either — choose based on team skills and query patterns.



Featured Post

Microsoft Azure Solutions Architect Expert (AZ-305) Complete Guide

Microsoft Azure Solutions Architect Expert (AZ-305) — Complete Guide Well-Architected Framework · Identity & Governance · Data Storage ...

Popular posts