Sunday, June 7, 2026

The AI Buzzword Glossary: From Agents to Agentic AI

The AI Buzzword Glossary: From Agents to Agentic AI

Every week brings a fresh wave of AI terminology. “Agentic AI,” “RAG,” “MCP,” “context engineering” — the words pile up faster than anyone can keep track of them, and half the time they’re used loosely or interchangeably.

This post is a no-jargon reference. Each term gets a plain-English definition (what it is), a quick real-world example, and a short note on what it’s for — so you can read a vendor blog or a conference talk without nodding along to words you don’t actually use yourself.

The terms are grouped by theme so related ideas sit together.


1. Core Agent Terminology

This is the cluster everyone is talking about — the shift from chatbots that answer to systems that act.

TermWhat it isWhat it’s for
AI AgentAn LLM-powered system that can take actions — call tools, hit APIs, run code — to accomplish a goal, not just produce texte.g., an agent that reads a support ticket, looks up the order, and issues a refundAutomating multi-step tasks like data lookups, ticket resolution, or bookings
Agentic AIThe broader paradigm where AI plans, decides, executes, and self-corrects with some degree of autonomye.g., a system that books a trip end-to-end — comparing flights, choosing, and reservingThe umbrella term for moving from “assistant that talks” to “system that does”
Autonomous AgentAn agent that runs with minimal or no human input, looping until its goal is mete.g., a monitor that watches logs overnight and opens a ticket when it spots an anomalyBackground jobs, monitoring, “set and forget” workflows
Multi-Agent System (MAS)Several specialized agents collaborating, each owning a sub-taske.g., a research agent hands findings to a writer agent, which hands copy to an editor agentComplex workflows where one agent coordinates others (planner + researcher + writer)
OrchestrationThe control layer that decides which agent, tool, or model handles each stepe.g., routing a customer query to the billing agent vs. the tech-support agent“Who does what next” in a multi-agent flow
Copilot / AssistantA human-in-the-loop agent embedded in an app that suggests or assists rather than acting alonee.g., code suggestions that appear inline as you typeProductivity inside everyday tools

2. Architecture & Reasoning Patterns

How agents are actually wired up under the hood.

TermWhat it isWhat it’s for
RAG (Retrieval-Augmented Generation)Fetching relevant documents or data and feeding them to the model before it answerse.g., a chatbot that pulls from your policy document before answering an HR questionGrounding answers in your own data; reducing hallucination
GroundingTying model output to authoritative source datae.g., an answer that links to the exact document the fact came fromTrust, citations, accuracy
Chain-of-Thought (CoT)Prompting the model to reason step-by-step before answeringe.g., showing the working before giving the final answer to a word problemBetter results on math, logic, and multi-step problems
ReAct (Reason + Act)A pattern where the model alternates between reasoning and tool-callinge.g., “I need today’s weather” → calls the weather API → then answersThe backbone of most agent loops
Reasoning ModelA model trained to “think” longer before respondinge.g., spending extra time to work through a tricky coding bugHard problems that need deliberation
Tool Use / Function CallingThe model invoking external functions or APIs in a structured waye.g., calling getWeather(“Bengaluru”) instead of guessing the temperatureHow agents actually do things rather than just describe them
Context EngineeringDeliberately designing everything that goes into the model’s context windowe.g., choosing which past messages and documents to include in a long support chatThe evolution of “prompt engineering” for agents

3. Models & Foundations

The engines that power everything above.

TermWhat it isWhat it’s for
LLM (Large Language Model)A model trained on massive text to generate languagee.g., the model behind a chat assistant that drafts an emailThe core engine of most GenAI
SLM (Small Language Model)A compact, cheaper, faster modele.g., a small model running on a phone to summarize notes offlineOn-device, low-latency, cost-sensitive tasks
Foundation ModelA large pre-trained model reusable across many taskse.g., one base model adapted for chat, coding, and searchThe base you build or fine-tune on
Frontier ModelThe newest, most capable models at the cutting edgee.g., the latest flagship model topping the benchmarksBenchmark-leading and demanding workloads
MultimodalHandles a mix of text, images, audio, and videoe.g., asking a question about a photo you uploadedDocument understanding, vision tasks, voice
GenAI (Generative AI)Umbrella term for AI that creates contente.g., generating text, images, code, or audio from a promptThe whole category

4. Data & Memory

Where agents get their knowledge — and how they remember.

TermWhat it isWhat it’s for
EmbeddingsNumeric vector representations of text or data that capture meaninge.g., “cancel my order” and “I want a refund” land close together as similarSemantic search, similarity, RAG retrieval
Vector DatabaseA store optimized for embeddingse.g., a search index that finds the most relevant document chunks for a questionFast semantic lookup that powers RAG
Knowledge SourceThe grounding data an agent draws one.g., a document library the agent searches before answeringAccurate, domain-specific answers
MemoryPersistence of context across turns or sessionse.g., the assistant remembering your name and preferences across chatsPersonalization and long-running agents

5. Protocols & Interoperability

The newest layer — standards that let agents and tools talk to each other without custom glue code.

TermWhat it isWhat it’s for
MCP (Model Context Protocol)An open standard introduced in late 2024 for connecting AI models to external tools and data sourcese.g., linking an assistant to your files and calendar through one standardPlug-and-play connectors — often described as “USB-C for AI tools”
A2A (Agent2Agent)An open protocol introduced in 2025 (now under the Linux Foundation) for agents to discover, message, and delegate to one anothere.g., a travel agent handing a payment task to a separate payments agentCross-vendor collaboration between agents
Connector / PluginA packaged integration giving an agent access to a specific systeme.g., a connector that lets an agent read tickets from a service-desk toolExtending agents to enterprise systems
MCP vs. A2A in one line: MCP connects an agent to its tools and data; A2A connects an agent to other agents. They’re complementary, not competing.

6. Build & Tuning

The practical side — making, shaping, and trusting models.

TermWhat it isWhat it’s for
Fine-tuningFurther training a model on your own datae.g., training on past support replies so the model matches your toneSpecialized tone, domain, or behavior
Prompt EngineeringCrafting effective instructionse.g., adding “answer in three bullet points” to control the formatSteering model output
InferenceRunning the model to produce output (as opposed to training it)e.g., the compute cost charged each time you send a requestThe “cost per use” side of the equation
GuardrailsSafety and policy filters around inputs and outputse.g., blocking the model from revealing personal or sensitive dataPreventing harmful or off-policy responses
Evals (Evaluations)Systematic testing of model or agent qualitye.g., running 100 test questions to check accuracy before launchMeasuring accuracy and catching regressions
HallucinationConfident but wrong or fabricated outpute.g., the model inventing a citation or statistic that doesn’t existThe core problem that grounding and RAG try to solve

How It All Fits Together

If you read the tables top to bottom, a rough mental model emerges:

  1. A foundation model (an LLM) provides the raw intelligence.
  2. Wrap it in a reasoning pattern like ReAct, give it tool use, and you have an agent.
  3. Ground it with RAG over a vector database so it answers from real data.
  4. Connect it to systems via MCP, and to other agents via A2A.
  5. Coordinate several of them through orchestration to build a multi-agent system.
  6. Keep it honest with guardrails and measure it with evals.

That progression — from a single model to a coordinated, tool-using, self-correcting system — is essentially what people mean when they say agentic AI.


Know a buzzword that should be on this list? The vocabulary keeps growing — this glossary will too.

Wednesday, June 3, 2026

How to Create a Self-Signed Certificate and Upload It to Azure App Registration Using PowerShell

How to Create a Self-Signed Certificate and Upload It to Azure App Registration Using PowerShell

Securing your Azure app with certificate-based authentication is a best practice — especially when building automation with Microsoft Graph, Exchange Online, or Teams. In this post, I'll walk you through how to create a self-signed certificate using PowerShell on your local machine and upload it to an Azure App Registration in Microsoft Entra ID.


Why Use Certificate-Based Authentication?

Certificate-based authentication is more secure than client secrets because:

  • Certificates are harder to leak accidentally (no plain-text secret string)
  • They support expiry enforcement with a clear renewal cycle
  • They work seamlessly with GitHub Actions, Azure Automation, and PowerShell scripts
  • They are the recommended approach for unattended/service account automation in M365

Prerequisites

  • Windows machine with PowerShell 5.1+ or PowerShell 7+
  • Access to Azure Portal with permissions to manage App Registrations
  • An existing App Registration in Microsoft Entra ID (or create a new one)

Step 1 — Create the Self-Signed Certificate

Open PowerShell as Administrator and run the following script to generate a self-signed certificate and store it in your local certificate store.

# ─────────────────────────────────────────────────────────────
# STEP 1: Create Self-Signed Certificate
# ─────────────────────────────────────────────────────────────

$certName   = "MyAppAuthCert"
$validYears = 2

# Generate the certificate in CurrentUser\My store
$cert = New-SelfSignedCertificate `
    -Subject           "CN=$certName" `
    -CertStoreLocation "Cert:\CurrentUser\My" `
    -KeyExportPolicy   Exportable `
    -KeySpec           Signature `
    -KeyLength         2048 `
    -KeyAlgorithm      RSA `
    -HashAlgorithm     SHA256 `
    -NotAfter          (Get-Date).AddYears($validYears)

# Confirm creation
Write-Host "✅ Certificate created successfully!" -ForegroundColor Green
Write-Host "   Subject    : $($cert.Subject)"
Write-Host "   Thumbprint : $($cert.Thumbprint)"
Write-Host "   Valid Until: $($cert.NotAfter)"

Note: The certificate is stored in Cert:\CurrentUser\My. You can verify it by opening certmgr.msc → Personal → Certificates.


Step 2 — Export the Public Key (.cer)

The App Registration only needs the public key in .cer format. No password is required for this export.

# ─────────────────────────────────────────────────────────────
# STEP 2: Export Public Key (.cer) for Azure Portal Upload
# ─────────────────────────────────────────────────────────────

$cerPath = "$env:USERPROFILE\Desktop\$certName.cer"

Export-Certificate `
    -Cert     "Cert:\CurrentUser\My\$($cert.Thumbprint)" `
    -FilePath $cerPath

Write-Host "✅ Public key exported!" -ForegroundColor Green
Write-Host "   Path: $cerPath"

Step 3 — Export the Private Key (.pfx)

Keep your .pfx file safe — this is used by your PowerShell scripts and automation pipelines to authenticate.

# ─────────────────────────────────────────────────────────────
# STEP 3: Export Private Key (.pfx) for Script Authentication
# ─────────────────────────────────────────────────────────────

$pfxPassword = Read-Host -Prompt "Enter PFX password" -AsSecureString
$pfxPath     = "$env:USERPROFILE\Desktop\$certName.pfx"

Export-PfxCertificate `
    -Cert     "Cert:\CurrentUser\My\$($cert.Thumbprint)" `
    -FilePath $pfxPath `
    -Password $pfxPassword

Write-Host "✅ Private key (PFX) exported!" -ForegroundColor Green
Write-Host "   Path: $pfxPath"
Write-Host "⚠️  Keep this file secure. Do NOT commit it to source control." -ForegroundColor Yellow

⚠️ Security Tip: Store the .pfx file in Azure Key Vault or as a GitHub Actions secret (base64-encoded). Never commit it to a repository.


Step 4 — Upload the Certificate to Azure App Registration

  1. Sign in to the Azure Portal
  2. Navigate to Microsoft Entra IDApp registrations
  3. Select your application
  4. In the left menu, click Certificates & secrets
  5. Click the Certificates tab → Upload certificate
  6. Browse and select the .cer file exported to your Desktop
  7. Add a Description (e.g., MyAppAuthCert-2026) → click Add

You will now see the certificate listed with its Thumbprint, Start date, and Expiry date.


Step 5 — Complete PowerShell Script (All-in-One)

Here is the complete end-to-end script you can save and run directly:

# ═══════════════════════════════════════════════════════════════
# Create Self-Signed Certificate & Export for Azure App Reg
# Author : Your Name
# Version: 1.0
# Date   : June 2026
# ═══════════════════════════════════════════════════════════════

# ── Configuration ─────────────────────────────────────────────
$certName   = "MyAppAuthCert"
$validYears = 2
$exportPath = "$env:USERPROFILE\Desktop"

Write-Host "`nšŸ” Starting Certificate Creation..." -ForegroundColor Cyan

# ── Step 1: Create Self-Signed Certificate ────────────────────
$cert = New-SelfSignedCertificate `
    -Subject           "CN=$certName" `
    -CertStoreLocation "Cert:\CurrentUser\My" `
    -KeyExportPolicy   Exportable `
    -KeySpec           Signature `
    -KeyLength         2048 `
    -KeyAlgorithm      RSA `
    -HashAlgorithm     SHA256 `
    -NotAfter          (Get-Date).AddYears($validYears)

Write-Host "✅ Certificate created: $($cert.Thumbprint)" -ForegroundColor Green

# ── Step 2: Export Public Key (.cer) ─────────────────────────
$cerPath = "$exportPath\$certName.cer"

Export-Certificate `
    -Cert     "Cert:\CurrentUser\My\$($cert.Thumbprint)" `
    -FilePath $cerPath | Out-Null

Write-Host "✅ Public key exported : $cerPath" -ForegroundColor Green

# ── Step 3: Export Private Key (.pfx) ────────────────────────
$pfxPassword = Read-Host -Prompt "`nEnter a secure password for the PFX file" -AsSecureString
$pfxPath     = "$exportPath\$certName.pfx"

Export-PfxCertificate `
    -Cert     "Cert:\CurrentUser\My\$($cert.Thumbprint)" `
    -FilePath $pfxPath `
    -Password $pfxPassword | Out-Null

Write-Host "✅ Private key exported: $pfxPath" -ForegroundColor Green

# ── Summary ───────────────────────────────────────────────────
Write-Host "`n════════════════════════════════════════" -ForegroundColor Cyan
Write-Host "  Certificate Summary" -ForegroundColor Cyan
Write-Host "════════════════════════════════════════" -ForegroundColor Cyan
Write-Host "  Subject     : $($cert.Subject)"
Write-Host "  Thumbprint  : $($cert.Thumbprint)"
Write-Host "  Valid From  : $($cert.NotBefore)"
Write-Host "  Valid Until : $($cert.NotAfter)"
Write-Host "  CER Path    : $cerPath"
Write-Host "  PFX Path    : $pfxPath"
Write-Host "════════════════════════════════════════" -ForegroundColor Cyan
Write-Host "`nšŸ“Œ Next Step: Upload the .cer file to your Azure App Registration" -ForegroundColor Yellow
Write-Host "   Portal → Entra ID → App Registrations → Certificates & Secrets`n"

Step 6 — Authenticate Using the Certificate in PowerShell

Once the certificate is uploaded to the App Registration, use the thumbprint to authenticate in your automation scripts.

Microsoft Graph

# ─────────────────────────────────────────────────────────────
# Authenticate with Microsoft Graph using Certificate
# ─────────────────────────────────────────────────────────────

$tenantId   = "<your-tenant-id>"
$clientId   = "<your-app-client-id>"
$thumbprint = "<certificate-thumbprint>"

Connect-MgGraph `
    -TenantId             $tenantId `
    -ClientId             $clientId `
    -CertificateThumbprint $thumbprint

Write-Host "✅ Connected to Microsoft Graph" -ForegroundColor Green

Exchange Online

# ─────────────────────────────────────────────────────────────
# Authenticate with Exchange Online using Certificate
# ─────────────────────────────────────────────────────────────

Connect-ExchangeOnline `
    -AppId        $clientId `
    -Organization "yourtenant.onmicrosoft.com" `
    -CertificateThumbprint $thumbprint

Write-Host "✅ Connected to Exchange Online" -ForegroundColor Green

Quick Reference Summary

Step Action Output
1 New-SelfSignedCertificate Cert created in local store
2 Export-Certificate .cer public key file
3 Export-PfxCertificate .pfx private key file
4 Upload .cer in Azure Portal App Registration cert added
5 Use thumbprint in scripts Cert-based auth working

Tips & Best Practices

  • šŸ” Renewal reminder — Set a calendar alert 30 days before expiry (NotAfter date)
  • šŸ”’ Store PFX securely — Use Azure Key Vault or GitHub Actions secrets (base64-encoded)
  • šŸ·️ Use descriptive names — Include the year in the cert description (e.g., AppAuthCert-2026)
  • šŸ“‹ Document the thumbprint — Save it in your runbook/wiki for reference during troubleshooting
  • 🚫 Never use secrets for automation — Cert-based auth is always preferred over client secrets for unattended scripts

Hashtags

#MicrosoftAzure #EntraID #PowerShell #AzureAppRegistration #CertificateAuthentication #M365 #MicrosoftGraph #DevOps #CloudSecurity #AzureAutomation

Building an End-to-End Retail Analytics Platform on Microsoft Fabric

Building an End-to-End Retail Analytics Platform on Microsoft Fabric

A complete, reproducible walkthrough — from raw CSVs to a secured, refreshable Power BI report — using the Lakehouse medallion pattern, Direct Lake, dynamic row-level security, and a Dev → Prod deployment pipeline.

Companion repo: https://github.com/usreekanthreddy/contoso-retail-fabric


Why this project

Most "Fabric demos" stop at loading a CSV and drawing a bar chart. Real analytics platforms need more: a layered data architecture you can trust, a dimensional model that performs, security that respects who is looking, and a release process that gets changes from development to production safely.

This post documents a complete build for a fictional retailer, Contoso Retail 360. Everything here is reproducible — the repo ships seed data, the notebooks, the SQL validation, and the DAX so you can stand the whole thing up in your own Fabric workspace in well under an hour.

The finished platform answers questions like:

  • How are sales trending month over month, and how does that compare to last year?
  • Which regions and stores are pulling their weight, and which channel (Store vs. Online) is winning?
  • What's our gross margin, and which products drive it?
  • …while making sure a regional manager only ever sees their own region's numbers.

The architecture at a glance

The solution follows the medallion architecture — three progressively refined layers inside a single Fabric Lakehouse (lh_retail):

Source CSVs (Files/landing/)
        │   file-based ingest
        ▼
┌─────────────────┐
│  BRONZE         │  raw, as-landed, + ingest metadata
│  *_raw tables   │  (_ingest_ts, _source_file)
└─────────────────┘
        │   clean, cast, de-dupe, hash PII
        ▼
┌─────────────────┐
│  SILVER          │  conformed, typed business entities
│  sales/products  │  customers (FullName hashed)
│  stores/customers│
└─────────────────┘
        │   build star schema, surrogate keys
        ▼
┌─────────────────┐
│  GOLD           │  DimDate / DimProduct / DimStore /
│  star schema    │  DimCustomer / DimChannel + FactSales
│  (+ UserRegion) │  Delta change-data-feed ON
└─────────────────┘
        │   Direct Lake
        ▼
   Semantic model  →  Power BI report (with RLS)

Why these choices:

  • Bronze / Silver / Gold keeps raw data immutable and auditable, isolates cleaning logic, and gives BI a stable, well-shaped surface to build on. If a transform is wrong, you fix Silver/Gold and re-run — Bronze is untouched.
  • Direct Lake lets Power BI read the Gold Delta tables directly from OneLake — no import refresh, no DirectQuery latency. You get import-like speed with live data.
  • Delta change data feed on the Gold tables makes Direct Lake "framing" (the refresh that picks up new data) efficient — only changed files are reprocessed.

The data model: a classic star schema

Gold is a textbook star: one fact table surrounded by conformed dimensions.

FactSales (one row per order line)

ColumnNotes
SalesKeysurrogate key
DateKey, ProductKey, StoreKey, CustomerKey, ChannelKeyforeign keys to dimensions
OrderNumberdegenerate dimension (for distinct order counts)
Quantity, UnitPrice, DiscountAmountadditive measures
SalesAmountQuantity × UnitPrice − DiscountAmount
UnitCostcarried for margin math

Dimensions

  • DimDate — a generated calendar with Day/Month/Quarter/Year, MonthName, DayOfWeek, IsWeekend. Marked as the date table to unlock time-intelligence.
  • DimProduct — product, category, subcategory, brand, list price, unit cost.
  • DimStore — store, city, region, country, format, open date.
  • DimCustomer — segment, city, loyalty tier, signup date (name hashed for privacy).
  • DimChannel — Store / Online.

All relationships are single-direction, one-to-many from each dimension to FactSales.


Step 1 — Create the Lakehouse and schemas

In the Fabric workspace, create a Lakehouse named lh_retail and three schemas: bronze, silver, gold. The notebooks create the schemas defensively too:

for s in ["bronze", "silver", "gold"]:
    spark.sql(f"CREATE SCHEMA IF NOT EXISTS {s}")

Step 2 — Land the source data

The kit ships four seed CSVs (generated deterministically with seed=42, so everyone gets identical numbers):

  • sales.csv — 3,001 order lines (Dec 2025 – May 2026)
  • products.csv — 30 products
  • stores.csv — 12 stores across West / Central / East
  • customers.csv — 500 customers

Upload them to the Lakehouse under Files/landing/. In production this folder would be the drop zone for a Data Factory Copy pipeline (pl_ingest_bronze); the notebook plays that role for the quick-test path.


Step 3 — Bronze: file-based ingestion

Bronze reads the landing CSVs and writes them straight to Delta tables, stamping each row with ingest metadata so you always know where a record came from and when it arrived:

def read_csv(name):
    return (spark.read.option("header", True).option("inferSchema", True)
            .csv(f"Files/landing/{name}.csv")
            .withColumn("_ingest_ts", F.current_timestamp())
            .withColumn("_source_file", F.lit(f"Files/landing/{name}.csv")))

W = lambda df, tbl: (df.write.format("delta")
                       .mode("overwrite")
                       .option("overwriteSchema", "true")
                       .saveAsTable(tbl))

W(read_csv("products"),  "bronze.products_raw")
W(read_csv("stores"),    "bronze.stores_raw")
W(read_csv("customers"), "bronze.customers_raw")
W(read_csv("sales"),     "bronze.sales_raw")

mode("overwrite") means every notebook is safe to re-run from scratch — no duplicate-on-rerun surprises.


Step 4 — Silver: clean, type, de-duplicate, and protect PII

Silver is where raw becomes trustworthy. Strings become dates and decimals, duplicates are dropped on natural keys, and — importantly — customer names are hashed and the raw value dropped before anything sensitive reaches the analytics layer:

# Sales: type-cast, default discounts, de-dupe on the natural key
sl = (spark.read.table("bronze.sales_raw")
      .withColumn("OrderDate", F.to_date("OrderDate"))
      .withColumn("Quantity", F.col("Quantity").cast("int"))
      .withColumn("UnitPrice", F.col("UnitPrice").cast("decimal(18,2)"))
      .withColumn("DiscountAmount",
                  F.coalesce(F.col("DiscountAmount").cast("decimal(18,2)"), F.lit(0)))
      .dropDuplicates(["OrderNumber", "ProductId"]))
W(sl.drop("_ingest_ts", "_source_file"), "silver.sales")

# Customers: SHA-256 the name, then drop the raw column
W(spark.read.table("bronze.customers_raw")
    .withColumn("SignupDate", F.to_date("SignupDate"))
    .withColumn("FullNameHash", F.sha2(F.col("FullName"), 256))
    .drop("FullName", "_ingest_ts", "_source_file")
    .dropDuplicates(["CustomerId"]), "silver.customers")

The hash preserves the ability to count and join on a customer without ever exposing their name downstream — a small change that makes the platform far more defensible from a privacy standpoint.


Step 5 — Gold: build the star schema

Gold assembles dimensions with surrogate keys and joins everything into FactSales.

# Surrogate-key helper
def skey(df, name):
    return df.withColumn(name, F.row_number().over(
        Window.orderBy(F.monotonically_increasing_id())))

W(skey(spark.read.table("silver.products"),  "ProductKey"),  "gold.DimProduct")
W(skey(spark.read.table("silver.stores"),    "StoreKey"),    "gold.DimStore")
W(skey(spark.read.table("silver.customers"), "CustomerKey"), "gold.DimCustomer")
W(spark.createDataFrame([(1, "Store"), (2, "Online")],
                        ["ChannelKey", "Channel"]), "gold.DimChannel")

The date dimension is generated from the actual min/max order dates in the data:

b = (spark.read.table("silver.sales")
        .agg(F.min("OrderDate").alias("mn"), F.max("OrderDate").alias("mx"))
        .collect()[0])
dd = spark.sql(
    f"SELECT explode(sequence(to_date('{b.mn}'), to_date('{b.mx}'), "
    f"interval 1 day)) AS Date")
dd = (dd.withColumn("DateKey", F.date_format("Date", "yyyyMMdd").cast("int"))
        .withColumn("Year", F.year("Date")).withColumn("Quarter", F.quarter("Date"))
        .withColumn("Month", F.month("Date"))
        .withColumn("MonthName", F.date_format("Date", "MMMM"))
        .withColumn("DayOfWeek", F.date_format("Date", "EEEE"))
        .withColumn("IsWeekend", F.dayofweek("Date").isin([1, 7])))
W(dd, "gold.DimDate")

And the fact table resolves every natural key to a surrogate key, computes SalesAmount, and selects the final shape:

fact = (s.join(dp, "ProductId", "left").join(ds, "StoreId", "left")
         .join(dc, "CustomerId", "left").join(dch, "Channel", "left")
         .withColumn("DateKey", F.date_format("OrderDate", "yyyyMMdd").cast("int"))
         .withColumn("SalesAmount",
                     (F.col("Quantity") * F.col("UnitPrice")
                      - F.col("DiscountAmount")).cast("decimal(18,2)"))
         .select(F.monotonically_increasing_id().alias("SalesKey"),
                 "DateKey", "ProductKey", "StoreKey", "CustomerKey", "ChannelKey",
                 "OrderNumber", "Quantity", "UnitPrice", "DiscountAmount",
                 "SalesAmount", "UnitCost"))
W(fact, "gold.FactSales")

A security table for dynamic RLS

Gold also includes a small mapping table that drives dynamic row-level security — which user can see which region:

user_region = spark.createDataFrame([
    ("sreekanth@vkn2k.onmicrosoft.com",       "West"),
    ("east.manager@vkn2k.onmicrosoft.com",    "East"),
    ("central.manager@vkn2k.onmicrosoft.com", "Central"),
], ["Email", "Region"])
W(user_region, "gold.UserRegion")

Turn on change data feed

Finally, enable Delta change data feed on every Gold table so Direct Lake refreshes stay incremental and fast:

for t in ["DimProduct","DimStore","DimCustomer","DimChannel",
          "DimDate","UserRegion","FactSales"]:
    spark.sql(f"ALTER TABLE gold.{t} "
              f"SET TBLPROPERTIES (delta.enableChangeDataFeed = true)")

Step 6 — Validate the Gold layer

Trust, but verify. The notebook prints a built-in validation block, and sql/validate_gold.sql runs the same checks against the SQL analytics endpoint. With the shipped seed data you should get exactly these numbers:

MetricExpected value
Orphan fact rows0
Distinct orders1,224
Total units9,097
Total sales1,673,866.79
Total cost1,028,681.42
Gross margin645,185.37
Gross margin %38.5%
Date range2025-12-01 → 2026-05-31
Raw FullName present in GoldFalse

Sales by region and channel (rounded):

RegionOnlineStore
Central224,875317,640
East226,002356,374
West226,808322,167

The orphan check is the most important line: it confirms every fact row resolved to a real product, store, customer, and channel. Zero orphans means the joins are clean.

orphans = f.filter(
    F.col("ProductKey").isNull() | F.col("StoreKey").isNull()
    | F.col("CustomerKey").isNull() | F.col("ChannelKey").isNull()).count()
assert orphans == 0

Step 7 — Build the Direct Lake semantic model

From the Lakehouse SQL analytics endpoint, choose New semantic model and select only the gold.* tables. Then wire up the relationships — all single-direction, one-to-many from dimension to fact:

  • DimDate[DateKey]FactSales[DateKey]
  • DimProduct[ProductKey]FactSales[ProductKey]
  • DimStore[StoreKey]FactSales[StoreKey]
  • DimCustomer[CustomerKey]FactSales[CustomerKey]
  • DimChannel[ChannelKey]FactSales[ChannelKey]

Mark DimDate as the date table (on DimDate[Date]) — this is what makes time-intelligence work.


Step 8 — Add the DAX measures

The model ships with a full measure library. The base measures:

Total Sales   = SUM ( FactSales[SalesAmount] )
Total Units   = SUM ( FactSales[Quantity] )
Total Cost    = SUMX ( FactSales, FactSales[Quantity] * FactSales[UnitCost] )
Gross Margin  = [Total Sales] - [Total Cost]
Gross Margin %= DIVIDE ( [Total Sales] - [Total Cost], [Total Sales] )
Order Count   = DISTINCTCOUNT ( FactSales[OrderNumber] )
Avg Basket Size = DIVIDE ( [Total Sales], [Order Count] )

A note on a bug worth remembering: define Gross Margin % as a self-contained expression — DIVIDE([Total Sales] - [Total Cost], [Total Sales]) — rather than referencing [Gross Margin]. During the build, having % depend on the plain Gross Margin measure (and accidentally overwriting one with the other) produced a circular-dependency error. Making the percentage stand on its own avoids the trap.

Time-intelligence (these are why DimDate had to be marked):

Sales LY   = CALCULATE ( [Total Sales], SAMEPERIODLASTYEAR ( DimDate[Date] ) )
Sales YoY  = [Total Sales] - [Sales LY]
Sales YoY %= DIVIDE ( [Sales YoY], [Sales LY] )
Sales MTD  = CALCULATE ( [Total Sales], DATESMTD ( DimDate[Date] ) )
Sales YTD  = CALCULATE ( [Total Sales], DATESYTD ( DimDate[Date] ) )
Sales MoM %= DIVIDE ( [Total Sales] - [Sales PM], [Sales PM] )

Mix and ranking:

Active Customers = DISTINCTCOUNT ( FactSales[CustomerKey] )
Online Sales %   = DIVIDE (
        CALCULATE ( [Total Sales], DimChannel[Channel] = "Online" ),
        [Total Sales] )
Product Sales Rank = RANKX (
        ALLSELECTED ( DimProduct[ProductName] ), [Total Sales], , DESC )

Step 9 — Dynamic row-level security

This is where the UserRegion table earns its place. Instead of hard-coding a role per region, a single role filters DimStore to the regions mapped to the logged-in user:

[Region] IN
    SELECTCOLUMNS (
        FILTER ( UserRegion, UserRegion[Email] = USERPRINCIPALNAME () ),
        "r", UserRegion[Region]
    )

Add one manager's email + region to gold.UserRegion, and they automatically see only their slice — no model changes, no new roles. The West manager sees West; the East manager sees East; an admin row could map to all three.


Step 10 — Build the report

Two pages cover the core audience:

  • Executive Overview — KPI cards (Total Sales, Gross Margin %, Order Count, Active Customers), a sales-over-time line with YoY, a region/channel matrix, and a top-products bar using the rank measure.
  • Store Performance — store-level table with margin, format breakdown, and a regional map, laid out so the most-scanned numbers sit top-left.

Because the model is Direct Lake, the report reflects new data as soon as the Gold tables are framed — no scheduled import refresh required.


Step 11 — Orchestrate, monitor, and deploy

A platform isn't done when the report renders. Three operational pieces close the loop:

Orchestration. A Data Factory pipeline chains ingest → Silver → Gold notebooks and runs on a schedule, so the medallion stays current without manual runs.

Failure notification. The pipeline includes an on-failure activity that alerts when a run breaks — you find out from a notification, not from a stakeholder asking why the numbers look stale.

Dev → Prod deployment pipeline. Fabric deployment pipelines move the Lakehouse, semantic model, and report from a Development stage to Production with a controlled promotion, instead of editing production artifacts by hand.

The semantic model is also certified, signalling to report authors across the tenant that it's the trusted, governed source for retail sales.


What's in the repo

contoso-retail-fabric/
├─ README.md
├─ .gitignore
├─ data/landing/         products.csv · stores.csv · customers.csv · sales.csv
├─ notebooks/            nb_01_bronze_ingest · nb_02_silver_transform · nb_03_gold_model
├─ sql/                  validate_gold.sql
└─ semantic-model/       measures.dax

Clone it, follow the run order, and compare your validation numbers against the table above. If they match, your pipeline ran correctly end to end.


Lessons learned

A few things that are easy to get wrong and worth doing right the first time:

  • Hash PII in Silver, not later. Once a name reaches Gold or the semantic model, it's much harder to claw back. Do it at the conforming step.
  • Keep Gross Margin % self-contained to avoid circular-dependency errors between dependent measures.
  • Mark the date table before writing time-intelligenceSAMEPERIODLASTYEAR and friends silently misbehave otherwise.
  • Enable change data feed on Gold so Direct Lake framing stays incremental as data grows.
  • Make RLS data-driven. A UserRegion mapping table beats a proliferation of hard-coded roles — onboarding a new manager becomes a single row insert.
  • Validate with assertions, not eyeballs. The orphan check and the fixed expected totals turn "looks right" into "is right."

Wrapping up

What started as raw CSVs is now a governed analytics platform: layered and auditable, modeled for performance, secured per-user, monitored, and promotable from dev to prod. The same pattern scales well beyond a demo — swap the seed CSVs for a real Copy pipeline, point Bronze at your sources, and the rest of the medallion carries the weight.

The full, reproducible kit lives at https://github.com/usreekanthreddy/contoso-retail-fabric

Featured Post

The AI Buzzword Glossary: From Agents to Agentic AI

The AI Buzzword Glossary: From Agents to Agentic AI Every week brings a fresh wave of AI terminology. “Agentic AI,” “RAG,” “MCP,” “co...

Popular posts