Azure SQL Data Warehouse Architecture Training (AZU102)
This course provides a comprehensive exploration of Microsoft Azure SQL Data Warehouse, focusing on its architecture, table structures, data distribution, and advanced technical details. Designed for database administrators, data engineers, and IT professionals, this course covers the essential concepts and best practices for managing and optimizing Azure SQL Data Warehouse environments.
The course begins with an Introduction to the Azure SQL Data Warehouse, where you will explore the family of SQL Server products and delve into Azure SQL Data Warehouse architecture. Topics include Symmetric Multi-Processing (SMP), parallel processing, and the basics of how Azure SQL Data Warehouse achieves linear scalability. You'll gain insights into key components like the Control Node, Data Rack, Landing Zone, and Backup Node, and learn about the role of Software as a Service (SaaS), Azure Data Lake, disaster recovery, and security compliance.
Next, in The Azure SQL Data Warehouse Table Structures module, you’ll explore the various table structures available in Azure SQL Data Warehouse, including distributed, replicated, and partitioned tables. You’ll learn about the differences between row-based and column-based storage, the use of clustered indexes, and best practices for creating and managing tables with distribution keys. This section equips you with the skills to design efficient data storage strategies tailored to your specific needs.
The Hashing and Data Distribution section dives into the hashing process and its role in data distribution across nodes. You’ll learn about distribution keys, how they affect data spread, and the impact of non-unique distribution keys on performance. This module provides best practices for choosing distribution keys and understanding the underlying mechanics of data movement within Azure SQL Data Warehouse.
In The Technical Details module, you will delve into the inner workings of data storage and retrieval in Azure SQL Data Warehouse. Topics include how data is stored across distributions, the organization of data blocks and pages, and the differences between heap tables and tables with clustered indexes. You’ll explore B-Trees, index creation, and the benefits of different indexing strategies, enhancing your ability to optimize query performance and manage data effectively.
The course concludes with CREATE Statistics, a detailed look at statistics creation and management in Azure SQL Data Warehouse. You’ll learn how to generate and update statistics to optimize query performance, use DBCC SHOW_STATISTICS to view statistics details, and implement best practices for maintaining accurate and useful statistics across your database tables.
By the end of this course, you will have gained an in-depth understanding of Azure SQL Data Warehouse, including how to design efficient table structures, distribute data effectively, and optimize performance through indexing and statistics management. You’ll be equipped with the knowledge and skills needed to manage complex data warehouse environments, ensuring scalability, reliability, and high performance in your cloud-based data solutions.
- Learn to gain a deeper knowledge and understanding of the Azure SQL Data Warehouse Architecture and how to write it.
- Introduction to the Azure SQL Data Warehouse
- Introduction to the Family of SQL Server Products
- Introduction to the Family Continued
- Microsoft Azure SQL Data Warehouse
- Symmetric Multi-Processing (SMP)
- What is Parallel Processing?
- The Basics of a Single Computer
- Data in Memory is fast as Lightning
- Parallel Processing of Data
- A Table has Columns and Rows
- The Azure SQL Data Warehouse has Linear Scalability
- The Architecture of the Azure SQL Data Warehouse
- Nexus is now available on the Microsoft Azure Cloud
- The MPP Engine is the Optimizer
- The Azure SQL Data Warehouse System
- The Azure SQL Data Warehouse System is Scalable
- The Control Node
- The Data Rack
- The Landing Zone
- The Backup Node
- Software as a Service (SaaS) and the Elastic Database
- Azure Data Lake
- Azure Disaster Recovery
- Security and Compliance
- How to Get an EXPLAIN Plan
- The Azure SQL Data Warehouse Table Structures
- The 5 Concepts of Azure SQL Data Warehouse Tables
- Tables are Either Distributed by Hash or Replicated (1 of 5)
- Table Rows are Either Sorted or Unsorted (2 of 5)
- Tables are Stored in Either Row or Columnar Format (3 of 5)
- Tables can be Partitioned (4 of 5)
- There are Permanent, Temporary and External Tables (5 of 5)
- Creating a Table with a Distribution Key
- Creating a Table that is replicated
- Distributed by Hash vs. Replication
- The Concept is all about the Joins
- Creation of a Hash Distributed Table with a Clustered Index
- A Clustered Index Sorts the Data Stored on Disk
- Each Node Has 8 Distributions
- How Hashed Tables are Stored among a Single Node
- Hashed Tables Will Be Distributed Among All Distributions
- Creation of a Replicated Table
- How Replicated Tables are Stored among a Single Node
- Replicated Table will be duplicated among Each Node
- Distributed by Replication
- How Hashed and Replicated Tables Work Together
- Tables are stored as Row-based or Column-based
- Creation of a Columnar Table that is hashed
- How Hashed Columnar Tables are Stored on a Single Node
- How Hashed Columnar Tables are Stored on All Distributions
- Comparing Normal Table vs. Columnar Tables
- Columnar can move just One Segment to Memory
- Segments on Distributions are aligned to rebuild a Row
- Why Columnar?
- Columnar Tables Store Each Column in Separate Pages
- Visualize the Data – Rows vs. Columns
- Creation of a Columnar Table that is replicated
- Creating a Partitioned Table per Month
- A Visual of One Year of Data with Range per Month
- Another Create Example of a Partitioned Table
- Creating a Partitioned Table per Month That is a Columnstore
- Visual of Row Partitioning and Columnar Storage
- CREATE TABLE AS (CTAS) Example
- Creating a Temporary Table
- Facts about Tables
- Hashing and Data Distribution
- Distribution Keys Hashed on Unique Values Spread Evenly
- Distribution Keys with Non-Unique Values Spread Unevenly
- Best Practices for Choosing a Distribution Key
- The Hash Map determines which Distribution owns the Row
- The Hash Map determines which Node will own the Row
- A Review of the Hashing Process
- Non-Unique Distribution Keys have Skewed Data
- The Technical Details
- Every Node has the Exact Same Tables
- Hashed Tables are spread across All Distributions
- The Table Header and the Data Rows are Stored Separately
- A Distribution Stores the Rows of a Table inside a Data Block
- To Read a Data Block a Node Moves the Block into Memory
- A Full Table Scan Means All Nodes Must Read All Rows
- Rows are organized inside a Page
- Moving Data Blocks is Like Checking in Luggage
- As Row-Based Tables Get Bigger, the Page Splits
- Data Pages are Processed One at a Time per Unit
- Creating a Table that is a Heap
- Heap Page
- Extents
- Creating a Table that has a Clustered Index
- Clustered Index Page
- The Row Offset Array is the Guidance System for Every Row
- The Row Offset Array Provides Two Search Options (1 of 2)
- The Row Offset Array Provides Two Search Options (2 of 2)
- The Row Offset Array Helps with Inserts
- B-Trees
- The Building of a B-Tree for a Clustered Index (1 of 3)
- The Building of a B-Tree for a Clustered Index (2 of 3)
- The Building of a B-Tree for a Clustered Index (3 of 3)
- When Do I Create a Clustered Index?
- When Do I Create a Non Clustered Index?
- B-Tree for Non Clustered Index on a Clustered Table (1 of 2)
- B-Tree for Non Clustered Index on a Clustered Table (2 of 2)
- Adding a Non Clustered Index to A Heap
- B-Tree for Non Clustered Index on a Heap Table (1 of 2)
- B-Tree for Non Clustered Index on a Heap Table (2 of 2)
- Max Levels on the Azure SQL Data Warehouse
- Azure SQL Data Warehouse Data Types
- Character Data Types for SQL Server
- Numeric Data Types for SQL Server
- Date and Time Data Types for SQL Server
- Additional Data Types for SQL Server
- CREATE Statistics
- CREATE Statistics Syntax
- CREATE Statistics on a Percentage of a Table
- CREATE Statistics on a Sample by Using the System Default
- CREATE Statistics on a Multi-Column Join Key
- What to Column(s) to CREATE Statistics On
- CREATE Statistics Using a WHERE Clause
- Updating All Statistics on a Table
- Updating Only Certain Statistics on a Table
- Dropping Statistics on Certain Statistics on a Table
- Showing the Statistics
- DBCC SHOW_STATISTICS
- DBCC SHOW_STATISTICS WITH HISTOGRAM
Each student will receive a comprehensive set of materials, including course notes and all the class examples.
Live Private Class
- Private Class for your Team
- Live training
- Online or On-location
- Customizable
- Expert Instructors