Doc Watson Round the Table Again

CREATE Tabular array AS SELECT (Azure Synapse Analytics)

Commodity
11/thirty/2021
xix minutes to read

Applies to: Azure Synapse Analytics Analytics Platform System (PDW)

CREATE TABLE AS SELECT (CTAS) is one of the well-nigh important T-SQL features available. It is a fully parallelized operation that creates a new tabular array based on the output of a SELECT statement. CTAS is the simplest and fastest way to create a re-create of a table.

For example, use CTAS to:

Re-create a table with a unlike hash distribution column.
Re-create a table equally replicated.
Create a columnstore index on just some of the columns in the tabular array.
Query or import external data.

Note

Since CTAS adds to the capabilities of creating a table, this topic tries not to repeat the CREATE TABLE topic. Instead, it describes the differences between the CTAS and CREATE TABLE statements. For the CREATE Tabular array details, meet CREATE Tabular array (Azure Synapse Analytics) statement.

Annotation

This syntax is non supported past serverless SQL puddle in Azure Synapse Analytics.

Topic link icon Transact-SQL Syntax Conventions

Syntax

              CREATE Tabular array { database_name.schema_name.table_name | schema_name.table_name | table_name }     [ ( column_name [ ,...n ] ) ]       WITH (        <distribution_option> -- required       [ , <table_option> [ ,...n ] ]         )       AS <select_statement>       Choice <query_hint>  [;]    <distribution_option> ::=     {          DISTRIBUTION = HASH ( distribution_column_name )        | DISTRIBUTION = ROUND_ROBIN        | DISTRIBUTION = REPLICATE     }     <table_option> ::=      {            CLUSTERED COLUMNSTORE INDEX --default for Synapse Analytics        | CLUSTERED COLUMNSTORE INDEX Social club (column[,...due north])       | HEAP --default for Parallel Information Warehouse          | CLUSTERED INDEX ( { index_column_name [ ASC | DESC ] } [ ,...n ] ) --default is ASC      }         | Partition ( partition_column_name RANGE [ LEFT | RIGHT ] --default is LEFT           FOR VALUES ( [ boundary_value [,...n] ] ) )     <select_statement> ::=       [ WITH <common_table_expression> [ ,...n ] ]       SELECT select_criteria    <query_hint> ::=     {         MAXDOP      }

Arguments

For details, meet the Arguments section in CREATE TABLE.

Column options

column_name [ ,...n ]
Column names do non allow the column options mentioned in CREATE TABLE. Instead, you can provide an optional listing of ane or more column names for the new table. The columns in the new table will utilize the names you lot specify. When y'all specify column names, the number of columns in the column list must match the number of columns in the select results. If yous don't specify any column names, the new target table will use the column names in the select statement results.

You cannot specify any other column options such as data types, collation, or nullability. Each of these attributes is derived from the results of the SELECT statement. Nevertheless, yous can use the SELECT statement to modify the attributes. For an instance, come across Apply CTAS to change column attributes.

Table distribution options

DISTRIBUTION = HASH ( distribution_column_name ) | ROUND_ROBIN | REPLICATE
The CTAS statement requires a distribution selection and does not have default values. This is different from CREATE Table which has defaults.

For details and to understand how to cull the best distribution cavalcade, encounter the Tabular array distribution options section in CREATE Tabular array.

Table partition options

The CTAS statement creates a not-partitioned table past default, even if the source tabular array is partitioned. To create a partitioned table with the CTAS statement, you must specify the partition option.

For details, see the Table segmentation options department in CREATE Tabular array.

Select statement

The select statement is the fundamental difference betwixt CTAS and CREATE Tabular array.

WITH common_table_expression
Specifies a temporary named result set, known as a common table expression (CTE). For more data, encounter WITH common_table_expression (Transact-SQL).

SELECT select_criteria
Populates the new tabular array with the results from a SELECT statement. select_criteria is the trunk of the SELECT argument that determines which information to copy to the new table. For data about SELECT statements, see SELECT (Transact-SQL).

Query hint

Users can prepare MAXDOP to an integer value to control the maximum degree of parallelism. When MAXDOP is gear up to 1, the query is executed by a single thread.

Permissions

CTAS requires SELECT permission on any objects referenced in the select_criteria.

For permissions to create a table, run across Permissions in CREATE TABLE.

General Remarks

For details, meet General Remarks in CREATE TABLE.

Limitations and Restrictions

An ordered clustered columnstore index can be created on columns of any data types supported in Azure Synapse Analytics except for string columns.

Set ROWCOUNT (Transact-SQL) has no upshot on CTAS. To achieve a like behavior, use Acme (Transact-SQL).

For details, see Limitations and Restrictions in CREATE Tabular array.

Locking Behavior

For details, run into Locking Behavior in CREATE Table.

Functioning

For a hash-distributed tabular array, you lot can use CTAS to choose a different distribution cavalcade to reach better performance for joins and aggregations. If choosing a different distribution column is non your goal, you will take the all-time CTAS performance if you specify the aforementioned distribution column since this will avoid re-distributing the rows.

If you are using CTAS to create table and functioning is non a factor, you can specify ROUND_ROBIN to avoid having to decide on a distribution cavalcade.

To avert data move in subsequent queries, you can specify REPLICATE at the price of increased storage for loading a total copy of the table on each Compute node.

Examples for copying a table

A. Utilize CTAS to re-create a tabular array

Applies to: Azure Synapse Analytics and Analytics Platform Arrangement (PDW)

Perhaps i of the most common uses of CTAS is creating a re-create of a table then that yous can change the DDL. If for instance you originally created your tabular array as ROUND_ROBIN and now want modify it to a table distributed on a column, CTAS is how you would change the distribution column. CTAS can besides be used to modify partitioning, indexing, or column types.

Let's say you created this tabular array using the default distribution type of ROUND_ROBIN distributed since no distribution column was specified in the CREATE Table.

              CREATE Table FactInternetSales ( 	ProductKey INT NOT Zippo, 	OrderDateKey INT NOT NULL, 	DueDateKey INT Not Nix, 	ShipDateKey INT Non NULL, 	CustomerKey INT Not Zippo, 	PromotionKey INT Not Nix, 	CurrencyKey INT NOT Zip, 	SalesTerritoryKey INT Non NULL, 	SalesOrderNumber NVARCHAR(20) NOT NULL, 	SalesOrderLineNumber TINYINT Not Naught, 	RevisionNumber TINYINT Non NULL, 	OrderQuantity SMALLINT Non NULL, 	UnitPrice MONEY NOT Zip, 	ExtendedAmount Money NOT NULL, 	UnitPriceDiscountPct Bladder NOT Null, 	DiscountAmount Bladder Non NULL, 	ProductStandardCost MONEY NOT Zippo, 	TotalProductCost Coin Non NULL, 	SalesAmount MONEY NOT Zilch, 	TaxAmt MONEY NOT NULL, 	Freight MONEY Non NULL, 	CarrierTrackingNumber NVARCHAR(25), 	CustomerPONumber NVARCHAR(25) );

Now y'all want to create a new copy of this table with a clustered columnstore alphabetize and then that you tin can take advantage of the performance of clustered columnstore tables. Yous also want to distribute this tabular array on ProductKey since you are anticipating joins on this column and want to avoid data movement during joins on ProductKey. Lastly y'all as well want to add partitioning on OrderDateKey then that yous can quickly delete old data by dropping old partitions. Here is the CTAS statement which would copy your one-time table into a new table.

              CREATE Tabular array FactInternetSales_new WITH (     CLUSTERED COLUMNSTORE Alphabetize,     DISTRIBUTION = HASH(ProductKey),     Partitioning     (         OrderDateKey RANGE Right FOR VALUES         (         20000101,20010101,20020101,20030101,20040101,20050101,20060101,20070101,20080101,20090101,         20100101,20110101,20120101,20130101,20140101,20150101,20160101,20170101,20180101,20190101,         20200101,20210101,20220101,20230101,20240101,20250101,20260101,20270101,20280101,20290101         )     ) ) AS SELECT * FROM FactInternetSales;

Finally you lot can rename your tables to swap in your new table and then drop your old table.

              RENAME OBJECT FactInternetSales TO FactInternetSales_old; RENAME OBJECT FactInternetSales_new TO FactInternetSales;  DROP TABLE FactInternetSales_old;

Examples for column options

B. Use CTAS to alter column attributes

Applies to: Azure Synapse Analytics and Analytics Platform Organization (PDW)

This example uses CTAS to alter data types, nullability, and collation for several columns in the DimCustomer2 tabular array.

              -- Original tabular array  CREATE Tabular array [dbo].[DimCustomer2] (       [CustomerKey] INT NOT NULL,       [GeographyKey] INT NULL,       [CustomerAlternateKey] nvarchar(xv) COLLATE SQL_Latin1_General_CP1_CI_AS Non Cipher   )   WITH (Clustered COLUMNSTORE Index, DISTRIBUTION = HASH([CustomerKey]));      -- CTAS instance to change information types, nullability, and column collations   CREATE Table test   WITH (HEAP, DISTRIBUTION = ROUND_ROBIN)   AS   SELECT       CustomerKey As CustomerKeyNoChange,       CustomerKey*1 AS CustomerKeyChangeNullable,       CAST(CustomerKey Every bit DECIMAL(10,ii)) As CustomerKeyChangeDataTypeNullable,       ISNULL(CAST(CustomerKey AS DECIMAL(ten,two)),0) AS CustomerKeyChangeDataTypeNotNullable,       GeographyKey AS GeographyKeyNoChange,       ISNULL(GeographyKey,0) AS GeographyKeyChangeNotNullable,       CustomerAlternateKey Equally CustomerAlternateKeyNoChange,       Example WHEN CustomerAlternateKey = CustomerAlternateKey          THEN CustomerAlternateKey Finish Every bit CustomerAlternateKeyNullable,       CustomerAlternateKey COLLATE Latin1_General_CS_AS_KS_WS AS CustomerAlternateKeyChangeCollation   FROM [dbo].[DimCustomer2]      -- Resulting table  CREATE Table [dbo].[test] (     [CustomerKeyNoChange] INT Not NULL,      [CustomerKeyChangeNullable] INT Cipher,      [CustomerKeyChangeDataTypeNullable] DECIMAL(10, 2) NULL,      [CustomerKeyChangeDataTypeNotNullable] DECIMAL(ten, 2) Not NULL,      [GeographyKeyNoChange] INT Nada,      [GeographyKeyChangeNotNullable] INT Not Null,      [CustomerAlternateKeyNoChange] NVARCHAR(fifteen) COLLATE SQL_Latin1_General_CP1_CI_AS Not Nada,      [CustomerAlternateKeyNullable] NVARCHAR(fifteen) COLLATE SQL_Latin1_General_CP1_CI_AS Goose egg,      [CustomerAlternateKeyChangeCollation] NVARCHAR(15) COLLATE Latin1_General_CS_AS_KS_WS NOT NULL ) WITH (DISTRIBUTION = ROUND_ROBIN);

As a final step, you can employ RENAME (Transact-SQL) to switch the table names. This makes DimCustomer2 be the new table.

              RENAME OBJECT DimCustomer2 TO DimCustomer2_old; RENAME OBJECT test TO DimCustomer2;  DROP Tabular array DimCustomer2_old;

Examples for table distribution

C. Apply CTAS to change the distribution method for a table

Applies to: Azure Synapse Analytics and Analytics Platform Arrangement (PDW)

This unproblematic example shows how to change the distribution method for a table. To show the mechanics of how to do this, it changes a hash-distributed tabular array to round-robin and and so changes the round-robin table back to hash distributed. The concluding table matches the original tabular array.

In most cases you won't need to change a hash-distributed table to a round-robin table. More often, you might need to change a round-robin table to a hash distributed tabular array. For example, you might initially load a new table as round-robin and then later move information technology to a hash-distributed table to get better bring together functioning.

This example uses the AdventureWorksDW sample database. To load the Azure Synapse Analytics version, see Load sample information into Azure Synapse Analytics

              -- DimSalesTerritory is hash-distributed. -- Re-create information technology to a round-robin tabular array. CREATE Tabular array [dbo].[myTable]    WITH      (        CLUSTERED COLUMNSTORE Index,       DISTRIBUTION = ROUND_ROBIN     )   Equally SELECT * FROM [dbo].[DimSalesTerritory];   -- Switch tabular array names  RENAME OBJECT [dbo].[DimSalesTerritory] to [DimSalesTerritory_old]; RENAME OBJECT [dbo].[myTable] TO [DimSalesTerritory];  Driblet Table [dbo].[DimSalesTerritory_old];

Next, alter information technology dorsum to a hash distributed table.

              -- Y'all just made DimSalesTerritory a round-robin table. -- Modify information technology back to the original hash-distributed table.  CREATE Tabular array [dbo].[myTable]    WITH      (        Amassed COLUMNSTORE INDEX,       DISTRIBUTION = HASH(SalesTerritoryKey)    )   As SELECT * FROM [dbo].[DimSalesTerritory];   -- Switch table names  RENAME OBJECT [dbo].[DimSalesTerritory] to [DimSalesTerritory_old]; RENAME OBJECT [dbo].[myTable] TO [DimSalesTerritory];  DROP TABLE [dbo].[DimSalesTerritory_old];

D. Use CTAS to convert a table to a replicated table

Applies to: Azure Synapse Analytics and Analytics Platform System (PDW)

This case applies for converting round-robin or hash-distributed tables to a replicated tabular array. This detail example takes the previous method of changing the distribution type i footstep further. Since DimSalesTerritory is a dimension and likely a smaller table, yous tin can cull to re-create the table as replicated to avert data movement when joining to other tables.

              -- DimSalesTerritory is hash-distributed. -- Copy it to a replicated tabular array. CREATE Table [dbo].[myTable]    WITH      (        CLUSTERED COLUMNSTORE INDEX,       DISTRIBUTION = REPLICATE    )   AS SELECT * FROM [dbo].[DimSalesTerritory];   -- Switch table names  RENAME OBJECT [dbo].[DimSalesTerritory] to [DimSalesTerritory_old]; RENAME OBJECT [dbo].[myTable] TO [DimSalesTerritory];  Drop TABLE [dbo].[DimSalesTerritory_old];

East. Use CTAS to create a table with fewer columns

Applies to: Azure Synapse Analytics and Analytics Platform System (PDW)

The post-obit case creates a circular-robin distributed table named myTable (c, ln). The new table only has ii columns. It uses the column aliases in the SELECT argument for the names of the columns.

              CREATE TABLE myTable   WITH      (        CLUSTERED COLUMNSTORE Index,       DISTRIBUTION = ROUND_ROBIN     )   AS SELECT CustomerKey AS c, LastName AS ln       FROM dimCustomer;

Examples for query hints

F. Utilise a Query Hint with CREATE TABLE Every bit SELECT (CTAS)

Applies to: Azure Synapse Analytics and Analytics Platform Organization (PDW)

This query shows the basic syntax for using a query bring together hint with the CTAS statement. After the query is submitted, Azure Synapse Analytics applies the hash join strategy when it generates the query plan for each individual distribution. For more information on the hash bring together query hint, encounter Option Clause (Transact-SQL).

              CREATE Tabular array dbo.FactInternetSalesNew   WITH      (        Amassed COLUMNSTORE INDEX,       DISTRIBUTION = ROUND_ROBIN      )   AS SELECT T1.* FROM dbo.FactInternetSales T1 JOIN dbo.DimCustomer T2   ON ( T1.CustomerKey = T2.CustomerKey )   Option ( HASH JOIN );

Examples for external tables

One thousand. Employ CTAS to import data from Azure Hulk storage

Applies to: Azure Synapse Analytics and Analytics Platform System (PDW)

To import data from an external table, simply use CREATE Table AS SELECT to select from the external table. The syntax to select information from an external table into Azure Synapse Analytics is the aforementioned as the syntax for selecting data from a regular table.

The following instance defines an external tabular array on data in an Azure blob storage account. It then uses CREATE Tabular array Equally SELECT to select from the external tabular array. This imports the information from Azure blob storage text-delimited files and stores the information into a new Azure Synapse Analytics tabular array.

              --Utilize your own processes to create the text-delimited files on Azure blob storage.   --Create the external tabular array called ClickStream.   CREATE EXTERNAL Table ClickStreamExt (        url VARCHAR(50),       event_date Engagement,       user_IP VARCHAR(l)   )   WITH (       LOCATION='/logs/clickstream/2015/',       DATA_SOURCE = MyAzureStorage,       FILE_FORMAT = TextFileFormat)   ;      --Use CREATE TABLE AS SELECT to import the Azure hulk storage data into a new    --Synapse Analytics tabular array called ClickStreamData   CREATE TABLE ClickStreamData    WITH     (       Clustered COLUMNSTORE Index,       DISTRIBUTION = HASH (user_IP)     )   Equally SELECT * FROM ClickStreamExt   ;

H. Use CTAS to import Hadoop data from an external tabular array

Applies to: Analytics Platform Organisation (PDW)

To import data from an external table, simply use CREATE Tabular array AS SELECT to select from the external table. The syntax to select data from an external table into Analytics Platform System (PDW) is the same equally the syntax for selecting information from a regular table.

The following example defines an external tabular array on a Hadoop cluster. It then uses CREATE TABLE AS SELECT to select from the external table. This imports the data from Hadoop text-delimited files and stores the data into a new Analytics Platform System (PDW) table.

              -- Create the external table called ClickStream.   CREATE EXTERNAL Table ClickStreamExt (        url VARCHAR(l),       event_date DATE,       user_IP VARCHAR(50)   )   WITH (       LOCATION = 'hdfs://MyHadoop:5000/tpch1GB/employee.tbl',       FORMAT_OPTIONS ( FIELD_TERMINATOR = '|')   )   ;      -- Use your ain processes to create the Hadoop text-delimited files  -- on the Hadoop Cluster.      -- Use CREATE TABLE As SELECT to import the Hadoop data into a new  -- table called ClickStreamPDW   CREATE Tabular array ClickStreamPDW    WITH     (       Clustered COLUMNSTORE INDEX,       DISTRIBUTION = HASH (user_IP)     )   AS SELECT * FROM ClickStreamExt   ;

Examples using CTAS to replace SQL Server code

Use CTAS to work effectually some unsupported features. Besides being able to run your lawmaking on the data warehouse, rewriting existing code to employ CTAS will usually amend functioning. This is a result of its fully parallelized design.

Note

Try to think "CTAS starting time". If you recollect you can solve a problem using CTAS then that is generally the best way to approach information technology - even if y'all are writing more than data as a result.

I. Use CTAS instead of SELECT..INTO

Applies to: Azure Synapse Analytics and Analytics Platform System (PDW)

SQL Server code typically uses SELECT..INTO to populate a table with the results of a SELECT statement. This is an example of a SQL Server SELECT..INTO argument.

              SELECT * INTO    #tmp_fct FROM    [dbo].[FactInternetSales]

This syntax is not supported in Azure Synapse Analytics and Parallel Data Warehouse. This example shows how to rewrite the previous SELECT..INTO argument as a CTAS statement. You lot can choose any of the DISTRIBUTION options described in the CTAS syntax. This example uses the ROUND_ROBIN distribution method.

              CREATE Table #tmp_fct WITH (     DISTRIBUTION = ROUND_ROBIN ) AS SELECT  * FROM    [dbo].[FactInternetSales] ;

J. Utilize CTAS and implicit joins to replace ANSI joins in the `FROM` clause of an `UPDATE` statement

Applies to: Azure Synapse Analytics and Analytics Platform Arrangement (PDW)

Yous may find yous accept a complex update that joins more two tables together using ANSI joining syntax to perform the UPDATE or DELETE.

Imagine you had to update this tabular array:

              CREATE TABLE [dbo].[AnnualCategorySales] (	[EnglishProductCategoryName]	NVARCHAR(50)	NOT NULL ,	[CalendarYear]			SMALLINT	NOT NULL ,	[TotalSalesAmount]		MONEY		Non Goose egg ) WITH ( 	DISTRIBUTION = ROUND_ROBIN ) ;

The original query might take looked something like this:

              UPDATE	acs Ready		[TotalSalesAmount] = [fis].[TotalSalesAmount] FROM	[dbo].[AnnualCategorySales] 	AS acs JOIN	( 		SELECT	[EnglishProductCategoryName] 		,		[CalendarYear] 		,		SUM([SalesAmount])				As [TotalSalesAmount] 		FROM	[dbo].[FactInternetSales]		AS due south 		Join	[dbo].[DimDate]					As d	ON s.[OrderDateKey]				= d.[DateKey] 		JOIN	[dbo].[DimProduct]				AS p	ON s.[ProductKey]				= p.[ProductKey] 		JOIN	[dbo].[DimProductSubCategory]	AS u	ON p.[ProductSubcategoryKey]	= u.[ProductSubcategoryKey] 		Bring together	[dbo].[DimProductCategory]		Every bit c	ON u.[ProductCategoryKey]		= c.[ProductCategoryKey] 		WHERE 	[CalendarYear] = 2004 		GROUP BY 				[EnglishProductCategoryName] 		,		[CalendarYear] 		) As fis ON	[acs].[EnglishProductCategoryName]	= [fis].[EnglishProductCategoryName] AND	[acs].[CalendarYear]				= [fis].[CalendarYear] ;

Since Azure Synapse Analytics does not back up ANSI joins in the FROM clause of an UPDATE statement, you cannot employ this SQL Server code over without changing it slightly.

Y'all can utilise a combination of a CTAS and an implicit join to supersede this code:

              -- Create an acting table CREATE TABLE CTAS_acs WITH (DISTRIBUTION = ROUND_ROBIN) Every bit SELECT	ISNULL(CAST([EnglishProductCategoryName] Every bit NVARCHAR(50)),0)	As [EnglishProductCategoryName] ,		ISNULL(CAST([CalendarYear] Equally SMALLINT),0) 						Equally [CalendarYear] ,		ISNULL(Bandage(SUM([SalesAmount]) As Coin),0)						Every bit [TotalSalesAmount] FROM	[dbo].[FactInternetSales]		Equally s JOIN	[dbo].[DimDate]					Every bit d	ON s.[OrderDateKey]				= d.[DateKey] JOIN	[dbo].[DimProduct]				Every bit p	ON s.[ProductKey]				= p.[ProductKey] Join	[dbo].[DimProductSubCategory]	AS u	ON p.[ProductSubcategoryKey]	= u.[ProductSubcategoryKey] Join	[dbo].[DimProductCategory]		AS c	ON u.[ProductCategoryKey]		= c.[ProductCategoryKey] WHERE 	[CalendarYear] = 2004 Group BY 		[EnglishProductCategoryName] ,		[CalendarYear] ;  -- Use an implicit join to perform the update UPDATE  AnnualCategorySales Prepare     AnnualCategorySales.TotalSalesAmount = CTAS_ACS.TotalSalesAmount FROM    CTAS_acs WHERE   CTAS_acs.[EnglishProductCategoryName] = AnnualCategorySales.[EnglishProductCategoryName] AND     CTAS_acs.[CalendarYear]               = AnnualCategorySales.[CalendarYear] ;  --Drib the acting tabular array DROP TABLE CTAS_acs ;

K. Use CTAS to specify which information to keep instead of using ANSI joins in the FROM clause of a DELETE statement

Applies to: Azure Synapse Analytics and Analytics Platform System (PDW)

Sometimes the best approach for deleting data is to use CTAS. Rather than deleting the information simply select the data you desire to continue. This specially truthful for DELETE statements that use ansi joining syntax since Azure Synapse Analytics does non support ANSI joins in the FROM clause of a DELETE statement.

An example of a converted DELETE statement is available beneath:

              CREATE Table dbo.DimProduct_upsert WITH (   Distribution=HASH(ProductKey) ,   CLUSTERED INDEX (ProductKey) ) AS -- Select Data you wish to keep SELECT     p.ProductKey ,          p.EnglishProductName ,          p.Color FROM       dbo.DimProduct p Correct Bring together dbo.stg_DimProduct south ON         p.ProductKey = southward.ProductKey ;  RENAME OBJECT dbo.DimProduct        TO DimProduct_old; RENAME OBJECT dbo.DimProduct_upsert TO DimProduct;

L. Use CTAS to simplify merge statements

Applies to: Azure Synapse Analytics and Analytics Platform System (PDW)

Merge statements can exist replaced, at least in part, by using CTAS. You tin can consolidate the INSERT and the UPDATE into a single statement. Any deleted records would need to be closed off in a second statement.

An case of an UPSERT is available below:

              CREATE TABLE dbo.[DimProduct_upsert] WITH (   DISTRIBUTION = HASH([ProductKey]) ,   CLUSTERED Alphabetize ([ProductKey]) ) Every bit -- New rows and new versions of rows SELECT      s.[ProductKey] ,           s.[EnglishProductName] ,           s.[Color] FROM      dbo.[stg_DimProduct] AS s Wedlock ALL   -- Keep rows that are non being touched SELECT      p.[ProductKey] ,           p.[EnglishProductName] ,           p.[Color] FROM      dbo.[DimProduct] AS p WHERE NOT EXISTS (   SELECT  *     FROM    [dbo].[stg_DimProduct] s     WHERE   s.[ProductKey] = p.[ProductKey] ) ;  RENAME OBJECT dbo.[DimProduct]          TO [DimProduct_old]; RENAME OBJECT dbo.[DimProduct_upsert]  TO [DimProduct];

K. Explicitly country data type and nullability of output

Applies to: Azure Synapse Analytics and Analytics Platform System (PDW)

When migrating SQL Server code to Azure Synapse Analytics, you might find you run across this blazon of coding pattern:

              DECLARE @d DECIMAL(7,ii) = 85.455 ,       @f Bladder(24)    = 85.455  CREATE Table result (result DECIMAL(7,2) Not NULL ) WITH (DISTRIBUTION = ROUND_ROBIN)  INSERT INTO upshot SELECT @d*@f ;

Instinctively you might think you should migrate this lawmaking to a CTAS and you would exist correct. However, at that place is a hidden issue here.

The following lawmaking does Non yield the aforementioned result:

              DECLARE @d DECIMAL(vii,2) = 85.455 ,       @f FLOAT(24)    = 85.455 ;  CREATE Table ctas_r WITH (DISTRIBUTION = ROUND_ROBIN) As SELECT @d*@f as result ;

Notice that the cavalcade "result" carries forward the information type and nullability values of the expression. This tin lead to subtle variances in values if y'all aren't careful.

Try the following equally an instance:

              SELECT upshot,outcome*@d from outcome ;  SELECT result,result*@d from ctas_r ;

The value stored for result is different. As the persisted value in the event column is used in other expressions the error becomes fifty-fifty more significant.

CREATE TABLE AS SELECT results

This is especially important for data migrations. Even though the 2nd query is arguably more accurate there is a problem. The data would exist different compared to the source organization and that leads to questions of integrity in the migration. This is 1 of those rare cases where the "incorrect" respond is actually the right one!

The reason we see this disparity betwixt the two results is down to implicit type casting. In the first instance the table defines the cavalcade definition. When the row is inserted an implicit type conversion occurs. In the second example there is no implicit blazon conversion as the expression defines data type of the cavalcade. Find besides that the column in the second example has been defined as a NULLable column whereas in the first case it has not. When the tabular array was created in the first example column nullability was explicitly defined. In the 2d example it was just left to the expression and by default this would result in a NULL definition.

To resolve these issues you must explicitly set up the type conversion and nullability in the SELECT portion of the CTAS argument. You cannot set these properties in the create tabular array part.

The example below demonstrates how to fix the lawmaking:

              DECLARE @d DECIMAL(7,2) = 85.455 ,       @f FLOAT(24)    = 85.455  CREATE Tabular array ctas_r WITH (DISTRIBUTION = ROUND_ROBIN) Equally SELECT ISNULL(CAST(@d*@f As DECIMAL(seven,2)),0) every bit event

Note the following:

Bandage or CONVERT could have been used
ISNULL is used to force NULLability not COALESCE
ISNULL is the outermost function
The second part of the ISNULL is a constant i.e. 0

Note

For the nullability to be correctly set information technology is vital to employ ISNULL and not Coagulate. Coalesce is not a deterministic role and and then the event of the expression will always exist NULLable. ISNULL is different. It is deterministic. Therefore when the second role of the ISNULL function is a constant or a literal and then the resulting value volition be Non NULL.

This tip is not just useful for ensuring the integrity of your calculations. Information technology is also important for table partition switching. Imagine yous take this table divers as your fact:

              CREATE TABLE [dbo].[Sales] (     [date]      INT     Non NULL ,   [production]   INT     Non Nada ,   [shop]     INT     NOT Naught ,   [quantity]  INT     Not NULL ,   [price]     Coin   Non Naught ,   [amount]    Money   Not NULL ) WITH (   DISTRIBUTION = HASH([production]) ,   PARTITION   (   [date] RANGE Right FOR VALUES                     (20000101,20010101,20020101                     ,20030101,20040101,20050101                     )                 ) ) ;

Yet, the value field is a calculated expression information technology is not part of the source information.

To create your partitioned dataset you lot might want to do this:

              CREATE TABLE [dbo].[Sales_in] WITH     (   DISTRIBUTION = HASH([production]) ,   PARTITION   (   [date] RANGE RIGHT FOR VALUES                     (20000101,20010101                     )                 ) ) AS SELECT     [date]     ,   [product] ,   [store] ,   [quantity] ,   [cost]    ,   [quantity]*[price]  As [amount] FROM [stg].[source] OPTION (LABEL = 'CTAS : Partition IN table : Create') ;

The query would run perfectly fine. The trouble comes when y'all endeavour to perform the partition switch. The table definitions practise not match. To make the table definitions match the CTAS needs to be modified.

              CREATE Table [dbo].[Sales_in] WITH     (   DISTRIBUTION = HASH([product]) ,   PARTITION   (   [date] RANGE RIGHT FOR VALUES                     (20000101,20010101                     )                 ) ) AS SELECT     [appointment]     ,   [product] ,   [store] ,   [quantity] ,   [price]    ,   ISNULL(CAST([quantity]*[price] Every bit MONEY),0) Equally [amount] FROM [stg].[source] OPTION (Characterization = 'CTAS : Partition IN table : Create');

You tin can see therefore that blazon consistency and maintaining nullability backdrop on a CTAS is a skilful engineering best exercise. It helps to maintain integrity in your calculations and also ensures that partition switching is possible.

N. Create an ordered clustered columnstore index with MAXDOP ane

              CREATE Table Table1 WITH (DISTRIBUTION = HASH(c1), Clustered COLUMNSTORE Index Order(c1) ) Every bit SELECT * FROM ExampleTable OPTION (MAXDOP 1);

Doc Watson Round the Table Again

CREATE Tabular array AS SELECT (Azure Synapse Analytics)

Is this page helpful?

Syntax

Arguments

Column options

Table distribution options

Table partition options

Select statement

Query hint

Permissions

General Remarks

Limitations and Restrictions

Locking Behavior

Functioning

Examples for copying a table

A. Utilize CTAS to re-create a tabular array

Examples for column options

B. Use CTAS to alter column attributes

Examples for table distribution

C. Apply CTAS to change the distribution method for a table

D. Use CTAS to convert a table to a replicated table

East. Use CTAS to create a table with fewer columns

Examples for query hints

F. Utilise a Query Hint with CREATE TABLE Every bit SELECT (CTAS)

Examples for external tables

One thousand. Employ CTAS to import data from Azure Hulk storage

H. Use CTAS to import Hadoop data from an external tabular array

Examples using CTAS to replace SQL Server code

I. Use CTAS instead of SELECT..INTO

J. Utilize CTAS and implicit joins to replace ANSI joins in the FROM clause of an UPDATE statement

K. Use CTAS to specify which information to keep instead of using ANSI joins in the FROM clause of a DELETE statement

L. Use CTAS to simplify merge statements

K. Explicitly country data type and nullability of output

N. Create an ordered clustered columnstore index with MAXDOP ane

See Also

0 Response to "Doc Watson Round the Table Again"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel

J. Utilize CTAS and implicit joins to replace ANSI joins in the `FROM` clause of an `UPDATE` statement