Single-row INSERT…SELECT much slower than separate SELECTdeteriorating stored procedure running...
RS485 using USART or UART port on STM32
why typing a variable (or expression) prints the value to stdout?
Boss asked me to sign a resignation paper without a date on it along with my new contract
Charging phone battery with a lower voltage, coming from a bike charger?
Equivalent of "illegal" for violating civil law
What does an unprocessed RAW file look like?
Illustrator to chemdraw
Is it possible to rotate the Isolines on a Surface Using `MeshFunction`?
Why is it that Bernie Sanders is always called a "socialist"?
Is the fingering of thirds flexible or do I have to follow the rules?
Besides PR credit, does diversity provide anything that meritocracy does not?
Crack the bank account's password!
Why didn't Tom Riddle take the presence of Fawkes and the Sorting Hat as more of a threat?
Context html export bibliography
What does からか mean?
How to politely refuse in-office gym instructor for steroids and protein
Is it really OK to use "because of"?
Is Screenshot Time-tracking Common?
How can I prevent an oracle who can see into the past from knowing everything that has happened?
Single-row INSERT...SELECT much slower than separate SELECT
Rigorous justification for non-relativistic QM perturbation theory assumptions?
What is a good way to explain how a character can produce flames from their body?
How to fly a direct entry holding pattern when approaching from an awkward angle?
Is there a non trivial covering of the Klein bottle by the Klein bottle
Single-row INSERT…SELECT much slower than separate SELECT
deteriorating stored procedure running timesRunning query against DMVs for Query Stats and Execution Plans joined to sys.databasesMerge Join Performance TuningQuery over SSRS/RPC:Completed a lot slower than SSMSSQL Server 2012: An older Index that used to help a SP overnightSubquery in FROM slower than two separate queriesDoes SQL Server Management Studio 2017 Contain a New Explain FeatureOptimizing table valued function SQL ServerWhy does elimination of an if-statement cause notable speed up in my function?Why does a DELETE query run in one format much longer than in another?
Given the following heap table with 400 rows numbered from 1 to 400:
DROP TABLE IF EXISTS dbo.N;
GO
SELECT
SV.number
INTO dbo.N
FROM master.dbo.spt_values AS SV
WHERE
SV.[type] = N'P'
AND SV.number BETWEEN 1 AND 400;
and the following settings:
SET NOCOUNT ON;
SET STATISTICS IO, TIME OFF;
SET STATISTICS XML OFF;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
The following SELECT statement completes in around 6 seconds (demo, plan):
DECLARE @n integer = 400;
SELECT
c = COUNT_BIG(*)
FROM dbo.N AS N
CROSS JOIN dbo.N AS N2
CROSS JOIN dbo.N AS N3
WHERE
N.number <= @n
AND N2.number <= @n
AND N3.number <= @n
OPTION
(OPTIMIZE FOR (@n = 1));
When the single-row output is written to a table, it takes 19 seconds (demo, plan):
DECLARE @T table (c bigint NOT NULL);
DECLARE @n integer = 400;
INSERT @T
(c)
SELECT
c = COUNT_BIG(*)
FROM dbo.N AS N
CROSS JOIN dbo.N AS N2
CROSS JOIN dbo.N AS N3
WHERE
N.number <= @n
AND N2.number <= @n
AND N3.number <= @n
OPTION
(OPTIMIZE FOR (@n = 1));
The execution plans appear identical aside from the insert of one row.
All the extra time seems to be consumed by CPU usage.
Why is the INSERT statement so much slower?
sql-server query-performance execution-plan
add a comment |
Given the following heap table with 400 rows numbered from 1 to 400:
DROP TABLE IF EXISTS dbo.N;
GO
SELECT
SV.number
INTO dbo.N
FROM master.dbo.spt_values AS SV
WHERE
SV.[type] = N'P'
AND SV.number BETWEEN 1 AND 400;
and the following settings:
SET NOCOUNT ON;
SET STATISTICS IO, TIME OFF;
SET STATISTICS XML OFF;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
The following SELECT statement completes in around 6 seconds (demo, plan):
DECLARE @n integer = 400;
SELECT
c = COUNT_BIG(*)
FROM dbo.N AS N
CROSS JOIN dbo.N AS N2
CROSS JOIN dbo.N AS N3
WHERE
N.number <= @n
AND N2.number <= @n
AND N3.number <= @n
OPTION
(OPTIMIZE FOR (@n = 1));
When the single-row output is written to a table, it takes 19 seconds (demo, plan):
DECLARE @T table (c bigint NOT NULL);
DECLARE @n integer = 400;
INSERT @T
(c)
SELECT
c = COUNT_BIG(*)
FROM dbo.N AS N
CROSS JOIN dbo.N AS N2
CROSS JOIN dbo.N AS N3
WHERE
N.number <= @n
AND N2.number <= @n
AND N3.number <= @n
OPTION
(OPTIMIZE FOR (@n = 1));
The execution plans appear identical aside from the insert of one row.
All the extra time seems to be consumed by CPU usage.
Why is the INSERT statement so much slower?
sql-server query-performance execution-plan
add a comment |
Given the following heap table with 400 rows numbered from 1 to 400:
DROP TABLE IF EXISTS dbo.N;
GO
SELECT
SV.number
INTO dbo.N
FROM master.dbo.spt_values AS SV
WHERE
SV.[type] = N'P'
AND SV.number BETWEEN 1 AND 400;
and the following settings:
SET NOCOUNT ON;
SET STATISTICS IO, TIME OFF;
SET STATISTICS XML OFF;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
The following SELECT statement completes in around 6 seconds (demo, plan):
DECLARE @n integer = 400;
SELECT
c = COUNT_BIG(*)
FROM dbo.N AS N
CROSS JOIN dbo.N AS N2
CROSS JOIN dbo.N AS N3
WHERE
N.number <= @n
AND N2.number <= @n
AND N3.number <= @n
OPTION
(OPTIMIZE FOR (@n = 1));
When the single-row output is written to a table, it takes 19 seconds (demo, plan):
DECLARE @T table (c bigint NOT NULL);
DECLARE @n integer = 400;
INSERT @T
(c)
SELECT
c = COUNT_BIG(*)
FROM dbo.N AS N
CROSS JOIN dbo.N AS N2
CROSS JOIN dbo.N AS N3
WHERE
N.number <= @n
AND N2.number <= @n
AND N3.number <= @n
OPTION
(OPTIMIZE FOR (@n = 1));
The execution plans appear identical aside from the insert of one row.
All the extra time seems to be consumed by CPU usage.
Why is the INSERT statement so much slower?
sql-server query-performance execution-plan
Given the following heap table with 400 rows numbered from 1 to 400:
DROP TABLE IF EXISTS dbo.N;
GO
SELECT
SV.number
INTO dbo.N
FROM master.dbo.spt_values AS SV
WHERE
SV.[type] = N'P'
AND SV.number BETWEEN 1 AND 400;
and the following settings:
SET NOCOUNT ON;
SET STATISTICS IO, TIME OFF;
SET STATISTICS XML OFF;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
The following SELECT statement completes in around 6 seconds (demo, plan):
DECLARE @n integer = 400;
SELECT
c = COUNT_BIG(*)
FROM dbo.N AS N
CROSS JOIN dbo.N AS N2
CROSS JOIN dbo.N AS N3
WHERE
N.number <= @n
AND N2.number <= @n
AND N3.number <= @n
OPTION
(OPTIMIZE FOR (@n = 1));
When the single-row output is written to a table, it takes 19 seconds (demo, plan):
DECLARE @T table (c bigint NOT NULL);
DECLARE @n integer = 400;
INSERT @T
(c)
SELECT
c = COUNT_BIG(*)
FROM dbo.N AS N
CROSS JOIN dbo.N AS N2
CROSS JOIN dbo.N AS N3
WHERE
N.number <= @n
AND N2.number <= @n
AND N3.number <= @n
OPTION
(OPTIMIZE FOR (@n = 1));
The execution plans appear identical aside from the insert of one row.
All the extra time seems to be consumed by CPU usage.
Why is the INSERT statement so much slower?
sql-server query-performance execution-plan
sql-server query-performance execution-plan
asked 1 hour ago
Paul White♦Paul White
52.3k14279452
52.3k14279452
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
SQL Server chooses to scan the heap tables on the inner side of the loops joins using row-level locks. A full scan would normally choose page-level locking, but a combination of the size of the table and the predicate means the storage engine chooses row locks, since that appears to be the cheapest strategy.
The cardinality misestimation deliberately introduced by the OPTIMIZE FOR means that the heaps are scanned many more times than the optimizer expects, and it does not introduce a spool as it normally would.
This combination of factors means that performance is very sensitive to the number of locks required at runtime.
The SELECT statement benefits from an optimization that allows row-level shared locks to be skipped (taking only intent-shared page-level locks) when there is no danger of reading uncommitted data, and there is no off-row data.
The INSERT...SELECT statement does not benefit from this optimization, so millions of RID locks are taken and released each second in the second case, along with the intent-shared page-level locks.
The enormous amount of locking activity accounts for the extra CPU and elapsed time.
The most natural workaround is to ensure the optimizer (and storage engine) get decent cardinality estimates so they can make good choices.
If that is not practical in the real use case, the INSERT and SELECT statements could be separated, with the result of the SELECT held in a variable. This will allow the SELECT statement to benefit from the lock-skipping optimization.
Changing the isolation level can also be made to work, either by not taking shared locks, or by ensuring that lock escalation takes places quickly.
As a final point of interest, the query can be made to run even faster than the optimized SELECT case by forcing the use of spools using undocumented trace flag 8691.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f230651%2fsingle-row-insert-select-much-slower-than-separate-select%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
SQL Server chooses to scan the heap tables on the inner side of the loops joins using row-level locks. A full scan would normally choose page-level locking, but a combination of the size of the table and the predicate means the storage engine chooses row locks, since that appears to be the cheapest strategy.
The cardinality misestimation deliberately introduced by the OPTIMIZE FOR means that the heaps are scanned many more times than the optimizer expects, and it does not introduce a spool as it normally would.
This combination of factors means that performance is very sensitive to the number of locks required at runtime.
The SELECT statement benefits from an optimization that allows row-level shared locks to be skipped (taking only intent-shared page-level locks) when there is no danger of reading uncommitted data, and there is no off-row data.
The INSERT...SELECT statement does not benefit from this optimization, so millions of RID locks are taken and released each second in the second case, along with the intent-shared page-level locks.
The enormous amount of locking activity accounts for the extra CPU and elapsed time.
The most natural workaround is to ensure the optimizer (and storage engine) get decent cardinality estimates so they can make good choices.
If that is not practical in the real use case, the INSERT and SELECT statements could be separated, with the result of the SELECT held in a variable. This will allow the SELECT statement to benefit from the lock-skipping optimization.
Changing the isolation level can also be made to work, either by not taking shared locks, or by ensuring that lock escalation takes places quickly.
As a final point of interest, the query can be made to run even faster than the optimized SELECT case by forcing the use of spools using undocumented trace flag 8691.
add a comment |
SQL Server chooses to scan the heap tables on the inner side of the loops joins using row-level locks. A full scan would normally choose page-level locking, but a combination of the size of the table and the predicate means the storage engine chooses row locks, since that appears to be the cheapest strategy.
The cardinality misestimation deliberately introduced by the OPTIMIZE FOR means that the heaps are scanned many more times than the optimizer expects, and it does not introduce a spool as it normally would.
This combination of factors means that performance is very sensitive to the number of locks required at runtime.
The SELECT statement benefits from an optimization that allows row-level shared locks to be skipped (taking only intent-shared page-level locks) when there is no danger of reading uncommitted data, and there is no off-row data.
The INSERT...SELECT statement does not benefit from this optimization, so millions of RID locks are taken and released each second in the second case, along with the intent-shared page-level locks.
The enormous amount of locking activity accounts for the extra CPU and elapsed time.
The most natural workaround is to ensure the optimizer (and storage engine) get decent cardinality estimates so they can make good choices.
If that is not practical in the real use case, the INSERT and SELECT statements could be separated, with the result of the SELECT held in a variable. This will allow the SELECT statement to benefit from the lock-skipping optimization.
Changing the isolation level can also be made to work, either by not taking shared locks, or by ensuring that lock escalation takes places quickly.
As a final point of interest, the query can be made to run even faster than the optimized SELECT case by forcing the use of spools using undocumented trace flag 8691.
add a comment |
SQL Server chooses to scan the heap tables on the inner side of the loops joins using row-level locks. A full scan would normally choose page-level locking, but a combination of the size of the table and the predicate means the storage engine chooses row locks, since that appears to be the cheapest strategy.
The cardinality misestimation deliberately introduced by the OPTIMIZE FOR means that the heaps are scanned many more times than the optimizer expects, and it does not introduce a spool as it normally would.
This combination of factors means that performance is very sensitive to the number of locks required at runtime.
The SELECT statement benefits from an optimization that allows row-level shared locks to be skipped (taking only intent-shared page-level locks) when there is no danger of reading uncommitted data, and there is no off-row data.
The INSERT...SELECT statement does not benefit from this optimization, so millions of RID locks are taken and released each second in the second case, along with the intent-shared page-level locks.
The enormous amount of locking activity accounts for the extra CPU and elapsed time.
The most natural workaround is to ensure the optimizer (and storage engine) get decent cardinality estimates so they can make good choices.
If that is not practical in the real use case, the INSERT and SELECT statements could be separated, with the result of the SELECT held in a variable. This will allow the SELECT statement to benefit from the lock-skipping optimization.
Changing the isolation level can also be made to work, either by not taking shared locks, or by ensuring that lock escalation takes places quickly.
As a final point of interest, the query can be made to run even faster than the optimized SELECT case by forcing the use of spools using undocumented trace flag 8691.
SQL Server chooses to scan the heap tables on the inner side of the loops joins using row-level locks. A full scan would normally choose page-level locking, but a combination of the size of the table and the predicate means the storage engine chooses row locks, since that appears to be the cheapest strategy.
The cardinality misestimation deliberately introduced by the OPTIMIZE FOR means that the heaps are scanned many more times than the optimizer expects, and it does not introduce a spool as it normally would.
This combination of factors means that performance is very sensitive to the number of locks required at runtime.
The SELECT statement benefits from an optimization that allows row-level shared locks to be skipped (taking only intent-shared page-level locks) when there is no danger of reading uncommitted data, and there is no off-row data.
The INSERT...SELECT statement does not benefit from this optimization, so millions of RID locks are taken and released each second in the second case, along with the intent-shared page-level locks.
The enormous amount of locking activity accounts for the extra CPU and elapsed time.
The most natural workaround is to ensure the optimizer (and storage engine) get decent cardinality estimates so they can make good choices.
If that is not practical in the real use case, the INSERT and SELECT statements could be separated, with the result of the SELECT held in a variable. This will allow the SELECT statement to benefit from the lock-skipping optimization.
Changing the isolation level can also be made to work, either by not taking shared locks, or by ensuring that lock escalation takes places quickly.
As a final point of interest, the query can be made to run even faster than the optimized SELECT case by forcing the use of spools using undocumented trace flag 8691.
answered 1 hour ago
Paul White♦Paul White
52.3k14279452
52.3k14279452
add a comment |
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f230651%2fsingle-row-insert-select-much-slower-than-separate-select%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown