I have 3 possible answers.
The sql fiddle for your question is here: http://sqlfiddle.com/#!3/7c7a0/3/0
The sql fiddle for my answer is here: http://sqlfiddle.com/#!3/5d257/1
Warnings:
- Query Analyzer is not enough – I notice a number of answers were rejected because their query plans are more expensive than the original query. The Analyzer is only guide. Depending on the actual data set, hardware, and use case, more expensive queries can return results faster than less expensive ones. You have to test in your environment.
- Query Analyzer is ineffective – even if you find a way to remove the “most expensive step” from a query, it often makes no difference to your query.
- Query changes alone rarely mitigate schema/design issues – Some answers were rejected because they involve schema level changes such as triggers and additional tables. Complex queries that resist optimization are a strong sign that problem is with the underlying design or with my expectations. You may not like it, but you might have to accept that the problem is not solvable at the query level.
- Indexed view cannot contain row_number()/partitition clause – Working around the self-join problem by creating six copies of objects table is not enough to allow you to create the indexed view you’ve suggested. I tried it in this sqlfiddle. If you uncomment that last “create index” statement, you’ll get an error because your view “contains a ranking or aggregate window function.”
Working Answers:
- Left Join instead of row_number() – You can use a query that uses left joins to exclude results that have been overriden lower in the tree. Removing the finally “order by” from this query actually removes the sort that has been plaguing you! The execution plan for this query is still more expensive than your original, but see Disclaimer #1 above.
- Indexed view for part of the query – Using some serious query magic (based on this technique), I created an indexed view for part of the query. This view can be used to enhance the original question query or answer #1.
- Actualize into a well indexed table – Someone else suggested this answer, but they might not have explained it well. Unless your result set is very large or you are doing very frequent updates to the source tables, actualizing the results of a query and using a trigger to keep them up-to-date is a perfectly fine way to work around this kind of issue. Once you create a view for your query, it is easy enough to test this option. You can reuse answer #2 to speed up the trigger, and then further improve it over time. (You are talking about creating six copies of your tables, try this first. It guarantees that the performance for the select you care about will be as good as possible.)
Here’s is the schema part of my answer from sqlfiddle:
Create Table Objects
(
Id int not null identity primary key,
LeftIndex int not null default 0,
RightIndex int not null default 0
)
alter table Objects add ParentId int null references Objects
CREATE TABLE TP
(
Object1 int not null references Objects,
Object2 int not null references Objects,
Object3 int not null references Objects,
Property varchar(20) not null,
Value varchar(50) not null
)
insert into Objects(LeftIndex, RightIndex) values(1, 10)
insert into Objects(ParentId, LeftIndex, RightIndex) values(1, 2, 5)
insert into Objects(ParentId, LeftIndex, RightIndex) values(1, 6, 9)
insert into Objects(ParentId, LeftIndex, RightIndex) values(2, 3, 4)
insert into Objects(ParentId, LeftIndex, RightIndex) values(3, 7, 8)
insert into TP(Object1, Object2, Object3, Property, Value) values(1,2,3, 'P1', 'abc')
insert into TP(Object1, Object2, Object3, Property, Value) values(1,2,3, 'P2', 'xyz')
insert into TP(Object1, Object2, Object3, Property, Value) values(1,3,4, 'P1', '123')
insert into TP(Object1, Object2, Object3, Property, Value) values(2,4,5, 'P1', '098')
create index ix_LeftIndex on Objects(LeftIndex)
create index ix_RightIndex on Objects(RightIndex)
create index ix_Objects on TP(Property, Value, Object1, Object2, Object3)
create index ix_Prop on TP(Property)
GO
---------- QUESTION ADDITIONAL SCHEMA --------
CREATE VIEW TPResultView AS
Select O1, O2, O3, Property, Value
FROM
(
select Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value,
row_number() over(
partition by Children1.Id, Children2.Id, Children3.Id, tp.Property
order by Objects1.LeftIndex desc, Objects2.LeftIndex desc, Objects3.LeftIndex desc
)
as Idx
from tp
-- Select corresponding objects of the triple
inner join Objects as Objects1 on Objects1.Id = tp.Object1
inner join Objects as Objects2 on Objects2.Id = tp.Object2
inner join Objects as Objects3 on Objects3.Id = tp.Object3
-- Then add all possible children of all those objects
inner join Objects as Children1 on Children1.LeftIndex between Objects1.LeftIndex and Objects1.RightIndex
inner join Objects as Children2 on Children2.LeftIndex between Objects2.LeftIndex and Objects2.RightIndex
inner join Objects as Children3 on Children3.LeftIndex between Objects3.LeftIndex and Objects3.RightIndex
) as x
WHERE idx = 1
GO
---------- ANSWER 1 SCHEMA --------
CREATE VIEW TPIntermediate AS
select tp.Property, tp.Value
, Children1.Id as O1, Children2.Id as O2, Children3.Id as O3
, Objects1.LeftIndex as PL1, Objects2.LeftIndex as PL2, Objects3.LeftIndex as PL3
, Children1.LeftIndex as CL1, Children2.LeftIndex as CL2, Children3.LeftIndex as CL3
from tp
-- Select corresponding objects of the triple
inner join Objects as Objects1 on Objects1.Id = tp.Object1
inner join Objects as Objects2 on Objects2.Id = tp.Object2
inner join Objects as Objects3 on Objects3.Id = tp.Object3
-- Then add all possible children of all those objects
inner join Objects as Children1 WITH (INDEX(ix_LeftIndex)) on Children1.LeftIndex between Objects1.LeftIndex and Objects1.RightIndex
inner join Objects as Children2 WITH (INDEX(ix_LeftIndex)) on Children2.LeftIndex between Objects2.LeftIndex and Objects2.RightIndex
inner join Objects as Children3 WITH (INDEX(ix_LeftIndex)) on Children3.LeftIndex between Objects3.LeftIndex and Objects3.RightIndex
GO
---------- ANSWER 2 SCHEMA --------
-- Partial calculation using an indexed view
-- Circumvented the self-join limitation using a black magic technique, based on
-- http://jmkehayias.blogspot.com/2008/12/creating-indexed-view-with-self-join.html
CREATE TABLE dbo.multiplier (i INT PRIMARY KEY)
INSERT INTO dbo.multiplier VALUES (1)
INSERT INTO dbo.multiplier VALUES (2)
INSERT INTO dbo.multiplier VALUES (3)
GO
CREATE VIEW TPIndexed
WITH SCHEMABINDING
AS
SELECT tp.Object1, tp.object2, tp.object3, tp.property, tp.value,
SUM(ISNULL(CASE M.i WHEN 1 THEN Objects.LeftIndex ELSE NULL END, 0)) as PL1,
SUM(ISNULL(CASE M.i WHEN 2 THEN Objects.LeftIndex ELSE NULL END, 0)) as PL2,
SUM(ISNULL(CASE M.i WHEN 3 THEN Objects.LeftIndex ELSE NULL END, 0)) as PL3,
SUM(ISNULL(CASE M.i WHEN 1 THEN Objects.RightIndex ELSE NULL END, 0)) as PR1,
SUM(ISNULL(CASE M.i WHEN 2 THEN Objects.RightIndex ELSE NULL END, 0)) as PR2,
SUM(ISNULL(CASE M.i WHEN 3 THEN Objects.RightIndex ELSE NULL END, 0)) as PR3,
COUNT_BIG(*) as ID
FROM dbo.tp
cross join dbo.multiplier M
inner join dbo.Objects
on (M.i = 1 AND Objects.Id = tp.Object1)
or (M.i = 2 AND Objects.Id = tp.Object2)
or (M.i = 3 AND Objects.Id = tp.Object3)
GROUP BY tp.Object1, tp.object2, tp.object3, tp.property, tp.value
GO
-- This index is mostly useless but required
create UNIQUE CLUSTERED index pk_TPIndexed on dbo.TPIndexed(property, value, object1, object2, object3)
-- Once we have the clustered index, we can create a nonclustered that actually addresses our needs
create NONCLUSTERED index ix_TPIndexed on dbo.TPIndexed(property, value, PL1, PL2, PL3, PR1, PR2, PR3)
GO
-- NOTE: this View is not indexed, but is uses the indexed view
CREATE VIEW TPIndexedResultView AS
Select O1, O2, O3, Property, Value
FROM
(
select Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value,
row_number() over(
partition by tp.Property, Children1.Id, Children2.Id, Children3.Id
order by tp.Property, Tp.PL1 desc, Tp.PL2 desc, Tp.PL3 desc
)
as Idx
from TPIndexed as TP WITH (NOEXPAND)
-- Then add all possible children of all those objects
inner join Objects as Children1 WITH (INDEX(ix_LeftIndex)) on Children1.LeftIndex between TP.PL1 and TP.PR1
inner join Objects as Children2 WITH (INDEX(ix_LeftIndex)) on Children2.LeftIndex between TP.PL2 and TP.PR2
inner join Objects as Children3 WITH (INDEX(ix_LeftIndex)) on Children3.LeftIndex between TP.PL3 and TP.PR3
) as x
WHERE idx = 1
GO
-- NOTE: this View is not indexed, but is uses the indexed view
CREATE VIEW TPIndexedIntermediate AS
select tp.Property, tp.Value
, Children1.Id as O1, Children2.Id as O2, Children3.Id as O3
, PL1, PL2, PL3
, Children1.LeftIndex as CL1, Children2.LeftIndex as CL2, Children3.LeftIndex as CL3
from TPIndexed as TP WITH (NOEXPAND)
-- Then add all possible children of all those objects
inner join Objects as Children1 WITH (INDEX(ix_LeftIndex)) on Children1.LeftIndex between TP.PL1 and TP.PR1
inner join Objects as Children2 WITH (INDEX(ix_LeftIndex)) on Children2.LeftIndex between TP.PL2 and TP.PR2
inner join Objects as Children3 WITH (INDEX(ix_LeftIndex)) on Children3.LeftIndex between TP.PL3 and TP.PR3
GO
---------- ANSWER 3 SCHEMA --------
-- You're talking about making six copies of the TP table
-- If you're going to go that far, you might as well, go the trigger route
-- The performance profile is much the same - slower on insert, faster on read
-- And instead of still recalculating on every read, you'll be recalculating
-- only when the data changes.
CREATE TABLE TPResult
(
Object1 int not null references Objects,
Object2 int not null references Objects,
Object3 int not null references Objects,
Property varchar(20) not null,
Value varchar(50) not null
)
GO
create UNIQUE index ix_Result on TPResult(Property, Value, Object1, Object2, Object3)
--You'll have to imagine this trigger, sql fiddle doesn't want to do it
--CREATE TRIGGER tr_TP
--ON TP
-- FOR INSERT, UPDATE, DELETE
--AS
-- DELETE FROM TPResult
-- -- For this example we'll just insert into the table once
INSERT INTO TPResult
SELECT O1, O2, O3, Property, Value
FROM TPResultView
Query part of my answer from sqlfiddle:
-------- QUESTION QUERY ----------
-- Original query, modified to use the view I added
SELECT O1, O2, O3, Property, Value
FROM TPResultView
WHERE property = 'P1' AND value="abc"
-- Your assertion is that this order by is the most expensive part.
-- Sometimes converting queries into views allows the server to
-- Optimize them better over time.
-- NOTE: removing this order by has no effect on this query.
-- ORDER BY O1, O2, O3
GO
-------- ANSWER 1 QUERY ----------
-- A different way to get the same result.
-- Query optimizer says this is more expensive, but I've seen cases where
-- it says a query is more expensive but it returns results faster.
SELECT O1, O2, O3, Property, Value
FROM (
SELECT A.O1, A.O2, A.O3, A.Property, A.Value
FROM TPIntermediate A
LEFT JOIN TPIntermediate B ON A.O1 = B.O1
AND A.O2 = B.O2
AND A.O3 = B.O3
AND A.Property = B.Property
AND
(
-- Find any rows with Parent LeftIndex triplet that is greater than this one
(A.PL1 < B.PL1
AND A.PL2 < B.PL2
AND A.PL3 < B.PL3)
OR
-- Find any rows with LeftIndex triplet that is greater than this one
(A.CL1 < B.CL1
AND A.CL2 < B.CL2
AND A.CL3 < B.CL3)
)
-- If this row has any rows that match the previous two cases, exclude it
WHERE B.O1 IS NULL ) AS x
WHERE property = 'P1' AND value="abc"
-- NOTE: Removing this order _DOES_ reduce query cost removing the "sort" action
-- that has been the focus of your question.
-- Howeer, it wasn't clear from your question whether this order by was required.
--ORDER BY O1, O2, O3
GO
-------- ANSWER 2 QUERIES ----------
-- Same as above but using an indexed view to partially calculate results
SELECT O1, O2, O3, Property, Value
FROM TPIndexedResultView
WHERE property = 'P1' AND value="abc"
-- Your assertion is that this order by is the most expensive part.
-- Sometimes converting queries into views allows the server to
-- Optimize them better over time.
-- NOTE: removing this order by has no effect on this query.
--ORDER BY O1, O2, O3
GO
SELECT O1, O2, O3, Property, Value
FROM (
SELECT A.O1, A.O2, A.O3, A.Property, A.Value
FROM TPIndexedIntermediate A
LEFT JOIN TPIndexedIntermediate B ON A.O1 = B.O1
AND A.O2 = B.O2
AND A.O3 = B.O3
AND A.Property = B.Property
AND
(
-- Find any rows with Parent LeftIndex triplet that is greater than this one
(A.PL1 < B.PL1
AND A.PL2 < B.PL2
AND A.PL3 < B.PL3)
OR
-- Find any rows with LeftIndex triplet that is greater than this one
(A.CL1 < B.CL1
AND A.CL2 < B.CL2
AND A.CL3 < B.CL3)
)
-- If this row has any rows that match the previous two cases, exclude it
WHERE B.O1 IS NULL ) AS x
WHERE property = 'P1' AND value="abc"
-- NOTE: Removing this order _DOES_ reduce query cost removing the "sort" action
-- that has been the focus of your question.
-- Howeer, it wasn't clear from your question whether this order by was required.
--ORDER BY O1, O2, O3
GO
-------- ANSWER 3 QUERY ----------
-- Returning results from a pre-calculated table is fast and easy
-- Unless your are doing many more inserts than reads, or your result
-- set is very large, this is a fine way to compensate for a poor design
-- in one area of your database.
SELECT Object1 as O1, Object2 as O2, Object3 as O3, Property, Value
FROM TPResult
WHERE property = 'P1' AND value="abc"
ORDER BY O1, O2, O3