Question: Optimal Way to Avoid Duplicate SubQuery in SQL

Question

Optimal Way to Avoid Duplicate SubQuery in SQL

Answers 2
Added at 2016-12-30 20:12
Tags
Question

I have a simple two column table. For example's sake we can use the following to build the data:

CREATE TABLE Duplicates
  (assignmentid varchar(5), questionid varchar(5));

INSERT INTO Duplicates
  (assignmentid, questionid)
VALUES
  ('aaaaa', '11111'),
  ('aaaaa', '22222'),
  ('bbbbb', '22222'),
  ('bbbbb', '33333'),
  ('bbbbb', '33333');

There are two rows that are identical. There is also a question that appears on multiple assignments. The latter is a valid scenario, and I am trying to get a query of all the questions that are part of multiple assignments. So my desired output is:

  aaaaa, 22222
  bbbbb, 22222

I was able to get this results with this:

SELECT main.questionid, sub.assignmentid 
FROM (
   SELECT questionid, count(assignmentid) AS AssignmentCount 
   FROM ( 
      SELECT DISTINCT questionid, assignmentid 
      FROM Duplicates
   ) sub 
   GROUP BY questionid
   HAVING AssignmentCount > 1
) main
INNER JOIN (
     SELECT DISTINCT questionid, assignmentid 
     FROM Duplicates
) sub ON main.questionid = sub.questionid;

As you can see the DISTINCT subquery is being repeated twice. I could avoid this by using the WITH command, but my understanding is that this does not necessarily mean that the subquery will be executed only once. So now I am here in StackOverflow, to ask if anyone knows a more efficient way to run this query.

Answers to

Optimal Way to Avoid Duplicate SubQuery in SQL

nr: #1 dodano: 2016-12-30 20:12

Just use window functions. One method is to compare the count of answers to the count of distinct answers:

select distinct answerid, questionid
from (select d.*,
             count(distinct answerid) over (partition by questionid) as cntd,
             count(*) over (partition by questionid) as cnt
      from duplicates d
     ) d
where cntd <> cnt;

EDIT:

You can do this without count(distinct) but it requires one extra subquery:

select distinct answerid, questionid
from (select d.*,
             count((seqnum = 1)::int) over (partition by questionid) as cntd,
             count(*) over (partition by questionid) as cnt
      from (select d.*,
                   row_number() over (partition by questionid, answerid order by questionid) as seqnum
            from duplicates d
           ) d
     ) d
where cntd <> cnt;

This uses the row number for the distinct calculation.

nr: #2 dodano: 2016-12-30 21:12

You can simplify that to:

select *
from duplicates
where questionid in (select questionid
                     from duplicates
                     group by questionid
                     having count(distinct assignmentid) > 1);

The sub-query returns all questionids that are assigned to more then one assignmentid.

Source Show
◀ Wstecz