We have a cupcake data table with two columns product_id and flavor. Here product_id is unique, which means it will not duplicate. Some of the flavors are duplicating. Our objective is to delete duplicated Flavors.
First what we are going to do is give row numbers to each flavor in a sequential order. Wherever vanilla comes it gives number row number 1 for the first occurrence and for the second occurrence row number 2 . This is what is meant by sequential order.
We are using a function called window function -row number () , to give a unique integer value in a sequential order.
The syntax of row_number is
ROW_NUMBER() OVER(
[PARTITION BY
column_1, column_2,…]
[ORDER BY
column_3,column_4,…]
)
The set of rows on which the ROW_NUMBER() function operates is called a window. Here the window is the flavor on which the row number function operates.
The PARTITION BY clause divides the window into smaller sets
or partitions. If we specify the PARTITION BY clause, the row number for each
partition starts with one and increments by one.
Then we can sort the result if required by order by clause.
The query for assigning row number will be like….
select product_id,flavor, row_number() over (partition by
flavor) as rn from cupcakes
order by flavor;
Result
This was our original table. Then we assigned the row number to the table.
To delete only duplicate records we have to select only the
duplicated rows, all we have to do is select the row number column where row number >1
We use a subquery to achieve this. A sub-query is a query
inside a query. Sub-query is also called an inner query. The inner query is
kept inside the main query.
Subquery
from (select product_id,flavor,
row_number() over (partition by flavor) as rn from cupcakes
group by product_id,flavor) as x where x.rn>1;
Result
We can see that the result has only rows with rn>1 which are the duplicates.
We have identified
the duplicates, and now all we do is delete the duplicated columns with by a
delete statement in the query.
Query to delete duplicates
Select * from cupcakes; run the query in the result we saw all the duplicates are removed.
No comments:
Post a Comment