Thursday, February 8, 2024

How to identify duplicate records from a table?

Finding duplicates in SQL involves identifying rows in a table where certain columns have identical values. Here is a very common method to achieve this:

Using GROUP BY and HAVING

Let us take an example.

We have cupcake data. In this table, the flavor of the cupcake is duplicated. Because our data is small, we can say by looking at the table the duplicate records are 2-Vanilla and  2-Chocolate. 

Query for cupcakes table

CREATE TABLE cupcakes (

    flavor VARCHAR(50)

);

INSERT INTO cupcakes (flavor) VALUES

    ('Vanilla'),

    ('Chocolate'),

    ('Strawberry'),

    ('Vanilla'),

    ('Chocolate'),

    ('Red Velvet');


select * from cupcakes;


Example Table




Remember we intend to identify only the duplicated flavor of cupcakes, we are not deleting the records.     

First, we need to check the count of each flavor.

Query for finding the count of each flavor

select flavor, count(flavor) from cupcakes

group by flavor;


Result




If the count is one we know the records are not duplicated, but if the count is more than one that means it has duplicated records. 

Now how do we extract only the duplicated records?

For this we give another condition in the query with having clause, the condition is that it returns the records with count >1. Along with our previous query, we add this condition using the having function. When adding having condition where count>1, it selects only those records whose count is greater than 1.

Query for finding duplicated flavors

select flavor, count(flavor) from cupcakes

group by flavor

having count(flavor)>1;

Result









                 




















No comments:

Post a Comment