Friday, September 6, 2024

How to import csv files to postgreSQL?

 

How to import csv files to postgreSQL?

Create the data base

Create table sales_data(Order_id int,

Order_date date,

Customer_Id varchar,

City varchar,

Product_Id varchar,

Cupcake_Flavor varchar,

Quantity int,

Unit_Price float,

Rating int);

 

Copy sales_data from ‘C:\Program Files\PostgreSQL\Sales_data\ Sales_data.csv’ csv header;

 Select * from sales_data;

Friday, March 1, 2024

Transpose a Table in SQL

 Write an SQL query to convert the given table in Table A to Table B



Table A



Table B 


Table A Query.

CREATE TABLE cupcake (

    customer_id varchar ,

    flavor VARCHAR(50),

    Qty INT

);


INSERT INTO cupcake (customer_id, flavor, Qty)

VALUES

    ('C1', 'Vanilla', 180),

    ('C1', 'Chocolate', 200),

    ('C1', 'Strawberry', 150),

('C2', 'Vanilla', 250),

    ('C2', 'Chocolate', 300),

    ('C2', 'Strawberry', 170);

select * from cupcake

Table A

Table B Query

select customer_id,

sum(case when flavor = 'Vanilla' then Qty else 0 end) as vanilla,

sum(case when flavor = 'Chocolate' then Qty else 0 end) as chocolate,

sum(case when flavor = 'Strawberry' then Qty else 0 end) as strawberry

from cupcake

group by customer_id;

Table B






Sunday, February 25, 2024

Group By in SQL

 Group by is an unavoidable clause for any data analyst who start analysis in SQL.

Let me explain the concept with an example

We have a  cupcake table that contains some records repeating like Vanilla, Chocolate, and Strawberry

What grouping means  here is arranging the cupcakes into identical groups.

All the chocolate together, all the strawberry together similar the rest of the Flavors.












In short SQL GROUP BY statement is used to arrange identical data into groups.

The GROUP BY clause combines all those records(row) that have identical values in a particular column and GROUPING can be done by column names.

Let's see how the group by works on the cupcake table

Query

select flavor from cupcakes group by flavor;

Result











Here what happened is the group by collapsed identical records into a single record

Now what are the situations where we can use group by sql clause?

The first one is, if you examine this sql query we ran earlier , with a simple group by, we saw that all the duplicated data is also removed from this given table.  Hence group by can also be used to remove duplicates from a table. This is a common interview question as well. If the interviewer gives a simple column and asks a question to remove the duplicates you may use the group by clause to get the result.

 The next important function of group by is , it is used with aggregate functions like sum, count, min, max etc in which case the aggregate produces a value for each group.

Let's see a query where group by can be used with aggregate function count.

Query

select flavor, count(flavor)from cupcakes

group by flavor;

Result







 






In the result, we get the flavor and the aggregated count.

Now we are going to discuss placing conditions on groups, here we use group by along with having clause. Lets say in our cupcake table we want to select only the records whose count is > 1.

Query for Group By with Having

select flavor, count(flavor) from cupcakes group by flavor having count(flavor)>1;

The query will select flavor and count of flavor according to groups, and then it checks for records whose count is >1. Remember Always group by is used before placing the condition with having clause.

 Result















One thing that we should keep in mind is that while grouping, we should include only those values in the SELECT list.

If we check the above example, flavor comes with the select statement and we group according to flavor.

I hope you understand how group by clause is used in SQL queries. 




Saturday, February 24, 2024

How to delete duplicate records ?- SQL

 We have a cupcake data table with two columns product_id and flavor. Here product_id is unique, which means it will not duplicate. Some of the flavors are duplicating. Our objective is to delete duplicated Flavors.










First what we are going to do is give row numbers to each flavor in a sequential order. Wherever vanilla comes it gives number row number 1 for the first occurrence and for the second occurrence row number 2 . This is what is meant by sequential order.



 







We are using a function called window function -row number () , to give a unique integer value in a sequential order.

The syntax of row_number is

ROW_NUMBER() OVER(

    [PARTITION BY column_1, column_2,…]

    [ORDER BY column_3,column_4,…]

)

The set of rows on which the ROW_NUMBER() function operates is called a window. Here the window is the flavor on which the row number function operates.











The PARTITION BY clause divides the window into smaller sets or partitions. If we specify the PARTITION BY clause, the row number for each partition starts with one and increments by one.

Then we can sort the result if required by order by clause.

The query for assigning row number will be like….

select product_id,flavor, row_number() over (partition by flavor) as rn from cupcakes

order by flavor;

Result


Wherever Chocolate comes it gives number row number 1 for the first record, row number 2 for the second record and row number 3 for 3rd occurrence of the record, and wherever vanilla comes it gives number row number 1 for the first record and row number 2 for the second record. We have noticed the row number has been given sequentially for duplicated records. Other flavors lemon and strawberry have only one record, so the row number function gives row number 1 for each of them.

This was our original table. Then we  assigned the row number to the table.





 





To delete only duplicate records we have to select only the duplicated rows,  all we have to do is select  the row number column where row number >1

We use a subquery to achieve this. A sub-query is a query inside a query. Sub-query is also called an inner query. The inner query is kept inside the main query.

Subquery

 select product_id,flavor,rn

from (select product_id,flavor, 

  row_number() over (partition by flavor) as rn from cupcakes

group by product_id,flavor) as x where x.rn>1;


Result







We can see that the result has only rows with rn>1 which are the duplicates.

We have identified the duplicates, and now all we do is delete the duplicated columns with by a delete statement in the query.

Query to delete duplicates








Select * from cupcakes; run the query in the result we saw all the duplicates are removed.

Result













Friday, February 23, 2024

SQL - CASE STATEMENT

 We will see the use of case statement in sql  with the help of an example

If we have cupcake sales data,  and if the sales amount > 10000, we want to set “Big Order” and if the sales order <10000, I want to set “Small Order”.

Example









So the logic will be if sales order >10000, then it will give “ Big Order” else “Small Order”

Basically, a case statement in SQL allows us to create conditional expressions similar to an IF-THEN-ELSE statement.

Another feature of the SQL case statement is it adds a column with values based on the condition.

The syntax for case statement is

SELECT column1, column2,... ,

CASE

  WHEN condition THEN result

END AS alias_name

FROM table;

The syntax of CASE always starts with the CASE keyword and ends with the END keyword followed by a column name alias.

Let's see this with an example

Query for Table cupcake

CREATE TABLE cupcake (

    product_id INT PRIMARY KEY,

    flavor VARCHAR(50),

    sale_amount INT

);

 

INSERT INTO cupcake (product_id, flavor, sale_amount)

VALUES

    (1, 'Vanilla', 10000),

    (2, 'Chocolate', 20000),

    (3, 'Strawberry', 2750),

    (4, 'Red Velvet', 3500),

    (5, 'Lemon', 2250),

    (6, 'Blueberry', 3250),

    (7, 'Caramel', 4000),

    (8, 'Pumpkin Spice', 3750),

    (9, 'Mocha', 4250),

    (10, 'Raspberry', 2900);

              

select * from cupcake    

CUPCAKE SALES DATA



Query with case statement

select *,

case

when sale_amount >=10000 then 'Big Order'

when sale_amount <10000 then 'Small Order'

end as order_volume

from cupcake;

 

Result



 






Here is the result. We can see as per our condition wherever the sales amount > 10000, we are getting “Big Order” and if the sales order <10000, we are getting “small order”., in a new column.


                          Important points to take care of while using a case statement are

  1. CASE  statement always starts with the CASE keyword
  2. Ends with  END keyword
  3. The END keyword is followed by a column name alias.
  4. A case statement adds a column based on the given condition                                             

 


Thursday, February 8, 2024

How to identify duplicate records from a table?

Finding duplicates in SQL involves identifying rows in a table where certain columns have identical values. Here is a very common method to achieve this:

Using GROUP BY and HAVING

Let us take an example.

We have cupcake data. In this table, the flavor of the cupcake is duplicated. Because our data is small, we can say by looking at the table the duplicate records are 2-Vanilla and  2-Chocolate. 

Query for cupcakes table

CREATE TABLE cupcakes (

    flavor VARCHAR(50)

);

INSERT INTO cupcakes (flavor) VALUES

    ('Vanilla'),

    ('Chocolate'),

    ('Strawberry'),

    ('Vanilla'),

    ('Chocolate'),

    ('Red Velvet');


select * from cupcakes;


Example Table




Remember we intend to identify only the duplicated flavor of cupcakes, we are not deleting the records.     

First, we need to check the count of each flavor.

Query for finding the count of each flavor

select flavor, count(flavor) from cupcakes

group by flavor;


Result




If the count is one we know the records are not duplicated, but if the count is more than one that means it has duplicated records. 

Now how do we extract only the duplicated records?

For this we give another condition in the query with having clause, the condition is that it returns the records with count >1. Along with our previous query, we add this condition using the having function. When adding having condition where count>1, it selects only those records whose count is greater than 1.

Query for finding duplicated flavors

select flavor, count(flavor) from cupcakes

group by flavor

having count(flavor)>1;

Result









                 




















Wednesday, February 7, 2024

How to remove duplicates using SQL query?

 Ever wondered why your database feels a bit messy? Meet the troublemakers called duplicates – they slow

things down, mess up accuracy, and hog space. In this article, I will share how you can delete duplicate

rows in SQL.

Method 1- By using distinct function

The DISTNCT keyword removes duplicate rows from the results of a SELECT statement

 or in technical language we can say a distinct keyword is used to remove redundant data

Query to create the example table

CREATE TABLE cupcakes (flavor VARCHAR(50));

INSERT INTO cupcakes (flavor) VALUES

    ('Vanilla'),

    ('Chocolate'),

    ('Strawberry'),

    ('Vanilla'),

    ('Chocolate'),

    ('Red Velvet');

Example Table






Query using "distinct" function

select distinct flavor from cupcakes;

Result Table




Method 2 -By using group by clause 

Group by clause allows us to collapse a field into its distinct values.

Example Table



Here the vanilla is repeated 2 times, similarly chocolate is also repeated two times. What group by clause does is it will collapse the two vanilla into one group and the two chocolate into one group. Other records remains the same. 

Query using "group by" clause

select flavor from cupcakes group by flavor; 

Result Table