Thursday, 9 February 2012

Basic SQL for Data Reconciliation


Excel’s VLOOKUP Function is probably the most used one for basic Data Analysis in a spreadsheet environment. Below are some SQL code snippets that do what VLOOKUP does in theory, but can be used when the data is in SQL Database for any reason.

The following tables will be used to describe on some very common scenarios. Let’s assume we have two independent data stores which are not integrated hence require regular reconciliation. Table A contains the Products Inventory. Table B contains the Online Catalog of those products. Table C is a mapping table across the two platforms. Table C contains sales figures made Online.


Product_ID
ProductName
Price
Category
OnlineCode
TotalSale
ZJU897JHG
JB Raincoat
110.00
Clothing
NULL
 NULL
JOK23MK30
Autograph Luxury Col
180.00
Clothing
NULL
 NULL
UIO0-SDNK
Animal Leather Wallet
20.00
Clothing
NULL
 NULL
PO0IKJU-90
Austin Reed B/W Shirt
60.00
Clothing
NULL
 NULL
78H-09-POI
Nike Sports Jacket
80.00
Clothing
NULL
 NULL
Table A: ProductDetails

OnlineCode
CreationDate
Stock
OL-FAS-002
12.03.2011
5
OL-FOD-091
23.06.2011
25
EX-COM-09
13.05.2011
60
OL-EXC-178
30.11.2011
14
EX-FAS-450
05.10.2011
10
Table B: OnlineProductCatalog

Product_ID
OnlineCode
ZJU897JHG
OL-FAS-002
II9080JM77
OL-FOD-091
JOK23MK30
OL-EXC-178
UIO0-SDNK
EX-FAS-450
Table C: ProductOnlineMapping

OnlineCode
Date
Qty
OL-FAS-002
14.03.2011
3
OL-FAS-002
15.03.2011
6
OL-FOD-091
23.06.2011
12
OL-FOD-091
25.06.2011
9
EX-COM-09
13.05.2011
20
EX-COM-09
13.05.2011
13
OL-EXC-178
30.11.2011
5
OL-EXC-178
30.11.2011
5
OL-EXC-178
30.11.2011
3
EX-FAS-450
05.10.2011
2
EX-FAS-450
05.10.2011
7
Table D: OnlineSales




Case 1:   Identify products that are not available in the Online Store
SQL Code A:          Nested queries:

SELECT DISTINCT Product_ID
FROM ProductDetails
WHERE Product_ID NOT IN
                (
SELECT DISTINCT Product_ID
FROM ProductOnlineMapping
                )

SQL Code B:          LEFT JOIN
SELECT DISTINCT P.Product_ID
FROM ProductDetails P     LEFT JOIN ProductOnlineMapping PM ON P.Product_ID = PM. Product_ID
                                                LEFT JOIN OnlineProductCatalog OC ON PM. OnlineCode = OC. OnlineCode
WHERE OC.OnlineCode IS NULL


Case 2:   Update Product Details table so it contains each item’s Online Code
SQL Code :             UPDATE script with INNER JOIN

UPDATE ProductDetails
SET OnlineCode = PM. OnlineCode
FROM ProductDetails P INNER JOIN ProductOnlineMapping PM
ON P.Product_ID = PM. Product_ID


Case 3:   Update ProductDetails table so it contains total online sale per item
SQL Code:             

-- Declare a temporary table / view to store results of the aggregate function
DECLARE @ProductTotalSale table(
   
Product_ID varchar(20),
    TotalSale int    );
 

-- Store results of the aggregate function
INSERT INTO  @ProductTotalSale
SELECT P.
Product_ID, SUM(OS.Qty)
from @ProductDetails P INNER JOIN @
ProductOnlineMapping PO ON P.Product_ID = PO.Product_ID
INNER JOIN @OnlineSales OS ON PO.OnlineCode = OS.OnlineCode
GROUP BY P.Product_ID


-- Update Products table using the stored values
UPDATE @
ProductDetails SET TotalSale= PTS.TotalSale
FROM @
ProductDetails P INNER JOIN @ProductTotalSale PTS ON P.Product_ID = PTS.Product_ID

Feel free to leave a comment if you have suggestions / queries.