Corey Nyako

Hello!
I am Corey Nyako, a data analyst in San Francisco.
Machine
Learning
Python D3.js Other

Data Patterns

SQL Function

I created this small SQL Server function for summarizing patterns of data in fields. It comes in useful when the data is messy and needs to be standardized. For example, phone numbers can come in different formats: 123-123-1234, (123) 123–1234, (123) 1231234. The patternMask function can summarize the patterns of the values in the field.

It works by replacing lower case letters with a lower case 'x', upper case letters with an upper case 'X', numbers with a '#', and non-alphanumeric characters with a '*'. Alternatively, the non-alphanumeric characters can be returned as they are.


Usage

SELECT field1, [dbo].[patternMask](field1, 0) 'without', [dbo].[patternMask](field1, 1) 'with'
FROM table1


Result

field1 w/o non-alphanumeric mask w non-alphanumeric mask
(()76&^%0(45eDD (()##&^%#(##xXX ***##***#*##xXX

Usage in Grouping

SELECT [dbo].[patternMask](field1, 0) Collate SQL_Latin1_General_CP1_CS_AS, COUNT(*)
FROM table1
GROUP BY [dbo].[patternMask](field1, 0) Collate SQL_Latin1_General_CP1_CS_AS