Tuesday 2 July 2013

MSSQL - Query Templates - GROUP_CONCAT

Today I will show how to implement aggregation of strings in MSSQL  (similar to MySQL GROUP_CONCAT). My favorite way is with usage of FOR XML expression.

Imagine that we have, such a database:



We need to build a list of table fields so, that if a field with the same name is in multiple tables, then the second column in result set will contain alphabetically sorted and comma separated set of table names. For example, in our case, for all fields terminating by '_ID':



To solve this task, we will use the data from INFORMATION_SCHEMA tables.
In the first step, we will get a list of COLUMN-TABLE combinations:
SELECT DISTINCT
    COLUMN_NAME,
    TABLE_NAME
FROM
    INFORMATION_SCHEMA.COLUMNS
ORDER BY
    COLUMN_NAME, TABLE_NAME

now the same, but as XML:
SELECT DISTINCT
    COLUMN_NAME,
    TABLE_NAME
FROM
    INFORMATION_SCHEMA.COLUMNS
ORDER BY
    COLUMN_NAME, TABLE_NAME
FOR XML RAW

Now for each field name we need get own list of tables. Result should be also in XML form.
For this we need first a list of unique field names:
SELECT DISTINCT
    COLUMN_NAME
FROM
    INFORMATION_SCHEMA.COLUMNS
ORDER BY
    COLUMN_NAME

now combine this query with a query that returns list of COLUMN-TABLE combinations:
SELECT DISTINCT
    [COLUMN] = C1.COLUMN_NAME,
    [TABLEs] = (
            SELECT C2.TABLE_NAME
            FROM INFORMATION_SCHEMA.COLUMNS C2
            WHERE
                C2.COLUMN_NAME = C1.COLUMN_NAME
            ORDER BY TABLE_NAME
            FOR XML RAW
        )
FROM INFORMATION_SCHEMA.COLUMNS C1
ORDER BY [COLUMN]

We obtain the following table:



It now remains:
- Filter out the unnecessary tables (sysdiagrams)
- Leaving only the fields that terminating by '_ID'
- Replace the start and end XML tags with delimiter (comma)
SELECT DISTINCT
    [COLUMN] = C1.COLUMN_NAME,
    [TABLEs] = REPLACE(REPLACE(REPLACE(
        (
            SELECT T=C2.TABLE_SCHEMA + '.' + C2.TABLE_NAME
            FROM INFORMATION_SCHEMA.COLUMNS C2
            WHERE
                C2.TABLE_NAME != 'sysdiagrams'
                AND
                C2.COLUMN_NAME = C1.COLUMN_NAME
            ORDER BY 1
            FOR XML RAW
        ),
        '"/><row T="', ', '),
        '<row T="', ''),
        '"/>', '')
FROM INFORMATION_SCHEMA.COLUMNS C1
WHERE
    C1.TABLE_NAME != 'sysdiagrams'
    AND
    C1.COLUMN_NAME LIKE '%_ID'
ORDER BY [COLUMN]

The result of the query:



It remains to convert all of this in the form of T-SQL template:
USE <DataBaseName, sysname, tempdb>;
SELECT DISTINCT
    [COLUMN] = C1.COLUMN_NAME,
    [TABLEs] = REPLACE(REPLACE(REPLACE(
        (
            SELECT T=CASE C2.TABLE_SCHEMA
      WHEN SCHEMA_NAME()
      THEN ''
      ELSE C2.TABLE_SCHEMA + '.'
     END + C2.TABLE_NAME
            FROM INFORMATION_SCHEMA.COLUMNS C2
            WHERE
                C2.TABLE_NAME != 'sysdiagrams'
                AND
                C2.COLUMN_NAME = C1.COLUMN_NAME
            ORDER BY T
            FOR XML RAW
        ),
        '"/><row T="', ', '),
        '<row T="', ''),
        '"/>', '')
FROM INFORMATION_SCHEMA.COLUMNS C1
WHERE
    C1.TABLE_NAME != 'sysdiagrams'
    AND
    C1.COLUMN_NAME LIKE <ColumnNamePattern, sysname, '%'>
ORDER BY [COLUMN]

Save it in the list of SQL-templates.
Now we can use this template to find the fields of the same name in different tables:


About templates and usage of the system tables you can read here:
MSSQL - Query Templates
MSSQL - Query Templates - usage of system tables