Using SQL to identify embedded resources for your CMS site's CSP

It's good practice these days to add a Content Security Policy and couple it with the consent dialog. Both should include all third parties a site might have embedded.

But especially if being too relaxed in the past and allowing for example a generic HTML block or similar; it's hard to overview and easy to miss external embedded resources that might reside in the content database.

Putting together a CSP and activating it in content-security-policy-report-only mode is a good way to start but I thought it also valuable to try to extract possible embeddings from CMS content.

To work in a more direct fashion I figured that using SQL queries would be nice.

This probably might have some holes but I thought that looking for src="// and src="http in LongString-based properties should give decent coverage.

A Stack Overflow answer from Darka had a smart function to involve so after downloading the production database from the self-service portal I added that function to my local copy.

I only needed to do a small modification to the extraction pattern to look for // only.

CREATE FUNCTION dbo.getLinksFromText (@Tekstas NVARCHAR(MAX))
RETURNS @Data TABLE(TheLink NVARCHAR(500))
AS
BEGIN
    DECLARE @FirstIndexOfChar INT,
            @LastIndexOfChar INT,
            @LengthOfStringBetweenChars INT,
            @String VARCHAR(MAX)

    SET @FirstIndexOfChar = CHARINDEX('//', @Tekstas, 0)

    WHILE @FirstIndexOfChar > 0
    BEGIN

        SET @String = ''
        SET @LastIndexOfChar = CHARINDEX('/', @Tekstas,@FirstIndexOfChar+7)
        SET @LengthOfStringBetweenChars = @LastIndexOfChar - @FirstIndexOfChar + 1

        SET @String = SUBSTRING(@Tekstas, @FirstIndexOfChar, @LengthOfStringBetweenChars)
        INSERT INTO @Data (TheLink) VALUES (@String);

        SET @Tekstas = SUBSTRING(@Tekstas, @LastIndexOfChar, LEN(@Tekstas))
        SET @FirstIndexOfChar = CHARINDEX('//', @Tekstas, 0)
    END

    RETURN
END

All results query

SELECT fkContentID, LongString, Extracted.*
FROM tblContentProperty
OUTER APPLY dbo.getLinksFromText(LongString) as Extracted
WHERE LongString LIKE '%src="http%'
OR LongString LIKE '%src="//%'
OR LongString LIKE '%src=http%'
OR LongString LIKE '%src=//%'
ORDER BY fkContentID

This gave me a good amount of rows with the property value included.

All results example

fkContentID LongString TheLink
202732 <iframe src="https://www.youtube-nocookie.com/embed/qda .. //www.youtube-nocookie.com/

Distinct results query

I also complemented the full report with a listing of the distinct host names.

SELECT DISTINCT TheLink FROM (
    SELECT fkContentID, Extracted.*
    FROM tblContentProperty
    OUTER APPLY dbo.getLinksFromText(LongString) as Extracted
    WHERE LongString LIKE '%src="http%'
    OR LongString LIKE '%src="//%'
    OR LongString LIKE '%src=http%'
    OR LongString LIKE '%src=//%'
) AS tbl
ORDER BY TheLink

Distinct results example

TheLink
//www.facebook.com/
//www.youtube-nocookie.com/

Comments?

Published and tagged with these categories: Optimizely, CMS, Microsoft SQL Server