I tested this before I saw this post and it does work (sort of), or does it cause some other issues?
Before adding to whitelist.txt
Code:
grep 'google-analytics.com' blocking_file
0.0.0.0 google-analytics.com
0.0.0.0 ssl.google-analytics.com
0.0.0.0 www.google-analytics.com
Then I added this to whitelist.txt
And it filters out ALL 3 of the entries above, which is not what I would expect at all.
So I did a little investigation and found that the whitelist entries are being removed using egrep -v (-v echos everything NOT matched)
And there is a space padded before and an end line anchor added after, i guess the space will match the space between the 0.0.0.0 and the domain.
So what is getting checked after processing is ' *.google-analytics.com$'
So using this knowledge, I realized that the '.' will match "anything" since it is a regex match.
For some reason the ' *' (thats: [space]*) is matching anything, any number of times, not sure why because that is not normal for regex. The space should only match a space. And it only seems to work at the start of the match.
Anyway I have some ideas that would make it more robust and also allow a standard wildcard if you are interested.
As it is right now, the '.' used in the domains is matching "anything" and has a very SMALL potential for false matches as intended.
For example:
Whitelist entry:
googl..com
Would match:
google.com
googly.com
googlr.com
googl.zcom
etc....