Giter VIP home page Giter VIP logo

hostsblock's Introduction

Hostsblock

An ad- and malware-blocking utility for POSIX systems

Contents

  1. Description: Features
  2. Installation: Dependencies, Arch Linux, Other POSIX
  3. Configuration: Edit hostsblock.conf, Enable Timer, Enable Postprocessing
  4. Usage: Configuring sudo, Manual Usage, UrlCheck Usage (examples)
  5. FAQ
  6. News & Bugs: Upgrading to 0.999.8
  7. License

Description

Hostsblock is a POSIX-compatible script designed to take advantage of the /etc/hosts file to provide system-wide blocking of internet advertisements, malicious domains, trackers, and other undesirable content.

To do so, it downloads a configurable set of blocklists and processes their entries into a single HOSTS file.

Hostsblock also provides a command-line utility that allows you to configure how individual websites and any other domains contained in that website are handled.

Features

  • Enhanced security - Runs as an unprivileged user instead of root. New: Includes systemd service files that heavily sandbox the background process.

  • System-wide blocking - All non-proxied connections use the HOSTS file (Proxied connections can be modified to use the HOSTS file)

  • Compression-friendly - Can download and process zip- and 7zip-compressed files automatically. (Provided that unzip and p7zip are installed)

  • Non-interactive - Can be run as a periodic background job without needing user interaction.

  • Extensive configurability - Allows for custom deny & allow listing, redirection, post-processing scripting (now provided via systemd configuration), etc.

  • Bandwith-efficient - Only downloads blocklists that have been changed, using http compression when available.

  • Resource-efficient - Only processes blocklists when changes are registered.

  • High performance blocking - Only when using dns caching.

  • Redirection capability - Enchances security by combating DNS cache poisoning.

  • Extensive choice of blocklists included - Allowing the user to choose how much or how little is blocked/redirected.

Installation

Dependencies

  • curl
  • A POSIX environment (which should already be in place on most Linux, *BSD, and macOS environments, including the following commands: sh (e.g. bash or dash, chmod, cksum, cp, cut, file, find, grep, id, mkdir, mv, rm, sed, sort, tee, touch, tr, wc, and xargs.

Optional dependencies for additional features

  • sudo to enable the user-friendly wrapper script (highly recommended)

Unarchivers to use archive blocklists instead of plain text:

  • unzip (for zip archives)
  • p7zip (for 7z archives) must include either 7z, 7za, or 7zr executables!

A DNS caching daemon to help speed up DNS resolutions:

If you use 127.0.0.1 as your blocking redirect address (redirecturl in hostsblock.conf), a pseudo-server that serves blank pages to remove boilerplate page and speed up page resolution on blocked domains:

Note that the default configuration gets no benefit from having a pseudo-server

Arch Linux

If you have yaourt installed: yaourt -S hostsblock or yaourt -S hostsblock-git

Or use one of the AUR packages: hostsblock, hostsblock-git

Don't forget to enable and start the systemd timer by running this:

$ sudo systemctl enable --now hostsblock.timer

For Other POSIX Flavors and Distros

The Best and Easiest Way

Please check with your distribution to see if a package is available. If there is not, ask for it or contribute your own!

If you are a package maintainer, let me know so that I can post the instructions here.

The Easy Way

First download the archive here or with curl like so: curl -O "https://github.com/gaenserich/hostsblock/archive/master.zip"

Unzip the archive, e.g. unzip hostsblock-master.zip

Execute the install.sh script as root, which will guide you through installation.

Configuration

By default, the configuration files are included in the /var/lib/hostsblock/config.examples/ directory. Copy them over to /var/lib/hostsblock/ to customize your setup.

Editing hostsblock.conf

Most of the hostsblock configuration is done in the hostsblock.conf. This file is commented really well, so please read through it before first use:

# CACHE DIRECTORY. Directory where blocklists will be downloaded and stored.

#cachedir="$HOME/cache" # DEFAULT


# WORK DIRECTORY. Temporary directory where interim files will be unzipped and
# # processed. This directory will be deleted after hostsblock completes.
#
# #tmpdir="/tmp/hostsblock" # DEFAULT

# FINAL HOSTSFILE. Final hosts file that combines together all downloaded blocklists.

#hostsfile="$HOME/hosts.block" # DEFAULT


# REDIRECT URL. IP address to which blocked hosts will be redirect, either 0.0.0.0 or
# 127.0.0.1. This replaces any entries to 0.0.0.0 and 127.0.0.1. If you run a
# pixelserver such as pixelserv or kwakd, it is advisable to use 127.0.0.1.

#redirecturl="0.0.0.0" # DEFAULT


# HEAD FILE. File containing hosts file entries which you want at the beginning
# of the resultant hosts file, e.g. for loopback devices and IPv6 entries. Use
# your original /etc/hosts file here if you are writing your final blocklist to
# /etc/hosts so as to preserve your loopback devices. Give hostshead="0" to
# disable this feature. For those targeting /etc/hosts, it is advisable to copy
# their old /etc/hosts file to this file so as to preserve existing entries.

#hostshead="0" # DEFAULT


# DENYLISTED SUBDOMAINS. File containing specific subdomains to denylist which
# may not be in the downloaded denylists. Be sure to provide not just the
# domain, e.g. "google.com", but also the specific subdomain a la
# "adwords.google.com" without quotations.

#denylist="$HOME/deny.list" # DEFAULT


# ALLOWLIST. File containing the specific subdomains to allow through that may
# be blocked by the downloaded blocklists. In this file, put a space in front of
# a string in order to let through that specific site (without quotations), e.g.
# " www.example.com" will unblock "http://www.example.com" but not
# "http://subdomain.example.com". Leave no space in front of the entry to
# unblock all subdomains that contain that string, e.g. ".dropbox.com" will let
# through "www.dropbox.com", "dl.www.dropbox.com", "foo.dropbox.com",
# "bar.dropbox.com", etc.

#allowlist="$HOME/allow.list"


# CONNECT_TIMEOUT. Parameter passed to curl. Determines how long to try to
# connect to each blocklist url before giving up.

#connect_timeout=60 # DEFAULT


# RETRY. Parameter passed to curl. Number of times to retry connecting to
# each blocklist url before giving up.

#retry=0 # DEFAULT


# MAX SIMULTANEOUS DOWNLOADS. Hostsblock can check and download files in parallel.
# By default, it will attempt to check and download four files at a time.

#max_simultaneous_downloads=4 # DEFAULT


# BLOCKLISTS FILE. File containing URLs of blocklists to be downloaded,
# each on a separate line. Downloaded files may be either
# plaintext, zip, or 7z files. Hostsblock will automatically
# identify the file type.

#blocklists="$HOME/block.urls"


# REDIRECTLISTS FILE. File containing URLs of redirectlists to be downloaded,
# each on a separate line. Downloaded files may be either
# plaintext, zip, or 7z files. Hostsblock will automatically
# identify the file type.

#redirectlists="" # DEFAULT, otherwise "$HOME/redirect.urls"


# If you have any additional lists, please post a bug report to
# https://github.com/gaenserich/hostsblock/issues 

Enable the systemd service

Don't forget to enable and start the systemd timer with:

$ sudo systemctl enable --now hostsblock.timer

Configure Postprocessing

Hostsblock does not write to /etc/hosts or manipulate any DNS caching daemons anymore. Instead, it will just compile a hosts-formatted file to /var/lib/hostsblock/hosts.block. To make this file actually do work, you have one of two options:

OPTION 1: Using a DNS Caching Daemon (Here: dnsmasq)

Using a DNS caching daemon like dnsmasq offers better performance.

To use hostsblock together with dnsmasq, configure dnsmasq as DNS caching daemon. Please refer to your distribution's manual. For ArchLinux read the following: Wiki section.

After that, add the following line to dnsmasq.conf (usually under /etc/dnsmasq.conf) so that dnsmasq will reference the file:

addn-hosts=/var/lib/hostsblock/hosts.block

Enable and start hostsblock-dnsmasq-restart.path:

$ sudo systemctl enable --now hostsblock-dnsmasq-restart.path

This has systemd watch the target file /var/lib/hostsblock/hosts.block for changes and then restart dnsmasq whenever they are found.

OPTION 2: Copy /var/lib/hostsblock/hosts.block to /etc/hosts

It is possible to have systemd overwrite /etc/hosts with the generated file.

Configure hostshead= in hostsblock.conf to make sure you don't remove the default system loopback address(es), e.g.:

hostshead="/var/lib/hostsblock/hosts.head"

Then put your necessary loopback entries in /var/lib/hostsblock/hosts.head. For example, you can copy over your existing /etc/hosts to this file:

$ sudo cp /etc/hosts /var/lib/hostsblock/hosts.head
$ sudo chown hostsblock:hostsblock /var/lib/hostsblock/hosts.head
$ sudo chmod 600 /var/lib/hostsblock/hosts.head

Enable and start hostsblock-hosts-clobber.path:

$ sudo systemctl enable --now hostsblock-hosts-clobber.path

This has systemd watch the target file /var/lib/hostsblock/hosts.block for changes and then copy /var/lib/hostsblock/hosts.block to /etc/hosts.

Usage

In its normal systemd-job configuration, hostsblock requires no interaction from the user aside from the steps above. If, however, you want to manually run the process, or to use the UrlCheck tool (hostsblock -c URL), you need to configure sudo:

Configuring sudo

Because hostsblock executes as a heavily sandboxed unpriviledged user (instead of root), you must configure sudo to allow other users to manually execute it.

To do so, edit sudoers by typing sudo visudo and add the following line to the end:

%hostsblock	ALL	=	(hostsblock)	NOPASSWD:	/usr/lib/hostsblock.sh

Add any users you want to be able to manually execute or use the urlcheck mode to the hostsblock group:

$ sudo gpasswd -a [MY USER NAME] hostsblock

The wrapper script installed in your PATH will automatically use sudo to execute the main script as the user hostsblock.

hostsblock [OPTION...] - download and combine HOSTS files

Without the -c URL option, hostsblock will check to see if its monitored blocklists have changed. If it detects changes (or if forced by the -u flag), it will download the changed blocklist(s) and recompile the target HOSTS file.

Help Options:
  -h                            Show help options

Options:
  -f CONFIGFILE         Specify an alternative configuration file
  -q                    Show only fatal errors
  -v                    Be verbose
  -d                    Be very verbose/debug
  -u                    Force hostsblock to update its target file

hostsblock [OPTION...] -c URL [COMMANDS...] - Manage how URL is handled

With the -c URL flag option, hostsblock can check and manipulate how it handles specific domains.

Note: The hostsblock-urlcheck symlink is now officially depreciated. Use hostsblock -c instead.

In addition to the above options, the following commands and subcommands can be used with hostsblock -c URL:

hostsblock -c URL (urlCheck) Commands:
  -s [-r -k]            State how hostblock modifies URL
  -b [-o -r]            Temporarily (un)block URL
  -e [-o -r -b]         Add/remove URL to/from denylist
  -a [-o -r -b]         Add/remove URL to/from allowlist
  -i [-o -r -k]         Interactively inspect URL

hostsblock -c URL Command Subcommands:
  -r                    COMMAND recurses to all domains on URL's page
  -k                    COMMAND recurses for all BLOCKED domains on page
  -o                    Perform opposite of COMMAND (e.g UNblock)
  -b                    With "-e", immediately block URL
                        With "-a", immediately unblock URL

Note that the -o subcommand turns a command into its opposite, e.g.

  • hostsblock -c URL -b -o unblocks URL
  • hostsblock -c URL -e -o removes URL from the denylist
  • hostsblock -c URL -a -o removes URL from the allowlist

Examples:

Once you have configured sudo, you can execute the following as any user in the hostsblock group:

See if "http://github.com/gaenserich/hostsblock" is blocked, denylisted, allowlisted, or redirected by hostsblock:
$ hostsblock -c "http://github.com/gaenserich/hostsblock" -s
Do the same thing for any of the sites referenced on this page:
$ hostsblock -c "http://github.com/gaenserich/hostsblock" -s -r
Do the same thing for any of the sites referenced on this page that are presently blocked:
$ hostsblock -c "http://github.com/gaenserich/hostsblock" -s -k
Block the domain containing "http://github.com/gaenserich/hostsblock" (that is, "github.com"):
$ hostsblock -c "http://github.com/gaenserich/hostsblock" -b

Note that "blocking" (and "unblocking", i.e. -b -o) a domain only works until the next time hostsblock refreshes /var/lib/hostsfile/hosts.block, unless you use a blocklist that does include it. To permanently block this domain, use the denylist (-e) command.

Permanently block (denylist) the domain containing "http://github.com/gaenserich/hostsblock" (that is, "github.com"):
$ hostsblock -c "http://github.com/gaenserich/hostsblock" -e

Note that "denylisting" on its own will not block the target domain until hostblock refreshes. You can combine both "blocking" and "denylisting" in one command, however:

Permanently and immediately block the domain containing "http://github.com/gaenserich/hostsblock" (that is, "github.com"):
$ hostsblock -c "http://github.com/gaenserich/hostsblock" -e -b
Temporarily unblock all blocked domains on "http://github.com/gaenserich/hostsblock" (helpful if the page isn't working quite right):
$ hostsblock -c "http://github.com/gaenserich/hostsblock" -b -o -k
Interactively scan through "http://github.com/gaenserich/hostsblock", prompting you if you want the domains referenced therein to be blocked, denylisted, or allowlisted
$ hostsblock -c "http://github.com/gaenserich/hostsblock" -i -r

FAQ

  • Why isn't it working with Chrome/Chromium?

    • Because they bypass the system's DNS settings and use their own.

    To force them to use the system's DNS settings, refer to this superuser.com question.

  • Hostsblock's systemd job fails with error "FAILED TO COMPILE BLOCK/REDIRECT ENTRIES FROM [...]" and leaves an empty hosts.block.new file.

    • You may have a blank line with a single space in your allowlist. Hostsblock matches that line with the space in between the IP address and the domain name that every single line has, i.e. it matches every single would-be entry in your target file. Remove the empty line, and hostsblock will function as expected.

News & Bugs

Upgrading to 0.999.8

For existing hostsblock users, please note the following changes in version 0.999.8:

Changes in hostsblock.conf

Due to the shift to POSIX-shell compatibility, the list of blocklists to be downloaded cannot be held in hostsblock.conf via the blocklists= parameter. Instead, this parameter contains the path to a file that contains the list of URLs, e.g. /var/lib/hostsblock/block.urls.

The new block.urls file is simply a newline separated list of URLs without quotations. Whitespace and text after # are ignored. An example block.urls file could look like this:

http://hosts-file.net/download/hosts.zip # General blocking meta-list
http://winhelp2002.mvps.org/hosts.zip

http://hostsfile.mine.nu/Hosts.zip

See the example block.urls in the /var/lib/hostsblock/config.examples directory for details.

No more postprocessing within script

Due to enhanced security and sandboxing, hostsblock no longer handles postprocessing on its own. Instead, users should use other systemd capabilities to replace the postprocess() {} functionality.

Hostsblock comes with systemd service files that replicate the most common scenarios. See the directions above for instructions on how to enable them.

Changes with sudo

sudo is no longer as widely used as before. The main systemd service no longer requires it. You only need it if you want to use the hostsblock -c URL (urlcheck) utility. See the above directions for details.

Other Caveats

  • The hostsblock-urlcheck symlink is depreciated. Please use hostsblock -c URL instead.
  • In UrlCheck mode, large hosts files will generate large temporary cache files that will eat up a lot of temporary storage. If you have a machine with little RAM (<6GB) and want to block a lot of domains, consider changing your $tmpdir to an HDD- or SSD-backed filesystem instead of the default tmpfs under /tmp.
  • UrlCheck mode will not be able to provide information on which blocklist blocked which domains anymore (annotation feature removed)
  • Hostsblock uses 0.0.0.0 as default redirection IP address instead of 127.0.0.1. 0.0.0.0 theoretically offers better performance without the need of a pseudo-server.

Other Changes from 0.999.7 to 0.999.8

Systemd Job Improvements
  • Systemd service now heavily hardened and sandboxed for enhanced security
  • Fixed simultaneous download feature so that it actually does what it is supposed to
  • Added processing support for source blocklists that just list domain names to be blocked, e.g. ads.google.com instead of 0.0.0.0 ads.google.com
  • Added support to read directly from zip and 7z files containing a single file without decompressing to a cache
  • Optimized filters used to process domains with improved throughput
  • If run with dash instead of bash, hostsblock has significant performance improvements
  • Removed annotation feature to reduce dependencies and overall processing demands
  • Vastly expanded list of potential blocklists (see block.urls)
POSIX-Compatibility Improvements
  • Supports POSIX shells (dash, ash, zsh) instead of just bash
  • Removed GNU-specific utilities, relies only on POSIX options
  • Should now run on *BSD and macOS (and perhaps even Android and iOS!) if proper POSIX environments are installed. UNTESTED
UrlCheck Mode Improvements

License

Hostsblock is licensed under GNU GPL

hostsblock's People

Contributors

gaenserich avatar jakevanderkolk avatar matkoniecz avatar pickfire avatar robs898 avatar sadi58 avatar salothin avatar tlvince avatar wabuo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hostsblock's Issues

Performance review

Review all files to make sure they comply with bash best practices, e.g. using "<" instead of cat, etc.

Sort, add compression to hostsblock.db

The hostsblock annotation file (by default /var/lib/hostsblock.db), which tracks which blocklists contain which entries (so that users can be informed which blocklist may contain overly-aggressive entries when using hostsblock-urlcheck) gets very large as just a plain text file (currently 157M on my box), and is also not in a sensible order (each pass of the target file compilation loop just appends to the same file). To make this file smaller and more human-readable, hostsblock should compress (with gzip/pigz, so as to just re-use the same mechanism used to compress previous target hosts files) and sort this file.
Ideas on implementation:
*hostsblock.sh: if gzip/pigz is detected, just "sort -u hostsblock.db | gzip/pigz -9dc > hostsblock.db.gz && rm hostsblock.db" etc.
*hostsblock-urlcheck: have a conditional to select either hostsblock.db.gz or fall back to hostsblock.db, add a gzip/pigz to the end of the write procedures for detected file.

rlwpx.free.fr entries aren't extracted back to hosts.block.d/...hosts.hosts files

Tested with pre-alpha, it looks like the directories are created, the file is extracted from 7z, and later there's no information about redirections. Below are parts of log created with verbosity=4:

rlwpx.free.fr.WPFF.hrsk.7z is a 7z archive. Will use 7za to extract it...
Extracting entries from /var/cache/hostsblock/rlwpx.free.fr.WPFF.hrsk.7z...
Created directory /dev/shm/hostsblock/rlwpx.free.fr.WPFF.hrsk.7z.d.
Moved rlwpx.free.fr.WPFF.hrsk.7z to /dev/shm/hostsblock/rlwpx.free.fr.WPFF.hrsk.7z.d.
[...]
Un7zipped rlwpx.free.fr.WPFF.hrsk.7z.
Extracting obvious entries from rlwpx.free.fr.WPFF.hrsk.7z...
Extracted obvious entries from rlwpx.free.fr.WPFF.hrsk.7z.
Extracting less-obvious entries from rlwpx.free.fr.WPFF.hrsk.7z
Extracted less-obvious entries from rlwpx.free.fr.WPFF.hrsk.7z.
Deleting /dev/shm/hostsblock/rlwpx.free.fr.WPFF.hrsk.7z.d...

dnscacher handling

I believe that hostsblock.conf should mention dnscacher variable and its recognized values. In my case dnsmasq auto-configuration fails (due to non-standard dnsmasq config) so I set dnscacher=manual which works well up to the point of hostsblock trying to restart it resulting in:

[WARN] FAILED to restart manual.

That could be fixed by enhancing below test condition:

# IF WE HAVE A DNS CACHER, LET'S RESTART IT
if [ "$dnscacher" != "none" ]; then
    _notify 3 "Restarting $dnscacher..."

getting rid of some odd entries

In addition to "0.0.0.0" and "0.0.0.0 0.0.0.0" I've mentioned before, today I've noticed this line in my hosts.block file - from http://hostsfile.mine.nu: "0.0.0.0 PNAP-WDC002"

Although these are harmless, I wonder if we could get rid of such entries by adding this (hopefully harmless) command as well: sed "/^[0-9].*[0-9][$\r]/d"

Always pdate /etc/hosts.block when white/black.list changed

Right now, /etc/hosts.block does will not always be updated when the while/black.list files are changed. This is because $changed is only set to true when the downloaded files change, but not when these configuration files change. The $changed part should therefore also monitor these files.

Reduce overall number of grep calls with grep -f lists.

Reduce the number of instances of grep piping to another grep instance by using the "-f" flag. Will probably have to create a whole "hostsblock" directory under "/usr/lib" to contain those "grep -f" files and hostsblock-common.sh. This should hopefully improve performance.

Default configuration: yandex.ru (site #1 in Russia) is blocked

After installing hostblock on my system and running it with default config I found that yandex.ru is blocked.

For ex-USSR users (especially in Russia) It is like block google.com by default. It is main search engine and etc.

I don't know yet which blacklist contains that entry, but it definitely should be turned off by default.

Have main processes run as non-priveledged user

Instead of having all of hostsblock run as root, have only the needed processes do so, e.g.

  • sending signals to any dnscachers to restart
  • write to files under /etc

Will sudo work for these other situations, or is there another method?

hostsblock-0.999.2-1: same version, 2 different scripts and results

hostsblock-0.999.2-1 is installed on two of my machines running Arch; are up-to-date, AUR included (as of yesterday).
hostsblock (and dnsmask) conf files are identical on both machines. It works on box One does but not on the other.

box One
With DNS caching on, hosts.block is updated and so on, single issue:

oct. 29 00:03:35 One hostsblock[3169]: gzip: invalid option -- 'z'
oct. 29 00:03:35 One hostsblock[3169]: Try `gzip --help' for more information.
oct. 29 00:03:38 One hostsblock[3169]: sort: write failed: standard output: Broken pipe
oct. 29 00:03:38 One hostsblock[3169]: sort: write error

from line 210 in /usr/bin/hostsblock --which is absent from the same version script on box Zero ~8-|

Box Zero
With DNS caching ON, hostsblock exit and hosts.block file is not updated

oct. 29 00:00:01 Zero hostsblock[13141]: [FATAL] NO VIABLE DNSCACHER CONFIGURATION FOUND. EXITING...
oct. 29 00:00:01 Zero systemd[1]: hostsblock.service: Main process exited, code=exited, status=1/FAILURE
oct. 29 00:00:01 Zero systemd[1]: Failed to start Block bad domains system-wide.
oct. 29 00:00:01 Zero systemd[1]: hostsblock.service: Unit entered failed state.
oct. 29 00:00:01 Zero systemd[1]: hostsblock.service: Failed with result 'exit-code'.

I tripled checked if dnsmasq.conf are identical between the two boxes and they are.

But I noted /usr/bin/hostsblock differ consequently. pacman -Qi says both are v0.999.2-1, with a different size ?#-|

For the moment I changed hostsblock on box One to run without DNS caching, and it runs. Though it still checks dnsmask:
/usr/bin/hostsblock

[INFO] Auto-detecting dnscaching settings...
[INFO] DNSMASQ does NOT have the correct 'listen-address' entry.
[INFO] DNSMASQ incorrectly configured.
[INFO] Checking blocklists for updates...

I suspect a script change forgetting to update version. What's your suggestion?

[Conf file] question on default configuration

Hi, in hostsblock.conf all DEFAULT settings are set for a NO DNS CACHER setup. But that one:

hostsfile="/etc/hosts.block"  # For use with a dnscacher, e.g dnsmasq

Of course unless I'm wrong. Please enlight me.

Add option to disable configuration checks.

Configuration that hostsblock script checks for is not the only possible, f.i. I usually avoid modifying dnsmasq.conf directly and place config overrides in /etc/dnsmasq.d/hostsblock instead. Also, "prepend domain-name-servers 127.0.0.1" is not always required.

It would be helpful to have a way to disable hostsblock config checks and assume that everything is already configured and working properly.

database format and _check_url

I am just trying to understand the format for the file /var/lib/hostsblock.db.
It seems to me it is something like "%IP %address ! %source"
However

  • line 262 in common: _strip_entries " $@$" "$annotate" . The extra $ in the first argument makes it match the address and the end of line. I would expect "$@ !"
  • Is there a special format for local files ? the "!" character is escape in many calls to sed and grep. So the database ends up with "%IP %address \! %source"

missing rc.conf

I wanted to install this package, but I didn't find it on AUR. So I decided to adopt the "general installation" from this git: I downloaded the tarball and extracted it, but I found out that the rc.conf link pointed to nowhere... now I see that it has been deleted from the git 5 days ago.
What happened? :(

[SOLVED] How to temporary disable the whole block list, for checking?

Hello,
Please how to temporary disable the whole block list?
I suspect my hostsblock config may causes my inability to download any data from Google Webmaster Tools. On my main computer with hostblocks it occurs with four different browsers (plus firefox no plugin). Clicking the "Download this table" on GWT gives no result but the following error in Firefox console:

AssertionError: Assertion failed: GA not loaded yet.

There's no visible URL on GWT's button so I cannot make use of hostsblock-url-check in this case.
It works on my laptop therefore the need to check my main computer's network conf.

Repeated entries in target hosts file

A small group of entries repeat themselves in the target hosts file despite filtering. Textfile encoding is suspected. The repeated entries:
ads.cnn.com
ads.doubleclick.com
ads.theweek.com
adsyndication.msn.com
advertising.demandmedia.com
beacon.scorecardresearch.com
cdn.betrad.com
click.linksynergy.com
content.dl-rms.com
csi.gstatic.com
dsa.csdata1.com
example.com
getglue.com
mediakit.theonion.com
nbcpeacock.rresults.com
ogp.me
pixel.mathtag.com
ru.youporn.com
s7.addthis.com
widgets.getglue.com
www.nbcudigitaladops.com
www.rooshv.com

Update hostsblock.conf

It's been some time since hostsblock scripts have been using hostsblock.conf and not rc.conf (which probably should have been deleted already).
So the recent adaway source list should be added to this file as well as moving someonewhocares.org to the highly recommended group together with it.
#54

what if a server FAILED?

Today I received "xxxxx FAILED" message for the first time, and I only saw it when I opened the log file after hostsblock ended quietly as normal.
Apparently no action was taken in such a case, although perhaps it would be better if hostsblock could (a) simply move on to the next server and complete the whole process using that server's hostsblock list in the cache, or (b) re-try connecting to such a server a (configurable?) specified number of times at (configurable?) specified intervals, and perhaps switch to mode (a) if it still FAILED.

Feature request: option to check updates only

I don't know if this is very easy (e.g. a little modification in the "DOWNLOAD BLOCKLISTS" section of hostsblock.sh) or difficult to implement without much change in the present code, but I thought it would be very handy if we could enter a command like hostsblock.sh -v 3 -checkonly &>/tmp/hostsblock-lists-status.log to see if there are any pending updates.

Note: What I have in mind actually is to try and create, in due course, an app indicator similar to https://github.com/Sadi58/grive-indicator and https://github.com/Sadi58/indicator-chars
And my "work-in-progress" to this end (adding several files under subfolder ./src/indicator/ only) is now here: https://github.com/Sadi58/hostsblock/tree/master/src/indicator

Using hostsblock for polipo

I hope you can add it to the README.md to show how to use hostsblock in polipo by converting /etc/hosts

sudo sed 's/127.0.0.1 //' /etc/hosts >> /etc/polipo/forbidden

detect_dnscacher function error?

While testing the new pre-alpha version I get the error "USING no DNSCACHER, BUT THE 'hostsfile' VARIABLE IS NOT SET TO /etc/hosts"
But I am using DNSCACHER.

False "CHANGES FOUND": someonewhocares.org

I've noticed that the blocklist at http://someonewhocares.org/hosts/hosts gets updated too frequently, and when I checked I didn't see such frequent updates there. For instance, hostsblock somehow has recently found changes every day although the file itself has this line: "Last updated: Nov 19th, 2015 at 21:40"
I wonder if this is caused by a server flaw or what.

hostsblock.log lines now have strange leading text

After the latest commits I've begun getting the text below followed by a space before each line in the hostsblock.log file:
[�[0;33mINFO�[0m]
The odd character above (white question mark against black background) is displayed in my text editor as a square including 00 in the first line and 18 or 1B in the second line (my eyes are a bit old ;-)
I use this launcher script:

#!/bin/bash
/etc/hostsblock/hostsblock -v 3 &>/var/log/hostsblock.log
cat <<EOF >> /var/log/hostsblock.log
Hostsblock completed at `date +'%d/%m/%Y %H:%M'`
EOF

notify when completed?

I don't know about other distros or MacOS but adding the following line at the very end displays a nice notification each time when this cron job is completed under Ubuntu:

DISPLAY=:0.0 XAUTHORITY=~/.Xauthority notify-send "Hostsblock updated just now!"

Underlying assumptions in function _check_dnsmasq_config appear to be wrong

In trying out hostsblock check on a Slackware machine that uses Networkmanager combined with dnsmasq the script fails for me due to a number of what appear to be wrong assumptions in the check_dnsmasq_config function in /usr/lib/hostsblock-common.sh

The background is that when someone uses Networkmanager to manage the network it becomes responsible for starting dnsmasq (by setting dns=dnsmasq). When doing this Networkmanager starts dnsmasq and it no longer uses a dnsmasq config file; instead it reads files that are put in a user defined conf-dir directory(user can pick the name of that directory). Note that this is something that can be the case without Networkmanager as well (see config file for dnsmasq)

For reference I get the following output for the options dnsmasq runs with:

bash-4.3$ ps -eo comm,args| grep dnsm dnsmasq /usr/sbin/dnsmasq --no-resolv --keep-in-foreground --no-hosts --bind-interfaces --pid-file=/var/run/NetworkManager/dnsmasq.pid --listen-address=127.0.0.1 --conf-file=/var/run/NetworkManager/dnsmasq.conf --cache-size=400 --proxy-dnssec --conf-dir=/etc/NetworkManager/dnsmasq.d

  1. The above line has both a conf-file and a conf-dir, however the settings that are added in the conf-directory DO NOT get added to the conf-file written by Networkmanager. So if you want to check the configuration you need to read both the conf-file and the content of all the files in the conf-dir. If you don't read the content of the conf-dir the check will always fail as the user has no control over the content of the conf-file.

  2. Note that I explicitly write "conf-file" instead of the "config-file" that you grep for in the code in this function. It should be like this if I look at the above output of ps. Moreover this is the way how these options are specified in the man-page of dnsmasq, see http://www.thekelleys.org.uk/dnsmasq/docs/dnsmasq-man.html (search for conf-file).

  3. Some of the dnsmasq options you check for are hardcoded in the networkmanager code that starts dnsmasq and don't appear in either the config file or the config directory you are trying to read. See line 273 - 290 in http://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/src/dns-manager/nm-dns-dnsmasq.c for this. You would miss for example that the listen-address is in fact configured correctly.

For now I gave up on trying to get the script working after this; however a quick glance through shows that I would have issues with the dhcpcd_config function as well as that also appears to look for files that don't exist. This is set in /etc/resolv.conf as Networkmanager manages this.

Personally I would opt for giving clear instructions of what settings should be used instead of these checks. Simply because getting these checks functioning correctly is hard because of distro and setup specific differences in network-config.

Add systemd units

Provide hostsblock.service and hostsblock.timer files to allow hostsblock to be used with systemd.

configuration file name

While testing the new pre-alpha version, got this error: "Configuration file /etc/hostsblock/hostsblock.conf NOT FOUND, using defaults."
It seems the file "rc.conf" should be renamed as "hostsblock.conf"

Whitelist not being applied

Under 0.999.3-1 the whitelist is not applied to the newly processed hosts file.

To reproduce, use hostsblock-urlcheck to explicity add a URL to the white list (eg, http://www.lrb.co.uk/) and then run hostsblock. The newly generated /etc/hosts file will include the whitelisted URL.

To debug, I appended a message to the relevant line in /usr/bin/hostsblock, like so
sed "s/ \!.*$//g" | sort -u | grep -vf "$whitelist" >> "$hostsfile" && _notify 4 "Whitelist applied +++++"; then

This never gets printed.

The output of hostsblock (with level 4 debugging) is:

[INFO] Old /etc/hosts will not be recycled into new version.
[DETAIL] Backing up old version of /etc/hosts...
/usr/lib/hostsblock-common.sh: line 55: 0: command not found
[WARN] FAILED to compress /etc/hosts with 0.
[DETAIL] Using a hostshead file, so overwriting /etc/hosts with /etc/hostsblock/hosts.head...
[INFO] Replaced existing /etc/hosts with .
[INFO] Compiling block entries into /etc/hosts...
[INFO] Compiled block entries into /etc/hosts.
[DETAIL] Skipping redirect entries...
[INFO] Appending blacklisted entries to /etc/hosts...
[INFO] Appended blacklisted entries to /etc/hosts.
[INFO] /etc/hosts: 354005 urls redirected to 127.0.0.1.
[INFO] Executing postprocessing...
[INFO] Postprocessing completed.
[DETAIL] Cleaning up temporary directory /dev/shm/hostsblock...
[NOTE] Cleaned up /dev/shm/hostsblock.
[INFO] DONE.

The error message about 0 not found points to /etc/hostsblock/hostblock.conf where I have tried both 0 and no here:

backup_old="0"

Sorry I haven't been able to debug more...

invalid gzip option

I installed hostsblock from AUR, the mentioned /etc/hostsblock/rc.conf file wasn't created (it appears to be that the hostsblock.conf file is the file that holds the configuration now, so the documentation is probably out of date)

$ ls /etc/hostsblock 
black.list  hosts.head  hostsblock.conf  white.list

so I just ran

$ sudo /usr/bin/hostsblock
gzip: invalid option -- 'z'
Try `gzip --help' for more information.

on a subsequent run there was no message, and I got my shell back after a few seconds

edit: when I clear the cache directory in /var/cache/hostsblock/ I get the message back

FAILED to unzip ...

First run works OK, subsequent one returns following:

$ sudo /usr/bin/hostsblock
[/var/cache/hostsblock/hosts-file.net.download.hosts.zip]
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of /var/cache/hostsblock/hosts-file.net.download.hosts.zip or
        /var/cache/hostsblock/hosts-file.net.download.hosts.zip.zip, and cannot find /var/cache/hostsblock/hosts-file.net.download.hosts.zip.ZIP, period.
[WARN] FAILED to unzip hosts-file.net.download.hosts.zip.
[/var/cache/hostsblock/hostsfile.org.Downloads.BadHosts.unx.zip]
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of /var/cache/hostsblock/hostsfile.org.Downloads.BadHosts.unx.zip or
        /var/cache/hostsblock/hostsfile.org.Downloads.BadHosts.unx.zip.zip, and cannot find /var/cache/hostsblock/hostsfile.org.Downloads.BadHosts.unx.zip.ZIP, period.
[WARN] FAILED to unzip hostsfile.org.Downloads.BadHosts.unx.zip.
[/var/cache/hostsblock/winhelp2002.mvps.org.hosts.zip]
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of /var/cache/hostsblock/winhelp2002.mvps.org.hosts.zip or
        /var/cache/hostsblock/winhelp2002.mvps.org.hosts.zip.zip, and cannot find /var/cache/hostsblock/winhelp2002.mvps.org.hosts.zip.ZIP, period.
[WARN] FAILED to unzip winhelp2002.mvps.org.hosts.zip.

PS: I've found this problem reported in Arch Forums hostsblock thread too but thought it's better to track it here.

urlcheck can't always find the host argument when passed an explicit configfile

It can find a url including http(s):// just fine:

sudo hostsblock-urlcheck -f test/hostsblock.conf http://www.foo.com
[DETAIL] Using configuration file test/hostsblock.conf.
Checking to see if url is blocked or unblocked...

'www.foo.com' NOT BLOCKED/REDIRECTED
    1) Block www.foo.com
    2) Block www.foo.com and delete all whitelist url entries containing www.foo.com
    3) Keep unblocked (default)
1-3 (default: 3)

But with foo.com or www.foo.com:

$ sudo hostsblock-urlcheck -f test/hostsblock.conf foo.com
[DETAIL] Using configuration file test/hostsblock.conf.
Checking to see if url is blocked or unblocked...

'-f' NOT BLOCKED/REDIRECTED
    1) Block -f
    2) Block -f and delete all whitelist url entries containing -f
    3) Keep unblocked (default)
1-3 (default: 3):

Leverage DNSMasq

Just FYI, dnsmasq will read hosts files and generate zones from them.

So you could use that for ALL of the heavy-listing. It even has a config option / flag (-b) to instantly report failed resolution for host-local addresses (127.anything, 0.0.0.0) and can cache somewhere around a thousand names in RAM (uses ~8MB RAM after running 5 days heavy browsing). That would make it a hard dependency though, so ...

Source: http://www.thekelleys.org.uk/dnsmasq/docs/dnsmasq-man.html

Can even specify a directory containing multiple hosts-formatted files ;)

hosts.block file has now 2 erroneous entries

After the latest commits I can see better processing in producing the hosts.block file with only 2 exceptions.
I have the following two lines as the 1st and 3rd entries:

0.0.0.0 0.0.0.0
0.0.0.0 0.0.0.0 www.zergnet.com

I don't think this is caused by the fact that I use 0.0.0.0 (which works fine for me) instead of 127.0.0.1 (which creates a little larger file).

Data loss in the process of merging and sorting blocklists?

I have HostsMan running in a Windows machine configured to use exactly the same sources as Linux hostsblock, but it produces considerably larger list.

The sources used are:
http://winhelp2002.mvps.org/hosts.zip
http://pgl.yoyo.org/as/serverlist.php?hostformat=hosts&mimetype=plaintext
http://www.malwaredomainlist.com/hostslist/hosts.txt
http://hosts-file.net/ad_servers.asp
http://hostsfile.mine.nu/Hosts.zip
http://someonewhocares.org/hosts/hosts
http://sysctl.org/cameleon/hosts

When I use Meld to compare two lists (after stripping everything other than domain names) I see that there are also a number of domain names missing in the Windows hosts list, but it has 147957 entries whereas hosts.block has 127335.

For instance, I saw large chunks of entries ending in hosts.block such as:

a.collective-media.net...302br.net
ad*.adk2.co
ad.amtk-media.com...302br.net
ad.doubleclick.net...302br.net
ad-emea.doubleclick.net...302br.net
adfarm.mediaplex.com...302br.net
admin*.testandtarget.omniture.com
ads.pointroll.com...302br.net
ams*.ib.adnxs.com

Should the code be organized into folders?

I think it is time to organize the code into src, man and something else as it seems a bit too long to ls and it may be easier to get all the src into /usr/bin, man into /usr/share/man.

gzip: invalid option -- 'z' when running command 'hostsblock' and 'hostsblock-urlchecker'

Hi.
It seems that everywhere the gzip command is used, there is a -z parameter passed along, which makes it fail.
The consequences are that the hostsblock.db.gz isn't created properly and then, other commands fail too.

The fix is pretty straightforward: remove the -z parameter.
BTW, the -z parameter is also used with the command pigz (parallel gzip), but as I don't use that program, I'm not sure if this issue affects it too. I'd guess it does.

Add license info [was: Add alternatives]

There is an alternative hosts blocking software which is written in python, well documentated README.md and only changes /etc/hosts. It uses the KISS principle which you can just python updateHostsFile.py to use it. I hope you can add it to the README.md. I also hope that you can add a license to hostsblock (I recommend wftpl.

302 Redirected URLs Don't Work

Sites that use a 302 redirect on a URL cause cURL to break, its fixed by adding "-L" or "--location" to the cURL arguments. However hostsblock will still fail to parse the resulting file if its compressed, because it'll save it as the initial URL and not as the redirected URL, which doesn't end with a file extension resulting in it being treated as a plaintext file. I'd submit a pull request to fix it, but I'm not skilled enough in BASH to do so.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.