The Squid log analyzer

SquidAnalyzer

Squid proxy native log analyser and reports generator with full statistics about times, hits, bytes, users, networks, top urls and top domains. Statistic reports are oriented toward user and bandwidth control; this is not a pure cache statistics generator.

SquidAnalyzer use flat files to store data and don't need any SQL, SQL Lite or Berkeley databases.

This log analyzer is incremental and should be run in a daily cron or more often on huge network trafic.

News

CentOS 8 RPM available - Sun Mars 15 2020

Thanks to Klaus Tachtler a SquidAnalyzer RPM package for CentOS 8 is available at https://copr.fedorainfracloud.org/coprs/tachtler/squidanalyzer/

6.6 - Sun May 7 16:38:14 CEST 2017

This is a maintenance release that fix one year of issues reported by users. There is also some additional features, configuration directives all listed here:

    * Add TopStorage configuration directive to limit the storage of url to
      a certain quantity in data file and sorted by OrderUrl. On huge access
      log it will improve a lot the performances but you will have less
      precision in the top url. Default to 0, all url will be stored.
    
      Here are the performances of SquidAnalayzer on a 1.4 GB access log
      file to parse and compute full reports over one week:
    
      UrlReport | UserReport | Duration
      ----------+------------+---------
          0     |     0      |  2m30s
          0     |     1      |  3m00s
          1     |     1      | 18m15s
          1     |     1      |  9m55s when TopStorage is set to 100

    * Add a cache to network and user aliases for speed improvement. Thanks to
      Louis-Berthier Soulliere for the report.
    * Add TimeStart and TimeStop configuration directives to allow to
      specify a start and stop time. Log line out of this time range
      will not be parsed. The format of the value is HH:MM. These
      directives can be overridden with the -s | --start and -S | --stop
      command line options. Thanks to Louis-Berthier Soulliere for the
      feature request.
    * Add UpdateAlias configuration directive to apply immediately the changes
      made in aliases files to avoid duplicates. You still have to use
      --rebuild to recreate previous reports with new aliases. Enabling
      this will imply a lost of performances with huge log files.
    * Add UseUrlPort configuration directive to be able to include port number
      into Url statistics. Default is to remove the port information from the
      Url. Thanks to Tobias Wigand for the feature request.
    * Add report of top denied url on user statistic page. Thanks to delumerlino
      and Pavel Podkorytov for the feature request.
    * Add last visited timestamp on urls reports and show last ten visit on user
      url report. The last visit are counted after 5 minutes in hour view, after
      30 minutes in day views and per day in month view. Thanks to Ringa Mari
      Sundberg for the feature request.
    * Add support to ipv6 address dns resolving, you need perl > 5.014. Thanks
      to Brian J. Murrell for the report.

Full list of other bug fixes:

    - Change user top url title from "Top n/N Url" into "Top n/N sites". Thanks
      to Daniel Bareiro for the report.
    - Update documentation to clarify the use of space character in aliases
      files. Thanks to Darren Spruell for the report.
    - Fix explanation of UserAlias file format about ip address vs DNS name.
      Thanks to Darren Spruell for the report.
    - Fix missing report of TCP_DENIED_REPLY messages. Thanks to Jeff Gebhardt
      for the report.
    - Add license file about resources file and a script to retrieve original
      javascript libraries.
    - Fix html report building that was limited to the last day.
    - Fix missing network alias replacement.
    - Update year in copyrights.
    - Disabled bandwidth cost report by default.
    - Fix removing of obsolete year directory.
    - Fix obsolete statistics no longer being deleted. Thanks to andreybrasil
      for the report.
    - Allow parsing of access.log generated through syslog. Thanks to Celine
      Labrude for the report.
    - Add Url_Hit label in translation files.
    - Fix remaining _SPC_ in username. Thanks to roshanroche for the report.
    - Fix remaining SA_CALENDAR_SA in html output. Thanks to roshanroche for
      the report.
    - Add more fix to denied stat datafile corruption. Thanks to PiK2K for the
      report.
    - Fix denied stat datafile corruption. Thanks to PiK2K for the report.
    - Use CORE::localtime to format denied first and last hit.
    - Fix potential unparsed log case when log file are set in configuration
      file and not on command line.
    - Change the in-line popup (on top domain and top URL) to show hits on hits
      tables, bytes on the bytes tables and duration on the duration tables,
      instead of count. Thanks to Wesley Bresson for the feature request.
    - Only apply OrderUrl to user url list, other reports in Top domain and Top
      Url are now always ordered following the first column, which is the sorted
      column of the report (hits, bytes and duration).
    - Fix missing limit total number of URLs shown for a user to TopNumber.
       Thanks to Graham Wing for the report.
    - Update statistic on users with DENIED code to have the full list of
      user/ip even if they never hit an url.
    - Change Perl install directory from vendor to site to avoid well know issue
      on BSD. Thanks to dspruell for the report.
    - Add initial Debian package build files
    - Update squidanalyzer.css changed the width of the single menu tabs,
      because in German language, it looks better at the tab "TOP DENIED" is in
      German language "TOP VERBOTEN" and will be displayed better, no wordwrap
      anymore, will be done with this change. Thanks to Klaus Tachtler for the
      patch.
    - Fix Throughput label for unit/s that was not dynamically changed during
      value formating and always labelled as B/s. Thanks to aabaker for the
      report.
    - Fix typo in graph titles. Thanks to aabaker for the patch.
    - Update missing fields to German language file. Thanks to Klaus Tachtler
      for the patch.
    - Fix top url report that was not cumulate statistics anymore. Thanks to
      Wesley Bresson for the report.
    - Fix typo about Network exclusion. Thanks to Mathieu Parent for the patch.
    - Manpages fixes. Thanks to Mathieu Parent for the patch.
    - Use FHS for manpages path. Thanks to Mathieu Parent for the patch.
    - Update russian language file. Thanks to Yuri Voinov for the patch.
    - Fix typo in mime type redefinition.
    - Mark mime-types with invalid characters as "invalid/type". Thanks to
      gitdevmod for the report.
    - Add missing throughput translation entries in lang files. Thanks to Yuri
      Voinov for the report.
    - Fix major issue in squidguard and ubfguard history file managment. Thanks
      to Guttilla Elmi for the report and the help.
    - Fix path to xzcat program durinf install. Thanks to Johan Glenac for
      the report.
    - Fix auto detection of SquidGuard log file when there is no denied entry
      in the first lines.
    - Fix typo in debug messages
    - Add warning when DNSLookupTimeout is reach. Thanks to gitdevmod for the
      report.

6.5 - Sun Jan 3 16:12:12 CET 2016

This is a mantenance release to fix an overlaping bug on bytes charts with last versions of browsers like firefox, iceweasel and chrome.

  - Fix height of bytes graphs that was overlaping on third graph. Thanks
    to Daniel Bareiro for the report.
  - Update russian translation. Thanks to Yuri Voinov for the patch.
  - Update copyright year.

6.4 - Wed Dec 16 22:12:45 CET 2015

This release adds throughput statistics to all reports. It also allow to add a ufdbGuard log to the list of log files and to report blocked URLs into the Denied reports. It also adds support to xz compressed files.

There's also a new configuration directive and command line option:

  * Add -t | --timezone and TimeZone directive to change the timezone. When set,
    SquidAnalyzer will read time from log file as UTC time and will add the
    hours specified in the timezone option. This is useful if the log file is
    not parsed on a computer with the same timezone than the squid server.

It also included several bug fixes since last release.

  - Fix graphic overlaps that with one of the graphics. Thanks to Daniel Bareiro
    for the report.
  - Add throughput calculation (ratio between bytes and duration) to all reports.
  - Fix missing largest URL in networks detailed report. Thanks to Juan Martin
    for the report.
  - Fix use of network-aliases together with a network include entry that make
    networks disappears from the report. Thanks to Juan Martin for the report.
  - Add -t | --timezone and TimeZone directive to change the timezone. When set,
    SquidAnalyzer will read time from log file as UTC time and will add the
    hours specified in the timezone option. Thanks to Anderson - BR Suporte for
    the feature request.
  - Add support to ufdbGuard log file. squidGuard and ufdbGuard files can be
    given together with squid log file as a list into LogFile configuration
    file or as arguments of command line. Thanks to Martin Hoffmann for the
    feature request.
  - Fix some division by zero. Thanks to cueda for the report.
  - Fix some potential illegal division by zero.
  - Fix negative duration with http like log file when duration is not set (-).
    Thanks to cedua for the report.
  - Add new throughput (Bytes/sec) column in all reports and a throughput graph.
    Thanks to Mike Lerley for the feature request.
  - Allow parsing of xz compressed files. Thanks to Markus Maikis for the patch.
  - Fix bug with include/exclude networks or clients preventing users reports to
    be built. Thanks to Juan Martin for the report.
  - Fix SquidAnalyzer fails to update statistics after cleanup of access.log.
    Thanks to mkhallaf for the report.
  - Limit parsing of ufdbGuard logs to BLOCK line.
  - Replace SquidGuard label by Blocklist as we use more blocklist tools.
  - Update Italian translation file. Thanks to Stefano Cailotto for the update.

6.3 - Mon Oct 12 07:56:29 CEST 2015

This release adds a new report to show statistics about Denied URLs. It also allow to add a SquidGuard log to the list of log files and to report blocked URLs into the Denied reports. It ialso adds a pie chart on SquidGuard ACLs use.

There's also four new configuration directives:

  - UserReport to be able to remove any user related reports but statistics
    about URL and domains will remain.
  - ExcludedCodes to be able to exclude some log entries following the TCP
    code returned.
  - UrlHitsOnly to be able to enable the generation of additional HTML tables
    with top Url per byte and per duration in Top Urls and Domains report.
  - MaxFormatError to not exit immediatly when a bad format error is encountered. SquidAnalyzer will
    wait MaxFormatError before exiting.

Note that this last directive is disable by default, so if you still want the three tables in the reports, you must set UrlHitsOnly to 1.

A Catalan translation file has been added to the lang directory.

It also included several bug fixes since last release.

  - Skip immediately lines that squid is not able to tag: TAG_NONE. Thanks to
    David Touzeau for the report.
  - Fix display order when OrderUrl was set in Top Url and Top Domain views.
    Thanks to Wesley Bresson for the report.
  - Convert fr_FR.txt translation file from ISO_8859-1 to UTF8 and change
    charset value. Thanks to zezinho42 for the report.
  - Change order in de_DE.txt of WeekDay to So Mo Di Mi Do Fr Sa, the week
    days in translation file must start with Sunday unlike in calendar.
  - Fix case sensitivity in command line options. Thanks to Pavel Podkorytov
    for the report.
  - Add SquidGuard.current state file to be able to do incremental parsing of
    both squid and squidguard log files without issues.
  - Try to fix bad characters in mime_type field and add MaxFormatError to not
    exit immediatly when a bad format error is encountered. SquidAnalyzer will
    wait MaxFormatError before exiting.
  - Add information about how to parse SquidGuard log together with Squid Cache
    access log file.
  - Add pie chart on SquidGuard ACLs use.
  - Remove redundant regular expressions.
  - Try to fix case when method or code in log file are corrupted with non
    printable characters, should never appears but some injection have been
    reported.
  - Add support for SquidGuard log parsing to report denied ACLs. Thanks to
    Pavel Podkorytov for the feature request.
  - Fix detection of new log file from history when log file was in common
    http format.
  - Fix possible POSIX::strftime error with debug mode activated.
  - Add / at end of WebUrl when it is set but does not terminate with a slash.
  - Remove extra slash in week link, update russian translation file and fix
    some missprint. Thanks to badfiles for the patches.
  - Add Catalan translation file. Thanks to atorrillasmat for the file.
  - misprints, there are two of them. badfiles
  - Add TCP_REDIRECT to be counted as a DENIED tag from log file for users of
    squidGuard/ufdbGuard-style URL rewriters. Thanks to slashdoom for the patch.
  - Force SquidAnalyzer to use locale C internally.
  - Exclusion/Inclusion check when reading data files are limited to rebuild
    otherwise there is too much performance lost.
  - Applied exclusion/inclusion on cumulative reports even if rebuild is not
    used.
  - Fix some issue with rebuild and exclusion.
  - Show more information when a log is skipped because his size is detected
    as lower than expected.
  - Print SquidAnalyzer version when debug mode is used.
  - Add TCP_TUNNEL used by Squid 3.5 for streaming to cache miss statistics.
    Thanks to MangOuste for the report.
  - Apply exclusion/inclusion definitions on old data when rebuild is used.
    Thanks to niccarp for the feature request.
  - Fix unwanted message when QuietMode is enabled.
  - Fix typo that was crashing squid-analyzer. Thanks to Juan Jose Pablos for
    the report.
  - Fix output of benckmark info when debug is not enabled. Thanks to Juan Jose
    Pablos for the report.
  - Fix issue when rebuilding previous data without denied url stat. Thanks to
    Stepan Andreev for the report.
  - Add top denied label to translation file.
  - Add UrlHitsOnly configuration directive to be able to disable the generation
    of tables ordered per byte and duration in Top Urls and Domains report.
    Thanks to Cesar Vazquez for the feature request.
  - Add top denied url statistics. Thanks to tierpod for the featur request.
  - Replace call to localtime() to CORE::localtime() to avoid Time::localtime
    override default behaviour. Thanks to oldnrustyreaper for the report.

6.2 - Sat Feb 21 16:50:25 CET 2015

This release adds support to common or combined squid log format and a new Italian translation file. There's also a new configuration directive UserReport to be able to remove any user related reports, statistics about URL and domains will remain. The second new directive is ExcludedCodes to be able to exclude some log entries following the TCP code returned.

It also included several bug fixes since last release.

    - Update year in copyright
    - Add documentation about log format.
    - Allow mime type report for common or combined log format. This require
      the use of %mt at end of the log format.
    - Add support to common and combined (http like) log format.
    - Fix hidden control character in configuration file.
    - Add UserReport configuration directive to not produce any report about
      users. Thanks to Razerlikes for the feature request.
    - Fix several issues with non default installation path. Thanks to Yuri
      Voinov for the report.
    - Force squid-analyzer to find perl executable from env. Thanks to Yuri
      Voinov for the report.
    - Fix russian translation. Thanks to Yuri Voinov for the patch.
    - Fix install on Solaris. Thanks to Yuri Voinov for the report.
    - Fix error message when --rebuild is used and configuration directive
      LogFile is empty. Thanks to Michael Gauthier for the report.
    - Remove any access.log file from the parser list when --rebuild is used
      to avoid double entry. Thanks to Michael Gauthier for the report.
    - Fix user anonymization with --rebuild option.
    - Fix issues in week reports when the week overlaps two years. Thanks to
      Michael Gauthier for the report.
    - SquidAnalyzer will look for include/exclude users using format: user,
      user@domain.tld and domain\user. Thanks to Jacques Serfontein for the
      feature request.
    - Fix case where file was not parsed in incremental mode when log file
      size was lower than history offset. Thanks to Amir Mottaghian for the
      report.
    - Add Italian translation file. Thanks to Stefano Cailotto for the patch
    - Add ExcludedCodes configuration directive to be able to remove some
      log entries from statistics based on the TCP code. Thanks to Peter C.
      Ndikuwera for the feature request. For example: TCP_DENIED/403, which
      are generated when a user accesses a page the first time without
      authentication.

6.1 - Mon Oct 13 11:36:52 CEST 2014

This release fix severals major issues and adds a new feature to disable weekly reports with a new command line option --no-week-stat.

        - Fix top domains report where url with port was reported in unknown
          domains. Thanks to Michael Gauthier for the report.
        - Add --no-week-stat to disable weekly reports generation. Thanks to
          Mang0uste for the feature request.
        - Fix and update Ukrainian translation file. Thanks to Oleg A. Deordiev
          for the patch.
        - Save the last line parsed line when squid-analyzer is interrupted to
          avoid loading twice the same data after restarting. Thanks to Michael
          Gauthier for the report.
        - Fix missing calendar menu on daily report. Thanks to Cesar Vazquez for
          the report.
        - Fix problem with links to weekly summaries in SquidAnalyzer.pm Thanks
          to David Murrel for the patch.
        - Add IO::Handle and FileHandle in Perl modules that should be loaded.
          Thanks to Jeetendra Poojari  for the report.

Multiprocess benchmark

You can find a benchmark here about performances speed improvement by using multiprocess mode.

6.0 - Sat Aug 30 21:48:14 CEST 2014

This major release adds several new features, lot of speed improvement and some major bug fixes.

  • Multiple access.log files can be processed at the same time.
  • Multiprocess mode can be activated using the -j N command line option.
  • New ExcludedMimes configuration directive to exclude from statistics a comma separated list of mime-type or using regex like text/.*.
  • New ExcludedMethods configuration directive to exclude from statistics a comma separated list of HTTP methods (GET,POST,CONNECT,...).
  • New translation available: pl_PL

Using 4 CPU cores (-j 4) to run SquidAnalyzer can divide by 4 the time used in single process mode. In single process mode building a 1.4GB access.log file takes 50 minutes on my computer, using 4 cpus take around 15 minutes.

	squid-analyzer --no-year-stat -j 4 /var/log/squid3/access.log*

Here the full list of changes:

	- Freshmeat/Freecode site is down, release announcement will be done on
	  twitter now, see https://twitter.com/SquidAnalyzer
	- Allow multiple log files to be given at command line arguments.
	- Add support to ETCDIR instead of CONFDIR during installation process,
	  Where real config files are installed on some distributions (BSD).
	- Add support to parse multiple access log file at a time in multi-
	  process mode.
	- Add documentation about multiprocess usage.
	- Add multiprocess support to SquidAnalyzer, see -j option. This can
	  improve a lot speed performances. See notes at bottom of issue #18
	  for more details. Great thanks to Francisco Rodriguez for his help.
	- Add more timing information during SquidAnalyzer execution.
	- Add some other minor speed improvement.
	- Removed call to tell,  we were spending too much time in this method
	  unnecessary.
	- Fix reports with --no-year-stat. It now reports cache stat only in year
	  and month view instead of empty page.
	- Remove intermediate build of week reports.
	- Fix Mime-Type transfer's chart title to reflect the unit used: MBytes.
	  Thanks to IMiGS for the report.
	- Little fix in a translation. Thanks to atlhon for the patch.
	- Fix case where days in calendar does not appear when DateFormat was
	  changed. Thanks to joseh-henrique for the report.
	- Update Makefile with META_MERGE and MAN3PODS informations.
	- Fix missing cleaning of pid file when early error occurs.
	- Automatically remove \r when reading configuration file.
	- Improve incremental mode by seeking directly to last position in
	  logfile and automatic detection of already parsed log.
	- Fix issue on calendar when days of a month are over 6 week. Thanks
	  to Michael Gauthier for the report.
	- Update cs_CZ language file. Thanks to Martin Kylian for the patch.
	- Fix weeks graph when a week overlaps over 2 months.
	- Add missing install of included file. Thanks to Klaus Tachtler for
	  the patch.
	- Force removing of pid file after die of the process. Thanks to Klaus
	  Tachtler for the report.
	- Fix german language de_DE.txt. Thanks to Klaus Tachtler for the patch.
	- Add ExcludedMimes configuration directive to allow exclusion from
	  statistics of a comma separated list of mime-type full name or using
	  regex like text/.*. Thanks to Ajayaks for the feature request.
	- Add ExcludedMethods configuration directive to allow exclusion from
	  statistics of a comma separated list of HTTP methods. Thanks to Ajayaks
	  for the feature request.
	- Fix error when rebuilding with old data repository that does not have
	  week view repository. Thanks to Adam Ciarcinski for the report.
	- Add pl_PL translation. Thanks to Adam Ciarcinski for the patch.
	- Fix en_US translation. Thanks to Adam Ciarcinski for the patch.

UPDATE: you must override all your installation, Perl scripts, configuration file, CSS and Javascript files. Backward compatibility with 5.x data files is preserved.

Release annoucement on twitter

Release annoucements used to be done on freshmeat.net or freecode.net, but unfortunately the site is now down. Please follow us on twitter now to receive release annoucement and latest news.

Help support
SquidAnalyzer!

 

Copyright (c) 2001-2017 Gilles Darold - All rights reserved. (GPL v3).