Tracking user behaviour on your website

Why would you track user behaviour?

Before I start explaining ‘the how’, let me first go into ‘the why’.
There are plenty of reasons really, but here are my top 3 reasons, which motivated me into implementing user behaviour tracking:

  1. Search Engine Optimization. Knowing which pages get hit via which search engine and crawled by which bots first hand is a big plus.
  2. Tools like ‘Google Analytics’ are very useful and powerful, but I wanted to make sure that there is no BIAS or error on their side involved in the results.
  3. User Experience Optimization. Knowing which pages your users frequent the most and how they got there can help you to enhance the experience for your users and even lead to a major navigational overhaul.

Deciding on what to store

For this tutorial I have focussed mainly on storing the more useful information that can be found in the $_SERVER variable.

In the end, it is of course up to you to decide which data you want to keep for later analysis. Other than the suggested fields in my tutorial, you could track the time your server takes from request to response, a user identification ID (if your website has a login feature), … Or remove fields that you won’t be needing in your particular case.

Creating the table

You can use the MySQL query found below to create the table:

CREATE TABLE IF NOT EXISTS `tblVisitLog` (
  `VisitID` bigint(20) unsigned NOT NULL auto_increment,
  `RequestTimeTimestamp` int(10) NOT NULL,
  `PhpSelf` varchar(1023) NOT NULL,
  `RequestUri` varchar(1023) NOT NULL,
  `HttpReferer` varchar(1023) NOT NULL,
  `HttpUserAgent` varchar(1023) NOT NULL,
  `UserIP` varchar(16) NOT NULL,
  `UserPort` int(5) NOT NULL,
  `HttpAccept` varchar(1023) NOT NULL,
  `HttpAcceptCharset` varchar(1023) NOT NULL,
  `HttpAcceptEncoding` varchar(1023) NOT NULL,
  `HttpAcceptLanguage` varchar(1023) NOT NULL,
  `HttpConnection` varchar(1023) NOT NULL,
  PRIMARY KEY  (`VisitID`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

Populating the table

I’m sure you’re already aware that you can never trust user input. While at first glance, the data we use here provided by $_SERVER doesn’t look like it can be manipulated, it is easy for a client to place anything they like in their HTTP request(s)! So be sure to always escape these fields when you are handling them.

Place the PHP code found below in a piece of your website that will always get loaded (f.e. the header or footer) in order to populate the table with data whenever a page gets accessed:

$query=sprintf("INSERT INTO `tblVisitLog`(`VisitID`, `RequestTimeTimestamp`, `PhpSelf`, `RequestUri`, `HttpReferer`, `HttpUserAgent`, `UserIP`, `UserPort`, `HttpAccept`, `HttpAcceptCharset`, `HttpAcceptEncoding`, `HttpAcceptLanguage`, `HttpConnection`) VALUES (NULL,'%s','%s','%s','%s','%s','%s','%s','%s','%s','%s','%s','%s')",
        mysql_real_escape_string($_SERVER['REQUEST_TIME']),
        mysql_real_escape_string($_SERVER['PHP_SELF']),
        mysql_real_escape_string($_SERVER['REQUEST_URI']),        
        mysql_real_escape_string($_SERVER['HTTP_REFERER']),
        mysql_real_escape_string($_SERVER['HTTP_USER_AGENT']),
        mysql_real_escape_string($_SERVER['REMOTE_ADDR']),
        mysql_real_escape_string($_SERVER['REMOTE_PORT']),
        mysql_real_escape_string($_SERVER['HTTP_ACCEPT']),
        mysql_real_escape_string($_SERVER['HTTP_ACCEPT_CHARSET']),
        mysql_real_escape_string($_SERVER['HTTP_ACCEPT_ENCODING']),
        mysql_real_escape_string($_SERVER['HTTP_ACCEPT_LANGUAGE']),
        mysql_real_escape_string($_SERVER['HTTP_CONNECTION']));
mysql_query($query);

Analysis of the data

The sky is the limit, using any combination of fields and functions you can create queries that will give you invaluable information about your website.

User-Agent example

This simple query will tell you which User-Agents were used to access your website, and how many times. (Ordered from most to least)

SELECT COUNT(VisitID) as 'Visits', 
HttpUserAgent as 'User-Agent' 
FROM `tblVisitLog` GROUP BY HttpUserAgent
ORDER BY `Visits` DESC

Google Bot example

If you want to know how many times the Google Bot has passed by your website, you can use the following query:

SELECT COUNT( VisitID ) AS 'Visits', HttpUserAgent AS 'User-Agent'
FROM `tblVisitLog` 
WHERE HttpUserAgent = 
'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
GROUP BY HttpUserAgent

Most Popular Page example

If you want to see which of your pages are the most popular, you can use the following query:
You can change the number after ‘LIMIT’ by the amount of results you would like to have returned.

SELECT COUNT( VisitID ) AS 'Amount', PhpSelf as 'Page'
FROM `tblVisitLog` 
GROUP BY PhpSelf
ORDER BY `Amount` DESC
LIMIT 5  OFFSET 0

A full list of all the available options that can be used with the $_SERVER variable.
A full list of the User-Agents used by Google Crawlers.

1 Comment on “Tracking user behaviour on your website

Leave a Reply