Web server log analysis

Introductory articles

  • Server log analysis
    From Usability.gov, an introduction to server log analysis and links to articles and other websites providing information on this topic.

  • Web log analysis
    Getting to know your audience is key to designing a successful website. Because your audience may be spread around the world, learning about the users of your site may be quite a challenge. Even if you think you have a pretty good idea of who your audience is, in many cases, there's a lot of information that you won't know--for example, what browsers your users are using, whether or not they are connecting from on or off campus, or what pages they find most useful.

  • Web traffic analytics and user experience
    As a specialist in the user, you gain knowledge through observation and direct questioning of individual users. Now, you can add to that insights gained from data pulled during their actions on the site. By looking at this information, you will get a fuller picture of user behavior, not in a lab, but in the true user environment.

  • What's important to measure on your website?
    Websites are very measurable. However, reams of data can be time consuming and confusing. The knack is to know what is really important to measure. This includes the following: reader actions; reader numbers; most and least popular pages; subscribers; external links; search keywords; page size; broken links and malfunctioning processes.

Discussion articles

  • Are you using the wrong web metrics?
    "Do you base success on measuring the volume of visitors and page impressions? Such measures may in fact reflect the failure-rather than the success-of your website."
    (Gerry McGovern - Content Critical)

  • Assessing web site usability from server log files (PDF)
    This 1999 whitepaper from Tec-Ed Inc. discusses how log file data can offer valuable insight into web site usage.

  • Measuring user motivation from server log files
    Estimating user interest and motivation by just counting page requests from a web server log provides a distorted metric of user activity. This paper proposes that measures of how much time users spend looking at a page are better estimates of user interest than page hits, providing simple human factors principles have been applied. An extended example of how this method might be used to collect and analyze data is also included. The types of decisions that can be made by authors and system administrators based on a time-based metric of user interest is summarised.

  • The end of the hit parade
    According to Forrester Research, many companies still use hits as the primary measurement of website success, followed by page views and session length. Yet few companies make good use of this basic server data. They don't know how to slice it and dice it, and even if they do, the information they glean is often too simplistic: Hit levels alone can't demonstrate customer loyalty or satisfaction, nor can they tell a company whether its website has helped reduce costs or bring in new business.

  • The value of web log data in use-based design and testing
    Web server logs and client-side logs can provide naturally-occurring, unobtrusive usage data, partially amenable to normative use assessments but particularly useful in experimental research comparing alternative web designs. Identification of types of seb server logs, client logs, types and uses of log data, and issues associated with the validity of these data are enumerated in this article. Frameworks that outline how sources of use-based data can be triangulated to assess web design are also illustrated. Finally, an approach to experimentation that overcomes many data validity issues is presented and illustrated through a pilot experiment that used server logs to compare user responses to frames, pop-up, and scrolling arrangements of a single web site.

  • Tracking site growth
    Jakob Nielsen on analysing numbers related to the growth of a website: "I normally recommend looking at them on a logarithmic scale. The reason is that the Web and the Internet both experience exponential growth. Therefore, Web statistics are better analysed in terms of growth rates than in terms of linear growth".

  • Traffic log patterns
    "The relative popularity of a site's pages, the number of visitors referred by other sites, and the traffic from search queries continue to follow a Zipf distribution. In comparing the new data with data from 10 years ago, the biggest finding is that the curves look almost the same. Several measures of Web traffic followed a Zipf curve in 1996, and they still do."
    (Jakob Nielsen - Alertbox)

  • Web analytics: the voice of users in information architecture projects
    "If you have been given the responsibility of with redesigning your company’s website, and you want to avoid an internal political minefield, I urge you to read on… Web analytics will help you understand the needs, preferences and behaviors of your website users. More specifically, it will show you who is using the website, what content and functional elements they are favouring, how well the taxonomy or content categories are performing, which site navigation represents a high degree of conversion propensity, how much affinity exists between different content categories, what role the site search function plays, and much more."
    (Hurol Inan)

  • Web traffic analysis and user experience
    By looking at the data on what users do on the site you can enhance your effectiveness as a specialist in the user. You already have information and knowledge gained through observation and direct questioning of individual users. Now, you can add to that insights gained from the broad swath of information pulled during their actions on the site. These numbers represent the real-world behavior and interests of the user.

  • Why web usage statistics are (worse than) meaningless
    Web usage statistics, such as those produced by programs such as Analog cannot be used to make strong inferences about the number of people who have read a website or webpage. Although those who compile these statistics usually try to make this clear, people still insist on misusing them to make overly strong inferences.

Tools for analysing web server logs

  • Analog
    Analog is a free program to measure the usage on your web server. It tells you which pages are most popular, which countries people are visiting from, which sites they tried to follow broken links from, and all sorts of other useful information. It is highly configurable and runs on many platforms including Unix, Mac and Windows.

  • Log analysis tools
    Links to a range of web server log analysis tools.

  • The Webalizer
    The Webalizer is a fast, free web server log file analysis program. It produces highly detailed, easily configurable usage reports in HTML format, for viewing with a standard web browser.