The goals of the LSAM environmental analysis component are: (1) create and insert algorithms into web servers that select files that are advantageous to transmit to proxies via a multicast group, and (2) insert into proxies an algorithm that determines whether to join specific server multicast groups.

Loosely stated, candidate files for multicasting by a server should be requested repeatedly over a considerable interval. This selection could be done by hand in special cases. For example, to select a group of URLs that become hot due to a popular current event and deslecting it when the event became less popular. Ideally however, the selection procedure should be automated. Determining what that automated procedure should be is a near-term goal for LSAM.

Problem Definition

In the absence of special hints given by a server administrator, an automated selection procedure has only the retrieval history with which to make a decision. Using that historical information the procedure must be able to successfully identify server files that are likely to be fetched frequently in future. However, other than perhaps a few key home page files, there is no reason to expect necessarily that files exhibiting such behavior exist in a server.

Closely related to the problem of selecting candidate files for multicasting is the question posed to a proxy when it is presented with the option of joining a server's multicast group. Would joining be advantageous to that proxy?

To determine whether or not a successful automated selection procedure could be created and how successful it might be, a package of log analysis software was written. This package is used to collect statistics on frequency of URL retrieval across long intervals.

The questions that we are examining are: Does past URL retrieval history indicate the likelihood of future retrieval? If it does, what algorithms are effective at selecting URLs as candidates for a multicast group.

