Abstract thoughts
One of the most impressive bits of Microsoft's Internet technology is Index server which has been in it's ver 2 incarnation for some time now, ever since NT option pack came along. Index Server enables you to completely index sections of your hard disc, be they web sites or document stores and then query this index via a number of methods. A brief abstract of the documents that match your query and be displayed and then obviously the ability to display the full document.
These documents can be any form of information as long as Index Server has the necessary filter installed so that it can read and make sense of the document and can index it properly. Obviously Microsoft has provided filters for the Office suite of programs and there is also a filter for Adobe's Pdf format available at http://www.adobe.com/supportservice/custsupport/LIBRARY/acwin.htm but we have not tried this. The indexing process does not involve the setting of keywords or any form of pre-processing of these documents to work. You just point Index Server to the directories that you wish to index via the Management Console and off it goes.
The Indexing process is incredibly fast and yet makes very little demands on the server box once the bulk of the indexing is done, and new files are added. Querying is incredibly fast as well. So with all this functionality and with Index Server being free there must be a catch? Well there isn't or rather there wasn't until ver 2. We have been using Index server on live web sites and against our office file server since ver 1 came out some years ago. A couple of weeks ago one of the web servers need to be upgraded. This server had escaped any major upgrades and was still running Internet Information Server 3 and index Server 1.1. We needed the latest ver of ASP on this box so that another web site would work.
So the upgrade was started, now, this is not a simple upgrade, as first NT4 Srv Pack 3 needs to be installed, thankfully this was already the case with this machine. Next the ubiquitous IE4 has to be install on the server, then NT4 Option pack is installed and it is this that upgrades Internet Information Server and Index Server. After the minimum three reboots this requires, all the sites came back up and all was well. Well, not quite. One site in particular uses Index Server and has been quite happy with Ver 1.1. However with Ver 2 things were not quite right. The abstracts looked wrong. These abstracts are normally the first 150 characters of the decoded document. Obviously it should just show you the text at the beginning of the document if the necessary filter is installed, if not then you will just see a jumble of characters. What was happening was that these documents were HTML with JavaScript in the first few lines of the <BODY> tag. Fairly normal stuff, however the abstracts were showing parts of the JavaScript. It all looked very messy.
Script code shows in abstracts
Obviously there must be something about this problem on the Microsoft Knowledgeable about this, and there is. Microsoft state that there is a bug in the HTML filter (remember us mentioning these important filters earlier?) that will cause any scripting to appear in the abstracts and that " Service Pack 4 updates the HTML filter used by Index Server so that it no longer treats client-side scripting as part of the document abstract." . So we applied Service pack 4 from a MSDN CD, rebooted and tried again, no luck. So just in case there was a change to the service pack we tried the upgrade from Microsoft's web site. Still no luck. The next step was to put a support call into Microsoft about this problem.
A few hours later we got a call from one of their support guys, Andy Dow, who walked Mark through some registry settings and DLL versions, which all seemed to check out fine. He suggested that we send him the files and he would try to reproduce the fault there. In the meantime we set-up another server and the fault duplicated itself there. After a few days, Microsoft got back to us saying that they had duplicated the fault, and tried various versions of this filter file and had not found a fix. They had escalated the problem to the team who wrote Index Server and are awaiting an outcome. This is still the position as we write this column, when a fix is available we will let you know. In the mean time if you have to use scripts in your HTML documents then as long as they are after the first 150 displayable characters in the body of your HTML then all should be OK. We won't be adopting this solution however as the site in question consists of over ten thousand files, so we will have to wait.
'Never mind the quality...'
We all hear about sites that generate large numbers of hits or page impressions, and the advertisers flock to put their adverts on these sites. What many advertisers are not asking themselves is, 'what is the quality of these page impressions'. By this we mean how many of these internet users are interested in the site that they have just visited? Are they just passing through? Or are they using some search engines' site and are being frustrated at not only the URLs being fed back to them are useless but the whole thing is being slowed down by totally irrelevant banner adverts being download as well. These people are hardly likely to even look at your advert let alone click on it. If your advert appeared on a well focused site that appeals to the person with the same interests or desires as your ideal customer, then although the site may have 10% of the hits of a larger unfocused site, the success of your advert will be much greater.
On TV the adverts are often extremely well focused, for example a classic recently ago was when Vanish the stain remover had an advert during an interview with Monica Lewinski. More commonly, during Formula One coverage there are adverts for tyres, oils and cars. Another area that is grossly over looked on the web is the quality of the adverts. We have all said at times that the adverts on TV are often better than the programs themselves. How often can you say this about web adverts? When did you last email a friend with the URL of a really great web advert ? These adverts need to get better than the 'click here' variety that we see at the moment.
'I want it NOW!'
The other day we needed a Visual Basic control urgently, which we duly downloaded and paid for with a credit card. The credit card details we taken and after the order was submitted, a screen informed us that we would receive an unlock code within 3 days! This delay was caused by the software house using a separate company to handle it's credit card transactions. If you or your company are considering any form of on-line commerce then you should aim for it to be as instant an experience for the customer as possible. On-line users expect that products and information are provided instantly. The waiting for three days for an email with an unlock code is just plain crazy when all this could be automated with a little effort.