Search for the Grail
The topic of Internet search engines has cropped up in more than one conversation this month, such is the type of witty repartee that flows at dinner parties. We were not looking from the web users side, of how you can best use search engines to help you find that elusive web site, but rather from the side of the web site owners, of how you get your web site represented fairly on the web sites. The general feeling is one of disappointment and frustration. Web site owners are in a very good position to check out the efficiency of the search engines ability to index the pages of their web site. After all they should know what search query should bring up their site.
There are many type of web search engines out there but the aim is the same. To make it possible for users to find information on the internet, it's as simple as that. Many of the search engines now index the newsgroups as well as the web, but it's searching the web that we will deal with. So what happens when you type your question into the search box on one of these sites? After a short wait you usually get several thousand matches. This is hopeless, most of the time the user is not trying to find a single page with the information on, but a web site that might have the information they are interested in.
In the early days of the web when most of the web sites were large acedemic sites with thousands of pages on a variety of subjects, the ability to find a single page on a subject was of use. We suggest that nowadays the default way for a search engine to present its results should be for it to show just one match per website, with a drill down ability to display the individual pages should this be needed. By doing this, the number of matches returned would be reduced to a manageable amount. Simple enough to achieve, so why don't they do it? The search engines like us to think that the more matches their product returns then the better their engine is and so even very obscure matches are presented. This is not making the web easier to navigate, in fact quite the opposite.
This coupled with many of the links presented being broken results in the frustration that we all know. Sometimes these broken links are caused by sites disappearing or moving but often it is caused by a site changing its information, and pages are removed. So perhaps the website administrator should keep these old pages on the site for the benefit of the search engine's index. But, hang on a minute, these search engines are supposed to be indexing our sites, not us making considerations for them. Why should a site keep old, out of date information just so that links from a search engine work? Why can't the search engine just keep checking the links on a regular basis? But, you say, that would require a lot of processing power to do that, well tough, if you are going to aim to be the foremost search engine on the web then you should do the job properly.
On the matter of making allowances for the search engines, why should we put metta tags on our pages when the search engines appear to ignore them? Submitting to search engines is a pain as well. We are told that if you make multiple submissions then your site rating will be lowered, so what do you do if your site does not appear after a few days of submitting? You have your boss / client screaming at you that they can't find his/her web site on the search engines, so you email the companies concerned and they don't reply. So you resubmit, still no luck. So what is the answer? Well at least Alta Vista has come up with a solution for this very common and desperate position. You can now pay them to have your web site at the top of a particular search. They have probably taken this somewhat unpopular route to generate some more revenue to appease their shareholders.
Bearing in mind the high values that these companies have been floated at, then we are sure that we will see more of this form of revenue generation. Some sites, notably Yahoo will take months, if ever, to put a submitted site on their search pages. Yahoo is a managed search site, relying more on human intervention than say, Alta Vista. We know of sites that the owners have submitted to on numerous occasions and still there is no entry. Others still have entries linking to a site that was moved two years ago. When you ask Yahoo for sites in the UK & Ireland you often get sites as far away as Australia ! Again the mantra of 'more is better' applies here as well. The users do not want this superfluous information, it is they who are paying for the site indirectly, by providing the clicks that the advertisers are falling over their wallets to get to.
This whole technology starts to fall over as more and more sites generate the HTML pages dynamically from information sources such as databases, also sites that use flash and similar technologies extensively or sites that protect their pages of information from people who download whole websites to use in their own sites. None of these types of site can be indexed by search engines without employing various tricks. We thought that it would be good to get a definitive answer from the 'horses mouth' as to the best way to configure sites such as these so that they will be indexed fully. So we emailed the major players and awaited their replies. We did not go in with the PcPro hat on, but rather as a web site designer. Several weeks later we still await their replies. This is typical of the arrogant view that some of the larger sites have, thinking that they control much of the access to the web and that the smaller sites need them. Whereas the opposite is just as true, unless the search engines are comprehensive and accurate then users will stop visiting them. Why are search engines still this bad? Mainly because of the huge number of new users to the web who need someway of navigating round. They use the search engines as their main way of navigating the web, many times have users been surprised when one of us types a URL directly into the address bar of the browser. If the search engine does not return what they are looking for, they blame themselves. For every dissatisfied experienced user of the search engines there are probably a hundred new users coming on line, keen and ready to search for their own grail!
There is a major opening for a new and properly designed search engine, if anyone wants the challenge of building this 'better mouse trap' then the world truly will be 'beating a path to their door'!
Flicking good software!
We had a need to set up a web site the other day that required a public and a private area with a user name and password type. There are several ways of doing this ranging from writing your own to getting a copy of Microsoft's Commerce server. This last option is not only expensive but complicated to set up as Mr Moss of this parish will vouch for. So being lazy and always wanting an easy way out, Mark started searching the web and asking around. After a little while he came across a product which is so simple to set up and flexible to use that at times it seems to good to be true. This product is called Authentix by Flicks (www.flicks.com). Basically it's an ISAPI Filter that is installed in IIS4.
After downloading you run the setup program which then takes you through the procedures needed to register this component in IIS4. It's all simple stuff as long as you take it slowly and do as they tell you. After restarting the web server, you find that nothing seems to be different. Then via a little user interface program provided, you tell it what directories you want to restrict access to and which users or user groups have access to these directories. The users do not have to have NT accounts, they can either be put into the internal database in Authentix or to an ODBC object such as SQL Server or Access so that you can maintain other user details along with the username and password.
Maintaining accounts via the software provided
However the simplest way of maintaining this user list is a simple text file with the user name and password separated by a space or comma. This file can be put anywhere on the server, well out of the web root so that there is no chance of web visitors seeing this file! The client can then upload a new file via a secure route when a subscription is paid. If you want to provide the client with a slicker way of maintaining the user list then Flicks provide some ASP pages for you to edit that enable all the user changes to be done via a web page, secured of course. These ASP pages use the internal database that Flicks provide to store the user accounts. You can even configure this component to use user lists from a multiple of sources.
One nice feature is the ability to grant/refuse access to all users from a domain or referrer so that if you want to let in users who have come to your private site via a link on another trusted private site then you don't have to worry about maintaining two lots of user names and passwords. The passwords can be encrypted as well and the whole solution works with all the main browsers rather than some of the solutions that we have seen that are only IE based. We have employed this solution on a client's site and it all works well, basically it's simple and flexible and it works.
The cost of this product is $299 and is available from their web site as is a free full evaluation which times out.
We wait
Last month we reported on the broken state of Index server 2 and how Microsoft were working on a solution. So far there has been no fix for this although Microsoft ring twice a week to inform us that they are working on it. Perhaps NT service Pack 5 which is currently in Beta, might have a nice surprise in it for us.