Google and the Dissemination of Knowledge


Because of the decentralized nature of the web the traditional methods of finding information have been transformed. Previously people would use reference books or established indexes to find material. These reference sources were created by a large number of organizations with differing objectives. Some were primarily commercial like telephone directories, some academic like Medline and some were aiming at "lifestyle" issues like The Whole Earth Catalog.

So the person seeking information had a wide variety of approaches to pursue. With the growth of the web this function has become consolidated into a very few web search engines. At the present time just three have most of the traffic: Google, Yahoo and Microsoft. This presents several problems for anyone seeking information.

There has been considerable discussion devoted to the commercial aspects of search engines. Because of human nature, the items returned at the top of the first page get a disproportionate amount of activity whether they are relevant or not. So, many people spend a lot of effort trying to get their sites list "first". The search engines know this and have created a lucrative business model to provide prominent site display for a fee.

This is not the issue that I wish to discuss, however. What I'm interested in is the access to non-commercial material. This can be scholarly, political or even self expression. Without the assistance of search engines this material is "invisible". It does not appear in printed indexes, and its only other connection may be from other sites that have chosen to create a link to it. Without being found by use of a search engine it doesn't exist. This has serious implications for the dissemination of knowledge as online material starts to replace print as the principal media for many.

Commercial Limitations

If the search engines decline to index a certain site then it can't be found. This can be done for commercial reasons. There is no reason why a search engine site should expend funds on those sites for which it expects to receive no income. So far these companies have included much non-commercial material as a way to gain credibility and to drive viewers to their service. There is no guarantee that this will continue in the future.

Government or Corporate Censorship

Another danger is that sites may present information that is found offensive to the search engine owners. We see this problem already in broadcast media. Never do negative comments about the parent company appear on NBC (General Electric) or ABC (Disney), for example. Thus the news programs on these networks have been compromised. Because of the interrelationship between business and government sites that present unpopular political positions may be suppressed. The Chinese government, for example, censors web traffic. Does this also mean that search engines are required to remove references to sites the government is blocking? How would anyone even know this was occurring? The search engines don't tell what they are skipping.

Poor Retrieval Techniques

The methods that search engines use to organize the material is quite primitive. There is a body of nearly fifty years of work on optimizing techniques for improving the relevance of the material retrieved. One of the first lessons learned was that Boolean searches on keywords is just about the poorest strategy. This is a fairly technical area and if you wish to read my comments follow this link: Search Techniques. The result of this is that material that contains abstract concepts is overlooked. The ranking techniques are closely tied to the popularity of the links being displayed rather than the relevance. This may work fine for directing viewers to sites where a keyword is highly selective. So, for example, searching on a famous person will almost always lead to relevant pages since the name has high selectivity. Thus, search engines appear to be doing a good job because the needs of the majority of non-critical users are being met.

So not only are we not finding the material we are looking for, but we don't even know what we are missing. Even in the darkest days of the Soviet Union banned publications were circulated by being retyped and photocopied. They may have reached a much smaller audience than officially sanctioned publications, but they still existed and influenced the political discourse in the society. The organization of electronic material is still in its early stages, but if the promise of the free flow of knowledge is not to be subverted into yet another tightly controlled method for promoting consumerism then a serious discussion needs to take place on how to deal with the limited pathways into the web.

Part 2

Moral: Freedom of speech doesn't mean much if nobody can hear you


Click here to see all my essays in context.
If you have any comments you would like to add email me at robert.feinman@gmail.com
Copyright © 2004 Robert D Feinman
Feel free to use the ideas, but the words are mine.