Google and the Dissemination of Knowledge
Because of the decentralized nature of the web the traditional methods of finding
information have been transformed. Previously people would use reference books
or established indexes to find material. These reference sources were created
by a large number of organizations with differing objectives. Some were primarily
commercial like telephone directories, some academic like Medline and some were
aiming at "lifestyle" issues like The Whole Earth Catalog.
So the person seeking information had a wide variety of approaches to pursue.
With the growth of the web this function has become consolidated into a very
few web search engines. At the present time just three have most of the traffic:
Google, Yahoo and Microsoft. This presents several problems for anyone seeking
information.
There has been considerable discussion devoted to the commercial aspects of
search engines. Because of human nature, the items returned at the top of
the first page get a disproportionate amount of activity whether they are
relevant or not. So, many people spend a lot of effort trying to get their
sites list "first". The search engines know this and have created a lucrative
business model to provide prominent site display for a fee.
This is not the issue that I wish to discuss, however. What I'm interested in
is the access to non-commercial material. This can be scholarly, political or
even self expression. Without the assistance of search engines this material
is "invisible". It does not appear in printed indexes, and its only other
connection may be from other sites that have chosen to create a link to it.
Without being found by use of a search engine it doesn't exist. This has
serious implications for the dissemination of knowledge as online material
starts to replace print as the principal media for many.
Commercial Limitations
If the search engines decline to index a certain site then it can't be found.
This can be done for commercial reasons. There is no reason why a search engine
site should expend funds on those sites for which it expects to receive no income.
So far these companies have included much non-commercial material as a way to
gain credibility and to drive viewers to their service. There is no guarantee
that this will continue in the future.
Government or Corporate Censorship
Another danger is that sites may present information that is found offensive to
the search engine owners. We see this problem already in broadcast media. Never
do negative comments about the parent company appear on NBC (General Electric)
or ABC (Disney), for example. Thus the news programs on these networks have been
compromised. Because of the interrelationship between business and government
sites that present unpopular political positions may be suppressed. The Chinese
government, for example, censors web traffic. Does this also mean that search
engines are required to remove references to sites the government is blocking?
How would anyone even know this was occurring? The search engines don't tell
what they are skipping.
Poor Retrieval Techniques
The methods that search engines use to organize the material is quite primitive.
There is a body of nearly fifty years of work on optimizing techniques for
improving the relevance of the material retrieved. One of the first lessons
learned was that Boolean searches on keywords is just about the poorest strategy.
This is a fairly technical area and if you wish to read my comments follow this link:
Search Techniques. The result of this
is that material that contains abstract concepts is overlooked. The ranking
techniques are closely tied to the popularity of the links being displayed rather
than the relevance. This may work fine for directing viewers to sites where a
keyword is highly selective. So, for example, searching on a famous person will
almost always lead to relevant pages since the name has high selectivity.
Thus, search engines appear to be doing a good job because the needs of the
majority of non-critical users are being met.
So not only are we not finding the material we are looking for, but we don't
even know what we are missing. Even in the darkest days of the Soviet Union
banned publications were circulated by being retyped and photocopied. They may
have reached a much smaller audience than officially sanctioned publications,
but they still existed and influenced the political discourse in the society.
The organization of electronic material is still in its early stages, but if
the promise of the free flow of knowledge is not to be subverted into yet
another tightly controlled method for promoting consumerism then a serious
discussion needs to take place on how to deal with the limited pathways into
the web.
Part 2
Moral: Freedom of speech doesn't mean much if nobody
can hear you
Click here
to see all my essays in context.
If you have any comments you would like to add email me
at robert.feinman@gmail.com
Copyright © 2004 Robert D Feinman
Feel free to use the ideas, but the words are mine.