Abstract
M.Sc. (Information Technology)
The internet grew exponentially over the last decade. With more information available on the
web, search engines, with the help of web crawlers also known as web bots, gather
information on the web and indexes billions of web pages. This indexed information helps
users to find relevant information on the internet.
An extranet is a sub-set of the internet. This part of the web controls access for a selected
audience to a specific resource and are also referred to as restricted web sites. Various
industries use extranets for different purposes and store different types of information on it.
Some of this information could be of a confidential nature and therefore it is important that
this information is adequately secured and should not be accessible by web bots.
In some cases web bots can accidently stumble onto poorly secured pages in an extranet and
add the restricted web pages to their indexed search results. Search engines like Google, that
are designed to filter through a large amount of data, can accidently crawl onto access
restricted web pages if such pages are not secured properly. Researchers found that it is
possible for web crawlers of well known search engines to access poorly secured web pages
in access restricted web sites. The risk is that not all web bots have good intentions and that
some have a more malicious intent. These malicious web bots search for vulnerabilities in
extranets and use the vulnerabilities to access confidential information.
The main objective of this dissertation is to develop a prototype web bot called Ferret that
would crawl through a web site developed by a web developer(s). Ferret will try to discover
and access restricted web pages that are poorly secured in the extranet and report the
weaknesses. From the information and findings of this research a best practice guideline will
be drafted that will help developers to ensure access restricted web pages are secured and
invisible to web bots.