Focused crawling for both relevance and quality of medical information∗ ABSTRACT
tion Search and Retrieval—information filtering, retrieval
Subject-specific search facilities on health sites are usually
built using manual inclusion and exclusion rules. These canbe expensive to maintain and often provide incomplete cov-
General Terms
erage of Web resources. On the other hand, health infor-
experimentation, performance, measurement
mation obtained through whole-of-Web search may not bescientifically based and can be potentially harmful. Keywords
To address problems of cost, coverage and quality, we built
quality health search, focused crawling, domain-specific search
a focused crawler for the mental health topic of depression,which was able to selectively fetch higher quality relevant in-
INTRODUCTION
formation. We found that the relevance of unfetched pages
A survey of US Internet users found that forty percent of
can be predicted based on link anchor context, but the qual-
respondents used the Internet to find advice or information
ity cannot. We therefore estimated quality of the entire link-
ing page, using a learned IR-style query of weighted single
have shown that medical information on the Internet can
words and word pairs, and used this to predict the quality
be fraudulent, of dubious quality and potentially dangerous
of its links. The overall crawler priority was determined by
the product of link relevance and source quality.
It is desirable that a search service over health web sites
We evaluated our crawler against baseline crawls using both
should return results which are not only relevant to the
relevance judgments and objective site quality scores ob-
query but in accord with evidence-based medical guidelines.
tained using an evidence-based rating scale. Both a rele-
Health experts, based on either scientific evidence or ac-
vance focused crawler and the quality focused crawler re-
countability criteria, have developed protocols for manual
trieved twice as many relevant pages as a breadth-first con-
assessment of medical web site quality [12, 8].
trol. The quality focused crawler was quite effective in re-
there is very little prior work on using automated quality
ducing the amount of low quality material fetched while
assessments, either in determining what to index or how to
crawling more high quality content, relative to the relevance
rank potential search results. One exception, due to Price
and Hersh [21], reranks results from general search enginesbased on automated ratings of relevance, credibility, absence
Analysis suggests that quality of content might be improved
of bias, content currency and value of links.
by post-filtering a very big breadth-first crawl, at the costof substantially increased network traffic.
ANU’s Centre for Mental Health Research operates a website1 which publishes evidence-based information on depres-
Categories and Subject Descriptors
sive illness and also provides integrated search of over 200
H.3.3 [Information Storage and Retrieval]: Informa-
depression sites. Currently, the set of indexed sites is man-ually maintained, using a seed list and URL-based inclusion
∗(Produces the permission block, copyright information and
rules that determine which parts of each site are indexed.
Here we report our experiences in developing a fully auto-
matic alternative, using a focused-crawler that takes intoaccount relevance and quality. BACKGROUND AND RELATED WORK Assessment of the quality of information on medical sites
The ultimate measure of the quality of a health web site isits effect on health outcomes but it is not usually feasible
for website publishers or visitors to obtain that information. Focused crawling
Next best would be an assessment of the extent to which
First introduced by de Bra et al. [3], and subsequently stud-
the content of the site is consistent with the best available
ied by many others [6, 9, 16], focused crawlers are designed
scientific evidence — evidence-based medicine — but deter-
to selectively fetch content relevant to a specified topic of
interest using the Web’s hyperlink structure.
Therefore in the present study, experts rate our crawled sites
A focused crawler starts from a seed list of topical URLs.
on a 21-point scale derived by Griffiths and Christensen [13].
It estimates the likelihood that each subsequent candidate
These ratings are based on a set of evidence-based depres-
link will lead to further relevant content, and may priori-
sion guidelines published by the Centre for Evidence Based
tise crawling order on that basis and/or reject low-likelihood
links. Evidence such as link anchortext, URL words andsource page relevance are typically exploited in estimating
There are also rating schemes for non-experts such as Silberg
[27] and DISCERN [8]. They focus on accountability crite-ria which could be measured by people without extensive
McCallum et al. [20] used Naive Bayes classifiers to cate-
medical expertise, such as whether the author is identified
gorise hyperlinks while Diligenti et al. [11] used the context-
and whether the site has been recently updated. However,
graph idea to guide a focused crawler. Rather than examin-
a study of depression web sites by Griffiths and Christensen
ing relevant nodes alone, both techniques trained a learner
[12] found no correlation between Silberg scores and expert
with features collected from paths leading up to the relevant
related with DISCERN scores [14], but carrying out suchmanual assessments is a lengthy process.
Chakrabarti et al. [6], on the other hand, used hypertextgraphs including in-neighbours (documents citing the tar-
In the Web search literature, link graph measures such as
get document) and out-neighbours (documents that target
PageRank [4] have been promoted as indicators of quality,
document cites) as input to some classifiers. According to
but how this type of quality might correlate with a medical
these authors, a focused crawler can acquire relevant pages
definition has been little studied. A very recent study by
steadily while a standard crawler quickly indexes a large
Griffiths and Christensen [14] found only a moderate corre-
number of irrelevant pages and loses its way, even though
lation between Google-reported PageRank and the 21-point
they started from the same seed lists.
rating scale. In this study we follow a content-based ap-proach. CRAWLERS Relevance feedback
In this section we introduce three crawlers. First we describe
Relevance feedback (RF) is a well-known information re-
our use of relevance feedback to estimate quality of pages,
trieval approach of ‘query by example’. Given example sets
then our classifier to compute relevance scores for links.
of relevant documents, the goal is to find more of the same.
Finally we describe the crawlers: breadth-first, relevance-
In this paper, we use this both to identify depression-relevant
pages and high-quality depression-relevant pages. Our spe-cific application of RF is described in more detail in Sec-tion 3.1. Relevance feedback for page relevance and quality
We applied Robertson’s approach to term selection [24]. In
A quality-focused crawler needs some way of predicting the
this approach, there are three ways to calculate the selection
quality of uncrawled URLs, to set its priority. We tried var-
value for a term: using the probability of the term occurring
ious methods to predict this, using as training data quality-
in a relevant document (r/R), rewarding terms that occur
judged depression pages from a previous study [28].
found it impossible to predict the quality of a link target
erage of these. We used the third approach, computing the
based on its anchor context alone, so we abandoned attempts
to score each link separately. Instead we scored the quality
of the whole page and applied this equally to the page’s
where R is the number of known relevant documents, r is
We used relevance feedback to predict page quality. RF was
the number of documents in R that contain term t and tf is
a natural choice here, because a focused crawling framework
the frequency of occurrence of the term within a document.
needs to prioritise the crawling order, and RF gives us scoresthat can be used in ranking. We also made separate use of
The weight w was calculated using the Robertson-Sparck
relevance feedback in scoring topic relevance for evaluation
purposes only. Both quality RF and relevance RF are de-
scribed in this section. Both use the term selection methods
described in Section 2.2 to identify extra query words and
(n − r + 0.5)/(N − n − R + r + 0.5)
phrases. Phrases usually include two adjacent words, but
where N is the number of documents in the collection, n is
sometimes three words if the middle word is a preposition,
the collection frequency which is the number of documents
for example ‘treatment of depression’.
containing a specific term; and R and r are defined as above. Document scoring based on relevance feedback
Table 1: Examples of terms in the relevance query.
We used the Okapi BM25 weighting function [26] to score
documents against the two weighted queries:
where tfd is the number of times term t occurs in document
d, N is the number of documents in the collection, n is thenumber of documents containing t, dl is the length of thedocument and avdl is the average document length.
Scores calculated with BM25 are collection dependent. Rather
Table 2: Examples of terms in the quality query.
than assuming a collection of the documents crawled thusfar, we chose to assume a more general web context and
used values for the collection parameters (N = 2, 376, 673,
avdl = 15 , 036 and n) which were derived from a large gen-
eral crawl of Australian educational websites.
The final score was computed using the following equation:
where Qt is obtained from equation 1 and wt from equa-tion 2. These scores represented either quality or relevance
Using relevance judgments from a previous experiment [28],we selected 347 relevant and 9000 irrelevant documents. Decision tree for link relevance
We applied the Robertson selection value formula to ob-
In our previous work we developed a classifier for predicting
tain weights for all the terms in relevant documents. Past
the relevance of a link target, based on features in the link’s
research has suggested that the number of terms that could
source page [29]. We evaluated a number of learning algo-
be usefully added to expand a query might range from 20 to
rithms provided by the Weka package [30], such as k-nearest
40 [15]. We arbitrarily selected 20 top weighted single words
neighbor, Naive Bayes, and C4.5. Since then we also eval-
and 20 top weighted phrases. See examples in Table 1.
uated Perceptron. The C4.5 decision tree [22] was the bestamongst those evaluated.
From the same previous experiment we identified 107 doc-
The classifier is based on words in the anchor text, words in
uments relevant to depression and of high quality, and an-
the target URL and words in the 50 characters before and
other set of 3002 documents which were either irrelevant or
after the link (link context). If we found multiple links to
the same URL, we included all available anchor contexts. This is a relatively standard approach [1, 9, 7].
We used the same technique as for the relevance query toproduce two candidate term lists: one containing single words
To produce a confidence score at each leaf node of the deci-
and the other containing phrases. However, we used a more
sion tree we used a Laplace correction formula [19]:
sophisticated procedure to choose a term selection cutoff.
We first derived a list of words and phrases representing ef-
fective depression treatments from [13], dividing multi-word
where N is the total number of training examples that reach
treatments into the type of phrases described above. E.g.
the leaf; Nk is the number of training examples from class k
‘cognitive behaviour therapy’ became ‘cognitive behaviour’
reaching the leaf; K is the number of classes and λk is the
and ‘behaviour therapy’. We then located these words and
prior for class k and is usually set to be 1. In our case, K is 2
phrases in the candidate lists and cut off the lists just after
because we only had two classes, relevance and irrelevance.
the lowest-ranked occurrence of an effective treatment term. Surprisingly this gave us the same cutoff (20) for phrases and
Combining quality and relevance scores
a similar cutoff for single words (29). Some example terms
We used the quality score of a page (computed using rele-
vance feedback) to predict the quality of its outlinks. If morethan one known page linked to the same URL, we took the
Note that the two queries include many terms in common,
mean quality score of the linking pages. Relevance scores
because both are on the topic of depression. High-quality
computed from the decision tree were already aggregated
depression-relevant documents are a subset of depression-
relevant documents. The quality query contains more wordsrelating to effective treatment methods such as ‘cognitive
To order the crawl queue for the quality crawler, we com-
therapy’ or antidepressant medications like ‘zoloft’ and ‘paxil’.
bined the quality and relevance scores. The overall score for
vance judging. However, to validate the accuracy of our RF-based ‘judgments’, we employed two relevance assessors to
judge the relevance of 300 RF-relevant and 120 RF-irrelevant
pages. These pages were randomly selected from all the RF
where conf idence levelrel is the URL’s relevance score (equa-
results of all the crawled pages. As for the judging criterion,
tion 4), DScorei using the quality query is a linking page’s
any page about the mental illness ‘depression’ was consid-
quality score (equation 3), and m is the number of pages
The level of agreement between the two assessors was high
The decision to multiply rather than add the scores was
(91.2%) indicating that judging for such a simple topic is
taken arbitrarily as combining relevance and quality is a
easy. The RF-judgments had an accuracy of 89.3%, a 90.9%
relatively new concept in IR. A side effect of taking the
success rate in predicting the relevance category, and a 84.6%
product is that if one of the two scores is zero, the overall
success rate in predicting the irrelevance category. We con-
cluded that these levels were high enough to present someRF-judgment-based results. Our three crawlers
Note that this RF classifier was only used in evaluating the
We evaluated three crawlers: the breadth-first (BF) crawler,
relevance of sets of pages returned by the various crawlers.
the relevance crawler, and the quality crawler.
None of these three crawlers used this classifier in deciding
crawler encounters a new URL that URL is added to a crawl
queue, and the crawler proceeds by taking URLs from thatqueue. The crawlers differ in how their crawl queues are
We evaluated relevance of the three crawlers, each starting
from a seed set of 160 URLs taken from the DMOZ depres-sion directory3. We evaluated the first 10,000 pages from
The BF crawler serves as a baseline for comparison. It tra-
each crawler according to RF-relevance.
verses the link graph in a breadth-first fashion, placing eachnewly discovered URL in a FIFO queue. The BF crawler islikely to find some depression pages since we start it from
Quality experiments
depression-relevant seed pages, but we would expect the rel-
Most of the models for assessing the quality of depression
evance of its crawl to fall as the crawl progresses.
content on the Web refer to the entire sites, not individualpages [8, 17]. We therefore grouped all the pages in each
The relevance crawler is designed to prefer domain-relevant
crawl into sites. Pages originated from the same host names
pages, ordering its crawl queue using the relevance decision
were considered to be from the same sites.
tree discussed in Section 3.2. The relevance RF score is notused, we reserve it for use in evaluation. By crawling the
The quality of the sites was evaluated by a research assis-
highest-scoring URLs first, we would expect the relevance
tant from the Centre for Mental Health Research using a
crawler to maintain its overall relevance more successfully
rating scale derived by Griffiths and Christensen [13] from
the CEBMH evidence-based clinical guidelines. Each sitewas assigned a quality score in the range 0 to 20.
The quality crawler is designed to prefer higher-quality domain-relevant pages. Each URL is given a score that was com-
Since judging took 4 hours per site on average, we could not
puted using equation 5. A major focus of this paper is to
use the full 160 page seed list. If we did, a large amount
evaluate whether the quality crawler can successfully priori-
of effort would be needed just to judge seeds, and these
tise its queue to maintain the overall quality of its crawl and
are uninformative with respect to crawl strategy. Therefore
avoid pages with low quality, potentially harmful advice.
we randomly selected 18 URLs from the 160 to use as ourquality experiment seeds. We cut off each of our three crawls
EXPERIMENTS AND MEASURES
at 3,000 pages. For this small crawl size, we were able to
Relevance experiment
judge the quality of any site with 6 or more crawled pages.
We used our RF relevance score (applying the relevancequery in equation 3), and a score threshold to evaluate the
We propose three measures to compare crawl quality. Note
overall relevance of our three crawls.
that, in our measures, the quality score of a page is assigned
found using 1000 relevant and 1000 irrelevant pages from
the quality score of the site containing it.
our previous study (these were separate from those used togenerate the relevance query). A threshold at 25% of the
• Quality score using all crawled pages: We first com-
theoretical maximum BM25 score (of 502.882) minimised
puted the mean value of the quality scores of all the
the total number of false positives and false negatives, so in
judged sites. We then transformed the site scores by
our crawls we labeled pages with RF relevance score greater
subtracting the mean, giving negative scores to sites
was given by the sum of quality scores of all its judged
Using RF scores rather than real relevance judgments al-
pages (all pages from quality-judged sites). This means
lows us to get some idea of relevance without extensive rele-
2Corresponding to a hypothetical zero-length document
3http://www.dmoz.org/Health/Mental_Health/
containing infinite numbers of each of the query terms.
Figure 1: Comparison of the BF, relevance and qual-
Figure 2: Quality score for each crawl based on all
ity crawlers for relevance using the RF classifier.
that the quality score captures both the quality of thepages and the size of the crawl.
ity scores. This means they crawled more pages from higher-
• Quality score using RF-relevant pages: Not all sites
quality sites than lower-quality ones. Although this is sur-
with quality judgments are dedicated to depression,
prising in the case of the breadth first crawler, it may be
and many contain a large number of irrelevant pages.
because higher-quality sites are simply larger. To explore
We used our RF-relevance classifier to identify the rel-
this, we fully crawled ten AAQ sites and ten BAQ sites, all
evant pages in each crawl, then calculated the total
of which were randomly selected. We found that, on aver-
quality score as above using just those pages.
age, a BAQ site had 56.6 pages while an AAQ site had 450.2
• AAQ and BAQ comparison: We grouped judged sites
into three categories: above average quality (denotedas AAQ, the top 25% of the judged sites), average
The main finding is that the quality crawler, using the qual-
quality (denoted as AQ, the middle 50%) and below
ity RF scores of known link sources to predict the quality of
average quality (denoted as BAQ, the bottom 25%).
the target, was able to significantly outperform the relevance
In some tests we focused on the number of crawled
crawler. Towards the end of the crawls its total quality was
pages from the ‘extreme’ AAQ and BAQ categories.
over 50% better than that of the relevance crawl. RESULTS AND DISCUSSION
Figure 3 shows the same total quality scores, but this timeonly counting pages judged relevant by our RF classifier. Relevance results
The results were similar to the previous figure, particularly
Figure 1 depicts the relevance levels throughout each of our
for the quality crawler, so we concluded that the presence
three crawls, based on RF relevance judgments. The rele-
of irrelevant pages was not a major factor in quality evalua-
vance and quality crawls each stabilised after 3,000 pages,
tion. The relevance and quality crawlers suffered a little with
at about 80% and 88% relevant respectively. The breadth
the elimination of some irrelevant pages from higher-quality
first crawler continued to degrade over time as it got further
sites, whereas the breadth-first crawler benefited from the
from the DMOZ depression seeds. At 10,000 pages it was
elimination of irrelevants from lower-quality sites.
down to 40% relevant and had not yet stabilised.
Now we focus on the AAQ and BAQ categories.
The quality crawler outperformed the relevance crawler, andthis must be due to the incorporation of the quality RF
An interesting set of pages are those that are from AAQ
score. Noticing this, we performed an additional crawl using
sites and are RF-judged to be relevant. These are the pages
relevance RF in place of quality RF, and achieved compa-
we would expect to be most useful in our domain-specific
rable results to the quality crawler. This indicates that RF
engine. Figure 4 shows the number of these pages in each
scores can offer a small improvement in crawl relevance, on
crawl over time. The quality crawler performed very well,
top of our relevance decision tree, with the caveat that, in
with more than 50% of its pages being AAQ and relevant.
this case only, we used RF techniques both to predict which
The other two performed well too, with over 25% of their
links to follow and to evaluate relevance of crawled pages.
Our overall conclusion on relevance is simply that our fo-
Figure 5 shows the number of pages from BAQ sites, re-
cused crawlers succeed in maintaining relevance as crawls
gardless of relevance. The breadth first crawler was much
worse on this count than the other two, with two or threetimes more BAQ pages than the other two. In the quality
Quality results
crawl, only about 5% of the pages were from BAQ sites, and
The quality scores based on all pages from judged sites are
this in combination with the 50% AAQ result underlines the
shown in Figure 2. All three crawlers achieved positive qual-
Table 3: Quality locality analysis according to thelink structure between source sites and target sites
Note that the number of AAQ pages was higher than thenumber of BAQ pages even in the BF crawl. The BF crawler
benefited from the seed list in its early stages — we found
that the seed list has 4 BAQ but 7 AAQ URLs — and alsofrom the relative sizes of AAQ and BAQ sites. However,
Figure 3: Quality score for each crawl based on rel-
in larger crawls the influence of the seed list would become
less, and focus would become increasingly important. FURTHER QUALITY ANALYSIS
We ran two additional experiments using our quality judg-ments. One measured the ‘quality locality’ of linkage be-
tween judged sites. The other considered what happens if
we post-filter our crawls using our quality scoring formula
(equation 3) on the text of the crawled pages, dropping low-
Quality locality analysis
Topic locality experiments described in [10] indicated that
pages typically link to pages with similar content. For a
quality-focused crawler to function effectively we hope thereis also ‘quality locality’. More specifically it would be helpful
if higher-quality sites tend to link to each other, making it
easier for the crawler to identify more of the same.
We did a breadth first crawl of 100,000 pages starting from
Figure 4: Number of relevant and above-average-
pages, we identified all links between sites, including linksto URLs that were not yet crawled. We then analysed link-age between our 114 judged depression sites, in particularcalculating the average number of sites of each type linkingto sites of other types (Table 3). For example, on averageeach AAQ site had links from 2.53 AAQ sites, 1.92 AQ sites
If quality locality were a direct analogue of topic locality, wemight expect to see a cluster of AAQ sites linking to each
other and another cluster of BAQ sites. What we observedin the linkage between judged sites was a tendency to link to
AAQ sites, even amongst links from BAQ sites. This meansthat no matter which judged site is crawled, the crawler is
most likely to find AAQ-site links. We also observed thathigher-quality sites had more outlinks. We conclude that
the observed link patterns are favourable for quality-focusedcrawling. Post-filtering for quality
Figure 5: Number of all below-average-quality pages
We observed pages from BAQ sites in all three crawls (Fig-
ure 5). An alternate way of using our RF quality scores isto post-filter our crawls, removing pages with quality scoresbelow some threshold. The question is whether filtering a
first crawler is an alternative to a quality-focused crawler.
However, certainly at an Australian university that paysover AUD20 per gigabyte of traffic, some focus is desirable.
Finally, there are some experiments we did not perform.
We did not consider how the quality score could be incorpo-
rated as a ranking feature, at query time. We do not have
the necessary per-query relevance and quality judgments to
do this. Also we did not consider post-filtering using the RFrelevance score. Again, we do not have the necessary human
judgments to carry out this experiment. Furthermore, stan-dard IR systems are robust to having irrelevant documents
in the crawl and the harm caused by retrieving one is low,
so we believe quality filtering is the more important case.
Figure 6: Quality score for each crawl at different
CONCLUSIONS AND FUTURE WORK
Subject-specific search facilities on health sites are usuallybuilt using manual inclusion and exclusion rules, which re-
Table 4: A comparison of quality scores between the
quire a lot of human effort in building and maintenance. We
quality crawl and each of the post-filtering BF crawls
have designed and built a fully automatic quality focused
of different sizes. The number of judged pages were
crawler for a mental health topic of depression, which was
set to 2737, which was the number of pages from
able to selectively crawl higher quality and relevant content.
Our work has resulted in four key findings.
First, domain relevance on depression could be well pre-
dicted using link anchor context. A relevance-focused crawler
based on this information fetched twice as many relevant
pages as a breadth-first control. A combination of link an-
chor context and source-page relevance feedback improved
Second, link anchor context alone was not sufficient to pre-
crawl by RF quality score can improve its overall human-
dict quality of Web pages. Instead, relevance feedback tech-
nique proved useful. We used this technique to learn andderive a list of terms representing high quality content from
In our first post-filtering experiment we progressively ap-
a small set of training data, which was then scored against
plied a stronger filter to our three main crawls (Figure 6).
crawled source pages to predict the quality of the targets.
Because below-the-mean sites received negative scores in our
Compared to the relevance and BF crawls, a quality crawl
scoring system, we expected an increase in total quality
using this approach obtained a much higher total quality
scores at certain thresholds where more low quality pages
score, significantly more relevant pages from high quality
were filtered out. However, we were unable to improve the
sites and fewer pages from low quality sites.
quality crawl or the relevance crawl by post-filtering. Thesecrawls already had good overall quality, and our RF quality
Third, analysis on quality locality suggested that above av-
score was not sufficient to improve on that. We observed
erage quality depression sites tended to have more incoming
some improvement in the breadth first crawl, but it did not
links and outgoing links compared to other types of site.
This observed link pattern is favourable for quality focusedcrawling, explaining in part why it was able to succeed.
Since the breadth first crawler was able to be improvedby post-filtering, our second experiment filtered successively
Fourth, quality of content might be improved by post-filtering
larger breadth-first crawls, to see if the quality-focused crawl
a very big breadth-first crawl if an appropriate filtering thresh-
could be surpassed. The quality crawl contained 2,737 pages
old is set. This leads to a trade-off decision between cost and
from judged sites, so for each breadth-first crawl we set the
efficiency. The post-filtering approach could be adopted in
filtering threshold to give us 2,737 pages from judged sites.
cases where a massive increase in crawl traffic and server
Note, this threshold also gave us a large number of pages
load is acceptable. Although we could not improve our other
from unjudged sites, adding some uncertainty to the quality
two crawlers by filtering, it might hypothetically be possible
to do so in a larger-scale experiment, and this would be aless wasteful approach than all-out breadth first crawling.
Table 4 shows the results of the experiment. To surpass thequality rating of the quality crawler we had to increase the
Given the interesting results that we found, there is obvious
breadth-first crawl size to 25,000 pages, compared to 3,000
follow-up work to be done on focused crawling. In particu-
pages for the quality-focused crawl. This means that if an
lar, it would be interesting to compare our quality crawl with
appropriate threshold can be set and a massive increase in
other depression-specific search portals and general search
crawl traffic and server load is acceptable, a filtered breadth
engines in terms of relevance and quality by running queries
against these engines and measuring the results.
[14] K. Griffiths, H. Christensen, and S. Blomberg. Website
quality indicators for consumers. In Tromso Telemedicine
Another question would be whether we could improve our
and e-Health Conf., Tromso, Norway, 2004.
[15] D. Harman. Towards interactive query expansion. In Procs.
links on page basis. Possibly, another quality focused crawler
of the 11th annual international ACM SIGIR conference
working on site basis, (by accumulating the quality scores
on Research and development in information retrieval,pages 321–331, New York, NY, USA, 1988. ACM Press.
of all the crawled pages from the same sites, and crawlingnew pages according to the predicted quality score of the
[16] M. Hersovici, M. Jacovi, Y. S. Maarek, D. Pellegb,
site containing them) could achieve even better results.
M. Shtalhaima, and S. Ura. The shark-search algorithm. anapplication: tailored web site mapping. In WWW7, 1998.
Investigation of whether our findings generalise to other
[17] A. R. Jadad and A. Gagliardi. Rating health information
on the internet. JAMA, 279:611–614, 1998.
health domains (characterised by an evidence-based notionof quality) or more generally is left for future work.
[18] R. Kiley. Quality of medical information on the internet. J.
Royal Soc. of Med., 91:369–370, 1998. ACKNOWLEDGMENTS
[19] D. D. Margineantu and T. G. Dietterich. Improved class
We gratefully acknowledge the assistance of Alistair Rendell
probability estimates from decision tree models. In D. D.
and Helen Christensen for seeking financial support for the
Denison, M. H. Hansen, C. C. Holmes, B. Mallick, and
project and the effort of our relevance and quality judges
B. Yu, editors, Lecture Notes in Statistics. Nonlinear
Sonya Welykyj, Michelle Banfield and Alison Neil.
Estimation and Classification, volume 171, pages 169–184,New York, 2002. Springer-Verlag. REFERENCES
[20] A. McCallum, K. Nigam, J. Rennie, and K. Seymore.
Building domain-specific search engines with machine
[1] C. C. Aggarwal, F. Al-Garawi, and P. S. Yu. On the design
learning technique. In Procs. of AAAI Spring Symposium
of a learning crawler for topical resource discovery. ACM
on Intelligents Engine in Cyberspace, 1999.
Trans. Inf. Syst., 19(3):286–309, 2001.
[21] S. L. Price and W. R. Hersh. Filtering web pages for
[2] L. Baker, T. H. Wagner, S. Singer, and M. K. Bundorf. Use
quality indicators: An empirical approach to finding high
of the internet and e-mail for health care information.
quality consumer health information on the world wide
web. In Procs. of the AMIA 1999 Annual Symposium,
[3] P. D. Bra, G. Houben, Y. Kornatzky, and R. Post.
pages 911–915, Washington DC, 1999.
Information retrieval in distributed hypertexts. In Procs. of
[22] J. R. Quinlan. C4.5: programs for machine learning.
the 4th RIAO Conference, pages 481–491, New York, 1994.
Morgan Kaufmann Publishers Inc., San Francisco, CA,
[4] S. Brin and L. Page. The anatomy of a large-scale
hypertextual web search engine. In WWW7, pages 107–117,
[23] A. Risk and J. Dzenowagis. Review of internet health
information quality initiatives. JMIR, 3(4):e28, 2001.
[5] CEBMH. A systematic guide for the management of
depression in primary care: treatment. University of
[24] S. E. Robertson. On term selection for query expansion. J.
cebmh/guidelines/depression/treatment.html, Accessed
[25] S. E. Robertson and K. S. Jones. Relevance weighting of
search terms. Journal of the American Society for
[6] S. Chakrabarti, M. Berg, and B. Dom. Focused crawling: A
Information Science, 27(3):129–146, 1976.
new approach to topic-specific web resource discovery. In
[26] S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu,
and M. Gatford. Okapi at trec-3. In Procs. of the Third
[7] S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan,
Text REtrieval Conference, pages 109–126, USA, 1996.
D. Gibson, and J. Kleinberg. Automatic resource
[27] W. M. Silberg, G. D. Lundberg, and R. A. Musacchio.
compilation by analyzing hyperlink structure and
Assessing, controlling, and assuring the quality of medical
associated text. In Procs. of the WWW7, pages 65–74,
information on the internet. JAMA, 277:1244–1245, 1997.
Brisbane, Australia, 1998. Elsevier Science Publishers B. V.
[28] T. T. Tang, N. Craswell, D. Hawking, K. M. Griffiths, and
[8] D. Charnock, S. Shepperd, G. Needham, and R. Gann.
H. Christensen. Quality and relevance of domain-specific
Discern: an instrument for judging the quality of written
search: A case study in mental health. To appear in the
consumer health information on treatment choices. J.
Journal of Information Retrieval - Special Issues, 2005.
Epidemiol Community Health, 53:105–111, 1999.
[29] T. T. Tang, D. Hawking, N. Craswell, and R. S.
[9] J. Cho, H. Garcia-Molina, and L. Page. Efficient crawling
Sankaranarayana. Focused crawling in depression portal
through url ordering. In WWW7, 1998.
search: A feasibility study. In Procs. of the Ninth ADCS,
[10] B. D. Davison. Topical locality in the web. In Procs. of the
23rd annual international ACM SIGIR conference on
[30] I. H. Witten and E. Frank. Data Mining: Practical
Research and development in information retrieval, pages
machine learning tools with Java implementations. Morgan
272–279, New York, NY, USA, 2000. ACM Press.
[11] M. Diligenti, F. M. Coetzee, S. Lawrence, C. L. Giles, and
M. Gori. Focused crawling using context graphs. In Procs. of the 26th VLDB Conference, Cairo, Egypt, 2000.
[12] K. Griffiths and H. Christensen. Quality of web based
information on treatment of depression: cross sectionalsurvey. British Medical Journal, 321:1511–1515, 2000. bmj.bmjjournals.com/cgi/content/full/321/7275/1511.
[13] K. Griffiths and H. Christensen. The quality and
accessibility of australian depression sites on the world wideweb. The Medical Journal of Australia, 176:S97–S104, 2002.
Notice Application for a 95-MW power plant in the Harmattan area Grande Prairie Generation, Inc. (GPG) has filed a facility application to construct and operate a 95-megwatt (MW) natural gas-fired power plant in the Harmattan area. Anyone who wishes to express their objections to, concerns about, or support of the application, must make a written submission to the Alberta
The American Journal of Forensic Medicine and Pathology: Volume 20(1) March 1999 p 101 "Tumescent" Liposuction Alert: Deaths From Lidocaine Cardiotoxicity [Letters To The Editor] de Jong, Rudolph H. M.D.; Grazer, Frederick M. M.D. In our just-completed survey of complications following cosmetic surgery, preliminary analysis identified a number of deaths in which lidocaine (Xylocain