david-hawking.net

David-hawking.net

Focused crawling for both relevance and quality of medical
information∗
ABSTRACT
tion Search and Retrieval—information filtering, retrieval Subject-specific search facilities on health sites are usually built using manual inclusion and exclusion rules. These canbe expensive to maintain and often provide incomplete cov- General Terms
erage of Web resources. On the other hand, health infor- experimentation, performance, measurement mation obtained through whole-of-Web search may not bescientifically based and can be potentially harmful.
Keywords
To address problems of cost, coverage and quality, we built quality health search, focused crawling, domain-specific search a focused crawler for the mental health topic of depression,which was able to selectively fetch higher quality relevant in- INTRODUCTION
formation. We found that the relevance of unfetched pages A survey of US Internet users found that forty percent of can be predicted based on link anchor context, but the qual- respondents used the Internet to find advice or information ity cannot. We therefore estimated quality of the entire link- ing page, using a learned IR-style query of weighted single have shown that medical information on the Internet can words and word pairs, and used this to predict the quality be fraudulent, of dubious quality and potentially dangerous of its links. The overall crawler priority was determined by the product of link relevance and source quality.
It is desirable that a search service over health web sites We evaluated our crawler against baseline crawls using both should return results which are not only relevant to the relevance judgments and objective site quality scores ob- query but in accord with evidence-based medical guidelines.
tained using an evidence-based rating scale. Both a rele- Health experts, based on either scientific evidence or ac- vance focused crawler and the quality focused crawler re- countability criteria, have developed protocols for manual trieved twice as many relevant pages as a breadth-first con- assessment of medical web site quality [12, 8].
trol. The quality focused crawler was quite effective in re- there is very little prior work on using automated quality ducing the amount of low quality material fetched while assessments, either in determining what to index or how to crawling more high quality content, relative to the relevance rank potential search results. One exception, due to Price and Hersh [21], reranks results from general search enginesbased on automated ratings of relevance, credibility, absence Analysis suggests that quality of content might be improved of bias, content currency and value of links.
by post-filtering a very big breadth-first crawl, at the costof substantially increased network traffic.
ANU’s Centre for Mental Health Research operates a website1 which publishes evidence-based information on depres- Categories and Subject Descriptors
sive illness and also provides integrated search of over 200 H.3.3 [Information Storage and Retrieval]: Informa- depression sites. Currently, the set of indexed sites is man-ually maintained, using a seed list and URL-based inclusion ∗(Produces the permission block, copyright information and rules that determine which parts of each site are indexed.
Here we report our experiences in developing a fully auto- matic alternative, using a focused-crawler that takes intoaccount relevance and quality.
BACKGROUND AND RELATED WORK
Assessment of the quality of information
on medical sites
The ultimate measure of the quality of a health web site isits effect on health outcomes but it is not usually feasible for website publishers or visitors to obtain that information.
Focused crawling
Next best would be an assessment of the extent to which First introduced by de Bra et al. [3], and subsequently stud- the content of the site is consistent with the best available ied by many others [6, 9, 16], focused crawlers are designed scientific evidence — evidence-based medicine — but deter- to selectively fetch content relevant to a specified topic of interest using the Web’s hyperlink structure.
Therefore in the present study, experts rate our crawled sites A focused crawler starts from a seed list of topical URLs.
on a 21-point scale derived by Griffiths and Christensen [13].
It estimates the likelihood that each subsequent candidate These ratings are based on a set of evidence-based depres- link will lead to further relevant content, and may priori- sion guidelines published by the Centre for Evidence Based tise crawling order on that basis and/or reject low-likelihood links. Evidence such as link anchortext, URL words andsource page relevance are typically exploited in estimating There are also rating schemes for non-experts such as Silberg [27] and DISCERN [8]. They focus on accountability crite-ria which could be measured by people without extensive McCallum et al. [20] used Naive Bayes classifiers to cate- medical expertise, such as whether the author is identified gorise hyperlinks while Diligenti et al. [11] used the context- and whether the site has been recently updated. However, graph idea to guide a focused crawler. Rather than examin- a study of depression web sites by Griffiths and Christensen ing relevant nodes alone, both techniques trained a learner [12] found no correlation between Silberg scores and expert with features collected from paths leading up to the relevant related with DISCERN scores [14], but carrying out suchmanual assessments is a lengthy process.
Chakrabarti et al. [6], on the other hand, used hypertextgraphs including in-neighbours (documents citing the tar- In the Web search literature, link graph measures such as get document) and out-neighbours (documents that target PageRank [4] have been promoted as indicators of quality, document cites) as input to some classifiers. According to but how this type of quality might correlate with a medical these authors, a focused crawler can acquire relevant pages definition has been little studied. A very recent study by steadily while a standard crawler quickly indexes a large Griffiths and Christensen [14] found only a moderate corre- number of irrelevant pages and loses its way, even though lation between Google-reported PageRank and the 21-point they started from the same seed lists.
rating scale. In this study we follow a content-based ap-proach.
CRAWLERS
Relevance feedback
In this section we introduce three crawlers. First we describe Relevance feedback (RF) is a well-known information re- our use of relevance feedback to estimate quality of pages, trieval approach of ‘query by example’. Given example sets then our classifier to compute relevance scores for links.
of relevant documents, the goal is to find more of the same.
Finally we describe the crawlers: breadth-first, relevance- In this paper, we use this both to identify depression-relevant pages and high-quality depression-relevant pages. Our spe-cific application of RF is described in more detail in Sec-tion 3.1.
Relevance feedback for page relevance and
quality
We applied Robertson’s approach to term selection [24]. In A quality-focused crawler needs some way of predicting the this approach, there are three ways to calculate the selection quality of uncrawled URLs, to set its priority. We tried var- value for a term: using the probability of the term occurring ious methods to predict this, using as training data quality- in a relevant document (r/R), rewarding terms that occur judged depression pages from a previous study [28].
found it impossible to predict the quality of a link target erage of these. We used the third approach, computing the based on its anchor context alone, so we abandoned attempts to score each link separately. Instead we scored the quality of the whole page and applied this equally to the page’s where R is the number of known relevant documents, r is We used relevance feedback to predict page quality. RF was the number of documents in R that contain term t and tf is a natural choice here, because a focused crawling framework the frequency of occurrence of the term within a document.
needs to prioritise the crawling order, and RF gives us scoresthat can be used in ranking. We also made separate use of The weight w was calculated using the Robertson-Sparck relevance feedback in scoring topic relevance for evaluation purposes only. Both quality RF and relevance RF are de- scribed in this section. Both use the term selection methods described in Section 2.2 to identify extra query words and (n − r + 0.5)/(N − n − R + r + 0.5) phrases. Phrases usually include two adjacent words, but where N is the number of documents in the collection, n is sometimes three words if the middle word is a preposition, the collection frequency which is the number of documents for example ‘treatment of depression’.
containing a specific term; and R and r are defined as above.
Document scoring based on relevance feedback Table 1: Examples of terms in the relevance query.
We used the Okapi BM25 weighting function [26] to score documents against the two weighted queries: where tfd is the number of times term t occurs in document d, N is the number of documents in the collection, n is thenumber of documents containing t, dl is the length of thedocument and avdl is the average document length.
Scores calculated with BM25 are collection dependent. Rather Table 2: Examples of terms in the quality query.
than assuming a collection of the documents crawled thusfar, we chose to assume a more general web context and used values for the collection parameters (N = 2, 376, 673, avdl = 15 , 036 and n) which were derived from a large gen- eral crawl of Australian educational websites.
The final score was computed using the following equation: where Qt is obtained from equation 1 and wt from equa-tion 2. These scores represented either quality or relevance Using relevance judgments from a previous experiment [28],we selected 347 relevant and 9000 irrelevant documents.
Decision tree for link relevance
We applied the Robertson selection value formula to ob- In our previous work we developed a classifier for predicting tain weights for all the terms in relevant documents. Past the relevance of a link target, based on features in the link’s research has suggested that the number of terms that could source page [29]. We evaluated a number of learning algo- be usefully added to expand a query might range from 20 to rithms provided by the Weka package [30], such as k-nearest 40 [15]. We arbitrarily selected 20 top weighted single words neighbor, Naive Bayes, and C4.5. Since then we also eval- and 20 top weighted phrases. See examples in Table 1.
uated Perceptron. The C4.5 decision tree [22] was the bestamongst those evaluated.
From the same previous experiment we identified 107 doc- The classifier is based on words in the anchor text, words in uments relevant to depression and of high quality, and an- the target URL and words in the 50 characters before and other set of 3002 documents which were either irrelevant or after the link (link context). If we found multiple links to the same URL, we included all available anchor contexts.
This is a relatively standard approach [1, 9, 7].
We used the same technique as for the relevance query toproduce two candidate term lists: one containing single words To produce a confidence score at each leaf node of the deci- and the other containing phrases. However, we used a more sion tree we used a Laplace correction formula [19]: sophisticated procedure to choose a term selection cutoff.
We first derived a list of words and phrases representing ef- fective depression treatments from [13], dividing multi-word where N is the total number of training examples that reach treatments into the type of phrases described above. E.g.
the leaf; Nk is the number of training examples from class k ‘cognitive behaviour therapy’ became ‘cognitive behaviour’ reaching the leaf; K is the number of classes and λk is the and ‘behaviour therapy’. We then located these words and prior for class k and is usually set to be 1. In our case, K is 2 phrases in the candidate lists and cut off the lists just after because we only had two classes, relevance and irrelevance.
the lowest-ranked occurrence of an effective treatment term.
Surprisingly this gave us the same cutoff (20) for phrases and Combining quality and relevance scores
a similar cutoff for single words (29). Some example terms We used the quality score of a page (computed using rele- vance feedback) to predict the quality of its outlinks. If morethan one known page linked to the same URL, we took the Note that the two queries include many terms in common, mean quality score of the linking pages. Relevance scores because both are on the topic of depression. High-quality computed from the decision tree were already aggregated depression-relevant documents are a subset of depression- relevant documents. The quality query contains more wordsrelating to effective treatment methods such as ‘cognitive To order the crawl queue for the quality crawler, we com- therapy’ or antidepressant medications like ‘zoloft’ and ‘paxil’.
bined the quality and relevance scores. The overall score for vance judging. However, to validate the accuracy of our RF-based ‘judgments’, we employed two relevance assessors to judge the relevance of 300 RF-relevant and 120 RF-irrelevant pages. These pages were randomly selected from all the RF where conf idence levelrel is the URL’s relevance score (equa- results of all the crawled pages. As for the judging criterion, tion 4), DScorei using the quality query is a linking page’s any page about the mental illness ‘depression’ was consid- quality score (equation 3), and m is the number of pages The level of agreement between the two assessors was high The decision to multiply rather than add the scores was (91.2%) indicating that judging for such a simple topic is taken arbitrarily as combining relevance and quality is a easy. The RF-judgments had an accuracy of 89.3%, a 90.9% relatively new concept in IR. A side effect of taking the success rate in predicting the relevance category, and a 84.6% product is that if one of the two scores is zero, the overall success rate in predicting the irrelevance category. We con- cluded that these levels were high enough to present someRF-judgment-based results.
Our three crawlers
Note that this RF classifier was only used in evaluating the We evaluated three crawlers: the breadth-first (BF) crawler, relevance of sets of pages returned by the various crawlers.
the relevance crawler, and the quality crawler.
None of these three crawlers used this classifier in deciding crawler encounters a new URL that URL is added to a crawl queue, and the crawler proceeds by taking URLs from thatqueue. The crawlers differ in how their crawl queues are We evaluated relevance of the three crawlers, each starting from a seed set of 160 URLs taken from the DMOZ depres-sion directory3. We evaluated the first 10,000 pages from The BF crawler serves as a baseline for comparison. It tra- each crawler according to RF-relevance.
verses the link graph in a breadth-first fashion, placing eachnewly discovered URL in a FIFO queue. The BF crawler islikely to find some depression pages since we start it from Quality experiments
depression-relevant seed pages, but we would expect the rel- Most of the models for assessing the quality of depression evance of its crawl to fall as the crawl progresses.
content on the Web refer to the entire sites, not individualpages [8, 17]. We therefore grouped all the pages in each The relevance crawler is designed to prefer domain-relevant crawl into sites. Pages originated from the same host names pages, ordering its crawl queue using the relevance decision were considered to be from the same sites.
tree discussed in Section 3.2. The relevance RF score is notused, we reserve it for use in evaluation. By crawling the The quality of the sites was evaluated by a research assis- highest-scoring URLs first, we would expect the relevance tant from the Centre for Mental Health Research using a crawler to maintain its overall relevance more successfully rating scale derived by Griffiths and Christensen [13] from the CEBMH evidence-based clinical guidelines. Each sitewas assigned a quality score in the range 0 to 20.
The quality crawler is designed to prefer higher-quality domain-relevant pages. Each URL is given a score that was com- Since judging took 4 hours per site on average, we could not puted using equation 5. A major focus of this paper is to use the full 160 page seed list. If we did, a large amount evaluate whether the quality crawler can successfully priori- of effort would be needed just to judge seeds, and these tise its queue to maintain the overall quality of its crawl and are uninformative with respect to crawl strategy. Therefore avoid pages with low quality, potentially harmful advice.
we randomly selected 18 URLs from the 160 to use as ourquality experiment seeds. We cut off each of our three crawls EXPERIMENTS AND MEASURES
at 3,000 pages. For this small crawl size, we were able to Relevance experiment
judge the quality of any site with 6 or more crawled pages.
We used our RF relevance score (applying the relevancequery in equation 3), and a score threshold to evaluate the We propose three measures to compare crawl quality. Note overall relevance of our three crawls.
that, in our measures, the quality score of a page is assigned found using 1000 relevant and 1000 irrelevant pages from the quality score of the site containing it.
our previous study (these were separate from those used togenerate the relevance query). A threshold at 25% of the • Quality score using all crawled pages: We first com- theoretical maximum BM25 score (of 502.882) minimised puted the mean value of the quality scores of all the the total number of false positives and false negatives, so in judged sites. We then transformed the site scores by our crawls we labeled pages with RF relevance score greater subtracting the mean, giving negative scores to sites was given by the sum of quality scores of all its judged Using RF scores rather than real relevance judgments al- pages (all pages from quality-judged sites). This means lows us to get some idea of relevance without extensive rele- 2Corresponding to a hypothetical zero-length document 3http://www.dmoz.org/Health/Mental_Health/ containing infinite numbers of each of the query terms.
Figure 1: Comparison of the BF, relevance and qual- Figure 2: Quality score for each crawl based on all ity crawlers for relevance using the RF classifier.
that the quality score captures both the quality of thepages and the size of the crawl.
ity scores. This means they crawled more pages from higher- • Quality score using RF-relevant pages: Not all sites quality sites than lower-quality ones. Although this is sur- with quality judgments are dedicated to depression, prising in the case of the breadth first crawler, it may be and many contain a large number of irrelevant pages.
because higher-quality sites are simply larger. To explore We used our RF-relevance classifier to identify the rel- this, we fully crawled ten AAQ sites and ten BAQ sites, all evant pages in each crawl, then calculated the total of which were randomly selected. We found that, on aver- quality score as above using just those pages.
age, a BAQ site had 56.6 pages while an AAQ site had 450.2 • AAQ and BAQ comparison: We grouped judged sites into three categories: above average quality (denotedas AAQ, the top 25% of the judged sites), average The main finding is that the quality crawler, using the qual- quality (denoted as AQ, the middle 50%) and below ity RF scores of known link sources to predict the quality of average quality (denoted as BAQ, the bottom 25%).
the target, was able to significantly outperform the relevance In some tests we focused on the number of crawled crawler. Towards the end of the crawls its total quality was pages from the ‘extreme’ AAQ and BAQ categories.
over 50% better than that of the relevance crawl.
RESULTS AND DISCUSSION
Figure 3 shows the same total quality scores, but this timeonly counting pages judged relevant by our RF classifier.
Relevance results
The results were similar to the previous figure, particularly Figure 1 depicts the relevance levels throughout each of our for the quality crawler, so we concluded that the presence three crawls, based on RF relevance judgments. The rele- of irrelevant pages was not a major factor in quality evalua- vance and quality crawls each stabilised after 3,000 pages, tion. The relevance and quality crawlers suffered a little with at about 80% and 88% relevant respectively. The breadth the elimination of some irrelevant pages from higher-quality first crawler continued to degrade over time as it got further sites, whereas the breadth-first crawler benefited from the from the DMOZ depression seeds. At 10,000 pages it was elimination of irrelevants from lower-quality sites.
down to 40% relevant and had not yet stabilised.
Now we focus on the AAQ and BAQ categories.
The quality crawler outperformed the relevance crawler, andthis must be due to the incorporation of the quality RF An interesting set of pages are those that are from AAQ score. Noticing this, we performed an additional crawl using sites and are RF-judged to be relevant. These are the pages relevance RF in place of quality RF, and achieved compa- we would expect to be most useful in our domain-specific rable results to the quality crawler. This indicates that RF engine. Figure 4 shows the number of these pages in each scores can offer a small improvement in crawl relevance, on crawl over time. The quality crawler performed very well, top of our relevance decision tree, with the caveat that, in with more than 50% of its pages being AAQ and relevant.
this case only, we used RF techniques both to predict which The other two performed well too, with over 25% of their links to follow and to evaluate relevance of crawled pages.
Our overall conclusion on relevance is simply that our fo- Figure 5 shows the number of pages from BAQ sites, re- cused crawlers succeed in maintaining relevance as crawls gardless of relevance. The breadth first crawler was much worse on this count than the other two, with two or threetimes more BAQ pages than the other two. In the quality Quality results
crawl, only about 5% of the pages were from BAQ sites, and The quality scores based on all pages from judged sites are this in combination with the 50% AAQ result underlines the shown in Figure 2. All three crawlers achieved positive qual- Table 3: Quality locality analysis according to thelink structure between source sites and target sites Note that the number of AAQ pages was higher than thenumber of BAQ pages even in the BF crawl. The BF crawler benefited from the seed list in its early stages — we found that the seed list has 4 BAQ but 7 AAQ URLs — and alsofrom the relative sizes of AAQ and BAQ sites. However, Figure 3: Quality score for each crawl based on rel- in larger crawls the influence of the seed list would become less, and focus would become increasingly important.
FURTHER QUALITY ANALYSIS
We ran two additional experiments using our quality judg-ments. One measured the ‘quality locality’ of linkage be- tween judged sites. The other considered what happens if we post-filter our crawls using our quality scoring formula (equation 3) on the text of the crawled pages, dropping low- Quality locality analysis
Topic locality experiments described in [10] indicated that pages typically link to pages with similar content. For a quality-focused crawler to function effectively we hope thereis also ‘quality locality’. More specifically it would be helpful if higher-quality sites tend to link to each other, making it easier for the crawler to identify more of the same.
We did a breadth first crawl of 100,000 pages starting from Figure 4: Number of relevant and above-average- pages, we identified all links between sites, including linksto URLs that were not yet crawled. We then analysed link-age between our 114 judged depression sites, in particularcalculating the average number of sites of each type linkingto sites of other types (Table 3). For example, on averageeach AAQ site had links from 2.53 AAQ sites, 1.92 AQ sites If quality locality were a direct analogue of topic locality, wemight expect to see a cluster of AAQ sites linking to each other and another cluster of BAQ sites. What we observedin the linkage between judged sites was a tendency to link to AAQ sites, even amongst links from BAQ sites. This meansthat no matter which judged site is crawled, the crawler is most likely to find AAQ-site links. We also observed thathigher-quality sites had more outlinks. We conclude that the observed link patterns are favourable for quality-focusedcrawling.
Post-filtering for quality
Figure 5: Number of all below-average-quality pages We observed pages from BAQ sites in all three crawls (Fig- ure 5). An alternate way of using our RF quality scores isto post-filter our crawls, removing pages with quality scoresbelow some threshold. The question is whether filtering a first crawler is an alternative to a quality-focused crawler.
However, certainly at an Australian university that paysover AUD20 per gigabyte of traffic, some focus is desirable.
Finally, there are some experiments we did not perform.
We did not consider how the quality score could be incorpo- rated as a ranking feature, at query time. We do not have the necessary per-query relevance and quality judgments to do this. Also we did not consider post-filtering using the RFrelevance score. Again, we do not have the necessary human judgments to carry out this experiment. Furthermore, stan-dard IR systems are robust to having irrelevant documents in the crawl and the harm caused by retrieving one is low, so we believe quality filtering is the more important case.
Figure 6: Quality score for each crawl at different CONCLUSIONS AND FUTURE WORK
Subject-specific search facilities on health sites are usuallybuilt using manual inclusion and exclusion rules, which re- Table 4: A comparison of quality scores between the quire a lot of human effort in building and maintenance. We quality crawl and each of the post-filtering BF crawls have designed and built a fully automatic quality focused of different sizes. The number of judged pages were crawler for a mental health topic of depression, which was set to 2737, which was the number of pages from able to selectively crawl higher quality and relevant content.
Our work has resulted in four key findings.
First, domain relevance on depression could be well pre- dicted using link anchor context. A relevance-focused crawler based on this information fetched twice as many relevant pages as a breadth-first control. A combination of link an- chor context and source-page relevance feedback improved Second, link anchor context alone was not sufficient to pre- crawl by RF quality score can improve its overall human- dict quality of Web pages. Instead, relevance feedback tech- nique proved useful. We used this technique to learn andderive a list of terms representing high quality content from In our first post-filtering experiment we progressively ap- a small set of training data, which was then scored against plied a stronger filter to our three main crawls (Figure 6).
crawled source pages to predict the quality of the targets.
Because below-the-mean sites received negative scores in our Compared to the relevance and BF crawls, a quality crawl scoring system, we expected an increase in total quality using this approach obtained a much higher total quality scores at certain thresholds where more low quality pages score, significantly more relevant pages from high quality were filtered out. However, we were unable to improve the sites and fewer pages from low quality sites.
quality crawl or the relevance crawl by post-filtering. Thesecrawls already had good overall quality, and our RF quality Third, analysis on quality locality suggested that above av- score was not sufficient to improve on that. We observed erage quality depression sites tended to have more incoming some improvement in the breadth first crawl, but it did not links and outgoing links compared to other types of site.
This observed link pattern is favourable for quality focusedcrawling, explaining in part why it was able to succeed.
Since the breadth first crawler was able to be improvedby post-filtering, our second experiment filtered successively Fourth, quality of content might be improved by post-filtering larger breadth-first crawls, to see if the quality-focused crawl a very big breadth-first crawl if an appropriate filtering thresh- could be surpassed. The quality crawl contained 2,737 pages old is set. This leads to a trade-off decision between cost and from judged sites, so for each breadth-first crawl we set the efficiency. The post-filtering approach could be adopted in filtering threshold to give us 2,737 pages from judged sites.
cases where a massive increase in crawl traffic and server Note, this threshold also gave us a large number of pages load is acceptable. Although we could not improve our other from unjudged sites, adding some uncertainty to the quality two crawlers by filtering, it might hypothetically be possible to do so in a larger-scale experiment, and this would be aless wasteful approach than all-out breadth first crawling.
Table 4 shows the results of the experiment. To surpass thequality rating of the quality crawler we had to increase the Given the interesting results that we found, there is obvious breadth-first crawl size to 25,000 pages, compared to 3,000 follow-up work to be done on focused crawling. In particu- pages for the quality-focused crawl. This means that if an lar, it would be interesting to compare our quality crawl with appropriate threshold can be set and a massive increase in other depression-specific search portals and general search crawl traffic and server load is acceptable, a filtered breadth engines in terms of relevance and quality by running queries against these engines and measuring the results.
[14] K. Griffiths, H. Christensen, and S. Blomberg. Website quality indicators for consumers. In Tromso Telemedicine Another question would be whether we could improve our and e-Health Conf., Tromso, Norway, 2004.
[15] D. Harman. Towards interactive query expansion. In Procs.
links on page basis. Possibly, another quality focused crawler of the 11th annual international ACM SIGIR conference working on site basis, (by accumulating the quality scores on Research and development in information retrieval,pages 321–331, New York, NY, USA, 1988. ACM Press.
of all the crawled pages from the same sites, and crawlingnew pages according to the predicted quality score of the [16] M. Hersovici, M. Jacovi, Y. S. Maarek, D. Pellegb, site containing them) could achieve even better results.
M. Shtalhaima, and S. Ura. The shark-search algorithm. anapplication: tailored web site mapping. In WWW7, 1998.
Investigation of whether our findings generalise to other [17] A. R. Jadad and A. Gagliardi. Rating health information on the internet. JAMA, 279:611–614, 1998.
health domains (characterised by an evidence-based notionof quality) or more generally is left for future work.
[18] R. Kiley. Quality of medical information on the internet. J.
Royal Soc. of Med., 91:369–370, 1998.
ACKNOWLEDGMENTS
[19] D. D. Margineantu and T. G. Dietterich. Improved class We gratefully acknowledge the assistance of Alistair Rendell probability estimates from decision tree models. In D. D.
and Helen Christensen for seeking financial support for the Denison, M. H. Hansen, C. C. Holmes, B. Mallick, and project and the effort of our relevance and quality judges B. Yu, editors, Lecture Notes in Statistics. Nonlinear Sonya Welykyj, Michelle Banfield and Alison Neil.
Estimation and Classification, volume 171, pages 169–184,New York, 2002. Springer-Verlag.
REFERENCES
[20] A. McCallum, K. Nigam, J. Rennie, and K. Seymore.
Building domain-specific search engines with machine [1] C. C. Aggarwal, F. Al-Garawi, and P. S. Yu. On the design learning technique. In Procs. of AAAI Spring Symposium of a learning crawler for topical resource discovery. ACM on Intelligents Engine in Cyberspace, 1999.
Trans. Inf. Syst., 19(3):286–309, 2001.
[21] S. L. Price and W. R. Hersh. Filtering web pages for [2] L. Baker, T. H. Wagner, S. Singer, and M. K. Bundorf. Use quality indicators: An empirical approach to finding high of the internet and e-mail for health care information.
quality consumer health information on the world wide web. In Procs. of the AMIA 1999 Annual Symposium, [3] P. D. Bra, G. Houben, Y. Kornatzky, and R. Post.
pages 911–915, Washington DC, 1999.
Information retrieval in distributed hypertexts. In Procs. of [22] J. R. Quinlan. C4.5: programs for machine learning.
the 4th RIAO Conference, pages 481–491, New York, 1994.
Morgan Kaufmann Publishers Inc., San Francisco, CA, [4] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In WWW7, pages 107–117, [23] A. Risk and J. Dzenowagis. Review of internet health information quality initiatives. JMIR, 3(4):e28, 2001.
[5] CEBMH. A systematic guide for the management of depression in primary care: treatment. University of [24] S. E. Robertson. On term selection for query expansion. J.
cebmh/guidelines/depression/treatment.html, Accessed [25] S. E. Robertson and K. S. Jones. Relevance weighting of search terms. Journal of the American Society for [6] S. Chakrabarti, M. Berg, and B. Dom. Focused crawling: A Information Science, 27(3):129–146, 1976.
new approach to topic-specific web resource discovery. In [26] S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In Procs. of the Third [7] S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, Text REtrieval Conference, pages 109–126, USA, 1996.
D. Gibson, and J. Kleinberg. Automatic resource [27] W. M. Silberg, G. D. Lundberg, and R. A. Musacchio.
compilation by analyzing hyperlink structure and Assessing, controlling, and assuring the quality of medical associated text. In Procs. of the WWW7, pages 65–74, information on the internet. JAMA, 277:1244–1245, 1997.
Brisbane, Australia, 1998. Elsevier Science Publishers B. V.
[28] T. T. Tang, N. Craswell, D. Hawking, K. M. Griffiths, and [8] D. Charnock, S. Shepperd, G. Needham, and R. Gann.
H. Christensen. Quality and relevance of domain-specific Discern: an instrument for judging the quality of written search: A case study in mental health. To appear in the consumer health information on treatment choices. J.
Journal of Information Retrieval - Special Issues, 2005.
Epidemiol Community Health, 53:105–111, 1999.
[29] T. T. Tang, D. Hawking, N. Craswell, and R. S.
[9] J. Cho, H. Garcia-Molina, and L. Page. Efficient crawling Sankaranarayana. Focused crawling in depression portal through url ordering. In WWW7, 1998.
search: A feasibility study. In Procs. of the Ninth ADCS, [10] B. D. Davison. Topical locality in the web. In Procs. of the 23rd annual international ACM SIGIR conference on [30] I. H. Witten and E. Frank. Data Mining: Practical Research and development in information retrieval, pages machine learning tools with Java implementations. Morgan 272–279, New York, NY, USA, 2000. ACM Press.
[11] M. Diligenti, F. M. Coetzee, S. Lawrence, C. L. Giles, and M. Gori. Focused crawling using context graphs. In Procs.
of the 26th VLDB Conference, Cairo, Egypt, 2000.
[12] K. Griffiths and H. Christensen. Quality of web based information on treatment of depression: cross sectionalsurvey. British Medical Journal, 321:1511–1515, 2000.
bmj.bmjjournals.com/cgi/content/full/321/7275/1511.
[13] K. Griffiths and H. Christensen. The quality and accessibility of australian depression sites on the world wideweb. The Medical Journal of Australia, 176:S97–S104, 2002.

Source: http://www.david-hawking.net/pubs/tang_cikm05.pdf

auc.ab.ca

Notice Application for a 95-MW power plant in the Harmattan area Grande Prairie Generation, Inc. (GPG) has filed a facility application to construct and operate a 95-megwatt (MW) natural gas-fired power plant in the Harmattan area. Anyone who wishes to express their objections to, concerns about, or support of the application, must make a written submission to the Alberta

"tumescent" liposuction alert: deaths from lidocaine cardiotoxicity

The American Journal of Forensic Medicine and Pathology: Volume 20(1) March 1999 p 101 "Tumescent" Liposuction Alert: Deaths From Lidocaine Cardiotoxicity [Letters To The Editor] de Jong, Rudolph H. M.D.; Grazer, Frederick M. M.D. In our just-completed survey of complications following cosmetic surgery, preliminary analysis identified a number of deaths in which lidocaine (Xylocain