A Methodology for Evaluation of Web-based Scholarship
David Dailey
Slippery Rock University
Department of Computer Science
Slippery Rock PA
From paper in press at ACM SIGITE 2008
Introduction
citation data (from Science Citations Index to Google)
peer review and "eminence" (quality vs Quality) [1]
turnaround time for publication
Notation
R=S(Q,W,D) refers to the ranking (R) of a given web site, W, by a Search Engine, S, in response to a query, Q, at a given date, D. [2]
N = S(Q,*,D) to imply that N= S (Q,W,D), summed over all websites, W.
examples:
Google([html], *, July 4, 2008) = 11,210,000,000
Google([html],
en.wikipedia.org/wiki/HTML, July 4, 2008) = 1/11,210,000,000
and
Google([html], www.w3.org/TR/REC-html40, July 4, 2008) = 3/11,210,000,000
or (simplification):
Google([html], W3, July 4, 2008) = 1/11,210,000,000
and
Google([html], W4, July 4, 2008) = 3/11,210,000,000
I(S(Q,W,D))= -log2(S(Q,W,D) .
So in the above examples, for D=July 4, 2008
I(Google([html], W3))=33.384
and
I(Google([html], W4))=31.799
Research Relevance
Google Page Rank [5]
Google and DirectHit/Teora since 2002
raw data on 19 web sites across top five search engines
Potential Problems
Everything is relative to the query
Google([license to derive]), W27) = 1/2,010,000
Google(“license to derive”, W27) = 1/1280
Algorithms change
Google([animation svg], *, Oct. 2007) = 1,870,000
Google([animation svg], *, June. 2008) = 161,000
Novelty and longevity can both affect rank
Google( [JavaScript SVG animation], W29, Aug. 2005) = 3/61,300
Google( [JavaScript SVG
animation], W29, Jul. 2008) = 1/5,860,000
However,
Google([public domain imagery],
W30, Aug. 2002)=1/52,000
Google([public domain imagery],
W30, Oct. 2008)=3/379,000
Some search algorithms are depth-limited
Impressions of citations rather than citations
Impermanence of web addresses
Linking is not always from academic sites
cross-disciplinary issues in relevance?
factors other than linking now used by search engines
legal issues
Conclusions
Footnotes:
[1] Zusne, L., & Dailey, D. P. (1982). History of psychology texts as measuring instruments of eminence in psychology. Revista de Historia de la Psycologia1982, 3, 7-4
[2] This ranking, R, is actually an ordered pair of integers (m,n) where m represents the ordinal rank of the page W and n represents the total number of pages returned by S in response to Q.
[3] en.wikipedia.org/wiki/HTML
[4] www.w3.org/TR/REC-html40
[5] Google, Inc. 2008. Corporate Information: Technology Overview. July 2008. DOI= http://www.google.com/corporate/tech.html.