A Methodology for Evaluation of Web-based Scholarship

 

David Dailey

Slippery Rock University
Department of Computer Science
Slippery Rock PA

From paper in press at ACM SIGITE 2008

Introduction

  1. citation data (from Science Citations Index to Google)

  2. peer review and "eminence" (quality vs Quality) [1]

  3. turnaround time for publication

Notation

  1. R=S(Q,W,D)  refers to the ranking (R) of a given web site, W, by a Search Engine, S,  in response to a query, Q, at a given date, D. [2]

  2. N = S(Q,*,D) to imply that N= S (Q,W,D), summed over all websites, W.

  3. examples:

    Google([html], *, July 4, 2008) = 11,210,000,000

    Google([html], en.wikipedia.org/wiki/HTML, July 4, 2008) = 1/11,210,000,000
    and

    Google([html], www.w3.org/TR/REC-html40, July 4, 2008) = 3/11,210,000,000

    or (simplification):

    Google([html], W3, July 4, 2008) = 1/11,210,000,000

    and

    Google([html], W4, July 4, 2008) = 3/11,210,000,000

     

  4. I(S(Q,W,D))= -log2(S(Q,W,D) .

    So in the above examples, for D=July 4, 2008

    I(Google([html], W3))=33.384               

    and

    I(Google([html], W4))=31.799

Research Relevance

  1. Google Page Rank [5]

  2. Google and DirectHit/Teora since 2002

  3. others

  4. raw data on 19 web sites across top five search engines

  5. intercorrelations between search engines

Potential Problems

  1. Everything is relative to the query

    Google([license to derive]), W27) = 1/2,010,000

    Google(“license to derive”, W27) = 1/1280

  2. Algorithms change

    Google([animation svg], *, Oct. 2007) = 1,870,000

    Google([animation svg], *, June. 2008) = 161,000

  3. Novelty and longevity can both affect rank

    Google( [JavaScript SVG animation], W29, Aug. 2005) = 3/61,300

    Google( [JavaScript SVG animation], W29, Jul. 2008) = 1/5,860,000
    However,
    Google([public domain imagery], W30, Aug. 2002)=1/52,000
    Google([public domain imagery], W30, Oct. 2008)=3/379,000

  4. Some search algorithms are depth-limited

  5. Impressions of citations rather than citations

  6. Impermanence of web addresses

  7. Linking is not always from academic sites

  8. cross-disciplinary issues in relevance?

  9. factors other than linking now used by search engines

  10. legal issues

Conclusions


Footnotes:

[1] Zusne, L., & Dailey, D. P. (1982). History of psychology texts as measuring instruments of eminence in psychology. Revista de Historia de la Psycologia1982, 3, 7-4

[2] This ranking, R, is actually an ordered pair of integers (m,n) where m represents the ordinal rank of the page W and n represents the total number of pages returned by S in response to Q.

[3]  en.wikipedia.org/wiki/HTML

[4] www.w3.org/TR/REC-html40

[5]    Google, Inc. 2008. Corporate Information: Technology Overview. July 2008. DOI= http://www.google.com/corporate/tech.html.