Sunday, November 08, 2009

Modeling the World Wide Web

Just so I don't forget, some papers by Filippo Menczer, who appears to be doing work related to an idea I've been mulling over for a while.

Informally, consider the World Wide Web as a graph with web pages as nodes and hyperlinks as edges, label each node with some value derived from the contents of the web page (e.g. the length of the page, the set of terms in the page, or the term vector for the page), then define the value of an edge as the difference between the nodes it connects. This basically yields a geometric representation of the web graph by mapping it into some metric space. (If the idea is still fuzzy, imagine a hyperlink as a function from the document that contains it to the document to which it points.)

Now that I've found some already-published work on this model, I'll have to spend next semester learning what people have already done so I can do something new.

No comments: