hexencodings: Notes for Information Retrieval Quiz #3

Thursday, November 26, 2009

Notes for Information Retrieval Quiz #3

Some things to review before the Information Retrieval quiz:

Lecture 18, Collaborative Filtering and Recommender Systems

Pearson correlation

Lecture 19, Information Extraction

Named entity recognition: find and classify (i.e. determine the category of) all the named entities in a text. Two approaches to named entity recognition:
Rule-based (regular expressions)

Lists of names
Patterns to match things that look like names
Patterns to match the environments that classes of names tend to occur in

ML-based

Get annotated training data
Extract features
Train systems to replicate the annotation

Relation analysis consists of two tasks:

determine if two entities are related
if they are, classify the relation

Features in relation analysis (for each of the above tasks) are:

Features of the named entities involved (their types [concatenation of the types, headwords of the entities)
Features derived from the words between and around the named entities (+- 1, 2, 3; bag of words between)
Features derived from the syntactic environment that governs the two entities (constituent path through the tree from one entity to the other; base syntactic chunk sequence from one to the other; dependency path)

Template filling

Rules and cascades of rules
Supervised ML as sequence labeling

One sequence classifier per slot
One big sequence classifier

Lecture 20, Sentiment Analysis

Classification in sentiment analysis

Coarse classification of sentiment

Document-level classification according to some simple (usually binary) scheme
- Political bias
- Likes/hates
Fine-grained classification of sentiment-bearing mentions in a text

Positive/negative classifichttp://www.blogger.com/post-edit.g?blogID=35269538&postID=7962329312778949245ations of opinions about entities mentioned in a text
Perhaps with intensity

Choosing a vocabulary

Essentially feature selection
Previous examples used all words
Can we do better by focusing on subset of words?
How to find words, phrases, patterns that express sentiment or polarity?
Adjectives

positive: honest important mature large patient
negative: harmful hypocritical inefficient insecure

Verbs

positive: praise, love
negative: blame, criticize

Nouns

positive: pleasure, enjoyment
negative: pain, criticism

Lecture 21, Sentiment Analysis (cont.)

Identifying polarity words

Assume that generating exhaustive lists of polarity words is too hard
Assume contexts are coherent with respect to polarity
Fair and legitimate, corrupt and brutal
But not: fair and brutal, corrupt and legitimate
Example:

Extract all adjectives with > 20 frequency from WSJ corpus
Label them for polarity
Extract all conjoined adjectives
A supervised learning algorithm builds a graph of adjectives linked by the same or different semantic orientation
A clustering algorithm partitions the adjectives into two subsets

Challenges

Mixed sentiment: The steering is accurate but feels somewhat anesthetized.
Sentiment inverters: ... never seen any RWD cars can handle well on snow even
just few inches.
Anaphora and meronymy:

It's a great car for just about anything. The mkVI is pretty
much a mkv but ironing out all the small problems.
Hey is the back seat comfortable? In my MkV it feels like
you're sitting on a vat of acid.

No comments:

Subscribe to: Post Comments (Atom)