Notes for Information Retrieval Quiz #3
Some things to review before the Information Retrieval quiz:
Lecture 18, Collaborative Filtering and Recommender Systems
Pearson correlation
Lecture 19, Information Extraction
Named entity recognition: find and classify (i.e. determine the category of) all the named entities in a text. Two approaches to named entity recognition:
Rule-based (regular expressions)
- Lists of names
- Patterns to match things that look like names
- Patterns to match the environments that classes of names tend to occur in
- Get annotated training data
- Extract features
- Train systems to replicate the annotation
- determine if two entities are related
- if they are, classify the relation
Features in relation analysis (for each of the above tasks) are:
- Features of the named entities involved (their types [concatenation of the types, headwords of the entities)
- Features derived from the words between and around the named entities (+- 1, 2, 3; bag of words between)
- Features derived from the syntactic environment that governs the two entities (constituent path through the tree from one entity to the other; base syntactic chunk sequence from one to the other; dependency path)
- Rules and cascades of rules
- Supervised ML as sequence labeling
- One sequence classifier per slot
- One big sequence classifier
Classification in sentiment analysis
- Coarse classification of sentiment
- Document-level classification according to some simple (usually binary) scheme
- Political bias
- Likes/hates
- Fine-grained classification of sentiment-bearing mentions in a text
- Positive/negative classifichttp://www.blogger.com/post-edit.g?blogID=35269538&postID=7962329312778949245ations of opinions about entities mentioned in a text
- Perhaps with intensity
- Essentially feature selection
- Previous examples used all words
- Can we do better by focusing on subset of words?
- How to find words, phrases, patterns that express sentiment or polarity?
- Adjectives
- positive: honest important mature large patient
- negative: harmful hypocritical inefficient insecure
- Verbs
- positive: praise, love
- negative: blame, criticize
- Nouns
- positive: pleasure, enjoyment
- negative: pain, criticism
Identifying polarity words
- Assume that generating exhaustive lists of polarity words is too hard
- Assume contexts are coherent with respect to polarity
- Fair and legitimate, corrupt and brutal
- But not: fair and brutal, corrupt and legitimate
- Example:
- Extract all adjectives with > 20 frequency from WSJ corpus
- Label them for polarity
- Extract all conjoined adjectives
- A supervised learning algorithm builds a graph of adjectives linked by the same or different semantic orientation
- A clustering algorithm partitions the adjectives into two subsets
- Mixed sentiment: The steering is accurate but feels somewhat anesthetized.
- Sentiment inverters: ... never seen any RWD cars can handle well on snow even
just few inches. - Anaphora and meronymy:
- It's a great car for just about anything. The mkVI is pretty
much a mkv but ironing out all the small problems. - Hey is the back seat comfortable? In my MkV it feels like
you're sitting on a vat of acid.
No comments:
Post a Comment