Thursday, November 26, 2009

Notes for Information Retrieval Quiz #3

Some things to review before the Information Retrieval quiz:

Lecture 18, Collaborative Filtering and Recommender Systems


Pearson correlation

Lecture 19, Information Extraction

Named entity recognition: find and classify (i.e. determine the category of) all the named entities in a text. Two approaches to named entity recognition:
Rule-based (regular expressions)

  • Lists of names
  • Patterns to match things that look like names
  • Patterns to match the environments that classes of names tend to occur in
ML-based
  • Get annotated training data
  • Extract features
  • Train systems to replicate the annotation
Relation analysis consists of two tasks:
  1. determine if two entities are related
  2. if they are, classify the relation

Features in relation analysis (for each of the above tasks) are:

  1. Features of the named entities involved (their types [concatenation of the types, headwords of the entities)
  2. Features derived from the words between and around the named entities (+- 1, 2, 3; bag of words between)
  3. Features derived from the syntactic environment that governs the two entities (constituent path through the tree from one entity to the other; base syntactic chunk sequence from one to the other; dependency path)
Template filling
  1. Rules and cascades of rules
  2. Supervised ML as sequence labeling
    1. One sequence classifier per slot
    2. One big sequence classifier
Lecture 20, Sentiment Analysis

Classification in sentiment analysis
  • Coarse classification of sentiment
    • Document-level classification according to some simple (usually binary) scheme
      • Political bias
      • Likes/hates
    • Fine-grained classification of sentiment-bearing mentions in a text
      • Positive/negative classifichttp://www.blogger.com/post-edit.g?blogID=35269538&postID=7962329312778949245ations of opinions about entities mentioned in a text
      • Perhaps with intensity
Choosing a vocabulary
  • Essentially feature selection
  • Previous examples used all words
  • Can we do better by focusing on subset of words?
  • How to find words, phrases, patterns that express sentiment or polarity?
  • Adjectives
    • positive: honest important mature large patient
    • negative: harmful hypocritical inefficient insecure
  • Verbs
    • positive: praise, love
    • negative: blame, criticize
  • Nouns
    • positive: pleasure, enjoyment
    • negative: pain, criticism
Lecture 21, Sentiment Analysis (cont.)

Identifying polarity words
  • Assume that generating exhaustive lists of polarity words is too hard
  • Assume contexts are coherent with respect to polarity
  • Fair and legitimate, corrupt and brutal
  • But not: fair and brutal, corrupt and legitimate
  • Example:
    • Extract all adjectives with > 20 frequency from WSJ corpus
    • Label them for polarity
    • Extract all conjoined adjectives
    • A supervised learning algorithm builds a graph of adjectives linked by the same or different semantic orientation
    • A clustering algorithm partitions the adjectives into two subsets
Challenges
  • Mixed sentiment: The steering is accurate but feels somewhat anesthetized.
  • Sentiment inverters: ... never seen any RWD cars can handle well on snow even
    just few inches.
  • Anaphora and meronymy:
    • It's a great car for just about anything. The mkVI is pretty
      much a mkv but ironing out all the small problems.
    • Hey is the back seat comfortable? In my MkV it feels like
      you're sitting on a vat of acid.

No comments: