ANLP(5): Pronoun Resolution — Hobbs Algorithm

vamshi krishna
5 min readNov 30, 2021

An Algorithmic Approach to NLP

Read the following rhyme:

Can you guess what “his” refers to? is it Jack, or Jill? Obviously, your answer would be Jack, but how would a machine understand this? The answer would be Pronoun Resolution. Pronoun resolution is a crucial task in the field of Natural Language Understanding(NLU), which is a subfield of NLP. It helps in many NLP applications such as question answering, sentiment analysis, machine translation etc.

We already know how syntactic trees are parsed(CYK algorithm). The Hobbs Algorithm uses these parse trees of sentences for Pronoun Resolution.

Consider the sentence: Jack was angry. He hit Jill. “He” could have referred to Jack or Jill, but we know for sure that this is not the case here. This is because “He” occurs before “Jill”. This is a basic assumption in the Hobbs Algorithm. The search for the referent is always restricted to the left of the target(pronoun). Using this property, we can eliminate crown from being a referent.

Another property is the gender agreement, which basically only considers pronouns and nouns of the same gender to be possible referents. In the Initial example: using gender agreement would get rid of the possibilities such as “Jill”, “Hill” or water to be referent.

The Recency property states that pronouns can only go a few sentences back, and entities closer to the referring phrase are more important than those further away. Using these three properties we can say that Jack is the answer for the initial example.

This is how Humans perform pronoun resolution, The Hobbs Algorithm uses similar rules as we will be seeing.

Hobbs Algorithm

Let us consider two sentences:

S1 : Jack is an engineer.

S2 : Jill likes him.

The input: the pronoun to be resolved, the text with pronoun

The syntax parse trees of the sentences are given below:

We start at the target pronoun in the tree, and crawl up the parse tree to the root node(S). It performs a breadth first left to right search of the node’s children to the left of the target for each noun phrase or ‘S’ node it finds. As a result, in our case, the algorithm begins with the parse tree of sentence 2 and works its way up to the root node S2. After that, it performs a breadth-first search to locate the noun phrase (NP). The algorithm discovers its first noun phrase for the word ‘Jill’ here.

Because Jill “likes” the person of the pronoun, the pronoun cannot obviously be Jill. This is explained better through Binding theory.

Binding theory states that: A reflexive can refer to the subject of the most immediate clause in which it appears, whereas a nonreflexive cannot corefer this subject.. Words such as himself, herself, themselves, etc. are known as reflexive.

So, in our scenario, ‘him’ will not refer to Jill because of the binding theory limitation. Even if the branch is investigated, Jill will not be the accepted referent for the pronoun ‘he’ due to the gender agreement requirement. As a result, the algorithm now begins its search in the previous sentence’s syntax tree.

Hobbs distance property states that entities in a subject position are more likely the possible substitute for the pronoun than in the object position.

Following the Hobbs distance property, we perform a breadth first left to right search of the node’s children for each noun phrase it discovers. As a result, the sentence’s subject Jack, who is an engineer, is explored before the object engineer, and Jack is finally the resolved referent for the pronoun him.

The detailed Algorithm is as follows:

Begin at the noun phrase (NP) node immediately dominating the pronoun.

Go up the tree to the first NP or sentence (S) node encountered. Call this node X, and call the path used to reach it p.

Traverse all branches below node X to the left of path p in a left-to-right, breadth-first fashion. Propose as the antecedent any NP node that is encountered which has an NP or S node between it and X.

If node X is the highest S node in the sentence, traverse the surface parse trees of previous sentences in the text in order of recency, the most recent first; each tree is traversed in a left-to-right, breadth-first manner, and when an NP node is encountered, it is proposed as antecedent.If X is not the highest S node in the sentence, continue to step 5.

From node X, go up the tree to the first NP or S node encountered. Call this new node X, and call the path traversed to reach it p.

If X is an NP node and if the path p to X did not pass through the Nominal node that X immediately dominates, propose X as the antecedent.

Traverse all branches below node X to the left of path p in a left-to-right, breadth- first manner. Propose any NP node encountered as the antecedent.

If X is an S node, traverse all branches of node X to the right of path p in a left-to- right, breadth-first manner, but do not go below any NP or S node encountered. Propose any NP node encountered as the antecedent.

Go to Step4

Therefore, We have seen how the Hobbs Tree Search Algorithm performs pronoun resolution.

--

--

vamshi krishna

I'm a research student at IIIT Hyderabad currently studying Computer Science and Computational Linguistics.