Multi-Tagging for Transition-based Dependency Parsing
Creative Commons CC BY 4.0
This project focuses on a modification of a greedy transition based dependency parser. Typically a Part-Of-Speech (POS) tagger models a probability distribution over all the possible tags for each word in the given sentence and chooses one as its best guess. This is then pass on to the parser which uses this information to build a parse tree. The current state of the art for POS tagging is about 97% word accuracy, which seems high but results in a around 56% sentence accuracy. Small errors at the POS tagging phase can lead to large errors down the NLP pipeline and transition based parsers are particularity sensitive to these types of mistakes. A maximum entropy Markov model was trained as a POS multi-tagger passing more than its 1-best guess to the parser which was thought could make a better decision when committing to a parse for the sentence. This has been shown to give improved accuracy in other parsing approaches. We shown there is a correlation between tagging ambiguity and parsers accuracy and in fact the higher the average tags per word the higher the accuracy.