Soft Cross lingual Syntax Projection for Dependency Parsing - - PowerPoint PPT Presentation
Soft Cross lingual Syntax Projection for Dependency Parsing - - PowerPoint PPT Presentation
Soft Cross lingual Syntax Projection for Dependency Parsing Zhenghua Li, Min Zhang, Wenliang Chen {zhli13, minzhang, wlchen}@suda.edu.cn Soochow University, China Dependency parsing A bilingual example pmod root obj obj det subj
A bilingual example
Dependency parsing
I1 eat2 the3 fish4 with5 a6 folk7
root
$0
subj pmod
- bj
det
- bj
det
我1 用2 叉子3 吃4 鱼5
subj root
- bj
- bj
vv
$0 fish eat
Big picture (semi-supervised)
Larger training data English Treebank Chinese Treebank Bitext
I love this game 我 爱 这 运动
Chinese labeled data with partial tree Project English parse trees into Chinese English Parser
Syntax projection
fish I1 eat2 the3 fish4 with5 a6 folk7 $0 我1 用2 叉子3 吃4 鱼5 $0 eat
Challenges
Syntactic non-isomorphism across languages Different annotation choices (guideline) Partial (incomplete) parse trees resulted from
projection
Parsing errors on the source side Word alignment errors
Cross-language non-isomorphism
I1 eat2 the3 fish4 with5 a6 folk7 $0 我1 用2 叉子3 吃4 鱼5 use (verb) $0 eat
Coordination structure as an example
Different annotation choices
fish and bird fish and bird fish and bird fish and bird fish and bird
Challenges
Syntactic non-isomorphism across languages Different annotation choices (guideline) Partial (incomplete) parse trees resulted from
projection
Parsing errors on the source side Word alignment errors
All these factors can lead to bad projections!
Why called soft projection
Project less but reliable dependencies, put quality before quantity
Careful/gentle/conservative projection Wrong projection -> training noise
Big picture (semi-supervised)
Larger training data Chinese Parser English Treebank Chinese Treebank Bitext
I love this game 我 爱 这 运动
Chinese labeled data with partial trees Project English parse trees into Chinese English Parser filtering
Step 1: word alignment and English parsing on bitext
English Treebank Bitext
I love this game 我 爱 这 运动
English Parser
I1 eat2 the3 fish4 with5 a6 folk7 $0 我1 用2 叉子3 吃4 鱼5 $0
Step 2: project English tree into Chinese (direct correspondence assumption)
English Treebank Bitext
I love this game 我 爱 这 运动
Chinese labeled data with partial tree Project English parse trees into Chinese English Parser
I1 eat2 the3 fish4 with5 a6 folk7 $0 我1 用2 叉子3 吃4 鱼5 $0
Step 2: project English tree into Chinese (direct correspondence assumption)
Step 3: filter projected structures with baseline Chinese Parser
Chinese Parser English Treebank Chinese Treebank Bitext
I love this game 我 爱 这 运动
Chinese labeled data with partial tree Project English parse trees into Chinese English Parser filtering
Relationship between prob and accuracy
Step 3: filter projected structures with baseline Chinese Parser
use I1 eat2 the3 fish4 with5 a6 folk7 $0 我1 用2 叉子3 吃4 鱼5 $0 eat
Chinese Parser
Step 3: filter projected structures with baseline Chinese Parser
I1 eat2 the3 fish4 with5 a6 folk7 $0 我1 用2 叉子3 吃4 鱼5 $0 use eat
Step 3: filter projected structures with baseline Chinese Parser
I1 eat2 the3 fish4 with5 a6 folk7 $0 我1 用2 叉子3 吃4 鱼5 $0 use eat
Step 4: combine the data to train a new Chinese Parser
Larger training data Chinese Parser English Treebank Chinese Treebank Bitext
I love this game 我 爱 这 运动
Chinese labeled data with partial tree Project English parse trees into Chinese English Parser filtering
How to handle data with partial tree annotation
Convert partial tree annotation into forest
annotation (ambiguous labelings)
For an unattached word, add links from all other words to
it.
$0 我1 用2 叉子3 吃4 鱼5
`
use eat
How to handle data with partial tree annotation
Maximize the mixed likelihood of manually
labeled data with tree annotation and auto- collected data with forest annotation
Tree annotation can be understood as a special case of
forest annotation
How to train a parser using data with forest annotation?
Train with ambiguous labelings
Refer to Tackstrom+ 13 and several earlier papers
Maximize the likelihood of the data Maximize the probability of a forest Maximize the sum probability of all the trees in the forest
The training problem can be solved with the inside-outside algorithm
Experiments
Data statistics Parser
Second-order dependency parser (McDonald & Pereira
06) (CRF-based, probabilistic)
SGD training (20K + 1M training data)
Relationship between prob and accuracy
Effect of filtering threshold
Proj ratio: 44% 31% 26%