SLIDE 17
- “Tech Specs”
- Working with other graph libraries friendly
( “Don’t reinvent the wheel.” )
- NLP applications oriented (built-in tokenizer,
stemmer, sentence analyzer…)
- Handle large corpus (e.g. Entire English Wikipedia
corpus, tokens; by using multiprocessing)
- Grid search friendly (different window size,
vocabulary size, sentence analyzer…)
How to generate a large word co-occurrence network within 3 hours ?
Results Achieved: corpus2graph*
*Efficient Generation and Processing of Word Co-occurrence Networks Using corpus2graph, Zheng ZHANG, Ruiqing YIN, Pierre ZWEIGENBAUM, In Proceedings of NAACL 2018 Workshop on Graph-Based Algorithms for Natural Language Processing, New Orleans, US
15