NLP Tasks
Published on 04 Jan 2018
This post will cover several NLP tasks such as NER, Parsing, semantic evaluation tasks such as word sense disambiguation and sentiment analysis. There is a workshop called SemEval - [International Workshop on Semantic Evaluation](http://alt.qcri.org/semeval2018/) There are several competitions in similar domains. For example [codelab](https://competitions.codalab.org/competitions/15984). ### Papers #### 1. Sentiment Analysis - [Convolutional Neural Networks for Sentence Classification](http://www.people.fas.harvard.edu/~yoonkim/data/sent-cnn.pdf) ### Datasets #### 1. Sentiment Analysis - [IMDB Movie Review dataset](http://ai.stanford.edu/~amaas/data/sentiment/): Two class problem (+ and -) - [Irony detection in English tweets](https://competitions.codalab.org/competitions/17468#learn_the_details-data-annotation) - [SemEval-2018 Task 1: Affect in Tweets](https://competitions.codalab.org/competitions/17751) - Stanford https://nlp.stanford.edu/sentiment/index.html #### 2. Word Sense Disambiguation - Find a unified dataset here: http://lcl.uniroma1.it/wsdeval/home #### Neural Language Modeling - See the link: https://machinelearningmastery.com/statistical-language-modeling-and-neural-language-models/ #### Named Entity Recognition - (CoNLL 2003 dataset is not directly available) - [CoNLL 2012](http://conll.cemantix.org/2012/data.html) - again have to build using some scripts - [Groningen Meaning Bank](http://gmb.let.rug.nl/data.php) - Huge dataset. From where conll 2012 is adapted from - [Corpus (CoNLL 2002) annotated with IOB and POS tags- kaggle](https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus) - need to reformat slightly - http://www.opener-project.eu/documentation/ - [Euronews corpora](https://github.com/EuropeanaNewspapers/ner-corpora) - German, Deutch etc. No English - NER using nltk and scikit classifiers - https://nlpforhackers.io/training-ner-large-dataset/