Simple NLP Part-of-Speech tagger in Python

So yesterday, I decided to learn Python. Been a .NET guy primarily for the last n years, had some people work in it around me, but never was inclined to try it out. DUH!!!! Such a nice language. It took a couple minutes to get my bearings, but I figured…why not! Everyone in the Valley is so anti-MS and so pro-(Python, MySQL, PHP) one needs to embrace the flow.

For the last couple years I’ve been using a very simple, yet (what I believe to be) a strong POS tagger built by Mark Watson and based on Eric Brill’s work. Written in C#, it gave me a very straightforward paring knife to do tokenization and POS tagging quickly and easily in .NET. Now Monty Tagger and NTLK are definitely incredible resources for NLP in Python, but I wanted something very strightforward and portable without all the bells and whistles so I can build on the core myself. Not to mention I wanted something fun for my first outting in Python. Well…ta da! Here it is.

It’s comprised of two (count them 2) VERY simple source files. The first is the basic hashing and pickling utility if you want to make changes to the lexicon (I believe I’m using the same lexicon file as Monty Tagger), and the second is the actual tagger/tokenizer.

I’ve made some additional tweaks to the versions I run and plan to port some of them also to Python. If you’re intersted in additions add a comment and I’ll do my best to share/accomodate.

You can download my Python NLP Part-of-Speech Tagger here.

P.S./Caveat/blahblah:
This is my first anything outside of some Hello World stuff in Python. It definitely works, and does so at a decent clip (speed wise), but I’m sure I could have done some of the operations a little more elegantly. Leave comments though with recommendations/suggestions/!flames.

Advertisements

13 Responses to “Simple NLP Part-of-Speech tagger in Python”

  1. Brandon Says:

    This looks great Jason… i’m not too familliar with Python so how do I install this to my linux apache server? All I have is FTP access to it.

  2. jasonwiener Says:

    Hi Brandon-

    Thanks. You can run it from PHP if you only have FTP access. Here’s a good link to use as reference. Let me know if you need further help. -J.

    http://www.csh.rit.edu/~jon/projects/pip/

  3. Dan Says:

    Hi Jason,

    Where can I get the .Net ( C#) version of the Simple NLP POS?

    Thanks

    Dan

  4. jasonwiener Says:

    Hi-

    It’s available above by clicking on Mark Watson’s name

    J.

  5. Joel Nothman Says:

    Hi Jason. It’s good to hear you’re getting into Python still, but I hope your code doesn’t look like *that* anymore!

    I’ve edited your code to make it a whole lot more pythonesque, and would like to send it to you, but feel that this textarea is not the place =)

  6. rona Says:

    Hi,

    I have downloaded the “Simple NLP Part-of-Speech tagger in Python”. I would like to integrate it in a c#.net program. Can you please guide me?

  7. NLP Exeter Says:

    Wow very intrested in this i have tried to download it but the website is down can you post a new link please.
    Kind Regards
    Mark
    NLP Exeter

  8. vignesh Says:

    Really fantastic work…. i am doing the project for extracting usefulness of reviews using NLP… can i use your tool for this work….

  9. tabularasa Says:

    Is it difficult to split Brill_lexicon into two files to get the file-size down? I’m trying to use this on app engine and the file is over the max size limit

  10. David Says:

    This is exactly what I’ve been looking for! I’ve encountered one stumbling block though: I’m not sure what all the tags mean 😦 Some of them are obvious, but it would be great to get a full list. I know there are several different tagging schemes around for POS, so I thought I’d ask you directly: Which scheme does the tagger use? Thanks!

  11. Francis Hates Everything Says:

    […] the Digg stories into a single twitter post was accomplished with a really nice little POS tagger written in python that I came across. It tags all the words in the text I give it with their […]

  12. umaisagu Says:

    hai Jason,

    I’ve tried your code using phython.. what a brilliant program!.. I am looking for a c# version of this code and refering to Mark watson blog. Unfortunately couldn’t find link on the brill’s tagger he implemented.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: