So yesterday, I decided to learn Python. Been a .NET guy primarily for the last n years, had some people work in it around me, but never was inclined to try it out. DUH!!!! Such a nice language. It took a couple minutes to get my bearings, but I figured…why not! Everyone in the Valley is so anti-MS and so pro-(Python, MySQL, PHP) one needs to embrace the flow.
For the last couple years I’ve been using a very simple, yet (what I believe to be) a strong POS tagger built by Mark Watson and based on Eric Brill’s work. Written in C#, it gave me a very straightforward paring knife to do tokenization and POS tagging quickly and easily in .NET. Now Monty Tagger and NTLK are definitely incredible resources for NLP in Python, but I wanted something very strightforward and portable without all the bells and whistles so I can build on the core myself. Not to mention I wanted something fun for my first outting in Python. Well…ta da! Here it is.
It’s comprised of two (count them 2) VERY simple source files. The first is the basic hashing and pickling utility if you want to make changes to the lexicon (I believe I’m using the same lexicon file as Monty Tagger), and the second is the actual tagger/tokenizer.
I’ve made some additional tweaks to the versions I run and plan to port some of them also to Python. If you’re intersted in additions add a comment and I’ll do my best to share/accomodate.
You can download my Python NLP Part-of-Speech Tagger here.
P.S./Caveat/blahblah:
This is my first anything outside of some Hello World stuff in Python. It definitely works, and does so at a decent clip (speed wise), but I’m sure I could have done some of the operations a little more elegantly. Leave comments though with recommendations/suggestions/!flames.
July 14, 2006 at 12:46 pm |
This looks great Jason… i’m not too familliar with Python so how do I install this to my linux apache server? All I have is FTP access to it.
July 14, 2006 at 1:09 pm |
Hi Brandon-
Thanks. You can run it from PHP if you only have FTP access. Here’s a good link to use as reference. Let me know if you need further help. -J.
http://www.csh.rit.edu/~jon/projects/pip/
February 8, 2007 at 11:25 am |
Hi Jason,
Where can I get the .Net ( C#) version of the Simple NLP POS?
Thanks
Dan
February 9, 2007 at 2:45 am |
Hi-
It’s available above by clicking on Mark Watson’s name
J.
April 6, 2008 at 3:44 pm |
Hi Jason. It’s good to hear you’re getting into Python still, but I hope your code doesn’t look like *that* anymore!
I’ve edited your code to make it a whole lot more pythonesque, and would like to send it to you, but feel that this textarea is not the place =)
April 11, 2008 at 12:48 pm |
Hi,
I have downloaded the “Simple NLP Part-of-Speech tagger in Python”. I would like to integrate it in a c#.net program. Can you please guide me?
November 27, 2008 at 5:39 pm |
Wow very intrested in this i have tried to download it but the website is down can you post a new link please.
Kind Regards
Mark
NLP Exeter
December 5, 2008 at 4:05 pm |
Really fantastic work…. i am doing the project for extracting usefulness of reviews using NLP… can i use your tool for this work….
December 10, 2008 at 8:08 pm |
Feel free, but if you plan on releasing the product commercially, please make sure to take care of the LGPL
December 23, 2008 at 2:12 am |
Is it difficult to split Brill_lexicon into two files to get the file-size down? I’m trying to use this on app engine and the file is over the max size limit
January 3, 2009 at 11:25 pm |
This is exactly what I’ve been looking for! I’ve encountered one stumbling block though: I’m not sure what all the tags mean
Some of them are obvious, but it would be great to get a full list. I know there are several different tagging schemes around for POS, so I thought I’d ask you directly: Which scheme does the tagger use? Thanks!
January 4, 2009 at 8:15 pm |
[...] the Digg stories into a single twitter post was accomplished with a really nice little POS tagger written in python that I came across. It tags all the words in the text I give it with their [...]