pyporter2
This is an implementation of the Porter2 (english) stemming algorithm in Python. It was born out of some academic work I did on clustering algorithms in the spring of 2008. The Porter Stemming Algorithm was first published in this 1979 paper - it is now one of the most widely known and used stemming algorithms. An implementation of the Porter stemmer already existed in Python, but not of the updated Porter2 stemmer. I decided to implement a Python version of Porter2 as an exercise.
Note: Python bindings for the official C version of the Porter2 stemmer exist here. If using these bindings is an option, it will probably be much more efficient than using the pure Python implementation here. pyporter2 is useful when the C bindings are not an option (like in Jython, IronPython, Babble or App Engine).
download
pyporter2 is open source, released under an MIT-style license. The latest version of pyporter2 is available here. To check out the source, install Git and run git clone git://github.com/mdirolf/pyporter2.git. The new API matches that of PyStemmer.
usage
Here is an example of how to use pyporter2:
>>> import Stemmer
>>> print Stemmer.algorithms()
['english']
>>> stemmer = Stemmer.Stemmer('english')
>>> print stemmer.stemWord('cycling')
cycl
>>> print stemmer.stemWords(['cycling', 'cyclist'])
['cycl', 'cyclist']
>>> print stemmer.stemWords(['cycling', u'cyclist'])
['cycl', u'cyclist']
testing
pyporter2 includes a test suite written using unittest. To run the tests, do python Stemmer.py.
questions
Feel free to contact me with any questions. It'd also be cool to let me know if you find pyporter2 useful for anything.