Thursday, February 15, 2007

形態素解析へようこそ!

形態素解析の検討に入る。
ツールとしては以下のツールがあるようだ。

MALLET: A Machine Learning for Language Toolkit
http://mallet.cs.umass.edu/index.php/Main Page
Andrew McCallum氏が開発した言語処理向けの機械学習ツールキット
言語はJava
機能はCRF,文書分類,クラスタリング,情報抽出

CRF Project Page
http://crf.sourceforge.net/
Sunita Sarawagi氏が開発
Javaで実装

FlexCRF
http://www.jaist.ac.jp/ hieuxuan/flexcrfs/flexcrfs.html
北陸先端大のXuan-Hieu Phan, Le-Minh Nguyen氏が開発

CRF++
http://chasen.org/taku/software/CRF++/
工藤拓氏が開発

MeCab
http://mecab.sourceforge.jp/
CRFを採用した初めての形態素解析

スタンフォード大学の The Stanford Natural Language Processing Groupで配布している自然言語処理用ソフトウェア一覧
The Stanford Parser
Java implementations of probabilistic natural language parsers, both highly optimized PCFG and dependency parsers, and a lexicalized PCFG parser. Includes: Parser FAQ and Online parser demo.
The Stanford POS Tagger
A Java implementation of a maximum-entropy part-of-speech (POS) tagger
The Stanford Named Entity Recognizer
A Java implementation of a Conditional Random Field sequence model, together with well-engineered features for Named Entity Recognition.
Stanford Chinese Word Segmenter
A Java implementation of a CRF-based Chinese Word Segmenter
The Stanford Classifier
A Java implementation of conditional loglinear model classification (a.k.a. maximum entropy or multiclass logistic regression models)
Tregex and Tsurgeon
A Java implementation of a Tgrep2-style utility for matching patterns in trees, and a tree-transformation utility built on top of this matching language.

Labels:

0 Comments:

Post a Comment

<< Home