Main Page | Report this Page
Computers Forum Index  »  Computer Artificial Intelligence - Language  »  Free lexicon for NLP tasks...
Page 1 of 1    

Free lexicon for NLP tasks...

Author Message
Troman...
Posted: Sat Jul 04, 2009 2:03 pm
Guest
Hello, I'm working on a natural language processing engine in Prolog
and I'm in need of a free (or maybe a cheap, but good) lexicon. The
primary task is part-of-speech tagging, but additional lexical
information is always welcome. The format of the lexicon is not very
important, as long as it is possible to manually convert it to some
different format.

The only more or less big free lexicon I have found is the wordnet
lexicon, but it is not well suited for tasks like part-of-speech
tagging. Is anyone aware of a better lexicon?
I would appreciate any help.
 
Ian Parker...
Posted: Sun Jul 05, 2009 12:28 pm
Guest
On 4 July, 15:03, Troman <na.troman... at (no spam) googlemail.com> wrote:
Quote:
Hello, I'm working on a natural language processing engine in Prolog
and I'm in need of a free (or maybe a cheap, but good) lexicon. The
primary task is part-of-speech tagging, but additional lexical
information is always welcome. The format of the lexicon is not very
important, as long as it is possible to manually convert it to some
different format.

The only more or less big free lexicon I have found is the wordnet
lexicon, but it is not well suited for tasks like part-of-speech
tagging. Is anyone aware of a better lexicon?
I would appreciate any help.

You don't in fact need to know the meaning of words to tag. CLAWS 7
does this task. This is a sample.

http://ucrel.lancs.ac.uk/claws/
http://ucrel.lancs.ac.uk/cgi-bin/claws7.pl Free samples

Hello_ITJ ,_, I_PNP 'm_VBB working_VVG on_PRP a_AT0 natural_AJ0
language_NN1
processing_NN1 engine_NN1 in_PRP Prolog_NP0 and_CJC I_PNP 'm_VBB
in_PRP31
need_PRP32 of_PRP33 a_AT0 free_AJ0 (_( or_CJC maybe_AV0 a_AT0
cheap_AJ0 ,_,
but_CJC good_AJ0 )_) lexicon_NN1 ._.
The_AT0 primary_AJ0 task_NN1 is_VBZ part-of-speech_AJ0 tagging_NN1 ,_,
but_CJC
additional_AJ0 lexical_AJ0 information_NN1 is_VBZ always_AV0
welcome_VVB ._.
The_AT0 format_NN1 of_PRF the_AT0 lexicon_NN1 is_VBZ not_XX0 very_AV0
important_AJ0 ,_, as_CJS31 long_CJS32 as_CJS33 it_PNP is_VBZ
possible_AJ0
to_TO0 manually_AV0 convert_VVI it_PNP to_PRP some_DT0 different_AJ0
format_NN1 ._.
The_AT0 only_AV0 more_AV0 or_CJC less_AV0 big_AJ0 free_AJ0 lexicon_NN1
I_PNP
have_VHB found_VVN is_VBZ the_AT0 wordnet_NN1 lexicon_NN1 ,_, but_CJC
it_PNP
is_VBZ not_XX0 well_AV0 suited_VVN for_PRP tasks_NN2 like_PRP
part-of-speech_AJ0 tagging_NN1 ._.
Is_VBZ anyone_PNI aware_AJ0 of_PRF a_AT0 better_AJC lexicon_NN1 ?_?
I_PNP would_VM0 appreciate_VVI any_DT0 help_NN1 ._.
------WebKitFormBoundarysnXAK3pCsMJt+P0U--_UNC

This is what you will be up against.

- Ian Parker
 
Ulrich Koch...
Posted: Mon Jul 06, 2009 3:23 pm
Guest
On 4 July, 15:03, Troman <na.troman... at (no spam) googlemail.com> wrote:

Quote:
Hello, I'm working on a natural language processing engine in Prolog
and I'm in need of a free (or maybe a cheap, but good) lexicon. The
primary task is part-of-speech tagging, but additional lexical
information is always welcome.

Maybe the CELEX lexical database is right for you. Unfortunately,
the project ended; I don't even have a pointer to it anymore.
Perhaps searching for "CELEX" or "Centre for Lexical Information"
(it is/was at the Max Planck Institute for Psycholinguistics at
Nijmegen, The Netherlands) will help.

Regards,
Ulli
--
Ulrich Koch, computer scientist and computational linguist
Universität Koblenz-Landau, Universitätsstraße 1, 56070 Koblenz, Germany
Room A 219, tel. +49 (0)261 287-2616
http://www.uni-koblenz.de/~koch/
 
Bob Bechtel...
Posted: Tue Jul 07, 2009 3:01 am
Guest
Ulrich Koch wrote:
Quote:
On 4 July, 15:03, Troman <na.troman... at (no spam) googlemail.com> wrote:

Hello, I'm working on a natural language processing engine in Prolog
and I'm in need of a free (or maybe a cheap, but good) lexicon. The
primary task is part-of-speech tagging, but additional lexical
information is always welcome.

Maybe the CELEX lexical database is right for you. Unfortunately,
the project ended; I don't even have a pointer to it anymore.
Perhaps searching for "CELEX" or "Centre for Lexical Information"
(it is/was at the Max Planck Institute for Psycholinguistics at
Nijmegen, The Netherlands) will help.

Regards,
Ulli

The CELEX2 data is available from the Linguistic Data Consortium
(http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC96L14) -
not free, but not terribly expensive either, for what you get (English,
German, and Dutch).

For free, you can get Grady Ward's Moby Part of Speech list for English,
through Project Gutenberg: http://www.gutenberg.org/etext/3203
 
Troman...
Posted: Tue Jul 07, 2009 5:10 pm
Guest
Thanks, everyone. You have helped me a great deal.
 
 
Page 1 of 1    
All times are GMT
The time now is Wed Dec 09, 2009 5:10 pm