Towards a new open-source parser

Intervenant(s) :Agnès Souque
Type d'événement :Conférence
Niveau :Confirmé
Date :Vendredi 10 juillet 2009
Horaire :14h20
Durée :20 minutes
Langue :English
Lieu :Salle E202 - Ireste

Open source tools poorly suited for French

In the open-source grammar checker we have studied, LanguageTool designed by Daniel Naber, like the other open-source tools of the same type, the different text-parsing processes are structured in successive layers. This generates a vicious circle leading to the wrong morphosyntactic analysis of the text and thus an erroneous detection of mistakes. Moreover, it uses the principle of pattern-matching for the detection of mistakes, with patterns of mistakes described in correction rules. This principle generates an exponential increase in the number of rules to be created.

A new "left-to-right" analysis model

To compensate for the limits of tools like LanguageTool, within the framework of our PhD thesis we envisage the creation of a "left-to-right" parser that carries out the parsing as it goes along the text, from left to right, and no longer in successive layered processes. The tagging is carried out using an adaptation of the open source lexicon Dicollecte. Incoherences are then detected through the declaration of syntactic expectations as opposed to the enumeration of the possible mistakes in the systems previously studied. In parallel processes, sentences are segmented into chunks, computational windows within and between which unification is calculated. Intelligible feedback then explains the incoherence detected and what is expected.

Open source resources for research

The software sources of our tool will be open source, as will the language resources, i.e. the different lexica which will be freely accessible and modifiable, and the linguistic resources constituted of the different kinds of rules for the detection of syntactic incoherences. These linguistic resources will be generic enough to be suited to other languages and will give a multilingual dimension to the tool. One of the results of the processes is a syntactic analysis of any text applicable in many different domains : grammar checking, but also language learning (assistance in understanding learning mistakes), searches for and extraction of information, or any processing of text requiring a preliminary robust analysis.

Authors : Agnès Souque, Thomas Lebarbé

Speaker : Agnès Souque

Agnès Souque is a second year PhD student in natural language processing, in the laboratory LIDILEM in Grenoble 3. Her thesis is about grammar checking. She aims at building a free grammar checker for French, as easy to adapt to other languages as possible, and which can be grafted to different free softwares, such as OpenOffice.org. Her PhD supervisor, Thomas Lebarbé, is co-author of the abstract.

Documents joints

Slides (texte - 235.2 ko)