Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up for Capitalization #16

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ TARGETS_COMMON = \
$(LANG1).autogen.att.gz \
$(LANG1).autopgen.bin \
$(LANG1).rlx.bin \
$(LANG1).lrx.bin
$(LANG1).lrx.bin \
$(LANG1).crx.bin

# Use this goal for creating .deps, otherwise make -j2 will give problems:
@ap_include@
Expand Down Expand Up @@ -42,6 +43,10 @@ $(LANG1).automorf.att.gz: $(LANG1).automorf.bin
$(LANG1).autopgen.bin: $(BASENAME).post-$(LANG1).dix
lt-comp lr $< $@

$(LANG1).crx.bin: $(BASENAME).$(LANG1).crx
apertium-validate-crx $<
apertium-compile-caps $< $@

###############################################################################
## Morph disambiguation rules
###############################################################################
Expand Down
25 changes: 25 additions & 0 deletions apertium-spa.spa.crx
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<?xml version="1.0"?>
<capitalization>
<rules>
<rule weight="2.0">
<or>
<begin/>
<match tags="sent"/>
</or>
<match select="Aa"/>
</rule>
<rule weight="0.5">
<match tags="np.*" select="Aa"/>
</rule>
<rule weight="3.0">
<or>
<match trglem="aA*" select="dix"/>
<match trglem="*aA*" select="dix"/>
<match trglem="AA" select="dix"/>
</or>
</rule>
<rule weight="0.1">
<match select="aa"/>
</rule>
</rules>
</capitalization>
2 changes: 1 addition & 1 deletion configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ AC_INIT([Apertium Spanish], [1.3.0], [[email protected]], [ap
AM_INIT_AUTOMAKE
AC_PROG_AWK

PKG_CHECK_MODULES(APERTIUM, apertium >= 3.8.3)
PKG_CHECK_MODULES(APERTIUM, apertium >= 3.9.0)
PKG_CHECK_MODULES(APERTIUM_LEX_TOOLS, apertium-lex-tools >= 0.4.2)
PKG_CHECK_MODULES(LTTOOLBOX, lttoolbox >= 3.7.1)
PKG_CHECK_MODULES(CG3, cg3 >= 1.3.9)
Expand Down
18 changes: 18 additions & 0 deletions modes.xml
Original file line number Diff line number Diff line change
Expand Up @@ -67,4 +67,22 @@
</pipeline>
</mode>

<mode name="spa-caps" install="yes">
<pipeline>
<program name="lt-proc -w">
<file name="spa.automorf.bin"/>
</program>
<program name="apertium-tagger -g $2 -p">
<file name="spa.prob"/>
</program>
<program name="apertium-extract-caps"/>
<program name="lt-proc -b">
<file name="spa.autogen.bin"/>
</program>
<program name="apertium-restore-caps">
<file name="spa.crx.bin"/>
</program>
</pipeline>
</mode>

</modes>
12 changes: 12 additions & 0 deletions test/spa-caps-biltrans-expected.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[5UFZV9O4uqpz#0] [[c:Aa/aa]]^mío<det><pos><mf><sg>/mi$ [[c:aa/aa]]^amigo<n><m><sg>/amigo$ [[c:aa/aa]]^escribir<vblex><pri><p3><sg>/escribe$ [[c:aa/aa]]^carta<n><f><pl>/cartas$ [[c:aa/aa]]^todo<predet><m><pl>/todos$ [[c:aa/aa]]^el<det><def><m><pl>/los$ [[c:aa/aa]]^día<n><m><pl>/días$[[c:aa/aa]]^.<sent>/.$
[/5UFZV9O4uqpz]
[Bz-zHO_KBjFw#0] [[c:aa/aa]]^iPhone<np><al>/iPhone$ [[c:aa/aa]]^ser<vbser><pri><p3><sg>/es$ [[c:aa/aa]]^uno<det><ind><m><sg>/un$ [[c:aa/aa]]^dispositivo<n><m><sg>/dispositivo$ [[c:aa/aa]]^popular<adj><mf><sg>/popular$[[c:aa/aa]]^.<sent>/.$
[/Bz-zHO_KBjFw]
[P8Lww91E9w3h#0] [[c:Aa/Aa]]^Maria<np><ant>/Maria$ [[c:aa/aa]]^visitar<vblex><pri><p3><sg>/visita$ [[c:Aa/Aa]]^Ana<np><ant>/Ana$ [[c:aa/aa]]^todo<predet><m><pl>/todos$ [[c:aa/aa]]^el<det><def><m><pl>/los$ [[c:aa/aa]]^fin<n><m><pl># de semana/fines de semana$[[c:aa/aa]]^.<sent>/.$
[/P8Lww91E9w3h]
[kevUt7u2wQI4#0] [[c:Aa/Aa]]^YouTube<np><al>/YouTube$ [[c:aa/aa]]^permitir<vblex><pri><p3><sg>/permite$ [[c:aa/aa]]^compartir<vblex><inf>/compartir$ [[c:aa/aa]]^video<n><m><pl>/videos$[[c:aa/aa]]^.<sent>/.$
[/kevUt7u2wQI4]
[ssK-AMO_ymJ3#0] [[c:Aa/Aa]]^Linkedin<np><al>/Linkedin$ [[c:aa/aa]]^ser<vbser><pri><p3><sg>/es$ [[c:aa/aa]]^unir<vblex><imp><p3><sg>/una$ [[c:aa/aa]]^red<n><f><sg>/red$ [[c:aa/aa]]^social<adj><mf><sg>/social$[[c:aa/aa]]^.<sent>/.$
[/ssK-AMO_ymJ3]
[xKcQ6y3roz0V#0] [[c:Aa/aa]]^el<det><def><f><sg>/~la$ [[c:AA/AA]]^OMS<n><acr><f><sg>/OMS$ [[c:aa/aa]]^haber<vbhaver><pri><p3><sg>/ha$ [[c:aa/aa]]^lanzar<vblex><pp><m><sg>/lanzado$ [[c:aa/aa]]^unir<vblex><imp><p3><sg>/una$ [[c:aa/aa]]^nuevo<adj><f><sg>/nueva$ [[c:aa/aa]]^campaña<n><f><sg>/campaña$[[c:aa/aa]]^.<sent>/.$
[/xKcQ6y3roz0V]
12 changes: 12 additions & 0 deletions test/spa-caps-decase-expected.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[5UFZV9O4uqpz#0] [[c:Aa/aa]]^mío<det><pos><mf><sg>$ [[c:aa/aa]]^amigo<n><m><sg>$ [[c:aa/aa]]^escribir<vblex><pri><p3><sg>$ [[c:aa/aa]]^carta<n><f><pl>$ [[c:aa/aa]]^todo<predet><m><pl>$ [[c:aa/aa]]^el<det><def><m><pl>$ [[c:aa/aa]]^día<n><m><pl>$[[c:aa/aa]]^.<sent>$
[/5UFZV9O4uqpz]
[Bz-zHO_KBjFw#0] [[c:aa/aa]]^iPhone<np><al>$ [[c:aa/aa]]^ser<vbser><pri><p3><sg>$ [[c:aa/aa]]^uno<det><ind><m><sg>$ [[c:aa/aa]]^dispositivo<n><m><sg>$ [[c:aa/aa]]^popular<adj><mf><sg>$[[c:aa/aa]]^.<sent>$
[/Bz-zHO_KBjFw]
[P8Lww91E9w3h#0] [[c:Aa/Aa]]^Maria<np><ant>$ [[c:aa/aa]]^visitar<vblex><pri><p3><sg>$ [[c:Aa/Aa]]^Ana<np><ant>$ [[c:aa/aa]]^todo<predet><m><pl>$ [[c:aa/aa]]^el<det><def><m><pl>$ [[c:aa/aa]]^fin<n><m><pl># de semana$[[c:aa/aa]]^.<sent>$
[/P8Lww91E9w3h]
[kevUt7u2wQI4#0] [[c:Aa/Aa]]^YouTube<np><al>$ [[c:aa/aa]]^permitir<vblex><pri><p3><sg>$ [[c:aa/aa]]^compartir<vblex><inf>$ [[c:aa/aa]]^video<n><m><pl>$[[c:aa/aa]]^.<sent>$
[/kevUt7u2wQI4]
[ssK-AMO_ymJ3#0] [[c:Aa/Aa]]^Linkedin<np><al>$ [[c:aa/aa]]^ser<vbser><pri><p3><sg>$ [[c:aa/aa]]^unir<vblex><imp><p3><sg>$ [[c:aa/aa]]^red<n><f><sg>$ [[c:aa/aa]]^social<adj><mf><sg>$[[c:aa/aa]]^.<sent>$
[/ssK-AMO_ymJ3]
[xKcQ6y3roz0V#0] [[c:Aa/aa]]^el<det><def><f><sg>$ [[c:AA/AA]]^OMS<n><acr><f><sg>$ [[c:aa/aa]]^haber<vbhaver><pri><p3><sg>$ [[c:aa/aa]]^lanzar<vblex><pp><m><sg>$ [[c:aa/aa]]^unir<vblex><imp><p3><sg>$ [[c:aa/aa]]^nuevo<adj><f><sg>$ [[c:aa/aa]]^campaña<n><f><sg>$[[c:aa/aa]]^.<sent>$
[/xKcQ6y3roz0V]
6 changes: 6 additions & 0 deletions test/spa-caps-input.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Mi amigo escribe cartas todos los días.
Maria visita Ana todos los fines de semana.
iPhone es un dispositivo popular.
La OMS ha lanzado una nueva campaña.
LinkedIn es una red social.
YouTube permite compartir videos.
12 changes: 12 additions & 0 deletions test/spa-caps-morph-expected.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[5UFZV9O4uqpz#0] ^Mi/mío<det><pos><mf><sg>$ ^amigo/amigo<adj><m><sg>/amigo<n><m><sg>/amigar<vblex><pri><p1><sg>$ ^escribe/escribir<vblex><pri><p3><sg>/escribir<vblex><imp><p2><sg>$ ^cartas/carta<n><f><pl>$ ^todos/todo<predet><m><pl>/todo<prn><tn><m><pl>$ ^los/el<det><def><m><pl>/lo<prn><pro><p3><m><pl>$ ^días/día<n><m><pl>$^./.<sent>$
[/5UFZV9O4uqpz]
[Bz-zHO_KBjFw#0] ^iPhone/iPhone<np><al>$ ^es/ser<vbser><pri><p3><sg>$ ^un/uno<det><ind><m><sg>$ ^dispositivo/dispositivo<adj><m><sg>/dispositivo<n><m><sg>$ ^popular/popular<adj><mf><sg>$^./.<sent>$
[/Bz-zHO_KBjFw]
[P8Lww91E9w3h#0] ^Maria/Maria<np><ant>/Maria<np><loc>$ ^visita/visita<n><f><sg>/visitar<vblex><pri><p3><sg>/visitar<vblex><imp><p2><sg>$ ^Ana/Ana<np><ant>$ ^todos/todo<predet><m><pl>/todo<prn><tn><m><pl>$ ^los/el<det><def><m><pl>/lo<prn><pro><p3><m><pl>$ ^fines de semana/fin<n><m><pl># de semana$^./.<sent>$
[/P8Lww91E9w3h]
[kevUt7u2wQI4#0] ^YouTube/YouTube<np><al>$ ^permite/permitir<vblex><pri><p3><sg>/permitir<vblex><imp><p2><sg>$ ^compartir/compartir<vblex><inf>$ ^videos/video<n><m><pl>$^./.<sent>$
[/kevUt7u2wQI4]
[ssK-AMO_ymJ3#0] ^LinkedIn/Linkedin<np><al>$ ^es/ser<vbser><pri><p3><sg>$ ^una/uno<num><f><sp>/uno<prn><tn><f><sg>/uno<det><ind><f><sg>/unir<vblex><prs><p3><sg>/unir<vblex><prs><p1><sg>/unir<vblex><imp><p3><sg>$ ^red/red<n><f><sg>/_prefix_re_d<n><f><sg>$ ^social/social<adj><mf><sg>$^./.<sent>$
[/ssK-AMO_ymJ3]
[xKcQ6y3roz0V#0] ^La/el<det><def><f><sg>/lo<prn><pro><p3><f><sg>$ ^OMS/OMS<n><acr><f><sg>$ ^ha/haber<vbhaver><pri><p3><sg>$ ^lanzado/lanzado<adj><m><sg>/lanzar<vblex><pp><m><sg>$ ^una/uno<num><f><sp>/uno<prn><tn><f><sg>/uno<det><ind><f><sg>/unir<vblex><prs><p3><sg>/unir<vblex><prs><p1><sg>/unir<vblex><imp><p3><sg>$ ^nueva/nuevo<adj><f><sg>$ ^campaña/campaña<n><f><sg>$^./.<sent>$
[/xKcQ6y3roz0V]
12 changes: 12 additions & 0 deletions test/spa-caps-recase-expected.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[5UFZV9O4uqpz#0] Mi amigo escribe cartas todos los días.
[/5UFZV9O4uqpz]
[Bz-zHO_KBjFw#0] iPhone es un dispositivo popular.
[/Bz-zHO_KBjFw]
[P8Lww91E9w3h#0] Maria visita Ana todos los fines de semana.
[/P8Lww91E9w3h]
[kevUt7u2wQI4#0] YouTube permite compartir videos.
[/kevUt7u2wQI4]
[ssK-AMO_ymJ3#0] Linkedin es una red social.
[/ssK-AMO_ymJ3]
[xKcQ6y3roz0V#0] ~La OMS ha lanzado una nueva campaña.
[/xKcQ6y3roz0V]
12 changes: 12 additions & 0 deletions test/spa-caps-tagger-expected.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[5UFZV9O4uqpz#0] ^Mi/mío<det><pos><mf><sg>$ ^amigo/amigo<n><m><sg>$ ^escribe/escribir<vblex><pri><p3><sg>$ ^cartas/carta<n><f><pl>$ ^todos/todo<predet><m><pl>$ ^los/el<det><def><m><pl>$ ^días/día<n><m><pl>$^./.<sent>$
[/5UFZV9O4uqpz]
[Bz-zHO_KBjFw#0] ^iPhone/iPhone<np><al>$ ^es/ser<vbser><pri><p3><sg>$ ^un/uno<det><ind><m><sg>$ ^dispositivo/dispositivo<n><m><sg>$ ^popular/popular<adj><mf><sg>$^./.<sent>$
[/Bz-zHO_KBjFw]
[P8Lww91E9w3h#0] ^Maria/Maria<np><ant>$ ^visita/visitar<vblex><pri><p3><sg>$ ^Ana/Ana<np><ant>$ ^todos/todo<predet><m><pl>$ ^los/el<det><def><m><pl>$ ^fines de semana/fin<n><m><pl># de semana$^./.<sent>$
[/P8Lww91E9w3h]
[kevUt7u2wQI4#0] ^YouTube/YouTube<np><al>$ ^permite/permitir<vblex><pri><p3><sg>$ ^compartir/compartir<vblex><inf>$ ^videos/video<n><m><pl>$^./.<sent>$
[/kevUt7u2wQI4]
[ssK-AMO_ymJ3#0] ^LinkedIn/Linkedin<np><al>$ ^es/ser<vbser><pri><p3><sg>$ ^una/unir<vblex><imp><p3><sg>$ ^red/red<n><f><sg>$ ^social/social<adj><mf><sg>$^./.<sent>$
[/ssK-AMO_ymJ3]
[xKcQ6y3roz0V#0] ^La/el<det><def><f><sg>$ ^OMS/OMS<n><acr><f><sg>$ ^ha/haber<vbhaver><pri><p3><sg>$ ^lanzado/lanzar<vblex><pp><m><sg>$ ^una/unir<vblex><imp><p3><sg>$ ^nueva/nuevo<adj><f><sg>$ ^campaña/campaña<n><f><sg>$^./.<sent>$
[/xKcQ6y3roz0V]
4 changes: 4 additions & 0 deletions test/tests.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
"input": "spa-input.txt",
"mode": "spa-tagger"
},
"spa-caps": {
"input": "spa-caps-input.txt",
"mode": "spa-caps"
},
"grep": {
"input": null,
"command": "bash -x test/test-grep.sh"
Expand Down