rules-applier segmentation fault, stalling #17

MemduhG · 2019-04-19T21:37:08Z

I am trying to train the ambiguous system on Ubuntu 18.04, working in the Kyrgyz to Turkish (kir->tur) direction. I am using Wikipedia dump files and following along the

When I first ran rules-applier, it output just a 0, and did not stop after 4-5 hours after which I cancelled it.

'/home/memduh/git/apertium-ambiguous/src'/rules-applier 'ky_KG' '../../apertium-tur-kir.kir-tur.t1x' sentences.txt lextor.txt rulesOut.txt
0

Upon running it again it began to produce output like:

161150
ruleId=0
tokId=0 , out = ^unknown<unknown>{^*Новосибир$}$
tokId=1 , out = ^default<default>{^milli<adj>$}$
tokId=2 , out = ^default<default>{^üniversite<n><px3sp><acc>$}$
tokId=3 , out = ^unknown<unknown>{^*Новосибирск$}$
tokId=4 , out = ^unknown<unknown>{^*ш$}$
tokId=5 , out = ^default<default>{^.<sent>$}$


tokId = 0 : *Новосибир
ruleId = 0; patNum = 1

tokId = 1 : milli
ruleId = 0; patNum = 1

tokId = 2 : üniversite
ruleId = 0; patNum = 1

tokId = 3 : *Новосибирск
ruleId = 0; patNum = 1

tokId = 4 : *ш
ruleId = 0; patNum = 1

tokId = 5 : .
ruleId = 0; patNum = 1

tok=0; rul=0; pat=1 - tok=1; rul=0; pat=1 - tok=2; rul=0; pat=1 - tok=3; rul=0; pat=1 - tok=4; rul=0; pat=1 - tok=5; rul=0; pat=1 -

However when I cancelled this and ran it again, it started to give me segmentation faults, like below:

'/home/memduh/git/apertium-ambiguous/src'/rules-applier 'ky_KG' '../../apertium-tur-kir.kir-tur.t1x' sentences.txt lextor.txt rulesOut.txt
0
Makefile:48: recipe for target 'rulesOut.txt' failed
make: *** [rulesOut.txt] Segmentation fault (core dumped)
make: *** Deleting file 'rulesOut.txt'

Waiting for a while and trying again, it now seems to be back in the first situation, stalling with no output. Should it be producing/writing out the output as it goes, or is there some long training period before it does this?

The text was updated successfully, but these errors were encountered:

aboelhamd · 2019-04-19T22:02:31Z

Hi Memduh,

I think you are using a copy where we didn't remove the debugging traces, sorry for that.

For the errors in ubuntu 18, it gave me the same behavior too, and I think it's because the compiler version.
While debugging, it complained about some hash maps accesses and since we were working on other issues, I just worked on other machine with ubuntu 16. I didn't try to downgrade or to solve the problem.
Also for the results I see, where there is no rules applied to any of the 5 tokens, I think there is something wrong with the transfer file, did you put id attribute in the rule element ?

For the run time of rules applier, I think it should take less than 10 minutes for a 1Mb file.
For the segementer, did you use our script or did you use it by your own, because there are some undesired characters we remove also with the segmenter script.

MemduhG · 2019-04-19T22:26:51Z

FYI my input file is like 114 megabytes. I used the Kazakh pragmatic segmenter for the Kyrgyz wiki (with the .rb script from your repo) and it seemed to work quite alright, though I am not sure if there are any characters that are causing problems. The rules only have comments, not IDs, they are defined like <rule comment="regla: nom-noflex">, I should add IDs to them in the t1x file if that is necessary.

I tried again with a file containing the first thousand lines of the sentences, and after some attempts the system fairly quickly output something with results for all of the original corpus that I had originally tried (all 948096 lines), which makes me wonder if the program is copying/caching the files it uses somewhere?

All the rules it chooses are "default" but I will try again once I assign IDs.

sevilaybayatli · 2019-04-19T22:51:08Z

I dont know if Kazakh segmenter(we prepare the Kazakh segmenter depend on just Kazakh data) work well with Kyrgyz too, you should add rules IDs it is necessary.

aboelhamd · 2019-04-19T22:52:36Z

Yes, it's necessary to add ids. There is a script that add ids to the rules here

For the caching problem, the program doesn't cache or copy the files it uses, honestly I don't know why it outputs results of the original file, may be you entered the sentences file with 1000 sentence but the lextor with the 948096 sentences ?

Theoretically, the program should work for very large files, but I prefer splitting files into 10Mbs files or so, it's easier to debug and less effected and less prone for faults.

If you can wait for an hour or so, I will provide you another rules-applier file without the debugging traces.

MemduhG · 2019-04-19T23:14:46Z

Ah I see, yes the makefile I wrote didnt regenerate all of that of course. I will add IDs, start again with a more reasonable file size and see what happens.

But when executing rules-applier, should it immediately begin to output the debug stuff, or just wait for a long time? I guess it was only working correctly when it was outputting them?

aboelhamd · 2019-04-19T23:17:46Z

Yes, your guess is correct.

aboelhamd · 2019-04-20T00:40:16Z

You can use this file instead. And also use the yasmet-formatter file in the same repo.

They have one change beside removing the debug traces, that's the sentences file is not needed in the input, because it has no use actually.

Try again and keep us updated with your results.

sevilaybayatli · 2019-04-20T06:36:28Z

To know what are the inputs for every program just give in command line for example ./rules-applier, then will give you what the required inputs for rules-applier, this works all of them.

…

On Sat, 20 Apr 2019, 03:40 Aboelhamd Aly, ***@***.***> wrote: You can use this file <https://github.com/aboelhamd/machine-translation/blob/master/src/RulesApplier.cpp> instead. And also use the yasmet-formatter file in the same repo. They have one change beside removing the debug traces, that's the sentences file is not needed in the input, because it has no use actually. Try again and keep us updated with your results. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#17 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFNDDVUVVHBXYTE4N26VBL3PRJQ7BANCNFSM4HHIC2ZQ> .

sevilaybayatli · 2019-04-20T08:09:33Z

According into the rules-applier, it is updated in this repository.

MemduhG · 2019-04-22T00:33:33Z

You can use this file instead. And also use the yasmet-formatter file in the same repo.

They have one change beside removing the debug traces, that's the sentences file is not needed in the input, because it has no use actually.

Try again and keep us updated with your results.

I added IDs and got segfaults very often,

'/home/memduh/git/apertium-ambiguous/src'/rules-applier 'ky_KG' '../../apertium-tur-kir.kir-tur.t1x' sentences.txt lextor.txt rulesOut.txt
0
Makefile:48: recipe for target 'rulesOut.txt' failed
make: *** [rulesOut.txt] Segmentation fault (core dumped)
make: *** Deleting file 'rulesOut.txt'

Trying it enough times makes it work eventually, but it seems completely random when it will work or not. I think I am doing something wrong with the IDs, I have added them as one can see in https://github.com/apertium/apertium-tur-kir/blob/master/apertium-tur-kir.kir-tur.t1x but I keep getting the 0 ruleID issue, and the output file is full of "defaults", as you can see here: https://termbin.com/xjiv

sevilaybayatli · 2019-04-22T00:55:59Z

Your ids looks fine in your transefr file, which rules-applier file you used, the one in this repository? if so during compiling, did give you any problem. another option, try run rules-applier in the same directory, without giving any path.

…

On Mon, Apr 22, 2019 at 3:33 AM Memduh Gökırmak ***@***.***> wrote: You can use this file <https://github.com/aboelhamd/machine-translation/blob/master/src/RulesApplier.cpp> instead. And also use the yasmet-formatter file in the same repo. They have one change beside removing the debug traces, that's the sentences file is not needed in the input, because it has no use actually. Try again and keep us updated with your results. I added IDs and got segfaults very often, '/home/memduh/git/apertium-ambiguous/src'/rules-applier 'ky_KG' '../../apertium-tur-kir.kir-tur.t1x' sentences.txt lextor.txt rulesOut.txt 0 Makefile:48: recipe for target 'rulesOut.txt' failed make: *** [rulesOut.txt] Segmentation fault (core dumped) make: *** Deleting file 'rulesOut.txt' Trying it enough times makes it work eventually, but it seems completely random when it will work or not. I think I am doing something wrong with the IDs, I have added them as one can see in https://github.com/apertium/apertium-tur-kir/blob/master/apertium-tur-kir.kir-tur.t1x but I keep getting the 0 ruleID issue, and the output file is full of "defaults", as you can see here: https://termbin.com/xjiv — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#17 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFNDDVQ2AMBAZSRKQXXJ2DDPRUBV3ANCNFSM4HHIC2ZQ> .

sevilaybayatli · 2019-04-22T01:01:09Z

you dont need give sentences.txt as input too. it has 4 input localeId, transferFilePath, lextorFilePath, interInFilePath; On Mon, Apr 22, 2019 at 3:55 AM Sevilay Bayatlı <[email protected]> wrote:

…

Your ids looks fine in your transefr file, which rules-applier file you used, the one in this repository? if so during compiling, did give you any problem. another option, try run rules-applier in the same directory, without giving any path. On Mon, Apr 22, 2019 at 3:33 AM Memduh Gökırmak ***@***.***> wrote: > You can use this file > <https://github.com/aboelhamd/machine-translation/blob/master/src/RulesApplier.cpp> > instead. And also use the yasmet-formatter file in the same repo. > > They have one change beside removing the debug traces, that's the > sentences file is not needed in the input, because it has no use actually. > > Try again and keep us updated with your results. > > I added IDs and got segfaults very often, > > '/home/memduh/git/apertium-ambiguous/src'/rules-applier 'ky_KG' '../../apertium-tur-kir.kir-tur.t1x' sentences.txt lextor.txt rulesOut.txt > 0 > Makefile:48: recipe for target 'rulesOut.txt' failed > make: *** [rulesOut.txt] Segmentation fault (core dumped) > make: *** Deleting file 'rulesOut.txt' > > Trying it enough times makes it work eventually, but it seems completely > random when it will work or not. I think I am doing something wrong with > the IDs, I have added them as one can see in > https://github.com/apertium/apertium-tur-kir/blob/master/apertium-tur-kir.kir-tur.t1x > but I keep getting the 0 ruleID issue, and the output file is full of > "defaults", as you can see here: https://termbin.com/xjiv > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#17 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AFNDDVQ2AMBAZSRKQXXJ2DDPRUBV3ANCNFSM4HHIC2ZQ> > . >

MemduhG · 2019-04-22T01:03:36Z

I didn't swap out the rules-applier with aboelhamd's, i can try that and not give it the sentences. I don't think that's what's causing the zero ID/default problem though. There were no compilation errors that i can remember when building the ambiguous project. On Mon, Apr 22, 2019, 3:01 AM sevilay bayatlı <[email protected]> wrote:

…

you dont need give sentences.txt as input too. it has 4 input localeId, transferFilePath, lextorFilePath, interInFilePath; On Mon, Apr 22, 2019 at 3:55 AM Sevilay Bayatlı ***@***.***> wrote: > Your ids looks fine in your transefr file, which rules-applier file you > used, the one in this repository? if so during compiling, did give you any > problem. > another option, try run rules-applier in the same directory, without > giving any path. > > > > > On Mon, Apr 22, 2019 at 3:33 AM Memduh Gökırmak < ***@***.***> > wrote: > >> You can use this file >> < https://github.com/aboelhamd/machine-translation/blob/master/src/RulesApplier.cpp > >> instead. And also use the yasmet-formatter file in the same repo. >> >> They have one change beside removing the debug traces, that's the >> sentences file is not needed in the input, because it has no use actually. >> >> Try again and keep us updated with your results. >> >> I added IDs and got segfaults very often, >> >> '/home/memduh/git/apertium-ambiguous/src'/rules-applier 'ky_KG' '../../apertium-tur-kir.kir-tur.t1x' sentences.txt lextor.txt rulesOut.txt >> 0 >> Makefile:48: recipe for target 'rulesOut.txt' failed >> make: *** [rulesOut.txt] Segmentation fault (core dumped) >> make: *** Deleting file 'rulesOut.txt' >> >> Trying it enough times makes it work eventually, but it seems completely >> random when it will work or not. I think I am doing something wrong with >> the IDs, I have added them as one can see in >> https://github.com/apertium/apertium-tur-kir/blob/master/apertium-tur-kir.kir-tur.t1x >> but I keep getting the 0 ruleID issue, and the output file is full of >> "defaults", as you can see here: https://termbin.com/xjiv >> >> — >> You are receiving this because you commented. >> Reply to this email directly, view it on GitHub >> < #17 (comment) >, >> or mute the thread >> < https://github.com/notifications/unsubscribe-auth/AFNDDVQ2AMBAZSRKQXXJ2DDPRUBV3ANCNFSM4HHIC2ZQ > >> . >> > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#17 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABNKPWRGGAMFHSQELJ4S3RTPRUE5LANCNFSM4HHIC2ZQ> .

sevilaybayatli · 2019-04-22T01:09:04Z

It worked for me without any problem for both language pairs, kaz-tur and spa-eng, As I said try run rules-applier in same directory not using path, btw rules-applier also updated in this repository.

and be sure you are using the right transfer file ones with ids

aboelhamd · 2019-04-22T06:37:42Z

Hi, Memduh.
I still think this arbitrary behaviour is because of the compiler version. Because by now you have done everything right.
May be you should try to downgrade the compiler to Ubuntu 16 's version.
Today isA I will try running it on some kir text, may be there is a bug with generalisation with other pairs.

sevilaybayatli · 2019-04-22T07:51:58Z

I think the problem with Ubuntu version (18) from my previous experience it gives me seg fault too, now please try it with 16.

…

On Mon, 22 Apr 2019, 09:37 Aboelhamd Aly, ***@***.***> wrote: Hi, Memduh. I still think this arbitrary behaviour is because of the compiler version. Because by now you have done everything right. May be you should try to downgrade the compiler to Ubuntu 16 's version. Today isA I will try running it on some kir text, may be there is a bug with generalisation with other pairs. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#17 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFNDDVXEPGTPXE5GIJRV7ITPRVMLNANCNFSM4HHIC2ZQ> .

aboelhamd · 2019-04-23T01:18:17Z

Hi @MemduhG , I tried running rules-applier on the same corpus you have, and I got "default" almost everywhere. In the next few days, I will try to debug the code to find what's the problem. I will keep you updated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rules-applier segmentation fault, stalling #17

rules-applier segmentation fault, stalling #17

MemduhG commented Apr 19, 2019 •

edited

Loading

aboelhamd commented Apr 19, 2019

MemduhG commented Apr 19, 2019 •

edited

Loading

sevilaybayatli commented Apr 19, 2019

aboelhamd commented Apr 19, 2019

MemduhG commented Apr 19, 2019

aboelhamd commented Apr 19, 2019

aboelhamd commented Apr 20, 2019

sevilaybayatli commented Apr 20, 2019 via email

sevilaybayatli commented Apr 20, 2019

MemduhG commented Apr 22, 2019

sevilaybayatli commented Apr 22, 2019 via email

sevilaybayatli commented Apr 22, 2019 via email

MemduhG commented Apr 22, 2019 via email

sevilaybayatli commented Apr 22, 2019

aboelhamd commented Apr 22, 2019

sevilaybayatli commented Apr 22, 2019 via email

aboelhamd commented Apr 23, 2019

rules-applier segmentation fault, stalling #17

rules-applier segmentation fault, stalling #17

Comments

MemduhG commented Apr 19, 2019 • edited Loading

aboelhamd commented Apr 19, 2019

MemduhG commented Apr 19, 2019 • edited Loading

sevilaybayatli commented Apr 19, 2019

aboelhamd commented Apr 19, 2019

MemduhG commented Apr 19, 2019

aboelhamd commented Apr 19, 2019

aboelhamd commented Apr 20, 2019

sevilaybayatli commented Apr 20, 2019 via email

sevilaybayatli commented Apr 20, 2019

MemduhG commented Apr 22, 2019

sevilaybayatli commented Apr 22, 2019 via email

sevilaybayatli commented Apr 22, 2019 via email

MemduhG commented Apr 22, 2019 via email

sevilaybayatli commented Apr 22, 2019

aboelhamd commented Apr 22, 2019

sevilaybayatli commented Apr 22, 2019 via email

aboelhamd commented Apr 23, 2019

MemduhG commented Apr 19, 2019 •

edited

Loading

MemduhG commented Apr 19, 2019 •

edited

Loading