-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rules-applier segmentation fault, stalling #17
Comments
Hi Memduh, I think you are using a copy where we didn't remove the debugging traces, sorry for that. For the errors in ubuntu 18, it gave me the same behavior too, and I think it's because the compiler version. For the run time of rules applier, I think it should take less than 10 minutes for a 1Mb file. |
FYI my input file is like 114 megabytes. I used the Kazakh pragmatic segmenter for the Kyrgyz wiki (with the .rb script from your repo) and it seemed to work quite alright, though I am not sure if there are any characters that are causing problems. The rules only have comments, not IDs, they are defined like I tried again with a file containing the first thousand lines of the sentences, and after some attempts the system fairly quickly output something with results for all of the original corpus that I had originally tried (all 948096 lines), which makes me wonder if the program is copying/caching the files it uses somewhere? All the rules it chooses are "default" but I will try again once I assign IDs. |
I dont know if Kazakh segmenter(we prepare the Kazakh segmenter depend on just Kazakh data) work well with Kyrgyz too, you should add rules IDs it is necessary. |
Yes, it's necessary to add ids. There is a script that add ids to the rules here For the caching problem, the program doesn't cache or copy the files it uses, honestly I don't know why it outputs results of the original file, may be you entered the sentences file with 1000 sentence but the lextor with the 948096 sentences ? Theoretically, the program should work for very large files, but I prefer splitting files into 10Mbs files or so, it's easier to debug and less effected and less prone for faults. If you can wait for an hour or so, I will provide you another rules-applier file without the debugging traces. |
Ah I see, yes the makefile I wrote didnt regenerate all of that of course. I will add IDs, start again with a more reasonable file size and see what happens. But when executing rules-applier, should it immediately begin to output the debug stuff, or just wait for a long time? I guess it was only working correctly when it was outputting them? |
Yes, your guess is correct. |
You can use this file instead. And also use the yasmet-formatter file in the same repo. They have one change beside removing the debug traces, that's the sentences file is not needed in the input, because it has no use actually. Try again and keep us updated with your results. |
To know what are the inputs for every program just give in command line for
example ./rules-applier, then will give you what the required inputs for
rules-applier, this works all of them.
…On Sat, 20 Apr 2019, 03:40 Aboelhamd Aly, ***@***.***> wrote:
You can use this file
<https://github.com/aboelhamd/machine-translation/blob/master/src/RulesApplier.cpp>
instead. And also use the yasmet-formatter file in the same repo.
They have one change beside removing the debug traces, that's the
sentences file is not needed in the input, because it has no use actually.
Try again and keep us updated with your results.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#17 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFNDDVUVVHBXYTE4N26VBL3PRJQ7BANCNFSM4HHIC2ZQ>
.
|
According into the rules-applier, it is updated in this repository. |
I added IDs and got segfaults very often,
Trying it enough times makes it work eventually, but it seems completely random when it will work or not. I think I am doing something wrong with the IDs, I have added them as one can see in https://github.com/apertium/apertium-tur-kir/blob/master/apertium-tur-kir.kir-tur.t1x but I keep getting the 0 ruleID issue, and the output file is full of "defaults", as you can see here: https://termbin.com/xjiv |
Your ids looks fine in your transefr file, which rules-applier file you
used, the one in this repository? if so during compiling, did give you any
problem.
another option, try run rules-applier in the same directory, without giving
any path.
…On Mon, Apr 22, 2019 at 3:33 AM Memduh Gökırmak ***@***.***> wrote:
You can use this file
<https://github.com/aboelhamd/machine-translation/blob/master/src/RulesApplier.cpp>
instead. And also use the yasmet-formatter file in the same repo.
They have one change beside removing the debug traces, that's the
sentences file is not needed in the input, because it has no use actually.
Try again and keep us updated with your results.
I added IDs and got segfaults very often,
'/home/memduh/git/apertium-ambiguous/src'/rules-applier 'ky_KG' '../../apertium-tur-kir.kir-tur.t1x' sentences.txt lextor.txt rulesOut.txt
0
Makefile:48: recipe for target 'rulesOut.txt' failed
make: *** [rulesOut.txt] Segmentation fault (core dumped)
make: *** Deleting file 'rulesOut.txt'
Trying it enough times makes it work eventually, but it seems completely
random when it will work or not. I think I am doing something wrong with
the IDs, I have added them as one can see in
https://github.com/apertium/apertium-tur-kir/blob/master/apertium-tur-kir.kir-tur.t1x
but I keep getting the 0 ruleID issue, and the output file is full of
"defaults", as you can see here: https://termbin.com/xjiv
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#17 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFNDDVQ2AMBAZSRKQXXJ2DDPRUBV3ANCNFSM4HHIC2ZQ>
.
|
you dont need give sentences.txt as input too.
it has 4 input localeId, transferFilePath, lextorFilePath, interInFilePath;
On Mon, Apr 22, 2019 at 3:55 AM Sevilay Bayatlı <[email protected]>
wrote:
… Your ids looks fine in your transefr file, which rules-applier file you
used, the one in this repository? if so during compiling, did give you any
problem.
another option, try run rules-applier in the same directory, without
giving any path.
On Mon, Apr 22, 2019 at 3:33 AM Memduh Gökırmak ***@***.***>
wrote:
> You can use this file
> <https://github.com/aboelhamd/machine-translation/blob/master/src/RulesApplier.cpp>
> instead. And also use the yasmet-formatter file in the same repo.
>
> They have one change beside removing the debug traces, that's the
> sentences file is not needed in the input, because it has no use actually.
>
> Try again and keep us updated with your results.
>
> I added IDs and got segfaults very often,
>
> '/home/memduh/git/apertium-ambiguous/src'/rules-applier 'ky_KG' '../../apertium-tur-kir.kir-tur.t1x' sentences.txt lextor.txt rulesOut.txt
> 0
> Makefile:48: recipe for target 'rulesOut.txt' failed
> make: *** [rulesOut.txt] Segmentation fault (core dumped)
> make: *** Deleting file 'rulesOut.txt'
>
> Trying it enough times makes it work eventually, but it seems completely
> random when it will work or not. I think I am doing something wrong with
> the IDs, I have added them as one can see in
> https://github.com/apertium/apertium-tur-kir/blob/master/apertium-tur-kir.kir-tur.t1x
> but I keep getting the 0 ruleID issue, and the output file is full of
> "defaults", as you can see here: https://termbin.com/xjiv
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#17 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AFNDDVQ2AMBAZSRKQXXJ2DDPRUBV3ANCNFSM4HHIC2ZQ>
> .
>
|
I didn't swap out the rules-applier with aboelhamd's, i can try that and
not give it the sentences. I don't think that's what's causing the zero
ID/default problem though.
There were no compilation errors that i can remember when building the
ambiguous project.
On Mon, Apr 22, 2019, 3:01 AM sevilay bayatlı <[email protected]>
wrote:
… you dont need give sentences.txt as input too.
it has 4 input localeId, transferFilePath, lextorFilePath, interInFilePath;
On Mon, Apr 22, 2019 at 3:55 AM Sevilay Bayatlı ***@***.***>
wrote:
> Your ids looks fine in your transefr file, which rules-applier file you
> used, the one in this repository? if so during compiling, did give you
any
> problem.
> another option, try run rules-applier in the same directory, without
> giving any path.
>
>
>
>
> On Mon, Apr 22, 2019 at 3:33 AM Memduh Gökırmak <
***@***.***>
> wrote:
>
>> You can use this file
>> <
https://github.com/aboelhamd/machine-translation/blob/master/src/RulesApplier.cpp
>
>> instead. And also use the yasmet-formatter file in the same repo.
>>
>> They have one change beside removing the debug traces, that's the
>> sentences file is not needed in the input, because it has no use
actually.
>>
>> Try again and keep us updated with your results.
>>
>> I added IDs and got segfaults very often,
>>
>> '/home/memduh/git/apertium-ambiguous/src'/rules-applier 'ky_KG'
'../../apertium-tur-kir.kir-tur.t1x' sentences.txt lextor.txt rulesOut.txt
>> 0
>> Makefile:48: recipe for target 'rulesOut.txt' failed
>> make: *** [rulesOut.txt] Segmentation fault (core dumped)
>> make: *** Deleting file 'rulesOut.txt'
>>
>> Trying it enough times makes it work eventually, but it seems completely
>> random when it will work or not. I think I am doing something wrong with
>> the IDs, I have added them as one can see in
>>
https://github.com/apertium/apertium-tur-kir/blob/master/apertium-tur-kir.kir-tur.t1x
>> but I keep getting the 0 ruleID issue, and the output file is full of
>> "defaults", as you can see here: https://termbin.com/xjiv
>>
>> —
>> You are receiving this because you commented.
>> Reply to this email directly, view it on GitHub
>> <
#17 (comment)
>,
>> or mute the thread
>> <
https://github.com/notifications/unsubscribe-auth/AFNDDVQ2AMBAZSRKQXXJ2DDPRUBV3ANCNFSM4HHIC2ZQ
>
>> .
>>
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#17 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABNKPWRGGAMFHSQELJ4S3RTPRUE5LANCNFSM4HHIC2ZQ>
.
|
It worked for me without any problem for both language pairs, kaz-tur and spa-eng, As I said try run rules-applier in same directory not using path, btw rules-applier also updated in this repository. and be sure you are using the right transfer file ones with ids |
Hi, Memduh. |
I think the problem with Ubuntu version (18) from my previous experience
it gives me seg fault too, now please try it with 16.
…On Mon, 22 Apr 2019, 09:37 Aboelhamd Aly, ***@***.***> wrote:
Hi, Memduh.
I still think this arbitrary behaviour is because of the compiler version.
Because by now you have done everything right.
May be you should try to downgrade the compiler to Ubuntu 16 's version.
Today isA I will try running it on some kir text, may be there is a bug
with generalisation with other pairs.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#17 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFNDDVXEPGTPXE5GIJRV7ITPRVMLNANCNFSM4HHIC2ZQ>
.
|
Hi @MemduhG , I tried running rules-applier on the same corpus you have, and I got "default" almost everywhere. In the next few days, I will try to debug the code to find what's the problem. I will keep you updated. |
I am trying to train the ambiguous system on Ubuntu 18.04, working in the Kyrgyz to Turkish (kir->tur) direction. I am using Wikipedia dump files and following along the
When I first ran rules-applier, it output just a 0, and did not stop after 4-5 hours after which I cancelled it.
Upon running it again it began to produce output like:
However when I cancelled this and ran it again, it started to give me segmentation faults, like below:
Waiting for a while and trying again, it now seems to be back in the first situation, stalling with no output. Should it be producing/writing out the output as it goes, or is there some long training period before it does this?
The text was updated successfully, but these errors were encountered: