Support infinite weights in lt-comp/lt-proc #62

AMR-KELEG · 2019-06-12T05:54:20Z

We need to implement a way to represent infinite weights.
The current outcome is strange!

$ cat sample.att
0       1       a       b       2
1       2       b       c       1
1       2       c       d       inf
2       0

$ lt-comp lr sample.att sa.bin
main@standard 3 3

$ lt-print sa.bin
0       1       a       b       1.000000
1       2       b       c       2.000000
1       2       c       d       -2.000000
2       0.000000

The text was updated successfully, but these errors were encountered:

flammie · 2019-06-12T16:38:23Z

I think functions like atof, strtod should just work with inf as string. Inf is not the most useful weight though, given that inf+x is inf for all x I think at least openfst just decides to bounce when it sees inf arc (considering it a non-arc; hfst also prints in xerox mode +? as analysis with weight inf and etc.).

For OOVs it's good enough to have reasonably high non-inf number, for more advanced implementations one can calculate some probability estimates like https://en.wikipedia.org/wiki/Additive_smoothing, https://en.wikipedia.org/wiki/Kneser%E2%80%93Ney_smoothing and so forth.

AMR-KELEG · 2019-06-15T20:16:05Z

Well, using laplacian smoothing will solve the problem while ensuring that OOV tokens get the highest -log(P) value.

OTOH, lt-print seems to not be showing inf weights as shown above.
I am convinced now that an edge with an infinite weight isn't that useful in most fsts.

flammie · 2019-06-18T10:20:59Z

Well, using laplacian smoothing will solve the problem while ensuring that OOV tokens get the highest -log(P) value.

Yes that should be good.

OTOH, lt-print seems to not be showing inf weights as shown above.
I am convinced now that an edge with an infinite weight isn't that useful in most fsts.

Yeah, so infinite weights in tropical semiring are mainly good for theoretical constructions like graph completion (where every state must have transition with every symbol). You could check the code where the inf parsing/printing/handling goes awry, since theoretically it should be possible to support it, but it's not a high priority at all.

mr-martian · 2022-07-06T22:34:55Z

I believe the issue here is not with lt-comp but with the way floating point numbers are written in the current file format since the functions used in compression.cc to disassemble doubles are unspecified when applied to inf (https://en.cppreference.com/w/cpp/numeric/math/frexp).

TinoDidriksen · 2022-07-07T08:22:20Z

We can reserve 0xFFFFFFFF 0xFFFFFFFF as inf. But is -inf meaningful?

flammie · 2022-07-22T11:53:15Z

I think the tropical semiring weight structures we use are only well defined in R+ including positive infinity, they may kind of work with negative values and I guess one could interpret a path with negative infinity as unconditionally top suggestion...

TinoDidriksen · 2022-07-22T16:14:09Z

Implemented by reserving 0xFFFFFFFF 0xFFFFFFFF as inf and 0xFFFFFFFF 0xFFFFFFFE as -inf.

ICU u_sscanf() only supports all-upper INF and -INF, and will print all-upper. So first quirk was adding a special case parse for lower-case inf and -inf.

See if that breaks anything.

TinoDidriksen added a commit that referenced this issue Jul 22, 2022

Support infinite weights in lt-comp (see #62)

b860ffa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support infinite weights in lt-comp/lt-proc #62

Support infinite weights in lt-comp/lt-proc #62

AMR-KELEG commented Jun 12, 2019

flammie commented Jun 12, 2019

AMR-KELEG commented Jun 15, 2019

flammie commented Jun 18, 2019

mr-martian commented Jul 6, 2022

TinoDidriksen commented Jul 7, 2022

flammie commented Jul 22, 2022

TinoDidriksen commented Jul 22, 2022

Support infinite weights in lt-comp/lt-proc #62

Support infinite weights in lt-comp/lt-proc #62

Comments

AMR-KELEG commented Jun 12, 2019

flammie commented Jun 12, 2019

AMR-KELEG commented Jun 15, 2019

flammie commented Jun 18, 2019

mr-martian commented Jul 6, 2022

TinoDidriksen commented Jul 7, 2022

flammie commented Jul 22, 2022

TinoDidriksen commented Jul 22, 2022