Sync dependencies #347

lippserd · 2021-08-12T07:32:37Z

At the moment dependencies are not synchronized in Redis and the database and there is no idea for the storage scheme yet.

Since dependencies represent trees (or even graphs?), They can be difficult to store in the database.

At the moment I can think of the following ways to store the dependencies in the database:

Associate parent
Nested set
Materialized path
Adjacency list / matrix
Closure table

I'm sure there's more.

All schemes must be compared in terms of complexity and performance when writing and reading.

I think it also makes sense to store the parent dependencies directly for the objects. This would allow us to display these in the detail areas in Web without much effort.

julianbrost · 2022-01-26T14:19:24Z

Another idea: don't sync dependencies to the relational database at all but instead only keep them in Redis. Whenever Web wants to display something about dependencies, the required graph algorithms are implemented as Lua scripts and executed on the Redis server.

Only downside with this: you couldn't use the dependency information in filters then (trying to support this would probably require some nasty work merging result sets from Redis and SQL), but things like showing all (reverse) dependencies of a checkable or rendering nice dependency graphs starting from a checkable should work nicely.

There could also be some combined solution where we think about what filtering on dependencies should be supported, write information specifically for these queries to the relational database and use Lua for everything else.

nilmerg · 2024-08-23T12:16:50Z

My proposal, which fits nicely with what is planned in Web, uses primarily recursive common table expressions to query connected nodes in either direction.

Required database versions:

MySQL 8
MariaDB 10.2.2
PostgreSQL 8.4.

The related tables consist of the following structures:

Adjacency Matrix (dependency_edge)

Schema Additions

Tables

dependency
A dependency's config

column	type	null	key	fk	comment
id	binary(20)	❌	primary	-	-
name	text	❌	-	-	-
display_name	text	❌	-	-	-
redundancy_group_id	binary(20)	✅	-	redundancy_group.id	-
disable_checks	bool_enum	❌	-	-	not required by Web, yet
disable_notifications	bool_enum	❌	-	-	not required by Web, yet
ignore_soft_states	bool_enum	❌	-	-	not required by Web, yet
timeperiod_id	binary(20)	✅	-	timeperiod.id	not required by Web, yet
states	tinyuint	❌	-	-	not required by Web, yet

dependency_state
The state of a dependency

column	type	null	key	fk
id	binary(20)	❌	primary	-
dependency_id	binary(20)	❌	-	dependency.id
failed	bool_enum	❌	-	-

redundancy_group
A redundancy group's config

column	type	null	key	fk	comment
id	binary(20)	❌	primary	-	`sha1('redundancy_group\|group-name\|children-names')` (Without the prefix, this might not be unique enough)
name	text	❌	-	-	The value of the `redundancy_group` dependency property suffixed by names of all related children
display_name	text	❌	-	-	The value of the `redundancy_group` dependency property

redundancy_group_state
The state of a redundancy group

column	type	null	key	fk
id	binary(20)	❌	primary	-
redundancy_group_id	binary(20)	❌	-	redundancy_group.id
failed	bool_enum	❌	-	-

dependency_node
A node referred to or referred by another node (i.e one that's part of a dependency relationship)

column	type	null	key	fk
id	binary(20)	❌	primary	-
host_id	binary(20)	✅	unique1	host.id
service_id	binary(20)	✅	unique1	service.id
redundancy_group_id	binary(20)	✅	unique2	redundancy_group.id

dependency_edge
All direct dependency relationships (from = child, to = parent)

column	type	null	key	fk	comment
from_node_id	binary(20)	❌	unique1	dependency_node.id	-
to_node_id	binary(20)	❌	unique1	dependency_node.id	-
dependency_id	binary(20)	✅	-	dependency.id	`NULL` in case `to_node_id` refers to a redundancy_group node

Relations

erDiagram
    dependency_node ||--|| host : is
    dependency_node ||--|| service : is
    dependency_node ||--|| redundancy_group : is
    dependency_edge }|--|{ dependency_node : "refers to / referred by"
    dependency_edge ||--|| dependency : "declared by / declares"
    dependency ||--|| dependency_state : has
    dependency }|--|| redundancy_group : "member of/has members"
    dependency }|--|| timeperiod : "uses / used by"
    redundancy_group ||--|| redundancy_group_state : has

Example Queries

Root Problems

WITH RECURSIVE UnreachableNodes AS (
    SELECT id as node_id,
           0                      as height,
           cast('' as binary(20)) as ref_node_id,
           dependency_node.host_id,
           cast('' as binary(20)) as service_id,
           cast('' as binary(20)) as dependency_id,
           cast('' as binary(20)) as redundancy_group_id
    FROM dependency_node
    WHERE dependency_node.host_id = '<host-id>'
      and dependency_node.service_id is null

    UNION ALL

    SELECT e.to_node_id,
           r.height + 1,
           e.from_node_id,
           tn.host_id,
           tn.service_id,
           e.dependency_id,
           tn.redundancy_group_id
    FROM dependency_edge e
             INNER JOIN UnreachableNodes r ON e.from_node_id = r.node_id
             INNER JOIN dependency_node tn ON e.to_node_id = tn.id
             LEFT JOIN host_state hs on hs.host_id = tn.host_id
             LEFT JOIN service_state ss on ss.service_id = tn.service_id
             LEFT JOIN dependency_state ds ON e.dependency_id = ds.dependency_id
             LEFT JOIN redundancy_group_state rgs on rgs.redundancy_group_id = tn.redundancy_group_id
    WHERE ds.failed = 'y'
       or rgs.failed = 'y'
       or hs.is_reachable = 'n'
       or ss.is_reachable = 'n')
SELECT rn.node_id,
       rn.height,
       rn.ref_node_id,
       h.name  as host_name,
       s.name  as service_name,
       d.name  as dependency_name,
       rg.name as redundancy_group_name
FROM UnreachableNodes rn
         LEFT JOIN host h on rn.host_id = h.id
         LEFT JOIN service s on rn.service_id = s.id
         LEFT JOIN dependency d on rn.dependency_id = d.id
         LEFT JOIN redundancy_group rg on rn.redundancy_group_id = rg.id;

Affected Children

WITH RECURSIVE UnreachableNodes AS (
    SELECT id as node_id,
           0                      as depth,
           cast('' as binary(20)) as ref_node_id,
           dependency_node.host_id,
           cast('' as binary(20)) as service_id,
           cast('' as binary(20)) as dependency_id,
           cast('' as binary(20)) as redundancy_group_id
    FROM dependency_node
    WHERE dependency_node.host_id = '<host-id>'
      and dependency_node.service_id is null

    UNION ALL

    SELECT e.from_node_id,
           r.depth + 1,
           e.to_node_id,
           fn.host_id,
           fn.service_id,
           e.dependency_id,
           fn.redundancy_group_id
    FROM dependency_edge e
             INNER JOIN UnreachableNodes r ON e.to_node_id = r.node_id
             INNER JOIN dependency_node fn ON e.from_node_id = fn.id
             LEFT JOIN host_state hs on hs.host_id = fn.host_id
             LEFT JOIN service_state ss on ss.service_id = fn.service_id
             LEFT JOIN dependency_state ds ON e.dependency_id = ds.dependency_id
             LEFT JOIN redundancy_group_state rgs on rgs.redundancy_group_id = fn.redundancy_group_id
    WHERE ds.failed = 'y'
       or rgs.failed = 'y'
       or hs.is_reachable = 'n'
       or ss.is_reachable = 'n')
SELECT rn.node_id,
       rn.depth,
       rn.ref_node_id,
       h.name  as host_name,
       s.name  as service_name,
       d.name  as dependency_name,
       rg.name as redundancy_group_name
FROM UnreachableNodes rn
         LEFT JOIN host h on rn.host_id = h.id
         LEFT JOIN service s on rn.service_id = s.id
         LEFT JOIN dependency d on rn.dependency_id = d.id
         LEFT JOIN redundancy_group rg on rn.redundancy_group_id = rg.id;

Direct Parents

SELECT e.to_node_id,
       h.name  as host_name,
       s.name  as service_name,
       d.name  as dependency_name,
       rg.name as redundancy_group_name
FROM dependency_edge e
         INNER JOIN dependency_node tn ON e.to_node_id = tn.id
         INNER JOIN dependency_node fn ON e.from_node_id = fn.id
         LEFT JOIN host h on tn.host_id = h.id
         LEFT JOIN service s on tn.service_id = s.id
         LEFT JOIN dependency d on e.dependency_id = d.id
         LEFT JOIN redundancy_group rg on tn.redundancy_group_id = rg.id
WHERE fn.host_id = '<host-id>'
  and fn.service_id is null;

Direct Children

SELECT e.from_node_id,
       h.name  as host_name,
       s.name  as service_name,
       d.name  as dependency_name,
       rg.name as redundancy_group_name
FROM dependency_edge e
         INNER JOIN dependency_node tn ON e.to_node_id = tn.id
         INNER JOIN dependency_node fn ON e.from_node_id = fn.id
         LEFT JOIN host h on fn.host_id = h.id
         LEFT JOIN service s on fn.service_id = s.id
         LEFT JOIN dependency d on e.dependency_id = d.id
         LEFT JOIN redundancy_group rg on fn.redundancy_group_id = rg.id
WHERE tn.host_id = '<host-id>'
  and tn.service_id is null;

julianbrost · 2024-08-23T13:24:46Z

dependency_state The state of the dependency's parent
state int ❌ - -

That's redundant with the host/service state information and I believe it also doesn't provide the information you hope for. I think you might want a failed boolean instead, given that the Icinga 2 Dependency object has attributes like states and ignore_soft_states that affect when a dependency counts as failed, i.e. marking the child object unreachable.

Similarly for redundancy groups. After all, they are just a set of dependencies that counts as failed if all individual dependencies are failed.

dependency_node_parent All parents a node is connected with, beyond all edges

This table can become huge. Imagine a dependency structure like the following (boxes are hosts, arrows are dependencies), scale up the top and bottom level and the size of this table will grow quadratically in the size of the Icinga 2 config.

graph TD;
    A1-->B;
    A2-->B;
	A3-->B;
	A4-->B;
    B-->C1;
    B-->C2;
	B-->C3;
	B-->C4;

So if this table is necessary, I see three options:

Introduce some new limitation to the Icinga 2 config, something like any object must have at most N total (direct and indirect) parents.
Have a similar check, but don't enforce it for the whole Icinga 2 config, but instead, skip writing rows into this table, i.e. it may be incomplete.
Ignore the problem and if someone builds an unfortunate dependency structure, the whole system will fall over.

All direct dependency relationships (from = child, to = parent)

Why not simply stick to "child" and "parent"?

The related tables consist of the following structures:

Adjacency Matrix

Closure Table

Transitive Closure Table

How do these map to the tables given?

Also, can you please give an example how you'd expect these additional virtual nodes to be generated? At least with redundancy groups, that's not obvious.

nilmerg · 2024-08-23T13:53:47Z

I think you might want a failed boolean instead

Yeah, I thought of this, but figured this would be noted sooner or later, as it happend now ;)

This table can become huge

Before this becomes a problem (for the database or our queries), doesn't Icinga already suffer from this anyway? I'd go with 3 for now.

Why not simply stick to "child" and "parent"?

Hah, I knew this question comes up. Because the table is called _edge, not _relation, and left or right were not neutral enough to me. 😅

How do these map to the tables given?

Eh. There is no simple closure table. Added the names to the other two.

can you please give an example how you'd expect these additional virtual nodes to be generated?

Done.

julianbrost · 2024-08-23T14:11:11Z

This table can become huge

Before this becomes a problem (for the database or our queries), doesn't Icinga already suffer from this anyway? I'd go with 3 for now.

To be honest, I don't know what would happen exactly, it's quite possible that things will get slower with lots of dependencies. But so far, it does not expand all indirect dependencies, so it's not obvious that this problem already exists.

Why not simply stick to "child" and "parent"?

Hah, I knew this question comes up. Because the table is called _edge, not _relation, and left or right were not neutral enough to me. 😅

Just to be sure: we're still talking about a directed graph here (otherwise, I don't know how this should work)? And the edges are aligned in the direction between a child and a parent, the edge just doesn't necessarily directly connect to something that's a host/service, i.e. the parent/child of a dependency in Icinga 2.

can you please give an example how you'd expect these additional virtual nodes to be generated?

Done.

That's not what I meant. More like which of these options do you have in mind? Or something totally different?

graph LR;
    ChildHost-->RedundancyGroup;
	RedundancyGroup-->ParentHostA;
	RedundancyGroup-->ParentHostB;
	ChildHost-->Dependency;
	Dependency-->ParentHostC;

graph LR;
    ChildHost-->RedundancyGroup;
	RedundancyGroup-->DependencyA;
	DependencyA-->ParentHostA;
	RedundancyGroup-->DependencyB;
	DependencyB-->ParentHostB;
	ChildHost-->Dependency;
	Dependency-->ParentHostC;

nilmerg · 2024-08-23T14:24:29Z

That's not what I meant. More like which of these options do you have in mind? Or something totally different?

~~Ah. Option 2 then.~~

yhabteab · 2024-08-26T11:02:03Z

dependency_node_parent All parents a node is connected with, beyond all edges

This table can become huge.

As Julian have already pointed out, the dependenc_node_parent table is also superfluous to me, as I believe that the information for the Affected Children¹ could also be assembled without this table (from the edges and nodes) by traversing the graph edges.

redundancy_group_state
The state of a redundancy group

I'm not sure at the moment why this is going to be useful, but technically redundancy groups only define a way to construct alternate dependencies and don't provide any actual states by themselves. So, why not just link the redundancy_group table to the dependency table, like it's done for the normal non-redundant dependencies? Alternatively, don't create a direct link at all, as the dependency can be reached via the dependency_node table.

(NOT EXISTS(SELECT 1 FROM dependency_edge WHERE from_node_id = e.to_node_id) and
      (hs.is_problem = 'y' or ss.is_problem = 'y'))) # the top most node is always reachable

Looking at this where clause, I'm wondering why the nodes are being expanded to themselves in the first place! Isn't that an indication of a cyclic graph (dependency), which Icinga 2 clearly prohibits, since Icinga 2 >= 2.14?

It currently only seems to be used in your Affected Children example query, albeit at the moment I'm not sure how this information is supposed to represent affected children as it simply joins the tables and sums up the result, and apart from that, just representing the node edges increases the database storage exponentially, and duplicating that information makes no sense to me. ↩

nilmerg · 2024-08-26T12:37:37Z

the dependency_node_parent table is also superfluous to me

It is not required to re-construct any relationships, yes. But it stores information crucial for a very specific use-case more efficiently. At least compared to work required to fetch the same information by traversing the edges each time. This table is only there to easily aggregate the number of affected children, as in the example. Without this table, this information cannot be shown in the UI in lists. (e.g. host and service list)

but technically redundancy groups only define a way to construct alternate dependencies and don't provide any actual states by themselves.

It's this alternation, which is abstracted away by this table. I don't want to have to express this in a query. Icinga knows about this, and can evaluate/update it in case it changes.

Looking at this where clause

This doesn't check for cyclic dependencies. It's used to know in advance that the recursion will stop with this row.

nilmerg · 2024-08-28T08:48:12Z

I've updated the proposal now with what I've discussed with @julianbrost yesterday. This is mainly:

Dependencies are no longer nodes in the graph. As such they are not referenced by dependency_node anymore. Instead, they are linked with dependency_edge
dependency now has additional columns in order to better reflect this as a new configuration table
The CTEs subquery has been removed, as edges now know their dependency's state which must be failed in case of a top node being able to include as root problem
Redundancy groups are no longer tracked per child relationship, but instead per children (plural) relationship. This reduces the amount of group nodes significantly and improves the visual representation. I can provide a more detailed description in person, or @julianbrost should also know about this
Regarding root problems and affected children, there are some alternative queries on the table now. They are relevant for the decision whether to require table dependency_node_parent or not

nilmerg · 2024-08-29T15:29:04Z

Regarding root problems and affected children, there are some alternative queries on the table now. They are relevant for the decision whether to require table dependency_node_parent or not

dependency_node_parent is gone now. In the end, it was only beneficial for a very specific use-case. (i.e the list of affected children.)

The total number of affected children, which Web would like to show in a host's or service's list item, is better be calculated by Icinga (DB) in advance and stored in table host and service respectively. Since this would only serve to indicate a very important node, this can only be an approximation (e.g. 1000+) if such an optimization is necessary.

The list of affected children will either be not shown, be a simple link to the zoomed in map or only be visible in case the parent has a problem while the list then only includes unreachable nodes. This is because, traversing the hierarchy top down (with a CTE) is considered to be very expensive, compared to a bottom up approach which is done when querying root problems.

Models for the new schema additions described here: Icinga/icingadb#347 (comment)

sukhwinder33445 · 2024-10-22T13:16:32Z

The redundancy_group_state should contain an additional column, responsible_node_id. This column contains the dependency_node_id (host/service) which is responsible for the change in failed column.

Responsible Node:

The last UP|OK state member is also DOWN|CRITICAL, which leads to (redundancy_group_state.failed=y).
Any DOWN|CRITICAL state member is UP|OK now, which leads to (redundancy_group_state.failed=n).

Use case:

In the redundancy group details view, we want to display the plugin output of the member that affected the redundancy_group_state.failed column OR has the highest severity.

Problem:

In case, a member (other than the responsible one) becomes DOWN|CRITICAL, this will not affect the state of the group, as some other members are still UP|OK. We will still have the plugin output of the last resposible_node (which may now be OK) even though there is another member with higher severity.

If the group status is also OK|UP, the responsible_node_id must be updated with the member who has the highest severity.
@oxzi Is this easy to handle for icinga or do we have to come up with another solution?

sukhwinder33445 · 2024-10-28T10:42:32Z

Update

We have dropped the idea of displaying the plugin output for redundancy groups. We therefore no longer need the responsible_node_id coulmn.

lippserd added the enhancement New feature or request label Aug 12, 2021

lippserd added this to the v1.0.0 milestone Aug 12, 2021

julianbrost removed this from the 1.0.0 milestone Nov 18, 2021

This was referenced Sep 2, 2024

Dependency schema Icinga/icingadb-web#1049

Merged

schema: Add dependency additions #795

Draft

nilmerg added a commit to Icinga/icingadb-web that referenced this issue Sep 5, 2024

Dependency schema (#1049)

ec1d3d7

Models for the new schema additions described here: Icinga/icingadb#347 (comment)

nilmerg mentioned this issue Oct 16, 2024

Let redundancy groups not just fail Icinga/icinga2#10190

Open

oxzi mentioned this issue Nov 12, 2024

IcingaDB: Synchronize Dependencies and Redundancy Groups into Redis Icinga/icinga2#10227

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync dependencies #347

Sync dependencies #347

lippserd commented Aug 12, 2021

julianbrost commented Jan 26, 2022

nilmerg commented Aug 23, 2024 •

edited by raviks789

Loading

julianbrost commented Aug 23, 2024

nilmerg commented Aug 23, 2024

julianbrost commented Aug 23, 2024

nilmerg commented Aug 23, 2024 •

edited

Loading

yhabteab commented Aug 26, 2024

nilmerg commented Aug 26, 2024

nilmerg commented Aug 28, 2024

nilmerg commented Aug 29, 2024

sukhwinder33445 commented Oct 22, 2024 •

edited

Loading

sukhwinder33445 commented Oct 28, 2024

Sync dependencies #347

Sync dependencies #347

Comments

lippserd commented Aug 12, 2021

julianbrost commented Jan 26, 2022

nilmerg commented Aug 23, 2024 • edited by raviks789 Loading

Schema Additions

Tables

Relations

Example Queries

Root Problems

Affected Children

Direct Parents

Direct Children

julianbrost commented Aug 23, 2024

nilmerg commented Aug 23, 2024

julianbrost commented Aug 23, 2024

nilmerg commented Aug 23, 2024 • edited Loading

yhabteab commented Aug 26, 2024

Footnotes

nilmerg commented Aug 26, 2024

nilmerg commented Aug 28, 2024

nilmerg commented Aug 29, 2024

sukhwinder33445 commented Oct 22, 2024 • edited Loading

Responsible Node:

Use case:

Problem:

sukhwinder33445 commented Oct 28, 2024

Update

nilmerg commented Aug 23, 2024 •

edited by raviks789

Loading

nilmerg commented Aug 23, 2024 •

edited

Loading

sukhwinder33445 commented Oct 22, 2024 •

edited

Loading