Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync dependencies #347

Open
lippserd opened this issue Aug 12, 2021 · 12 comments
Open

Sync dependencies #347

lippserd opened this issue Aug 12, 2021 · 12 comments
Labels
enhancement New feature or request

Comments

@lippserd
Copy link
Member

At the moment dependencies are not synchronized in Redis and the database and there is no idea for the storage scheme yet.

Since dependencies represent trees (or even graphs?), They can be difficult to store in the database.

At the moment I can think of the following ways to store the dependencies in the database:

  • Associate parent
  • Nested set
  • Materialized path
  • Adjacency list / matrix
  • Closure table

I'm sure there's more.

All schemes must be compared in terms of complexity and performance when writing and reading.

I think it also makes sense to store the parent dependencies directly for the objects. This would allow us to display these in the detail areas in Web without much effort.

@lippserd lippserd added the enhancement New feature or request label Aug 12, 2021
@lippserd lippserd added this to the v1.0.0 milestone Aug 12, 2021
@julianbrost julianbrost removed this from the 1.0.0 milestone Nov 18, 2021
@julianbrost
Copy link
Contributor

Another idea: don't sync dependencies to the relational database at all but instead only keep them in Redis. Whenever Web wants to display something about dependencies, the required graph algorithms are implemented as Lua scripts and executed on the Redis server.

Only downside with this: you couldn't use the dependency information in filters then (trying to support this would probably require some nasty work merging result sets from Redis and SQL), but things like showing all (reverse) dependencies of a checkable or rendering nice dependency graphs starting from a checkable should work nicely.

There could also be some combined solution where we think about what filtering on dependencies should be supported, write information specifically for these queries to the relational database and use Lua for everything else.

@nilmerg
Copy link
Member

nilmerg commented Aug 23, 2024

My proposal, which fits nicely with what is planned in Web, uses primarily recursive common table expressions to query connected nodes in either direction.

Required database versions:

  • MySQL 8
  • MariaDB 10.2.2
  • PostgreSQL 8.4.

The related tables consist of the following structures:

  • Adjacency Matrix (dependency_edge)

Schema Additions

Tables

dependency
A dependency's config

column type null key fk comment
id binary(20) primary - -
name text - - -
display_name text - - -
redundancy_group_id binary(20) - redundancy_group.id -
disable_checks bool_enum - - not required by Web, yet
disable_notifications bool_enum - - not required by Web, yet
ignore_soft_states bool_enum - - not required by Web, yet
timeperiod_id binary(20) - timeperiod.id not required by Web, yet
states tinyuint - - not required by Web, yet

dependency_state
The state of a dependency

column type null key fk
id binary(20) primary -
dependency_id binary(20) - dependency.id
failed bool_enum - -

redundancy_group
A redundancy group's config

column type null key fk comment
id binary(20) primary - sha1('redundancy_group|group-name|children-names') (Without the prefix, this might not be unique enough)
name text - - The value of the redundancy_group dependency property suffixed by names of all related children
display_name text - - The value of the redundancy_group dependency property

redundancy_group_state
The state of a redundancy group

column type null key fk
id binary(20) primary -
redundancy_group_id binary(20) - redundancy_group.id
failed bool_enum - -

dependency_node
A node referred to or referred by another node (i.e one that's part of a dependency relationship)

column type null key fk
id binary(20) primary -
host_id binary(20) unique1 host.id
service_id binary(20) unique1 service.id
redundancy_group_id binary(20) unique2 redundancy_group.id

dependency_edge
All direct dependency relationships (from = child, to = parent)

column type null key fk comment
from_node_id binary(20) unique1 dependency_node.id -
to_node_id binary(20) unique1 dependency_node.id -
dependency_id binary(20) - dependency.id NULL in case to_node_id refers to a redundancy_group node

Relations

erDiagram
    dependency_node ||--|| host : is
    dependency_node ||--|| service : is
    dependency_node ||--|| redundancy_group : is
    dependency_edge }|--|{ dependency_node : "refers to / referred by"
    dependency_edge ||--|| dependency : "declared by / declares"
    dependency ||--|| dependency_state : has
    dependency }|--|| redundancy_group : "member of/has members"
    dependency }|--|| timeperiod : "uses / used by"
    redundancy_group ||--|| redundancy_group_state : has
Loading

Example Queries

Root Problems

WITH RECURSIVE UnreachableNodes AS (
    SELECT id as node_id,
           0                      as height,
           cast('' as binary(20)) as ref_node_id,
           dependency_node.host_id,
           cast('' as binary(20)) as service_id,
           cast('' as binary(20)) as dependency_id,
           cast('' as binary(20)) as redundancy_group_id
    FROM dependency_node
    WHERE dependency_node.host_id = '<host-id>'
      and dependency_node.service_id is null

    UNION ALL

    SELECT e.to_node_id,
           r.height + 1,
           e.from_node_id,
           tn.host_id,
           tn.service_id,
           e.dependency_id,
           tn.redundancy_group_id
    FROM dependency_edge e
             INNER JOIN UnreachableNodes r ON e.from_node_id = r.node_id
             INNER JOIN dependency_node tn ON e.to_node_id = tn.id
             LEFT JOIN host_state hs on hs.host_id = tn.host_id
             LEFT JOIN service_state ss on ss.service_id = tn.service_id
             LEFT JOIN dependency_state ds ON e.dependency_id = ds.dependency_id
             LEFT JOIN redundancy_group_state rgs on rgs.redundancy_group_id = tn.redundancy_group_id
    WHERE ds.failed = 'y'
       or rgs.failed = 'y'
       or hs.is_reachable = 'n'
       or ss.is_reachable = 'n')
SELECT rn.node_id,
       rn.height,
       rn.ref_node_id,
       h.name  as host_name,
       s.name  as service_name,
       d.name  as dependency_name,
       rg.name as redundancy_group_name
FROM UnreachableNodes rn
         LEFT JOIN host h on rn.host_id = h.id
         LEFT JOIN service s on rn.service_id = s.id
         LEFT JOIN dependency d on rn.dependency_id = d.id
         LEFT JOIN redundancy_group rg on rn.redundancy_group_id = rg.id;

Affected Children

WITH RECURSIVE UnreachableNodes AS (
    SELECT id as node_id,
           0                      as depth,
           cast('' as binary(20)) as ref_node_id,
           dependency_node.host_id,
           cast('' as binary(20)) as service_id,
           cast('' as binary(20)) as dependency_id,
           cast('' as binary(20)) as redundancy_group_id
    FROM dependency_node
    WHERE dependency_node.host_id = '<host-id>'
      and dependency_node.service_id is null

    UNION ALL

    SELECT e.from_node_id,
           r.depth + 1,
           e.to_node_id,
           fn.host_id,
           fn.service_id,
           e.dependency_id,
           fn.redundancy_group_id
    FROM dependency_edge e
             INNER JOIN UnreachableNodes r ON e.to_node_id = r.node_id
             INNER JOIN dependency_node fn ON e.from_node_id = fn.id
             LEFT JOIN host_state hs on hs.host_id = fn.host_id
             LEFT JOIN service_state ss on ss.service_id = fn.service_id
             LEFT JOIN dependency_state ds ON e.dependency_id = ds.dependency_id
             LEFT JOIN redundancy_group_state rgs on rgs.redundancy_group_id = fn.redundancy_group_id
    WHERE ds.failed = 'y'
       or rgs.failed = 'y'
       or hs.is_reachable = 'n'
       or ss.is_reachable = 'n')
SELECT rn.node_id,
       rn.depth,
       rn.ref_node_id,
       h.name  as host_name,
       s.name  as service_name,
       d.name  as dependency_name,
       rg.name as redundancy_group_name
FROM UnreachableNodes rn
         LEFT JOIN host h on rn.host_id = h.id
         LEFT JOIN service s on rn.service_id = s.id
         LEFT JOIN dependency d on rn.dependency_id = d.id
         LEFT JOIN redundancy_group rg on rn.redundancy_group_id = rg.id;

Direct Parents

SELECT e.to_node_id,
       h.name  as host_name,
       s.name  as service_name,
       d.name  as dependency_name,
       rg.name as redundancy_group_name
FROM dependency_edge e
         INNER JOIN dependency_node tn ON e.to_node_id = tn.id
         INNER JOIN dependency_node fn ON e.from_node_id = fn.id
         LEFT JOIN host h on tn.host_id = h.id
         LEFT JOIN service s on tn.service_id = s.id
         LEFT JOIN dependency d on e.dependency_id = d.id
         LEFT JOIN redundancy_group rg on tn.redundancy_group_id = rg.id
WHERE fn.host_id = '<host-id>'
  and fn.service_id is null;

Direct Children

SELECT e.from_node_id,
       h.name  as host_name,
       s.name  as service_name,
       d.name  as dependency_name,
       rg.name as redundancy_group_name
FROM dependency_edge e
         INNER JOIN dependency_node tn ON e.to_node_id = tn.id
         INNER JOIN dependency_node fn ON e.from_node_id = fn.id
         LEFT JOIN host h on fn.host_id = h.id
         LEFT JOIN service s on fn.service_id = s.id
         LEFT JOIN dependency d on e.dependency_id = d.id
         LEFT JOIN redundancy_group rg on fn.redundancy_group_id = rg.id
WHERE tn.host_id = '<host-id>'
  and tn.service_id is null;

@julianbrost
Copy link
Contributor

dependency_state The state of the dependency's parent
state int ❌ - -

That's redundant with the host/service state information and I believe it also doesn't provide the information you hope for. I think you might want a failed boolean instead, given that the Icinga 2 Dependency object has attributes like states and ignore_soft_states that affect when a dependency counts as failed, i.e. marking the child object unreachable.

Similarly for redundancy groups. After all, they are just a set of dependencies that counts as failed if all individual dependencies are failed.

dependency_node_parent All parents a node is connected with, beyond all edges

This table can become huge. Imagine a dependency structure like the following (boxes are hosts, arrows are dependencies), scale up the top and bottom level and the size of this table will grow quadratically in the size of the Icinga 2 config.

graph TD;
    A1-->B;
    A2-->B;
	A3-->B;
	A4-->B;
    B-->C1;
    B-->C2;
	B-->C3;
	B-->C4;
Loading

So if this table is necessary, I see three options:

  1. Introduce some new limitation to the Icinga 2 config, something like any object must have at most N total (direct and indirect) parents.
  2. Have a similar check, but don't enforce it for the whole Icinga 2 config, but instead, skip writing rows into this table, i.e. it may be incomplete.
  3. Ignore the problem and if someone builds an unfortunate dependency structure, the whole system will fall over.

All direct dependency relationships (from = child, to = parent)

Why not simply stick to "child" and "parent"?

The related tables consist of the following structures:

  • Adjacency Matrix
  • Closure Table
  • Transitive Closure Table

How do these map to the tables given?

Also, can you please give an example how you'd expect these additional virtual nodes to be generated? At least with redundancy groups, that's not obvious.

@nilmerg
Copy link
Member

nilmerg commented Aug 23, 2024

I think you might want a failed boolean instead

Yeah, I thought of this, but figured this would be noted sooner or later, as it happend now ;)

This table can become huge

Before this becomes a problem (for the database or our queries), doesn't Icinga already suffer from this anyway? I'd go with 3 for now.

Why not simply stick to "child" and "parent"?

Hah, I knew this question comes up. Because the table is called _edge, not _relation, and left or right were not neutral enough to me. 😅

How do these map to the tables given?

Eh. There is no simple closure table. Added the names to the other two.

can you please give an example how you'd expect these additional virtual nodes to be generated?

Done.

@julianbrost
Copy link
Contributor

This table can become huge

Before this becomes a problem (for the database or our queries), doesn't Icinga already suffer from this anyway? I'd go with 3 for now.

To be honest, I don't know what would happen exactly, it's quite possible that things will get slower with lots of dependencies. But so far, it does not expand all indirect dependencies, so it's not obvious that this problem already exists.

Why not simply stick to "child" and "parent"?

Hah, I knew this question comes up. Because the table is called _edge, not _relation, and left or right were not neutral enough to me. 😅

Just to be sure: we're still talking about a directed graph here (otherwise, I don't know how this should work)? And the edges are aligned in the direction between a child and a parent, the edge just doesn't necessarily directly connect to something that's a host/service, i.e. the parent/child of a dependency in Icinga 2.

can you please give an example how you'd expect these additional virtual nodes to be generated?

Done.

That's not what I meant. More like which of these options do you have in mind? Or something totally different?

graph LR;
    ChildHost-->RedundancyGroup;
	RedundancyGroup-->ParentHostA;
	RedundancyGroup-->ParentHostB;
	ChildHost-->Dependency;
	Dependency-->ParentHostC;
Loading
graph LR;
    ChildHost-->RedundancyGroup;
	RedundancyGroup-->DependencyA;
	DependencyA-->ParentHostA;
	RedundancyGroup-->DependencyB;
	DependencyB-->ParentHostB;
	ChildHost-->Dependency;
	Dependency-->ParentHostC;
Loading

@nilmerg
Copy link
Member

nilmerg commented Aug 23, 2024

That's not what I meant. More like which of these options do you have in mind? Or something totally different?

Ah. Option 2 then.

@yhabteab
Copy link
Member

dependency_node_parent All parents a node is connected with, beyond all edges

This table can become huge.

As Julian have already pointed out, the dependenc_node_parent table is also superfluous to me, as I believe that the information for the Affected Children1 could also be assembled without this table (from the edges and nodes) by traversing the graph edges.

redundancy_group_state
The state of a redundancy group

I'm not sure at the moment why this is going to be useful, but technically redundancy groups only define a way to construct alternate dependencies and don't provide any actual states by themselves. So, why not just link the redundancy_group table to the dependency table, like it's done for the normal non-redundant dependencies? Alternatively, don't create a direct link at all, as the dependency can be reached via the dependency_node table.

(NOT EXISTS(SELECT 1 FROM dependency_edge WHERE from_node_id = e.to_node_id) and
      (hs.is_problem = 'y' or ss.is_problem = 'y'))) # the top most node is always reachable

Looking at this where clause, I'm wondering why the nodes are being expanded to themselves in the first place! Isn't that an indication of a cyclic graph (dependency), which Icinga 2 clearly prohibits, since Icinga 2 >= 2.14?

Footnotes

  1. It currently only seems to be used in your Affected Children example query, albeit at the moment I'm not sure how this information is supposed to represent affected children as it simply joins the tables and sums up the result, and apart from that, just representing the node edges increases the database storage exponentially, and duplicating that information makes no sense to me.

@nilmerg
Copy link
Member

nilmerg commented Aug 26, 2024

the dependency_node_parent table is also superfluous to me

It is not required to re-construct any relationships, yes. But it stores information crucial for a very specific use-case more efficiently. At least compared to work required to fetch the same information by traversing the edges each time. This table is only there to easily aggregate the number of affected children, as in the example. Without this table, this information cannot be shown in the UI in lists. (e.g. host and service list)

but technically redundancy groups only define a way to construct alternate dependencies and don't provide any actual states by themselves.

It's this alternation, which is abstracted away by this table. I don't want to have to express this in a query. Icinga knows about this, and can evaluate/update it in case it changes.

Looking at this where clause

This doesn't check for cyclic dependencies. It's used to know in advance that the recursion will stop with this row.

@nilmerg
Copy link
Member

nilmerg commented Aug 28, 2024

I've updated the proposal now with what I've discussed with @julianbrost yesterday. This is mainly:

  • Dependencies are no longer nodes in the graph. As such they are not referenced by dependency_node anymore. Instead, they are linked with dependency_edge
  • dependency now has additional columns in order to better reflect this as a new configuration table
  • The CTEs subquery has been removed, as edges now know their dependency's state which must be failed in case of a top node being able to include as root problem
  • Redundancy groups are no longer tracked per child relationship, but instead per children (plural) relationship. This reduces the amount of group nodes significantly and improves the visual representation. I can provide a more detailed description in person, or @julianbrost should also know about this
  • Regarding root problems and affected children, there are some alternative queries on the table now. They are relevant for the decision whether to require table dependency_node_parent or not

@nilmerg
Copy link
Member

nilmerg commented Aug 29, 2024

Regarding root problems and affected children, there are some alternative queries on the table now. They are relevant for the decision whether to require table dependency_node_parent or not

dependency_node_parent is gone now. In the end, it was only beneficial for a very specific use-case. (i.e the list of affected children.)

The total number of affected children, which Web would like to show in a host's or service's list item, is better be calculated by Icinga (DB) in advance and stored in table host and service respectively. Since this would only serve to indicate a very important node, this can only be an approximation (e.g. 1000+) if such an optimization is necessary.

The list of affected children will either be not shown, be a simple link to the zoomed in map or only be visible in case the parent has a problem while the list then only includes unreachable nodes. This is because, traversing the hierarchy top down (with a CTE) is considered to be very expensive, compared to a bottom up approach which is done when querying root problems.

nilmerg added a commit to Icinga/icingadb-web that referenced this issue Sep 5, 2024
Models for the new schema additions described here:
Icinga/icingadb#347 (comment)
@sukhwinder33445
Copy link

sukhwinder33445 commented Oct 22, 2024

The redundancy_group_state should contain an additional column, responsible_node_id. This column contains the dependency_node_id (host/service) which is responsible for the change in failed column.

Responsible Node:

  • The last UP|OK state member is also DOWN|CRITICAL, which leads to (redundancy_group_state.failed=y).

  • Any DOWN|CRITICAL state member is UP|OK now, which leads to (redundancy_group_state.failed=n).

Use case:

In the redundancy group details view, we want to display the plugin output of the member that affected the redundancy_group_state.failed column OR has the highest severity.

Problem:

In case, a member (other than the responsible one) becomes DOWN|CRITICAL, this will not affect the state of the group, as some other members are still UP|OK. We will still have the plugin output of the last resposible_node (which may now be OK) even though there is another member with higher severity.

If the group status is also OK|UP, the responsible_node_id must be updated with the member who has the highest severity.
@oxzi Is this easy to handle for icinga or do we have to come up with another solution?

@sukhwinder33445
Copy link

Update

We have dropped the idea of displaying the plugin output for redundancy groups. We therefore no longer need the responsible_node_id coulmn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants