Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Transformation JMES Path Predicates #1086

Merged

Conversation

AndreKurait
Copy link
Member

@AndreKurait AndreKurait commented Oct 21, 2024

Description

Implement Transformation JMES Path Predicates

Components:

  • Adds IJsonPredicate/Provider to define a spec for a custom precondition
  • Adds JMESPathPredicate/Provider to define defining preconditions with JMES
  • Adds PredicatesLoader to use service loading for preconditions
  • Adds JsonConditionalTransformerProvider which uses PredicateLoader and TransformationLoader to pull in classes to delegate functionality to
  • Separates Loaders into dedicated module jsonMessageTransformerLoaders

  • Category: New Feature
  • Why these changes are required? Improve Transformation Ux
  • What is the old behavior before changes and new behavior after changes? New ability to specify conditional transformations.

Issues Resolved

MIGRATIONS-2146

Is this a backport? If so, please add backport PR # and/or commits #

Testing

Added unit testing, performed some E2E manual testing in docker.

Example Transformation:

[
  {
    "JsonConditionalTransformerProvider": [
      [
        {
          "JsonJMESPathPredicateProvider": {
            "script": "URI == '/testindex/_search'"
          }
        }
      ],
      [
        {
          "JsonJoltTransformerProvider": {
            "canned": "ADD_GZIP"
          }
        }
      ]
    ]
  },
  {
    "JsonJoltTransformerProvider": {
      "script": {
        "operation": "modify-overwrite-beta",
        "spec": {
          "headers": {
            "newHeader": "newValue"
          }
        }
      }
    }
  }
]

Check List

  • New functionality includes testing
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

codecov bot commented Oct 21, 2024

Codecov Report

Attention: Patch coverage is 61.76471% with 39 lines in your changes missing coverage. Please review.

Project coverage is 80.51%. Comparing base (db0075e) to head (18824d0).
Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
...ensearch/migrations/transform/PredicateLoader.java 61.22% 15 Missing and 4 partials ⚠️
...tions/transform/JsonJMESPathPredicateProvider.java 40.00% 6 Missing and 3 partials ⚠️
.../transform/JsonConditionalTransformerProvider.java 68.42% 4 Missing and 2 partials ⚠️
...rch/migrations/transform/TransformationLoader.java 25.00% 2 Missing and 1 partial ⚠️
...grations/transform/JsonConditionalTransformer.java 71.42% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1086      +/-   ##
============================================
- Coverage     80.66%   80.51%   -0.15%     
- Complexity     2893     2915      +22     
============================================
  Files           383      389       +6     
  Lines         14361    14461     +100     
  Branches        989      998       +9     
============================================
+ Hits          11584    11643      +59     
- Misses         2184     2212      +28     
- Partials        593      606      +13     
Flag Coverage Δ
gradle-test 78.58% <61.76%> (-0.17%) ⬇️
python-test 90.33% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…nLoaders to separate package

Signed-off-by: Andre Kurait <[email protected]>
Copy link
Collaborator

@gregschohn gregschohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the input to what goes into the transformer. I'd like future readers to see where URI is defined and to understand the general structure...

I have some significant concerns as to the approach now that I'm carefully going through this and thinking of the dev/user experience. Some of the concerns are issues for the next corner (but maybe the very next corner, w/ genAI and stable ids coming into the mix), but it's useful to consider them to think about how this routing feature will interplay as the solution matures.

Concerns

  1. We run transformations w/in a netty event loop. Today, transformations are assumed to be non-blocking and non-preemptive. How would we make them, at least in some modes, run asynchronously.
  2. Today, we pass a java map into a transformation - tomorrow it may need to be something more like a stream (File Channel, etc). I suspect that we’ll have a few layers of an onion here to help to facilitate between java-json interfaces and say purely text ones (Python, etc)
  3. Predicates shouldn’t be able to make changes to the input. It will be a miserable user experience if a predicate DOES make a change (inadvertently) and the next body or predicate is thrown off.

W/ those future changes, how can we minimize their impact to basic high-level control flow? We need to eventually make our interfaces readonly/text/streams…
If we did that, setting up lots of separate parses (potentially) within each predicate function would be wasteful.

I can easily get the parse-passes down to two if I have a Document -> Label function (like a switch statement), but I’m not sure that I can do better w/out making the interfaces much more difficult. This is a tradeoff between the complexity of pulling everything together and defining things in a limited number of places. Running a whole bunch of boolean predicates w/ an entire document could be really expensive for the utility that users are getting. It might also be a lot more for them to manage too - could it be easier if they just defined one routing transform?

TrafficCapture/trafficReplayer/build.gradle Outdated Show resolved Hide resolved

Expression<Object> expression;

public JsonJMESPathPrecondition(BaseRuntime<Object> runtime, String script) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that it's pretty minor, but why not extend the existing JsonJMESPathTransformer? That would keep the implementations in sync until we decided that they shouldn't be (which, hopefully, is never).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying to wrap the script as the value in a transformer and get the value back out?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was one thought, but I didn't like that one in the end. Another option could be to define both of these methods in the same interface and use the same loading functionality for both. I haven't thought of that too much. I just didn't like seeing the parallel universes developing.

+ "to be a Map<String,Object> or a List<Map<String,Object>>. "
+ "Each of the Maps should have one key-value of \"script\": \"...\". "
+ "Script values should be a fully-formed inlined JsonPath queries encoded as a json value. "
+ "All of the values within a configuration will be concatenated into one chained precondition.";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this last line mean? "all of the values within a configuration..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We create a JsonCompositePrecondition with a CompositeOperation of ALL. so if a list is passed in, we'll check the AND of all the preconditions



public interface IJsonPrecondition {
boolean evaluatePrecondition(Map<String, Object> incomingJson);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment - drop the Precondition part.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you meaning to have this be boolean evaluate(...?
I intentionally differentiated the name to allow for multiple inheritance of interfaces in the case IJsonTransformer defined `Map<String, Object> evaluate(Map<String, Object> incomingJson);

I'll rename this to predicate

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this should follow a Predicate pattern. IJsonPredicate::test(thing) -> boolean

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with

public interface IJsonPredicate extends Predicate<Map<String, Object>>


import lombok.NonNull;

public class JsonCompositePrecondition implements IJsonPrecondition {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a good bit of replication and I'm not sure that it's worth it. Still thinking though.

Comment on lines 30 to 35
case ALL:
return preconditions.allMatch(predicate);
case ANY:
return preconditions.anyMatch(predicate);
case NONE:
return preconditions.noneMatch(predicate);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would the use case be of using these instead of putting those commands in the predicate code itself?

Copy link
Member Author

@AndreKurait AndreKurait Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have it now, but if we had JOLT Preconditions, then you could chain on both of them needing to hold. If this is a sticking point, i can remove this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we could have a jolt precondition, I'd want to redo this entire PR.
A jolt transform to change the input whereas a JMESPath one cannot

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put a pin on non-JMESPath predicates and restrict ourselves to this for the meantime

@AndreKurait AndreKurait changed the title Implement Transformation JMES Path Preconditions Implement Transformation JMES Path Predicates Oct 24, 2024
@AndreKurait AndreKurait merged commit 4769ba4 into opensearch-project:main Oct 28, 2024
13 of 14 checks passed
@AndreKurait AndreKurait deleted the JmesPathPreconditions branch October 28, 2024 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants