Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check before delete #3209

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

xinyual
Copy link
Collaborator

@xinyual xinyual commented Nov 11, 2024

Description

This pr is to check all downstream service before deleting ml model. Two downstream tasks are checking here:

  1. Agent
  2. Pipelines

For agent, we enforce this tool factory to implement a method to return the key field name of each tool. Then we create a should DSL query like

{
  "query": {
  "should": [
  "terms": {
            "tool.parameters.model_id": ["delete_model_id"]
        },
    "terms": {
            "tool.parameters.embedding_model_id": ["delete_model_id"]
        }
  },
  "terms": {
            "tool.parameters.inference_model_id": ["delete_model_id"]
        }
  }]
}

For pipelines, we fetch all ingestion pipelines and search pipelines and check for each pipeline whether they contain the candidate model id.

Related Issues

#3191
#3087
#3088

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
@xinyual xinyual marked this pull request as ready for review November 12, 2024 07:15
Signed-off-by: xinyual <[email protected]>
public class RelatedModelIdHelper {
private Map<String, List<String>> relatedModelIdMap;

public RelatedModelIdHelper(Map<String, Tool.Factory> ToolFactories) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

camel case parameter name should start with lower case: ToolFactories.

import org.opensearch.ml.common.spi.tools.Tool;
import org.opensearch.search.builder.SearchSourceBuilder;

public class RelatedModelIdHelper {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better name would be AgentModelsSearcher

}

@Override
protected void doExecute(Task task, ActionRequest request, ActionListener<DeleteResponse> actionListener) {
MLModelDeleteRequest mlModelDeleteRequest = MLModelDeleteRequest.fromActionRequest(request);
String modelId = mlModelDeleteRequest.getModelId();
try (ThreadContext.StoredContext context = client.threadPool().getThreadContext().stashContext()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the search agent models logic to a centralized place e.g. AgentModelsSearcher, and encapsulate them to return a boolean value to identify if anywhere using the model id.

try (ThreadContext.StoredContext context = client.threadPool().getThreadContext().stashContext()) {
// check whether agent are using them
SearchRequest searchAgentRequest = relatedModelIdHelper.constructQueryRequest(modelId);
client.search(searchAgentRequest, ActionListener.runBefore(ActionListener.wrap(searchResponse -> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of querying agent, pipelines in a sequential manner, you can divide them to multiple single method responsible for single search and execute them parallel, each method can have a reference of CountDownLatch instance to coordinate them, refer: https://github.com/opensearch-project/ml-commons/blob/main/plugin/src/main/java/org/opensearch/ml/action/models/DeleteModelTransportAction.java#L243

private <T> Boolean isPipelineContainsModel(
List<T> pipelineConfigurations,
String candidateModelId,
Function<T, Map<String, Object>> getConfigFunction
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a simple way to extract the configuration map from the responses, then change this method to accept Map<String, Object> as parameter?

@@ -138,6 +139,11 @@ public String getDefaultType() {
public String getDefaultVersion() {
return null;
}

@Override
public List<String> getRelatedModelIDKeyFields() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better name could be getAllModelKeys

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants