-
Notifications
You must be signed in to change notification settings - Fork 6
importing 146 GB faiss ivf flat index fails after 40% #87
Comments
Also, |
the operation fails after the datanode on server1 reach too much memory and gets evicted |
Hello user, I need to confirm the following information, please provide it:
|
hi |
Hello @gland1 , please use the following command to capture the memory information of the datanode when its memory usage is high: go tool pprof {datanode_ip}:9091/debug/pprof/heap After execution, you should see the generation of a pprof file, like: Just provide the generated pprof file. |
|
it will take some time to reach this state as I now try loading the the dataset by inserts (btw - this also fails after a while |
|
I've tried to recreate - |
@lentitude2tk please try to reproduce this in house and see what we can improve |
OK,I will find the relevant personnel to try and reproduce this issue with this data volume in-house. |
@gland1 Could you please let us know if you are using a public dataset on wiki? If so, could you provide us with the link? Additionally, could you share the migration.yaml configuration you are using with milvus-migration (sensitive information can be ignored)? We will reproduce the issue you encountered locally and work on resolving it. |
@lentitude2tk |
User feedback: "hanging_at_70%". For version 2.4, 70% indicates that bulkInsert has been completed and it is currently in the process of building the index. Therefore, the issue lies in why buildIndex is hanging. |
hi This is the full milvus yaml I'm using:
This is the migration yaml: target: # configs for the target Milvus collection. As for the dataset - we curved 50M from the 88M wiki-all nvidia dataset available at: |
Any Idea how can I stop the migration ? |
segment maxSize is too large in the configuration. 1024MB is the recommended size. |
If the segment is too large, there will be too many binlog files, and some atomic operations cannot be completed. In addition, 80*4 = 320GB+ of memory is required when building the index. |
I'm using diskAnn index, so it should require less memory to build Is it possible to stop the migration ? |
If your target collection is a poc testing collection, you can choose to delete your collection, thus causing the entire migration task to fail |
@gland1 Why is it necessary to have as few segments as possible when investigating horizontal scaling? Large segment would result in many side effects, as detailed here: milvus-io/milvus#33808 (comment) |
Current Behavior
Deployed milvus operator on 3 servers
tried to import faiss ivf flat index(from 200M wiki dataset) size 146Gb
Failed due to max file size 16G.
Increased maxfile size to 1024G
Tries again and failed after 40% done.
This is the error shown:
[2024/05/31 18:46:22.983 +03:00] [ERROR] [dbclient/milvus2x.go:206] ["[Loader] Check Milvus bulkInsertState Error"] [error="rpc error: code = Unknown desc = stack trace: /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:556 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:570 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/client/client.go:107 github.com/milvus-io/milvus/internal/distributed/datacoord/client.wrapGrpcCall[...]\n/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/client/client.go:737 github.com/milvus-io/milvus/internal/distributed/datacoord/client.(*Client).GetImportProgress\n/go/src/github.com/milvus-io/milvus/internal/proxy/impl.go:6071 github.com/milvus-io/milvus/internal/proxy.(*Proxy).GetImportProgress\n/go/src/github.com/milvus-io/milvus/internal/proxy/impl.go:4649 github.com/milvus-io/milvus/internal/proxy.(*Proxy).GetImportState\n/go/src/github.com/milvus-io/milvus/internal/distributed/proxy/service.go:1018 github.com/milvus-io/milvus/internal/distributed/proxy.(*Server).GetImportState\n/go/pkg/mod/github.com/milvus-io/milvus-proto/go-api/[email protected]/milvuspb/milvus.pb.go:13136 github.com/milvus-io/milvus-proto/go-api/v2/milvuspb._MilvusService_GetImportState_Handler.func1\n/go/src/github.com/milvus-io/milvus/internal/proxy/connection/util.go:60 github.com/milvus-io/milvus/internal/proxy/connection.KeepActiveInterceptor: empty grpc client: find no available datacoord, check datacoord state"] [stack="github.com/zilliztech/milvus-migration/core/dbclient.(*Milvus2x).WaitBulkLoadSuccess\n\t/home/runner/work/milvus-migration/milvus-migration/core/dbclient/milvus2x.go:206\ngithub.com/zilliztech/milvus-migration/core/loader.(*Milvus2xLoader).loadDataOne\n\t/home/runner/work/milvus-migration/milvus-migration/core/loader/milvus2x_loader.go:198\ngithub.com/zilliztech/milvus-migration/core/loader.(*Milvus2xLoader).loadDataBatch.func1\n\t/home/runner/work/milvus-migration/milvus-migration/core/loader/milvus2x_loader.go:180\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/home/runner/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75"]
load error: rpc error: code = Unknown desc = stack trace: /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace
/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:556 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call
/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:570 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall
/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/client/client.go:107 github.com/milvus-io/milvus/internal/distributed/datacoord/client.wrapGrpcCall[...]
/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/client/client.go:737 github.com/milvus-io/milvus/internal/distributed/datacoord/client.(*Client).GetImportProgress
/go/src/github.com/milvus-io/milvus/internal/proxy/impl.go:6071 github.com/milvus-io/milvus/internal/proxy.(*Proxy).GetImportProgress
/go/src/github.com/milvus-io/milvus/internal/proxy/impl.go:4649 github.com/milvus-io/milvus/internal/proxy.(*Proxy).GetImportState
/go/src/github.com/milvus-io/milvus/internal/distributed/proxy/service.go:1018 github.com/milvus-io/milvus/internal/distributed/proxy.(*Server).GetImportState
/go/pkg/mod/github.com/milvus-io/milvus-proto/go-api/[email protected]/milvuspb/milvus.pb.go:13136 github.com/milvus-io/milvus-proto/go-api/v2/milvuspb._MilvusService_GetImportState_Handler.func1
/go/src/github.com/milvus-io/milvus/internal/proxy/connection/util.go:60 github.com/milvus-io/milvus/internal/proxy/connection.KeepActiveInterceptor: empty grpc client: find no available datacoord, check datacoord state
Expected Behavior
migration should succeed
Steps To Reproduce
Environment
Anything else?
No response
The text was updated successfully, but these errors were encountered: