-
Notifications
You must be signed in to change notification settings - Fork 713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The columns of A don't match the number of elements of x. A: 768, x: 1536 #14368
Comments
thanks @SidWeng, we will look into this |
@maziyarpanahi I found the root cause but I'm guessing it is not a bug, please take a look |
Hi @SidWeng Yes, that's exactly the root cause. We are working on adding a parameter to |
I totally missed that you are using Until we implement a simple averaging to put everything together, here are a few options:
|
Discussed in #14362
Originally posted by SidWeng August 8, 2024
I use the following pipeline with BioBERT Sentence Embeddings.
However, it throws
The columns of A don't match the number of elements of x. A: 768, x: 1536
when execute pipeline.fit(). I trace the code and find out the dimension ofrandMatrix
used byBucketedRandomProjectLSHModel
is determined byDatasetUtils.getNumFeatures()
.Does it imply something wrong with the data I feed into fit() ? The data I feed is a DataFrame with a String column code and a String column text. The longest length of text is 229.
24/08/08 03:19:13.581 [task-result-getter-3] WARN o.a.spark.scheduler.TaskSetManager - Lost task 7.2 in stage 10.0 (TID 370) (10.0.0.12 executor 4): org.apache.spark.SparkException: Failed to execute user defined function (LSHModel$$Lambda$5263/1056329262: (struct<type:tinyint,size:int,indices:array,values:array>) => array<struct<type:tinyint,size:int,indices:array,values:array>>)
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:177)
at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.serializefromobject_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:32)
at org.sparkproject.guava.collect.Ordering.leastOf(Ordering.java:670)
at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
at org.apache.spark.rdd.RDD.$anonfun$takeOrdered$2(RDD.scala:1539)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:855)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:855)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.IllegalArgumentException: requirement failed: The columns of A don't match the number of elements of x. A: 768, x: 1536
at scala.Predef$.require(Predef.scala:281)
at org.apache.spark.ml.linalg.BLAS$.gemv(BLAS.scala:579)
at org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction(BucketedRandomProjectionLSH.scala:87)
at org.apache.spark.ml.feature.LSHModel.$anonfun$transform$1(LSH.scala:99)
... 22 more
The text was updated successfully, but these errors were encountered: