Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.NullPointerException: Null value appeared in non-nullable field #1286

Open
wycccccc opened this issue Dec 14, 2022 · 6 comments · May be fixed by #1471
Open

java.lang.NullPointerException: Null value appeared in non-nullable field #1286

wycccccc opened this issue Dec 14, 2022 · 6 comments · May be fixed by #1471
Assignees

Comments

@wycccccc
Copy link
Collaborator

以下是我還在抓問題,他會固定出現在某幾個檔案中例如20190619。這是scala的int不能爲non-nullable導致的?還希望學長可以給我一些建議。

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 90 in stage 1.0 failed 1 times, most recent failure: Lost task 90.0 in stage 1.0 (TID 1061) (9c3c08e01993 executor driver): java
.lang.NullPointerException: Null value appeared in non-nullable field:                                                                                                                                             
                                                                                                                                                                                                                   
If the schema is inferred from a Scala tuple/case class, or a Java bean, please try to use scala.Option[_] or other nullable types (e.g. java.lang.Integer instead of int/scala.Int).                              
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)                                                                                                 
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)                                                                                                 
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)                                                                                                                                             
        at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$1(WriteToDataSourceV2Exec.scala:416)                                                                                   
        at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)                                                                                                                    
        at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:452)                                                                                              
        at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:360)                                                                                
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)                                                                                                                                      
        at org.apache.spark.scheduler.Task.run(Task.scala:131)                                                                                                                                                     
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)                                                                                                                        
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)                                                                                                                                       
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)                                                                                                                                   
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)                                                                                                               
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)                                                                                                               
        at java.base/java.lang.Thread.run(Thread.java:829)  

Driver stacktrace:                                                                                                                                                                                                 
        at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2258)                                                                                                            
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2207)                                                                                                                  
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2206)                                                                                                          
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)                                                                                                                                
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)                                                                                                                               
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)                                                                                                                                      
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2206)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1079)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1079)
        at scala.Option.foreach(Option.scala:407)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1079)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2445)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2387)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2376)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:868)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2196)
        at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:357)
        ... 40 more

Caused by: java.lang.NullPointerException: Null value appeared in non-nullable field:

If the schema is inferred from a Scala tuple/case class, or a Java bean, please try to use scala.Option[_] or other nullable types (e.g. java.lang.Integer instead of int/scala.Int).
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
        at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$1(WriteToDataSourceV2Exec.scala:416)
        at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
        at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:452)
        at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:360)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

@chia7712
Copy link
Contributor

這是scala的int不能爲non-nullable導致的

這蠻有可能的,現在測試的資料應該是還沒指定型別,你是測試哪批資料的時候有指定嗎

@wycccccc
Copy link
Collaborator Author

wycccccc commented Dec 15, 2022

這蠻有可能的,現在測試的資料應該是還沒指定型別,你是測試哪批資料的時候有指定嗎

現在的測試都是指定型別的,在配置properties時通過如下內容指定每個col的型別
column.name=TIMESTAMP\=timestamp,RECORD\=string,wind_vane1\=integer,wind_vane2\=double,wind_vane3\=double,wind_vane4\=double,Wind_SPD_Hub\=double,wind_speed1\=double,wind_speed2\=double,wind_speed3\=double,wind_speed4\=double
測試的資料就是學長給我的那批,我再在型別問題上研究一下

@chia7712
Copy link
Contributor

現在的測試都是指定型別的,在配置properties時通過如下內容指定每個col的型別
column.name=TIMESTAMP=timestamp,RECORD=string,wind_vane1=integer,wind_vane2=double,wind_vane3=double,wind_vane4=double,Wind_SPD_Hub=double,wind_speed1=double,wind_speed2=double,wind_speed3=double,wind_speed4=double
測試的資料就是學長給我的那批,我再在型別問題上研究一下

這個錯誤是基於最新的code產生的嗎

@wycccccc
Copy link
Collaborator Author

這個錯誤是基於最新的code產生的嗎

我想我應該能夠確定錯誤了,是由於我們使用了自己的jackson轉換導致的。使用該方法會導致需要對col value 進行 deserialize。而Object可以爲null,deserialize回scala long就不能爲null了。我再開個pr處理一下這個問題

@wycccccc
Copy link
Collaborator Author

有新的問題想要討論一下,既然我們最後都會轉成json,我感覺創建schema時,不需要根據type去創建,統一創建成StringType就好。這樣就可以避免上述問題,因爲scala String 是nullable的。
因爲要處理上述問題似乎需要把dataFrame轉成dataSet,但是對於直接讀properties來定義schema這不太好轉(沒法定義轉dataSet 時需要的case class)。還是有什麼簡潔的方式可以處理schema中型別爲optional的方式。

@chia7712
Copy link
Contributor

有新的問題想要討論一下,既然我們最後都會轉成json,我感覺創建schema時,不需要根據type去創建,統一創建成StringType就好。這樣就可以避免上述問題,因爲scala String 是nullable的

可以都用字串,我們這段驗收不會處理其他型別

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants