-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#26 add create checkpoint function #79
Conversation
3af40ab
to
e56e1ed
Compare
JaCoCo agent module code coverage report - spark:2 - scala 2.12.12
|
JaCoCo server module code coverage report - scala 2.12.12
|
e56e1ed
to
2068feb
Compare
2068feb
to
1d08ad8
Compare
…nt-creation-function # Conflicts: # agent/src/main/scala/za/co/absa/atum/agent/AtumContext.scala # agent/src/test/scala/za/co/absa/atum/agent/AtumContextTest.scala
… unit tests, refactoring & simplification
…nt-creation-function # Conflicts: # agent/src/main/scala/za/co/absa/atum/agent/AtumContext.scala # agent/src/main/scala/za/co/absa/atum/agent/model/Checkpoint.scala
* @param atumContext Contains the calculations to be done and publish the result | ||
* @return | ||
*/ | ||
def createCheckpoint(checkpointName: String, author: String)(implicit atumContext: AtumContext): DataFrame = { | ||
// todo: implement checkpoint creation | ||
def createAndSaveCheckpoint(checkpointName: String, author: String)(implicit atumContext: AtumContext): DataFrame = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the rename? I would just keep the existing name.
The saving is kind of implicit, plus mentioned in the comment.
author: String, | ||
measurements: Seq[Measurement] | ||
): Checkpoint = { | ||
def createAndSaveCheckpoint(checkpointName: String, author: String, dataToMeasure: DataFrame): AtumContext = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really keep simple. The "saving" can be implied easily from the fact that it doesn't return the checkpoint. And the function name can be interpreted "create checkpoint in DB"
def createAndSaveCheckpoint(checkpointName: String, author: String, dataToMeasure: DataFrame): AtumContext = { | |
def createCheckpoint(checkpointName: String, author: String, dataToMeasure: DataFrame): AtumContext = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, will change this consistently on all places & document
@@ -62,6 +88,16 @@ class AtumContext private[agent] ( | |||
) | |||
} | |||
|
|||
def createAndSaveCheckpointOnProvidedData( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def createAndSaveCheckpointOnProvidedData( | |
def createCheckpointFromProvidedValues( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
@@ -62,6 +88,16 @@ class AtumContext private[agent] ( | |||
) | |||
} | |||
|
|||
def createAndSaveCheckpointOnProvidedData( | |||
checkpointName: String, author: String, measurements: Seq[Measurement] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non blocking
The Seq[Measurement]
is not the best data type here imho.
- harder to work with
- no check on content (same measures
But let's address it in a separate ticket
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed to potentially change this once we'll be refactoring the measurement builder & Measure class - all this will be changed to some degree (some changes will be more brutal) in #98
|
||
package za.co.absa.atum.agent.exception | ||
|
||
case class MeasurementProvidedException(msg: String) extends Exception(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor
Would have a base AtumException
class (maybe even in model module?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100%
@@ -33,85 +34,118 @@ sealed trait Measure extends MeasurementProcessor with MeasureType { | |||
trait MeasureType { | |||
val measureName: String | |||
val onlyForNumeric: Boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this flag. It's kind of problematci.
Can an avg
be run on Date, Timestamp etc? Probably yes.
On the other hand sum on Timestamp doesn't really make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, avg
can be for non-numeric, so it would be false
, and sum
probably doesn't make sense so would be true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so basically if it's false
then no check would be performed, if it's true
then numerical check would be performed-just a general one basically, but if this is a bad idea I can drop it. I was actually inspired by original Atum library, it had this there in a very similar fashion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it's not used, and eventually only part of validation, I would remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh okay, I can remove it here
…nt-creation-function # Conflicts: # agent/src/main/scala/za/co/absa/atum/agent/AtumContext.scala
@@ -33,85 +34,118 @@ sealed trait Measure extends MeasurementProcessor with MeasureType { | |||
trait MeasureType { | |||
val measureName: String | |||
val onlyForNumeric: Boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it's not used, and eventually only part of validation, I would remove it.
/** | ||
* When the application/user of Atum Agent provides actual results by himself, the type is precise and we don't need | ||
* to do any adjustments. | ||
*/ | ||
case class MeasurementProvided[T](measure: Measure, resultValue: T, resultType: ResultValueType.ResultValueType) | ||
extends Measurement | ||
|
||
object MeasurementProvided { | ||
|
||
def handleSpecificType[T]( | ||
measure: Measure, resultValue: T, requiredType: ResultValueType.ResultValueType | ||
): MeasurementProvided[T] = { | ||
|
||
val actualType = measure.resultValueType | ||
if (actualType != requiredType) | ||
throw MeasurementProvidedException( | ||
s"Type of a given provided measurement result and type that a given measure supports are not compatible! " + | ||
s"Got $actualType but should be $requiredType" | ||
) | ||
MeasurementProvided[T](measure, resultValue, requiredType) | ||
} | ||
|
||
def apply[T](measure: Measure, resultValue: T): Measurement = { | ||
resultValue match { | ||
case l: Long => | ||
handleSpecificType[Long](measure, l, ResultValueType.Long) | ||
case d: Double => | ||
handleSpecificType[Double](measure, d, ResultValueType.Double) | ||
case bd: BigDecimal => | ||
handleSpecificType[BigDecimal](measure, bd, ResultValueType.BigDecimal) | ||
case s: String => | ||
handleSpecificType[String](measure, s, ResultValueType.String) | ||
|
||
case unsupportedType => | ||
val className = unsupportedType.getClass.getSimpleName | ||
throw MeasurementProvidedException( | ||
s"Unsupported type of measurement for measure ${measure.measureName}: $className " + | ||
s"for provided result: $resultValue" | ||
) | ||
} | ||
} | ||
} | ||
|
||
/** | ||
* When the Atum Agent itself performs the measurements, using Spark, then in some cases some adjustments are | ||
* needed - thus we are converting the results to strings always - but we need to keep the information about | ||
* the actual type as well. | ||
*/ | ||
case class MeasurementByAtum(measure: Measure, resultValue: String, resultType: ResultValueType.ResultValueType) | ||
extends Measurement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need these two classes?
If we do, please put them into a companion object to keep the in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes they are needed; why - what benefit would it have? I think that in this case it would only slightly complicate their usages - if the only reason is encapsulation, then they are, after all, in their own module 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am old school 👼 , don't like top level classes present in a file named by another class.
Also they are Measurement
descendants, so I see no problem to hide them inside.
Usage is not an issue with import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- code reviewed
- pulled
- built
- run
Work done mostly on Agent side:
functionName
->measureName
to be consistent across the projectCloses #26