Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#26 add create checkpoint function #79

Merged
merged 15 commits into from
Nov 1, 2023

Conversation

cerveada
Copy link
Collaborator

@cerveada cerveada commented Sep 25, 2023

Work done mostly on Agent side:

  • Lot of refactorization and simplifications
  • Adding proper scope (protected / private) of methods and classes
  • Finished Checkpoint creation - on Agent as well as Provided data measurements
  • Adding a result type for each Measure and implementing validation of this, especially for measurements provided by user
  • Slightly changing Agent's internal data model
  • Changed functionName -> measureName to be consistent across the project
  • Lots of unit tests covering all these

Closes #26

@github-actions
Copy link

github-actions bot commented Sep 25, 2023

JaCoCo agent module code coverage report - spark:2 - scala 2.12.12

File Coverage [83.26%] 🍏
MeasurementBuilder.scala 100% 🍏
ConsoleDispatcher.scala 100% 🍏
AtumAgentException.scala 100% 🍏
AtumContext.scala 97.03% 🍏
Measurement.scala 86.67% 🍏
MeasuresBuilder.scala 85.19% 🍏
Measure.scala 84.75% 🍏
AtumAgent.scala 83.22% 🍏
MeasurementProcessor.scala 68.75%
HttpDispatcher.scala 0%
Total Project Coverage 83.26% 🍏

@github-actions
Copy link

JaCoCo server module code coverage report - scala 2.12.12

There is no coverage information present for the Files changed

Total Project Coverage 12.45%

lsulak
lsulak previously approved these changes Oct 10, 2023
salamonpavel
salamonpavel previously approved these changes Oct 10, 2023
@cerveada cerveada dismissed stale reviews from salamonpavel and lsulak via 2068feb October 13, 2023 08:20
@cerveada cerveada force-pushed the feature/26-checkpoint-creation-function branch from e56e1ed to 2068feb Compare October 13, 2023 08:20
@cerveada cerveada force-pushed the feature/26-checkpoint-creation-function branch from 2068feb to 1d08ad8 Compare October 20, 2023 14:24
cerveada and others added 8 commits October 20, 2023 16:49
…nt-creation-function

# Conflicts:
#	agent/src/main/scala/za/co/absa/atum/agent/AtumContext.scala
#	agent/src/test/scala/za/co/absa/atum/agent/AtumContextTest.scala
…nt-creation-function

# Conflicts:
#	agent/src/main/scala/za/co/absa/atum/agent/AtumContext.scala
#	agent/src/main/scala/za/co/absa/atum/agent/model/Checkpoint.scala
@benedeki benedeki changed the title #29 add create checkpoint function #26 add create checkpoint function Oct 27, 2023
* @param atumContext Contains the calculations to be done and publish the result
* @return
*/
def createCheckpoint(checkpointName: String, author: String)(implicit atumContext: AtumContext): DataFrame = {
// todo: implement checkpoint creation
def createAndSaveCheckpoint(checkpointName: String, author: String)(implicit atumContext: AtumContext): DataFrame = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the rename? I would just keep the existing name.
The saving is kind of implicit, plus mentioned in the comment.

author: String,
measurements: Seq[Measurement]
): Checkpoint = {
def createAndSaveCheckpoint(checkpointName: String, author: String, dataToMeasure: DataFrame): AtumContext = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really keep simple. The "saving" can be implied easily from the fact that it doesn't return the checkpoint. And the function name can be interpreted "create checkpoint in DB"

Suggested change
def createAndSaveCheckpoint(checkpointName: String, author: String, dataToMeasure: DataFrame): AtumContext = {
def createCheckpoint(checkpointName: String, author: String, dataToMeasure: DataFrame): AtumContext = {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, will change this consistently on all places & document

@@ -62,6 +88,16 @@ class AtumContext private[agent] (
)
}

def createAndSaveCheckpointOnProvidedData(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def createAndSaveCheckpointOnProvidedData(
def createCheckpointFromProvidedValues(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -62,6 +88,16 @@ class AtumContext private[agent] (
)
}

def createAndSaveCheckpointOnProvidedData(
checkpointName: String, author: String, measurements: Seq[Measurement]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non blocking
The Seq[Measurement] is not the best data type here imho.

  • harder to work with
  • no check on content (same measures

But let's address it in a separate ticket

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed to potentially change this once we'll be refactoring the measurement builder & Measure class - all this will be changed to some degree (some changes will be more brutal) in #98


package za.co.absa.atum.agent.exception

case class MeasurementProvidedException(msg: String) extends Exception(msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor
Would have a base AtumException class (maybe even in model module?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100%

@@ -33,85 +34,118 @@ sealed trait Measure extends MeasurementProcessor with MeasureType {
trait MeasureType {
val measureName: String
val onlyForNumeric: Boolean
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this flag. It's kind of problematci.
Can an avg be run on Date, Timestamp etc? Probably yes.
On the other hand sum on Timestamp doesn't really make sense.

Copy link
Collaborator

@lsulak lsulak Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, avg can be for non-numeric, so it would be false, and sum probably doesn't make sense so would be true

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so basically if it's false then no check would be performed, if it's true then numerical check would be performed-just a general one basically, but if this is a bad idea I can drop it. I was actually inspired by original Atum library, it had this there in a very similar fashion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it's not used, and eventually only part of validation, I would remove it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh okay, I can remove it here

@@ -33,85 +34,118 @@ sealed trait Measure extends MeasurementProcessor with MeasureType {
trait MeasureType {
val measureName: String
val onlyForNumeric: Boolean
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it's not used, and eventually only part of validation, I would remove it.

Comment on lines 28 to 77
/**
* When the application/user of Atum Agent provides actual results by himself, the type is precise and we don't need
* to do any adjustments.
*/
case class MeasurementProvided[T](measure: Measure, resultValue: T, resultType: ResultValueType.ResultValueType)
extends Measurement

object MeasurementProvided {

def handleSpecificType[T](
measure: Measure, resultValue: T, requiredType: ResultValueType.ResultValueType
): MeasurementProvided[T] = {

val actualType = measure.resultValueType
if (actualType != requiredType)
throw MeasurementProvidedException(
s"Type of a given provided measurement result and type that a given measure supports are not compatible! " +
s"Got $actualType but should be $requiredType"
)
MeasurementProvided[T](measure, resultValue, requiredType)
}

def apply[T](measure: Measure, resultValue: T): Measurement = {
resultValue match {
case l: Long =>
handleSpecificType[Long](measure, l, ResultValueType.Long)
case d: Double =>
handleSpecificType[Double](measure, d, ResultValueType.Double)
case bd: BigDecimal =>
handleSpecificType[BigDecimal](measure, bd, ResultValueType.BigDecimal)
case s: String =>
handleSpecificType[String](measure, s, ResultValueType.String)

case unsupportedType =>
val className = unsupportedType.getClass.getSimpleName
throw MeasurementProvidedException(
s"Unsupported type of measurement for measure ${measure.measureName}: $className " +
s"for provided result: $resultValue"
)
}
}
}

/**
* When the Atum Agent itself performs the measurements, using Spark, then in some cases some adjustments are
* needed - thus we are converting the results to strings always - but we need to keep the information about
* the actual type as well.
*/
case class MeasurementByAtum(measure: Measure, resultValue: String, resultType: ResultValueType.ResultValueType)
extends Measurement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these two classes?
If we do, please put them into a companion object to keep the in this file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they are needed; why - what benefit would it have? I think that in this case it would only slightly complicate their usages - if the only reason is encapsulation, then they are, after all, in their own module 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am old school 👼 , don't like top level classes present in a file named by another class.
Also they are Measurement descendants, so I see no problem to hide them inside.
Usage is not an issue with import.

Copy link
Contributor

@benedeki benedeki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • code reviewed
  • pulled
  • built
  • run

@lsulak lsulak merged commit 0942d07 into master Nov 1, 2023
5 of 6 checks passed
@lsulak lsulak deleted the feature/26-checkpoint-creation-function branch November 1, 2023 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create functions to create a checkpoint
5 participants