Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixer validator #215

Merged
merged 24 commits into from
Oct 21, 2024
Merged

Mixer validator #215

merged 24 commits into from
Oct 21, 2024

Commits on Sep 10, 2024

  1. Adding script that validates if mixer config is well formated and has…

    … everything in place
    Masha Iureva authored and Masha Iureva committed Sep 10, 2024
    Configuration menu
    Copy the full SHA
    b724d5f View commit details
    Browse the repository at this point in the history

Commits on Sep 13, 2024

  1. Add S3 path validation with boto3 existence check

    Masha Iureva authored and Masha Iureva committed Sep 13, 2024
    Configuration menu
    Copy the full SHA
    1954932 View commit details
    Browse the repository at this point in the history

Commits on Sep 16, 2024

  1. Adding check of the files, trying to run jq expressions on them and s…

    …ee if both files and jq expressions are valid
    Masha Iureva authored and Masha Iureva committed Sep 16, 2024
    Configuration menu
    Copy the full SHA
    9ebe5f1 View commit details
    Browse the repository at this point in the history

Commits on Sep 25, 2024

  1. Add S3 path validation, sampling, and doc-attribute alignment checks

    Masha Iureva authored and Masha Iureva committed Sep 25, 2024
    Configuration menu
    Copy the full SHA
    67adadf View commit details
    Browse the repository at this point in the history
  2. adding logic to split jsonpath expressions into pieces and check them

    Masha Iureva authored and Masha Iureva committed Sep 25, 2024
    Configuration menu
    Copy the full SHA
    4391805 View commit details
    Browse the repository at this point in the history
  3. Added JsonPath syntax evaluation, started working on sampling docs an…

    …d checking their content
    Masha Iureva authored and Masha Iureva committed Sep 25, 2024
    Configuration menu
    Copy the full SHA
    2885e7e View commit details
    Browse the repository at this point in the history

Commits on Sep 27, 2024

  1. Adding logic to check if all doc and corresponding attributes files c…

    …ontain correct fields and same anount of lines
    Masha Iureva authored and Masha Iureva committed Sep 27, 2024
    Configuration menu
    Copy the full SHA
    82920f8 View commit details
    Browse the repository at this point in the history

Commits on Oct 3, 2024

  1. Adding functionality to check if filters in config and attribute file…

    …s match
    Masha Iureva authored and Masha Iureva committed Oct 3, 2024
    Configuration menu
    Copy the full SHA
    8745c8d View commit details
    Browse the repository at this point in the history
  2. updating filter checking logic to focus on filters missing from the m…

    …ixer config
    Masha Iureva authored and Masha Iureva committed Oct 3, 2024
    Configuration menu
    Copy the full SHA
    564cee6 View commit details
    Browse the repository at this point in the history

Commits on Oct 7, 2024

  1. adding ligic to run jq and jsonpath filters on small set of docs to s…

    …ee if they work or fail
    Masha Iureva authored and Masha Iureva committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    5e7e3d4 View commit details
    Browse the repository at this point in the history

Commits on Oct 9, 2024

  1. refactored to use smart open and added logic to download sample files…

    … to a temp folder
    Masha Iureva authored and Masha Iureva committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    445bfef View commit details
    Browse the repository at this point in the history

Commits on Oct 11, 2024

  1. added logic to sample lines from doc and apply filters to it, refacto…

    …red main, added logic to download sample files and work with them locally
    Masha Iureva authored and Masha Iureva committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    83481ac View commit details
    Browse the repository at this point in the history
  2. Adding clean up logic to delete sample files after the run

    Masha Iureva authored and Masha Iureva committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    890de88 View commit details
    Browse the repository at this point in the history
  3. Merge branch 'main' of https://github.com/allenai/dolma into mixer-va…

    …lidator
    Masha Iureva authored and Masha Iureva committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    50763bd View commit details
    Browse the repository at this point in the history

Commits on Oct 14, 2024

  1. adding test configs for mixer validator

    Masha Iureva authored and Masha Iureva committed Oct 14, 2024
    Configuration menu
    Copy the full SHA
    001fd04 View commit details
    Browse the repository at this point in the history
  2. fixing bug in test configs

    Masha Iureva authored and Masha Iureva committed Oct 14, 2024
    Configuration menu
    Copy the full SHA
    15a7104 View commit details
    Browse the repository at this point in the history

Commits on Oct 16, 2024

  1. addressing comments, spliting script into smaller files, moving test …

    …configs to test folder, adding a couple of helpers functions
    Masha Iureva authored and Masha Iureva committed Oct 16, 2024
    Configuration menu
    Copy the full SHA
    b740e45 View commit details
    Browse the repository at this point in the history

Commits on Oct 17, 2024

  1. adding --verbose method, support of .env variables

    Masha Iureva authored and Masha Iureva committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    c1708e2 View commit details
    Browse the repository at this point in the history
  2. supporting != operator

    Masha Iureva authored and Masha Iureva committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    2aca6a3 View commit details
    Browse the repository at this point in the history

Commits on Oct 21, 2024

  1. updating types in function definitions, updating Readme

    Masha Iureva authored and Masha Iureva committed Oct 21, 2024
    Configuration menu
    Copy the full SHA
    d10de44 View commit details
    Browse the repository at this point in the history
  2. fixing a bug in readme

    Masha Iureva authored and Masha Iureva committed Oct 21, 2024
    Configuration menu
    Copy the full SHA
    e59c64b View commit details
    Browse the repository at this point in the history
  3. adding more error handlers

    Masha Iureva authored and Masha Iureva committed Oct 21, 2024
    Configuration menu
    Copy the full SHA
    07c2367 View commit details
    Browse the repository at this point in the history
  4. deleting the initial version of the script

    Masha Iureva authored and Masha Iureva committed Oct 21, 2024
    Configuration menu
    Copy the full SHA
    6941ac6 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    f019fef View commit details
    Browse the repository at this point in the history