-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync quickstart/index.md with gh-pages/quickstart.md #2891
Conversation
Codecov ReportPatch coverage has no change and project coverage change:
Additional details and impacted files@@ Coverage Diff @@
## master #2891 +/- ##
==========================================
+ Coverage 47.34% 47.40% +0.05%
==========================================
Files 395 395
Lines 44045 44041 -4
Branches 487 487
==========================================
+ Hits 20854 20878 +24
+ Misses 21600 21580 -20
+ Partials 1591 1583 -8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
) Signed-off-by: Rich Scott <[email protected]>
* Update simulator * Replace Output with C * Typo * Restore pkg proto * Restore files * Fixing simulator changes (#6) * Fixing simulator changes * Changed to less than or equal Co-authored-by: Mustafa Ilyas <[email protected]> * Simulator Changes (#9) * Add config and dependency injection to scheduler metrics (#2892) * Replace metrics singleton with an injection pattern. * fix * add configuration structures to metrics * add configuration * rename elements * Maker Pulsar ReceiverQueueSize Configurable (#2895) * wip * wip * set receiverQueueSize to 100 * remove old PulsarReceiverQueueSize * revert * subscriptionin api --------- Co-authored-by: Chris Martin <[email protected]> * Add poll_interval (#2805) * Add poll_interval * Add poll_interval * Added poll_interval * update by running tox-e docs --------- Co-authored-by: Kevin Hannon <[email protected]> Co-authored-by: Adam McArthur <[email protected]> * Seperate python script for armada v1 and v2 system diagrams (#2758) * Seperate python script for armada v1 system diagram * removed generate.py so it can be replaced with two seperate files for Armada V1 and Armada V2 * Python script to generate Armada V2 system diagram * generate_v1.py Update #1 * generate_v1.py Update Number:2 * generate.py runs generate_v1.py as well as generate_v2.py and it is consistent with our instructions as 'docs/design/diagrams/relationships' * generate_v1.py Update No:3 * Armada V1 and Armada V2 diagrams * updated relationships_diagram.md to include armada v1 and v2 diagrams --------- Co-authored-by: Adam McArthur <[email protected]> * Add config to use autoupdater on tagged branches (#2905) * #2904 add autoupdate config * #2904 add label config and other options * docs: create README.md for plugins directory (#2897) * Create README.md for plugins directory * Update README.md * Update plugins/README.md Co-authored-by: Kevin Hannon <[email protected]> * Update README.md --------- Co-authored-by: Kevin Hannon <[email protected]> Co-authored-by: Adam McArthur <[email protected]> * Enables airflow operator level retry. (#2894) * Update docker stuff for latest airflow 2.7.0 * Use AirflowException instead of AirflowFailException to allow for retries * Remove codecov workflows (#2902) * Upgrade Pulsar Client to v0.11 (#2896) * update * update pulsar client * Fix bug causing server spinning * Abstract out the retry until success logic for testing (#2901) * Respond to review --------- Co-authored-by: Chris Martin <[email protected]> Co-authored-by: Daniel Rastelli <[email protected]> * Sync quickstart/index.md with gh-pages/quickstart.md (#2891) * Log Call Site (#2909) * allow logger to report caller * allow logger to report caller * lint --------- Co-authored-by: Chris Martin <[email protected]> * Add cleaner test output for mage with os/exec.Command (#2907) * feat: Update Semver from version 6.3.0 to 6.3.1 (#2686) Co-authored-by: Adam McArthur <[email protected]> * fix: upgrade @typescript-eslint/parser from 5.52.0 to 5.61.0 (#2743) Snyk has created this PR to upgrade @typescript-eslint/parser from 5.52.0 to 5.61.0. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr Co-authored-by: snyk-bot <[email protected]> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> * fix: upgrade @types/react from 16.14.32 to 16.14.43 (#2747) Snyk has created this PR to upgrade @types/react from 16.14.32 to 16.14.43. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr Co-authored-by: snyk-bot <[email protected]> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> * Bump github.com/go-openapi/jsonreference from 0.20.0 to 0.20.2 (#2316) Bumps [github.com/go-openapi/jsonreference](https://github.com/go-openapi/jsonreference) from 0.20.0 to 0.20.2. - [Release notes](https://github.com/go-openapi/jsonreference/releases) - [Commits](go-openapi/jsonreference@v0.20.0...v0.20.2) --- updated-dependencies: - dependency-name: github.com/go-openapi/jsonreference dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> * Order leased jobs by serial (#2912) This will ensure the job leased first, gets send to the cluster first Currently we just order by postgres default sorting - which often picks the most recently leased - causing the first lease jobs to get stuck - This only occurs when scheduling is faster than leasing * Bump webpack from 5.75.0 to 5.77.0 in /internal/lookout/ui (#2302) Bumps [webpack](https://github.com/webpack/webpack) from 5.75.0 to 5.77.0. - [Release notes](https://github.com/webpack/webpack/releases) - [Commits](webpack/webpack@v5.75.0...v5.77.0) --- updated-dependencies: - dependency-name: webpack dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> * Bump word-wrap from 1.2.3 to 1.2.5 in /internal/lookout/ui (#2806) Bumps [word-wrap](https://github.com/jonschlinkert/word-wrap) from 1.2.3 to 1.2.5. - [Release notes](https://github.com/jonschlinkert/word-wrap/releases) - [Commits](jonschlinkert/word-wrap@1.2.3...1.2.5) --- updated-dependencies: - dependency-name: word-wrap dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> * resolve flaky (#2914) Co-authored-by: Adam McArthur <[email protected]> * fix: upgrade @typescript-eslint/eslint-plugin from 5.52.0 to 5.61.0 (#2744) Snyk has created this PR to upgrade @typescript-eslint/eslint-plugin from 5.52.0 to 5.61.0. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr Co-authored-by: snyk-bot <[email protected]> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> * fix: upgrade react-router-dom from 6.9.0 to 6.14.1 (#2746) Snyk has created this PR to upgrade react-router-dom from 6.9.0 to 6.14.1. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr Co-authored-by: snyk-bot <[email protected]> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> * Bump semver from 6.3.0 to 6.3.1 in /internal/lookout/ui (#2661) Bumps [semver](https://github.com/npm/node-semver) from 6.3.0 to 6.3.1. - [Release notes](https://github.com/npm/node-semver/releases) - [Changelog](https://github.com/npm/node-semver/blob/v6.3.1/CHANGELOG.md) - [Commits](npm/node-semver@v6.3.0...v6.3.1) --- updated-dependencies: - dependency-name: semver dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> * Run CodeQL once daily on a schedule (#2918) * Helm chart update: executor (#2917) * Helm chart update: executor At the moment the helm chart for the executor doesn't include priorityClass even though one is created in the chart. This means that the executor deployment is unable to set the priorityClass. * Patch/dependencies (#2923) * Bump github.com/go-openapi/strfmt from 0.21.3 to 0.21.7 Bumps [github.com/go-openapi/strfmt](https://github.com/go-openapi/strfmt) from 0.21.3 to 0.21.7. - [Release notes](https://github.com/go-openapi/strfmt/releases) - [Commits](go-openapi/strfmt@v0.21.3...v0.21.7) --- updated-dependencies: - dependency-name: github.com/go-openapi/strfmt dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * Bump github.com/go-openapi/runtime from 0.24.2 to 0.26.0 Bumps [github.com/go-openapi/runtime](https://github.com/go-openapi/runtime) from 0.24.2 to 0.26.0. - [Release notes](https://github.com/go-openapi/runtime/releases) - [Commits](go-openapi/runtime@v0.24.2...v0.26.0) --- updated-dependencies: - dependency-name: github.com/go-openapi/runtime dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * Bump github.com/goreleaser/nfpm/v2 from 2.25.1 to 2.29.0 Bumps [github.com/goreleaser/nfpm/v2](https://github.com/goreleaser/nfpm) from 2.25.1 to 2.29.0. - [Release notes](https://github.com/goreleaser/nfpm/releases) - [Changelog](https://github.com/goreleaser/nfpm/blob/main/.goreleaser.yml) - [Commits](goreleaser/nfpm@v2.25.1...v2.29.0) --- updated-dependencies: - dependency-name: github.com/goreleaser/nfpm/v2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> * Bump github.com/go-playground/validator/v10 from 10.11.1 to 10.14.1 Bumps [github.com/go-playground/validator/v10](https://github.com/go-playground/validator) from 10.11.1 to 10.14.1. - [Release notes](https://github.com/go-playground/validator/releases) - [Commits](go-playground/validator@v10.11.1...v10.14.1) --- updated-dependencies: - dependency-name: github.com/go-playground/validator/v10 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * Bump Grpc.Net.Client in /client/DotNet/ArmadaProject.Io.Client Bumps [Grpc.Net.Client](https://github.com/grpc/grpc-dotnet) from 2.47.0 to 2.52.0. - [Release notes](https://github.com/grpc/grpc-dotnet/releases) - [Changelog](https://github.com/grpc/grpc-dotnet/blob/master/doc/release_process.md) - [Commits](grpc/grpc-dotnet@v2.47.0...v2.52.0) --- updated-dependencies: - dependency-name: Grpc.Net.Client dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> * fix: upgrade @mui/material from 5.10.17 to 5.13.6 Snyk has created this PR to upgrade @mui/material from 5.10.17 to 5.13.6. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr * fix: upgrade prettier from 2.7.1 to 2.8.8 Snyk has created this PR to upgrade prettier from 2.7.1 to 2.8.8. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr * fix: upgrade @mui/icons-material from 5.10.16 to 5.14.3 Snyk has created this PR to upgrade @mui/icons-material from 5.10.16 to 5.14.3. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr * fix: upgrade eslint-plugin-import from 2.26.0 to 2.28.0 Snyk has created this PR to upgrade eslint-plugin-import from 2.26.0 to 2.28.0. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr * fix: upgrade eslint-config-prettier from 8.5.0 to 8.10.0 Snyk has created this PR to upgrade eslint-config-prettier from 8.5.0 to 8.10.0. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr * Trying to update klog * go mod fix --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: snyk-bot <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> * Fix bug causing GetJobSetEvents to get stuck (#2903) * Add error message of final job run to JobFailedMessage When we hit the maximum retry limit, the JobFailedMessage just says something along the lines of "Job has been retried too many times, giving up" Now we include the final run error in that message - to make it easier to work out the cause of retries * Fix bug causing GetJobSetEvents to get stuck GetJobSetEvents only increments its fromId variable on sending new messages However now all redis events produce api events that will be sent downstream The issue here is if we get 500 redis events in a row that don't produce api events, then the fromId never gets updated - Meaning the watching gets stuck here To fix this, ReadEvents now returns a lastMessageId. So if there are no messages to process, the fromId should be updated using the lastMessageId * Formatting * Bump @adobe/css-tools from 4.0.1 to 4.3.1 in /internal/lookout/ui (#2931) Bumps [@adobe/css-tools](https://github.com/adobe/css-tools) from 4.0.1 to 4.3.1. - [Changelog](https://github.com/adobe/css-tools/blob/main/History.md) - [Commits](https://github.com/adobe/css-tools/commits) --- updated-dependencies: - dependency-name: "@adobe/css-tools" dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Improved etcd protection (#2925) * Initial commit * Delete unused code * Export metrics collection delay metrics * Add mutex to InMemoryJobRepository * Add tests * Lint * Update internal/executor/configuration/types.go * Lint --------- Co-authored-by: JamesMurkin <[email protected]> * Stop executor requesting more jobs when it still has leased jobs (#2932) * Stop executor requesting more jobs when it still has leased jobs Currently we "queue" jobs to be submitted on the executor - which sit the leased state until they are submitted to kubernetes However this causes 2 issues with our current setup: - It prevents back-pressure from working well on the scheduler side. As it sees all these "Leased" jobs as active, so just keep scheduling more - In the case we are slowing submission due to etcd going over its limit. We "queue" lots of jobs, and as soon as etcd goes under its limit we hit it with potentially thousands of jobs This flow needs further work and thought - however for now this is the minimal fix to prevent bad behaviour Signed-off-by: JamesMurkin <[email protected]> * WIP Signed-off-by: JamesMurkin <[email protected]> * Fix scheduler side tests Signed-off-by: JamesMurkin <[email protected]> * Implement number of requested jobs on executor side Signed-off-by: JamesMurkin <[email protected]> * Remove unused config Signed-off-by: JamesMurkin <[email protected]> * Fixing panic on startup when etcd health monitor not registered Signed-off-by: JamesMurkin <[email protected]> * Enhance logging Signed-off-by: JamesMurkin <[email protected]> * Set more sensible default for maxLeasedJobs Signed-off-by: JamesMurkin <[email protected]> --------- Signed-off-by: JamesMurkin <[email protected]> * Fix race in etcd protections (#2937) * Initial commit * Fix MultiHealthMonitor race * Fix etcd health metric naming conflict (#2939) * Fix metric naming conflict * Fix metric names * Fix metrix prefix * Fix label * Bump golang.org/x/sync from 0.1.0 to 0.3.0 (#2946) Bumps [golang.org/x/sync](https://github.com/golang/sync) from 0.1.0 to 0.3.0. - [Commits](golang/sync@v0.1.0...v0.3.0) --- updated-dependencies: - dependency-name: golang.org/x/sync dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Add more scheduler metrics (#2906) * Add jobs considered and refactor to counters * Add fair share metrics * Add reset for gauge metrics * format * cycle imports * modify cycle return struct * verbose logging --------- Co-authored-by: Albin Severinson <[email protected]> * Update config.yaml (#2953) * Remove gang job cardinality submit check. Add placeholder for min gang size * Add msumner91 and mustafai to magic list of trusted people (#2956) * Add msumner91 to magic list of trusted people * Update .mergify.yml * Airflow: always set credentials from args in channel ctor (#2952) In the GrpcChannelArguments constructor, always set the credentials_callback_args member from what is given. Add a test to verify serialization round-tripping is complete, and a __eq__ implementation for GrpcChannelArguments. Signed-off-by: Rich Scott <[email protected]> * Removed Makefile from repo (#2915) Co-authored-by: Mohamed Abdelfatah <[email protected]> * Add per-queue scheduling rate-limiting (#2938) * Initial commit * Add rate limiters * go mod tidy * Updates * Add tests * Update default config * Update default scheduler config * Whitespace * Cleanup * Docstring improvements * Remove limiter nil checks * Add Cardinality() function on gctx * Fix test * Fix test * Add note about signed commits to Contributor documentation (#2960) * Add note about signed commits to Contributor documentation Signed-off-by: Aviral Singh <[email protected]> * Add note about signed commits to Contributor documentation --------- Signed-off-by: Aviral Singh <[email protected]> * ArmadaContext that includes a logger (#2934) * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * compilation! * rename package * more compilation * rename to Context * embed * compilation * compilation * fix test * remove old ctxloggers * revert design doc * revert developer doc * formatting * wip * tests * don't gen * don't gen * merged master --------- Co-authored-by: Chris Martin <[email protected]> Co-authored-by: Albin Severinson <[email protected]> * Bump armada airflow operator to version 0.5.4 (#2961) * Bump armada airflow operator to version 0.5.4 Signed-off-by: Rich Scott <[email protected]> * Regenerate Airflow Operator Markdown doc. Signed-off-by: Rich Scott <[email protected]> * Fix regenerated Airflow doc error. Signed-off-by: Rich Scott <[email protected]> * Pin versions of all modules, especially around docs generation. Signed-off-by: Rich Scott <[email protected]> * Regenerate Airflow docs using Python 3.10 Signed-off-by: Rich Scott <[email protected]> --------- Signed-off-by: Rich Scott <[email protected]> * Simulator Changes Made a number of changes to the simulator and simulator tests, most notably: - Fixed implementation of minSubmitTime setting for workload specifications - Added tests for SchedulingConfigsFromPattern, ClusterSpecsFromPattern, WorkloadFromPattern - Added sample workloads, clusters and scheduling configs - Added tests which simulate per-pool and per-executorGroup scheduling - Implemented further metrics for use in simulator tests, such as a cluster's aggregate resources, number of preemptions and schedules for a given test run - Added optimisation to speed up simulator, whereby the scheduler skips the current schedule event if no eventSequences have been received since the previous schedule. * Simplified TestClusterSpecsFromPattern and TestWorkloadFromPattern tests * Removed unused test * Fixed malformed yaml * Improved metrics for simulations. Improved simulator tests with errorgroups. * Removed all simulator test data except basic data necessary for testing * Implementing CLI Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: JamesMurkin <[email protected]> Signed-off-by: Rich Scott <[email protected]> Signed-off-by: Aviral Singh <[email protected]> Co-authored-by: Daniel Rastelli <[email protected]> Co-authored-by: Chris Martin <[email protected]> Co-authored-by: Chris Martin <[email protected]> Co-authored-by: Sarthak Negi <[email protected]> Co-authored-by: Kevin Hannon <[email protected]> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Pradeep Kurapati <[email protected]> Co-authored-by: Dave Gantenbein <[email protected]> Co-authored-by: Shivang Shandilya <[email protected]> Co-authored-by: Kevin Hannon <[email protected]> Co-authored-by: Clif Houck <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Co-authored-by: Kanu Mike Chibundu <[email protected]> Co-authored-by: snyk-bot <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: JamesMurkin <[email protected]> Co-authored-by: owenthomas17 <[email protected]> Co-authored-by: Albin Severinson <[email protected]> Co-authored-by: Mark Sumner <[email protected]> Co-authored-by: Rich Scott <[email protected]> Co-authored-by: MeenuyD <[email protected]> Co-authored-by: Aviral Singh <[email protected]> Co-authored-by: Mustafa Ilyas <[email protected]> * Adding verbose flag to simulator CLI, changing logging context in simulator * Improved simulator CLI output, removed redundant features, implemented parallel simulations by addressing mutability of structures inputted into the simulator * Removed unknown logging library * Changing threadSafeLogger Info call to Print. Adding separation back between simulation results * Implemented stochastic runtime for jobs using a shifted exponential distribution (#13) * Implemented stochastic runtime for jobs using a shifted exponential distribution * Implemented min submit time from dependency completion (#14) Co-authored-by: Mustafa Ilyas <[email protected]> * Fixed tests * Fixed implementation of shifted exponential distribution * Using FP unrounded parameters to sample from distribution * Modified stochastic runtime definition * Adding logging to simulator Co-authored-by: Mustafa Ilyas <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: JamesMurkin <[email protected]> Signed-off-by: Rich Scott <[email protected]> Signed-off-by: Aviral Singh <[email protected]> Co-authored-by: Albin Severinson <[email protected]> Co-authored-by: Mustafa Ilyas <[email protected]> Co-authored-by: Mustafa Ilyas <[email protected]> Co-authored-by: Daniel Rastelli <[email protected]> Co-authored-by: Chris Martin <[email protected]> Co-authored-by: Chris Martin <[email protected]> Co-authored-by: Sarthak Negi <[email protected]> Co-authored-by: Kevin Hannon <[email protected]> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Pradeep Kurapati <[email protected]> Co-authored-by: Dave Gantenbein <[email protected]> Co-authored-by: Shivang Shandilya <[email protected]> Co-authored-by: Kevin Hannon <[email protected]> Co-authored-by: Clif Houck <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Co-authored-by: Kanu Mike Chibundu <[email protected]> Co-authored-by: snyk-bot <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: JamesMurkin <[email protected]> Co-authored-by: owenthomas17 <[email protected]> Co-authored-by: Albin Severinson <[email protected]> Co-authored-by: Mark Sumner <[email protected]> Co-authored-by: Rich Scott <[email protected]> Co-authored-by: MeenuyD <[email protected]> Co-authored-by: Aviral Singh <[email protected]>
* Sync out testsuite changes (#19) * Update simulator * Replace Output with C * Typo * Restore pkg proto * Restore files * Fixing simulator changes (#6) * Fixing simulator changes * Changed to less than or equal Co-authored-by: Mustafa Ilyas <[email protected]> * Simulator Changes (#9) * Add config and dependency injection to scheduler metrics (#2892) * Replace metrics singleton with an injection pattern. * fix * add configuration structures to metrics * add configuration * rename elements * Maker Pulsar ReceiverQueueSize Configurable (#2895) * wip * wip * set receiverQueueSize to 100 * remove old PulsarReceiverQueueSize * revert * subscriptionin api --------- Co-authored-by: Chris Martin <[email protected]> * Add poll_interval (#2805) * Add poll_interval * Add poll_interval * Added poll_interval * update by running tox-e docs --------- Co-authored-by: Kevin Hannon <[email protected]> Co-authored-by: Adam McArthur <[email protected]> * Seperate python script for armada v1 and v2 system diagrams (#2758) * Seperate python script for armada v1 system diagram * removed generate.py so it can be replaced with two seperate files for Armada V1 and Armada V2 * Python script to generate Armada V2 system diagram * generate_v1.py Update #1 * generate_v1.py Update Number:2 * generate.py runs generate_v1.py as well as generate_v2.py and it is consistent with our instructions as 'docs/design/diagrams/relationships' * generate_v1.py Update No:3 * Armada V1 and Armada V2 diagrams * updated relationships_diagram.md to include armada v1 and v2 diagrams --------- Co-authored-by: Adam McArthur <[email protected]> * Add config to use autoupdater on tagged branches (#2905) * #2904 add autoupdate config * #2904 add label config and other options * docs: create README.md for plugins directory (#2897) * Create README.md for plugins directory * Update README.md * Update plugins/README.md Co-authored-by: Kevin Hannon <[email protected]> * Update README.md --------- Co-authored-by: Kevin Hannon <[email protected]> Co-authored-by: Adam McArthur <[email protected]> * Enables airflow operator level retry. (#2894) * Update docker stuff for latest airflow 2.7.0 * Use AirflowException instead of AirflowFailException to allow for retries * Remove codecov workflows (#2902) * Upgrade Pulsar Client to v0.11 (#2896) * update * update pulsar client * Fix bug causing server spinning * Abstract out the retry until success logic for testing (#2901) * Respond to review --------- Co-authored-by: Chris Martin <[email protected]> Co-authored-by: Daniel Rastelli <[email protected]> * Sync quickstart/index.md with gh-pages/quickstart.md (#2891) * Log Call Site (#2909) * allow logger to report caller * allow logger to report caller * lint --------- Co-authored-by: Chris Martin <[email protected]> * Add cleaner test output for mage with os/exec.Command (#2907) * feat: Update Semver from version 6.3.0 to 6.3.1 (#2686) Co-authored-by: Adam McArthur <[email protected]> * fix: upgrade @typescript-eslint/parser from 5.52.0 to 5.61.0 (#2743) Snyk has created this PR to upgrade @typescript-eslint/parser from 5.52.0 to 5.61.0. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr Co-authored-by: snyk-bot <[email protected]> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> * fix: upgrade @types/react from 16.14.32 to 16.14.43 (#2747) Snyk has created this PR to upgrade @types/react from 16.14.32 to 16.14.43. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr Co-authored-by: snyk-bot <[email protected]> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> * Bump github.com/go-openapi/jsonreference from 0.20.0 to 0.20.2 (#2316) Bumps [github.com/go-openapi/jsonreference](https://github.com/go-openapi/jsonreference) from 0.20.0 to 0.20.2. - [Release notes](https://github.com/go-openapi/jsonreference/releases) - [Commits](go-openapi/jsonreference@v0.20.0...v0.20.2) --- updated-dependencies: - dependency-name: github.com/go-openapi/jsonreference dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> * Order leased jobs by serial (#2912) This will ensure the job leased first, gets send to the cluster first Currently we just order by postgres default sorting - which often picks the most recently leased - causing the first lease jobs to get stuck - This only occurs when scheduling is faster than leasing * Bump webpack from 5.75.0 to 5.77.0 in /internal/lookout/ui (#2302) Bumps [webpack](https://github.com/webpack/webpack) from 5.75.0 to 5.77.0. - [Release notes](https://github.com/webpack/webpack/releases) - [Commits](webpack/webpack@v5.75.0...v5.77.0) --- updated-dependencies: - dependency-name: webpack dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> * Bump word-wrap from 1.2.3 to 1.2.5 in /internal/lookout/ui (#2806) Bumps [word-wrap](https://github.com/jonschlinkert/word-wrap) from 1.2.3 to 1.2.5. - [Release notes](https://github.com/jonschlinkert/word-wrap/releases) - [Commits](jonschlinkert/word-wrap@1.2.3...1.2.5) --- updated-dependencies: - dependency-name: word-wrap dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> * resolve flaky (#2914) Co-authored-by: Adam McArthur <[email protected]> * fix: upgrade @typescript-eslint/eslint-plugin from 5.52.0 to 5.61.0 (#2744) Snyk has created this PR to upgrade @typescript-eslint/eslint-plugin from 5.52.0 to 5.61.0. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr Co-authored-by: snyk-bot <[email protected]> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> * fix: upgrade react-router-dom from 6.9.0 to 6.14.1 (#2746) Snyk has created this PR to upgrade react-router-dom from 6.9.0 to 6.14.1. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr Co-authored-by: snyk-bot <[email protected]> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> * Bump semver from 6.3.0 to 6.3.1 in /internal/lookout/ui (#2661) Bumps [semver](https://github.com/npm/node-semver) from 6.3.0 to 6.3.1. - [Release notes](https://github.com/npm/node-semver/releases) - [Changelog](https://github.com/npm/node-semver/blob/v6.3.1/CHANGELOG.md) - [Commits](npm/node-semver@v6.3.0...v6.3.1) --- updated-dependencies: - dependency-name: semver dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> * Run CodeQL once daily on a schedule (#2918) * Helm chart update: executor (#2917) * Helm chart update: executor At the moment the helm chart for the executor doesn't include priorityClass even though one is created in the chart. This means that the executor deployment is unable to set the priorityClass. * Patch/dependencies (#2923) * Bump github.com/go-openapi/strfmt from 0.21.3 to 0.21.7 Bumps [github.com/go-openapi/strfmt](https://github.com/go-openapi/strfmt) from 0.21.3 to 0.21.7. - [Release notes](https://github.com/go-openapi/strfmt/releases) - [Commits](go-openapi/strfmt@v0.21.3...v0.21.7) --- updated-dependencies: - dependency-name: github.com/go-openapi/strfmt dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * Bump github.com/go-openapi/runtime from 0.24.2 to 0.26.0 Bumps [github.com/go-openapi/runtime](https://github.com/go-openapi/runtime) from 0.24.2 to 0.26.0. - [Release notes](https://github.com/go-openapi/runtime/releases) - [Commits](go-openapi/runtime@v0.24.2...v0.26.0) --- updated-dependencies: - dependency-name: github.com/go-openapi/runtime dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * Bump github.com/goreleaser/nfpm/v2 from 2.25.1 to 2.29.0 Bumps [github.com/goreleaser/nfpm/v2](https://github.com/goreleaser/nfpm) from 2.25.1 to 2.29.0. - [Release notes](https://github.com/goreleaser/nfpm/releases) - [Changelog](https://github.com/goreleaser/nfpm/blob/main/.goreleaser.yml) - [Commits](goreleaser/nfpm@v2.25.1...v2.29.0) --- updated-dependencies: - dependency-name: github.com/goreleaser/nfpm/v2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> * Bump github.com/go-playground/validator/v10 from 10.11.1 to 10.14.1 Bumps [github.com/go-playground/validator/v10](https://github.com/go-playground/validator) from 10.11.1 to 10.14.1. - [Release notes](https://github.com/go-playground/validator/releases) - [Commits](go-playground/validator@v10.11.1...v10.14.1) --- updated-dependencies: - dependency-name: github.com/go-playground/validator/v10 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * Bump Grpc.Net.Client in /client/DotNet/ArmadaProject.Io.Client Bumps [Grpc.Net.Client](https://github.com/grpc/grpc-dotnet) from 2.47.0 to 2.52.0. - [Release notes](https://github.com/grpc/grpc-dotnet/releases) - [Changelog](https://github.com/grpc/grpc-dotnet/blob/master/doc/release_process.md) - [Commits](grpc/grpc-dotnet@v2.47.0...v2.52.0) --- updated-dependencies: - dependency-name: Grpc.Net.Client dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> * fix: upgrade @mui/material from 5.10.17 to 5.13.6 Snyk has created this PR to upgrade @mui/material from 5.10.17 to 5.13.6. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr * fix: upgrade prettier from 2.7.1 to 2.8.8 Snyk has created this PR to upgrade prettier from 2.7.1 to 2.8.8. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr * fix: upgrade @mui/icons-material from 5.10.16 to 5.14.3 Snyk has created this PR to upgrade @mui/icons-material from 5.10.16 to 5.14.3. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr * fix: upgrade eslint-plugin-import from 2.26.0 to 2.28.0 Snyk has created this PR to upgrade eslint-plugin-import from 2.26.0 to 2.28.0. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr * fix: upgrade eslint-config-prettier from 8.5.0 to 8.10.0 Snyk has created this PR to upgrade eslint-config-prettier from 8.5.0 to 8.10.0. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr * Trying to update klog * go mod fix --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: snyk-bot <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> * Fix bug causing GetJobSetEvents to get stuck (#2903) * Add error message of final job run to JobFailedMessage When we hit the maximum retry limit, the JobFailedMessage just says something along the lines of "Job has been retried too many times, giving up" Now we include the final run error in that message - to make it easier to work out the cause of retries * Fix bug causing GetJobSetEvents to get stuck GetJobSetEvents only increments its fromId variable on sending new messages However now all redis events produce api events that will be sent downstream The issue here is if we get 500 redis events in a row that don't produce api events, then the fromId never gets updated - Meaning the watching gets stuck here To fix this, ReadEvents now returns a lastMessageId. So if there are no messages to process, the fromId should be updated using the lastMessageId * Formatting * Bump @adobe/css-tools from 4.0.1 to 4.3.1 in /internal/lookout/ui (#2931) Bumps [@adobe/css-tools](https://github.com/adobe/css-tools) from 4.0.1 to 4.3.1. - [Changelog](https://github.com/adobe/css-tools/blob/main/History.md) - [Commits](https://github.com/adobe/css-tools/commits) --- updated-dependencies: - dependency-name: "@adobe/css-tools" dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Improved etcd protection (#2925) * Initial commit * Delete unused code * Export metrics collection delay metrics * Add mutex to InMemoryJobRepository * Add tests * Lint * Update internal/executor/configuration/types.go * Lint --------- Co-authored-by: JamesMurkin <[email protected]> * Stop executor requesting more jobs when it still has leased jobs (#2932) * Stop executor requesting more jobs when it still has leased jobs Currently we "queue" jobs to be submitted on the executor - which sit the leased state until they are submitted to kubernetes However this causes 2 issues with our current setup: - It prevents back-pressure from working well on the scheduler side. As it sees all these "Leased" jobs as active, so just keep scheduling more - In the case we are slowing submission due to etcd going over its limit. We "queue" lots of jobs, and as soon as etcd goes under its limit we hit it with potentially thousands of jobs This flow needs further work and thought - however for now this is the minimal fix to prevent bad behaviour Signed-off-by: JamesMurkin <[email protected]> * WIP Signed-off-by: JamesMurkin <[email protected]> * Fix scheduler side tests Signed-off-by: JamesMurkin <[email protected]> * Implement number of requested jobs on executor side Signed-off-by: JamesMurkin <[email protected]> * Remove unused config Signed-off-by: JamesMurkin <[email protected]> * Fixing panic on startup when etcd health monitor not registered Signed-off-by: JamesMurkin <[email protected]> * Enhance logging Signed-off-by: JamesMurkin <[email protected]> * Set more sensible default for maxLeasedJobs Signed-off-by: JamesMurkin <[email protected]> --------- Signed-off-by: JamesMurkin <[email protected]> * Fix race in etcd protections (#2937) * Initial commit * Fix MultiHealthMonitor race * Fix etcd health metric naming conflict (#2939) * Fix metric naming conflict * Fix metric names * Fix metrix prefix * Fix label * Bump golang.org/x/sync from 0.1.0 to 0.3.0 (#2946) Bumps [golang.org/x/sync](https://github.com/golang/sync) from 0.1.0 to 0.3.0. - [Commits](golang/sync@v0.1.0...v0.3.0) --- updated-dependencies: - dependency-name: golang.org/x/sync dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Add more scheduler metrics (#2906) * Add jobs considered and refactor to counters * Add fair share metrics * Add reset for gauge metrics * format * cycle imports * modify cycle return struct * verbose logging --------- Co-authored-by: Albin Severinson <[email protected]> * Update config.yaml (#2953) * Remove gang job cardinality submit check. Add placeholder for min gang size * Add msumner91 and mustafai to magic list of trusted people (#2956) * Add msumner91 to magic list of trusted people * Update .mergify.yml * Airflow: always set credentials from args in channel ctor (#2952) In the GrpcChannelArguments constructor, always set the credentials_callback_args member from what is given. Add a test to verify serialization round-tripping is complete, and a __eq__ implementation for GrpcChannelArguments. Signed-off-by: Rich Scott <[email protected]> * Removed Makefile from repo (#2915) Co-authored-by: Mohamed Abdelfatah <[email protected]> * Add per-queue scheduling rate-limiting (#2938) * Initial commit * Add rate limiters * go mod tidy * Updates * Add tests * Update default config * Update default scheduler config * Whitespace * Cleanup * Docstring improvements * Remove limiter nil checks * Add Cardinality() function on gctx * Fix test * Fix test * Add note about signed commits to Contributor documentation (#2960) * Add note about signed commits to Contributor documentation Signed-off-by: Aviral Singh <[email protected]> * Add note about signed commits to Contributor documentation --------- Signed-off-by: Aviral Singh <[email protected]> * ArmadaContext that includes a logger (#2934) * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * compilation! * rename package * more compilation * rename to Context * embed * compilation * compilation * fix test * remove old ctxloggers * revert design doc * revert developer doc * formatting * wip * tests * don't gen * don't gen * merged master --------- Co-authored-by: Chris Martin <[email protected]> Co-authored-by: Albin Severinson <[email protected]> * Bump armada airflow operator to version 0.5.4 (#2961) * Bump armada airflow operator to version 0.5.4 Signed-off-by: Rich Scott <[email protected]> * Regenerate Airflow Operator Markdown doc. Signed-off-by: Rich Scott <[email protected]> * Fix regenerated Airflow doc error. Signed-off-by: Rich Scott <[email protected]> * Pin versions of all modules, especially around docs generation. Signed-off-by: Rich Scott <[email protected]> * Regenerate Airflow docs using Python 3.10 Signed-off-by: Rich Scott <[email protected]> --------- Signed-off-by: Rich Scott <[email protected]> * Simulator Changes Made a number of changes to the simulator and simulator tests, most notably: - Fixed implementation of minSubmitTime setting for workload specifications - Added tests for SchedulingConfigsFromPattern, ClusterSpecsFromPattern, WorkloadFromPattern - Added sample workloads, clusters and scheduling configs - Added tests which simulate per-pool and per-executorGroup scheduling - Implemented further metrics for use in simulator tests, such as a cluster's aggregate resources, number of preemptions and schedules for a given test run - Added optimisation to speed up simulator, whereby the scheduler skips the current schedule event if no eventSequences have been received since the previous schedule. * Simplified TestClusterSpecsFromPattern and TestWorkloadFromPattern tests * Removed unused test * Fixed malformed yaml * Improved metrics for simulations. Improved simulator tests with errorgroups. * Removed all simulator test data except basic data necessary for testing * Implementing CLI Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: JamesMurkin <[email protected]> Signed-off-by: Rich Scott <[email protected]> Signed-off-by: Aviral Singh <[email protected]> Co-authored-by: Daniel Rastelli <[email protected]> Co-authored-by: Chris Martin <[email protected]> Co-authored-by: Chris Martin <[email protected]> Co-authored-by: Sarthak Negi <[email protected]> Co-authored-by: Kevin Hannon <[email protected]> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Pradeep Kurapati <[email protected]> Co-authored-by: Dave Gantenbein <[email protected]> Co-authored-by: Shivang Shandilya <[email protected]> Co-authored-by: Kevin Hannon <[email protected]> Co-authored-by: Clif Houck <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Co-authored-by: Kanu Mike Chibundu <[email protected]> Co-authored-by: snyk-bot <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: JamesMurkin <[email protected]> Co-authored-by: owenthomas17 <[email protected]> Co-authored-by: Albin Severinson <[email protected]> Co-authored-by: Mark Sumner <[email protected]> Co-authored-by: Rich Scott <[email protected]> Co-authored-by: MeenuyD <[email protected]> Co-authored-by: Aviral Singh <[email protected]> Co-authored-by: Mustafa Ilyas <[email protected]> * Adding verbose flag to simulator CLI, changing logging context in simulator * Improved simulator CLI output, removed redundant features, implemented parallel simulations by addressing mutability of structures inputted into the simulator * Removed unknown logging library * Changing threadSafeLogger Info call to Print. Adding separation back between simulation results * Implemented stochastic runtime for jobs using a shifted exponential distribution (#13) * Implemented stochastic runtime for jobs using a shifted exponential distribution * Implemented min submit time from dependency completion (#14) Co-authored-by: Mustafa Ilyas <[email protected]> * Fixed tests * Fixed implementation of shifted exponential distribution * Using FP unrounded parameters to sample from distribution * Modified stochastic runtime definition * Adding logging to simulator Co-authored-by: Mustafa Ilyas <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: JamesMurkin <[email protected]> Signed-off-by: Rich Scott <[email protected]> Signed-off-by: Aviral Singh <[email protected]> Co-authored-by: Albin Severinson <[email protected]> Co-authored-by: Mustafa Ilyas <[email protected]> Co-authored-by: Mustafa Ilyas <[email protected]> Co-authored-by: Daniel Rastelli <[email protected]> Co-authored-by: Chris Martin <[email protected]> Co-authored-by: Chris Martin <[email protected]> Co-authored-by: Sarthak Negi <[email protected]> Co-authored-by: Kevin Hannon <[email protected]> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Pradeep Kurapati <[email protected]> Co-authored-by: Dave Gantenbein <[email protected]> Co-authored-by: Shivang Shandilya <[email protected]> Co-authored-by: Kevin Hannon <[email protected]> Co-authored-by: Clif Houck <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Co-authored-by: Kanu Mike Chibundu <[email protected]> Co-authored-by: snyk-bot <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: JamesMurkin <[email protected]> Co-authored-by: owenthomas17 <[email protected]> Co-authored-by: Albin Severinson <[email protected]> Co-authored-by: Mark Sumner <[email protected]> Co-authored-by: Rich Scott <[email protected]> Co-authored-by: MeenuyD <[email protected]> Co-authored-by: Aviral Singh <[email protected]> * Add missing brace * Lint * Lint * Lint * Cleanup * Testsuite improvements * Lint * Tidying --------- Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: JamesMurkin <[email protected]> Signed-off-by: Rich Scott <[email protected]> Signed-off-by: Aviral Singh <[email protected]> Co-authored-by: Albin Severinson <[email protected]> Co-authored-by: Albin Severinson <[email protected]> Co-authored-by: Mustafa Ilyas <[email protected]> Co-authored-by: Mustafa Ilyas <[email protected]> Co-authored-by: Daniel Rastelli <[email protected]> Co-authored-by: Chris Martin <[email protected]> Co-authored-by: Chris Martin <[email protected]> Co-authored-by: Sarthak Negi <[email protected]> Co-authored-by: Kevin Hannon <[email protected]> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Pradeep Kurapati <[email protected]> Co-authored-by: Dave Gantenbein <[email protected]> Co-authored-by: Shivang Shandilya <[email protected]> Co-authored-by: Kevin Hannon <[email protected]> Co-authored-by: Clif Houck <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Co-authored-by: Kanu Mike Chibundu <[email protected]> Co-authored-by: snyk-bot <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: JamesMurkin <[email protected]> Co-authored-by: owenthomas17 <[email protected]> Co-authored-by: Mark Sumner <[email protected]> Co-authored-by: Rich Scott <[email protected]> Co-authored-by: MeenuyD <[email protected]> Co-authored-by: Aviral Singh <[email protected]>
* Enables airflow operator level retry. (#2894) * Update docker stuff for latest airflow 2.7.0 * Use AirflowException instead of AirflowFailException to allow for retries Signed-off-by: Rich Scott <[email protected]> * Remove codecov workflows (#2902) Signed-off-by: Rich Scott <[email protected]> * Upgrade Pulsar Client to v0.11 (#2896) * update * update pulsar client * Fix bug causing server spinning * Abstract out the retry until success logic for testing (#2901) * Respond to review --------- Co-authored-by: Chris Martin <[email protected]> Co-authored-by: Daniel Rastelli <[email protected]> Signed-off-by: Rich Scott <[email protected]> * Sync quickstart/index.md with gh-pages/quickstart.md (#2891) Signed-off-by: Rich Scott <[email protected]> * Log Call Site (#2909) * allow logger to report caller * allow logger to report caller * lint --------- Co-authored-by: Chris Martin <[email protected]> Signed-off-by: Rich Scott <[email protected]> * Add cleaner test output for mage with os/exec.Command (#2907) Signed-off-by: Rich Scott <[email protected]> * feat: Update Semver from version 6.3.0 to 6.3.1 (#2686) Co-authored-by: Adam McArthur <[email protected]> Signed-off-by: Rich Scott <[email protected]> * fix: upgrade @typescript-eslint/parser from 5.52.0 to 5.61.0 (#2743) Snyk has created this PR to upgrade @typescript-eslint/parser from 5.52.0 to 5.61.0. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr Co-authored-by: snyk-bot <[email protected]> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Signed-off-by: Rich Scott <[email protected]> * fix: upgrade @types/react from 16.14.32 to 16.14.43 (#2747) Snyk has created this PR to upgrade @types/react from 16.14.32 to 16.14.43. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr Co-authored-by: snyk-bot <[email protected]> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Signed-off-by: Rich Scott <[email protected]> * Bump github.com/go-openapi/jsonreference from 0.20.0 to 0.20.2 (#2316) Bumps [github.com/go-openapi/jsonreference](https://github.com/go-openapi/jsonreference) from 0.20.0 to 0.20.2. - [Release notes](https://github.com/go-openapi/jsonreference/releases) - [Commits](go-openapi/jsonreference@v0.20.0...v0.20.2) --- updated-dependencies: - dependency-name: github.com/go-openapi/jsonreference dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Signed-off-by: Rich Scott <[email protected]> * Order leased jobs by serial (#2912) This will ensure the job leased first, gets send to the cluster first Currently we just order by postgres default sorting - which often picks the most recently leased - causing the first lease jobs to get stuck - This only occurs when scheduling is faster than leasing Signed-off-by: Rich Scott <[email protected]> * Bump webpack from 5.75.0 to 5.77.0 in /internal/lookout/ui (#2302) Bumps [webpack](https://github.com/webpack/webpack) from 5.75.0 to 5.77.0. - [Release notes](https://github.com/webpack/webpack/releases) - [Commits](webpack/webpack@v5.75.0...v5.77.0) --- updated-dependencies: - dependency-name: webpack dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Signed-off-by: Rich Scott <[email protected]> * Bump word-wrap from 1.2.3 to 1.2.5 in /internal/lookout/ui (#2806) Bumps [word-wrap](https://github.com/jonschlinkert/word-wrap) from 1.2.3 to 1.2.5. - [Release notes](https://github.com/jonschlinkert/word-wrap/releases) - [Commits](jonschlinkert/word-wrap@1.2.3...1.2.5) --- updated-dependencies: - dependency-name: word-wrap dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Signed-off-by: Rich Scott <[email protected]> * resolve flaky (#2914) Co-authored-by: Adam McArthur <[email protected]> Signed-off-by: Rich Scott <[email protected]> * fix: upgrade @typescript-eslint/eslint-plugin from 5.52.0 to 5.61.0 (#2744) Snyk has created this PR to upgrade @typescript-eslint/eslint-plugin from 5.52.0 to 5.61.0. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr Co-authored-by: snyk-bot <[email protected]> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Signed-off-by: Rich Scott <[email protected]> * fix: upgrade react-router-dom from 6.9.0 to 6.14.1 (#2746) Snyk has created this PR to upgrade react-router-dom from 6.9.0 to 6.14.1. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr Co-authored-by: snyk-bot <[email protected]> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Signed-off-by: Rich Scott <[email protected]> * Bump semver from 6.3.0 to 6.3.1 in /internal/lookout/ui (#2661) Bumps [semver](https://github.com/npm/node-semver) from 6.3.0 to 6.3.1. - [Release notes](https://github.com/npm/node-semver/releases) - [Changelog](https://github.com/npm/node-semver/blob/v6.3.1/CHANGELOG.md) - [Commits](npm/node-semver@v6.3.0...v6.3.1) --- updated-dependencies: - dependency-name: semver dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Signed-off-by: Rich Scott <[email protected]> * Run CodeQL once daily on a schedule (#2918) Signed-off-by: Rich Scott <[email protected]> * Helm chart update: executor (#2917) * Helm chart update: executor At the moment the helm chart for the executor doesn't include priorityClass even though one is created in the chart. This means that the executor deployment is unable to set the priorityClass. Signed-off-by: Rich Scott <[email protected]> * Patch/dependencies (#2923) * Bump github.com/go-openapi/strfmt from 0.21.3 to 0.21.7 Bumps [github.com/go-openapi/strfmt](https://github.com/go-openapi/strfmt) from 0.21.3 to 0.21.7. - [Release notes](https://github.com/go-openapi/strfmt/releases) - [Commits](go-openapi/strfmt@v0.21.3...v0.21.7) --- updated-dependencies: - dependency-name: github.com/go-openapi/strfmt dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * Bump github.com/go-openapi/runtime from 0.24.2 to 0.26.0 Bumps [github.com/go-openapi/runtime](https://github.com/go-openapi/runtime) from 0.24.2 to 0.26.0. - [Release notes](https://github.com/go-openapi/runtime/releases) - [Commits](go-openapi/runtime@v0.24.2...v0.26.0) --- updated-dependencies: - dependency-name: github.com/go-openapi/runtime dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * Bump github.com/goreleaser/nfpm/v2 from 2.25.1 to 2.29.0 Bumps [github.com/goreleaser/nfpm/v2](https://github.com/goreleaser/nfpm) from 2.25.1 to 2.29.0. - [Release notes](https://github.com/goreleaser/nfpm/releases) - [Changelog](https://github.com/goreleaser/nfpm/blob/main/.goreleaser.yml) - [Commits](goreleaser/nfpm@v2.25.1...v2.29.0) --- updated-dependencies: - dependency-name: github.com/goreleaser/nfpm/v2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> * Bump github.com/go-playground/validator/v10 from 10.11.1 to 10.14.1 Bumps [github.com/go-playground/validator/v10](https://github.com/go-playground/validator) from 10.11.1 to 10.14.1. - [Release notes](https://github.com/go-playground/validator/releases) - [Commits](go-playground/validator@v10.11.1...v10.14.1) --- updated-dependencies: - dependency-name: github.com/go-playground/validator/v10 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * Bump Grpc.Net.Client in /client/DotNet/ArmadaProject.Io.Client Bumps [Grpc.Net.Client](https://github.com/grpc/grpc-dotnet) from 2.47.0 to 2.52.0. - [Release notes](https://github.com/grpc/grpc-dotnet/releases) - [Changelog](https://github.com/grpc/grpc-dotnet/blob/master/doc/release_process.md) - [Commits](grpc/grpc-dotnet@v2.47.0...v2.52.0) --- updated-dependencies: - dependency-name: Grpc.Net.Client dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> * fix: upgrade @mui/material from 5.10.17 to 5.13.6 Snyk has created this PR to upgrade @mui/material from 5.10.17 to 5.13.6. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr * fix: upgrade prettier from 2.7.1 to 2.8.8 Snyk has created this PR to upgrade prettier from 2.7.1 to 2.8.8. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr * fix: upgrade @mui/icons-material from 5.10.16 to 5.14.3 Snyk has created this PR to upgrade @mui/icons-material from 5.10.16 to 5.14.3. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr * fix: upgrade eslint-plugin-import from 2.26.0 to 2.28.0 Snyk has created this PR to upgrade eslint-plugin-import from 2.26.0 to 2.28.0. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr * fix: upgrade eslint-config-prettier from 8.5.0 to 8.10.0 Snyk has created this PR to upgrade eslint-config-prettier from 8.5.0 to 8.10.0. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr * Trying to update klog * go mod fix --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: snyk-bot <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Signed-off-by: Rich Scott <[email protected]> * Fix bug causing GetJobSetEvents to get stuck (#2903) * Add error message of final job run to JobFailedMessage When we hit the maximum retry limit, the JobFailedMessage just says something along the lines of "Job has been retried too many times, giving up" Now we include the final run error in that message - to make it easier to work out the cause of retries * Fix bug causing GetJobSetEvents to get stuck GetJobSetEvents only increments its fromId variable on sending new messages However now all redis events produce api events that will be sent downstream The issue here is if we get 500 redis events in a row that don't produce api events, then the fromId never gets updated - Meaning the watching gets stuck here To fix this, ReadEvents now returns a lastMessageId. So if there are no messages to process, the fromId should be updated using the lastMessageId * Formatting Signed-off-by: Rich Scott <[email protected]> * Bump @adobe/css-tools from 4.0.1 to 4.3.1 in /internal/lookout/ui (#2931) Bumps [@adobe/css-tools](https://github.com/adobe/css-tools) from 4.0.1 to 4.3.1. - [Changelog](https://github.com/adobe/css-tools/blob/main/History.md) - [Commits](https://github.com/adobe/css-tools/commits) --- updated-dependencies: - dependency-name: "@adobe/css-tools" dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Rich Scott <[email protected]> * Update CreateJobs Signed-off-by: Rich Scott <[email protected]> * Update CreateJobs to return a SubmitResponse with error details Signed-off-by: Rich Scott <[email protected]> * Update createJobs sub functions to check jobs individually * Update function usage to count errored jobs Signed-off-by: Rich Scott <[email protected]> * Fix grammar Signed-off-by: Rich Scott <[email protected]> * Added updated test cases Signed-off-by: Rich Scott <[email protected]> * Update gang job testing Signed-off-by: Rich Scott <[email protected]> * Merge branch 'master' into feat/create_job_error Signed-off-by: Rich Scott <[email protected]> * Lint fix Signed-off-by: Rich Scott <[email protected]> * Rework gRPC to send JobSubmitResponse over status.details * Add better nil checking Signed-off-by: Rich Scott <[email protected]> * Typo == instead of != Signed-off-by: Rich Scott <[email protected]> * Wrap gRPC SubmitJob function Signed-off-by: Rich Scott <[email protected]> * Create new client function instead of sharing Signed-off-by: Rich Scott <[email protected]> * Change import order Signed-off-by: Rich Scott <[email protected]> * Add a space between imports Signed-off-by: Rich Scott <[email protected]> * Avoid nil pointer deference Signed-off-by: Rich Scott <[email protected]> * Improved etcd protection (#2925) * Initial commit * Delete unused code * Export metrics collection delay metrics * Add mutex to InMemoryJobRepository * Add tests * Lint * Update internal/executor/configuration/types.go * Lint --------- Co-authored-by: JamesMurkin <[email protected]> Signed-off-by: Rich Scott <[email protected]> * Stop executor requesting more jobs when it still has leased jobs (#2932) * Stop executor requesting more jobs when it still has leased jobs Currently we "queue" jobs to be submitted on the executor - which sit the leased state until they are submitted to kubernetes However this causes 2 issues with our current setup: - It prevents back-pressure from working well on the scheduler side. As it sees all these "Leased" jobs as active, so just keep scheduling more - In the case we are slowing submission due to etcd going over its limit. We "queue" lots of jobs, and as soon as etcd goes under its limit we hit it with potentially thousands of jobs This flow needs further work and thought - however for now this is the minimal fix to prevent bad behaviour Signed-off-by: JamesMurkin <[email protected]> * WIP Signed-off-by: JamesMurkin <[email protected]> * Fix scheduler side tests Signed-off-by: JamesMurkin <[email protected]> * Implement number of requested jobs on executor side Signed-off-by: JamesMurkin <[email protected]> * Remove unused config Signed-off-by: JamesMurkin <[email protected]> * Fixing panic on startup when etcd health monitor not registered Signed-off-by: JamesMurkin <[email protected]> * Enhance logging Signed-off-by: JamesMurkin <[email protected]> * Set more sensible default for maxLeasedJobs Signed-off-by: JamesMurkin <[email protected]> --------- Signed-off-by: JamesMurkin <[email protected]> Signed-off-by: Rich Scott <[email protected]> * Fix race in etcd protections (#2937) * Initial commit * Fix MultiHealthMonitor race Signed-off-by: Rich Scott <[email protected]> * Fix etcd health metric naming conflict (#2939) * Fix metric naming conflict * Fix metric names * Fix metrix prefix * Fix label Signed-off-by: Rich Scott <[email protected]> * lint fix Signed-off-by: Rich Scott <[email protected]> * Return clearer errors for multiple-jobs validation. Signed-off-by: Rich Scott <[email protected]> * Return more detailed submission/validation errors. Generate and return more detailed submission and/or validation errors. If there are numerous jobs with errors, just give the number of failed jobs (and the total number originally submitted), and truncate the list of failed jobs errors to just the first 5 (this is defined in a single constant variable, if neededed to change later). Signed-off-by: Rich Scott <[email protected]> --------- Signed-off-by: Rich Scott <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: JamesMurkin <[email protected]> Co-authored-by: Clif Houck <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Co-authored-by: Chris Martin <[email protected]> Co-authored-by: Chris Martin <[email protected]> Co-authored-by: Daniel Rastelli <[email protected]> Co-authored-by: Kanu Mike Chibundu <[email protected]> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Dave Gantenbein <[email protected]> Co-authored-by: snyk-bot <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: JamesMurkin <[email protected]> Co-authored-by: Sarthak Negi <[email protected]> Co-authored-by: owenthomas17 <[email protected]> Co-authored-by: Raajheer1 <[email protected]> Co-authored-by: Raaj Patel <[email protected]> Co-authored-by: Albin Severinson <[email protected]>
quickstart.md
has been renamed toquickstart/index.md
on master. I am making sure thatquickstart/index.md
is being copied togh-pages/quickstart
so we need only to push updates only to master and the workflow will do the rest.┆Issue is synchronized with this Jira Task by Unito