Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example for Reclient. #286

Conversation

chrisstaite-menlo
Copy link
Collaborator

@chrisstaite-menlo chrisstaite-menlo commented Sep 15, 2023

This change is Reviewable

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this!

Reviewed 7 of 7 files at r1, all commit messages.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @chrisstaite-menlo)


deployment-examples/docker-compose-reclient/docker-compose.yml line 19 at r1 (raw file):

services:
  turbo_cache_local_cas:
    image: allada/turbo-cache:latest

nit: can we use "extends" similar to how it's done here?
https://github.com/TraceMachina/turbo-cache/pull/267/files#diff-419ec2b388f508c75e39af538e03dd9959467d6bea1bf45b8fcda943687d27b3R19


deployment-examples/docker-compose-reclient/docker-compose.yml line 67 at r1 (raw file):

        target: /var/run/docker.sock
    environment:
      RUST_LOG: info

fyi: this will certainly slow everything down due to the amount of logs being generated. I would suggest warn.

It might be worth for me to spend time evaluating what is "info" and what is "debug", right now they are mixed.


deployment-examples/docker-compose-reclient/README.md line 11 at r1 (raw file):

## Roles

There a number of roles required to perform a Reclient set up:

nit: There are


deployment-examples/docker-compose-reclient/README.md line 14 at r1 (raw file):

 - Turbo Cache Scheduler - The Turbo Cache instance that communicates with the
   Goma proxy and handles scheduling of tasks to workers within the Turbo Cache

Doesn't REclient replace goma?


deployment-examples/docker-compose-reclient/README.md line 39 at r1 (raw file):

# Required because the default is projects/rbe-chrome-untrusted/instances/default_instance
# and we do not support instance names with / in currently.
export RBE_instance=main

nit: I'd remove this as it's about to be fixed.


deployment-examples/docker-compose-reclient/README.md line 52 at r1 (raw file):

Now that the environment is set up you simply need to enable it for your
Chromium build by modifying your `args.gn` file to contain:

note: You can also use:

gn gen --args="use_goma=true rbe_cfg_dir=\"../../buildtools/reclient_cfgs/linux\"" out/Default

deployment-examples/docker-compose-reclient/run_in_container.sh line 13 at r1 (raw file):

    WORK_DIRECTORY=$(pwd | cut -c25-94)
    WORKING_DIRECTORY=$(pwd | cut -c95-)
    exec docker run --rm -w /work${WORKING_DIRECTORY} -v ${HOST_ROOT}${WORK_DIRECTORY}:/work $(echo "$CONTAINER" | cut -c10-) /bin/env "$@"

oh the memories.

It turns out this has a high chance of causing spurious errors (from experience). We got around these random docker failures by running a script in docker that would collect the exit code and write it to a file then outside docker read the file. If the file was not written too, it'd kill/restart the worker and inform the scheduler to disconnect all future work (because it means docker, kernel or other system failure if no exit code).

no action, but FYI.


deployment-examples/docker-compose-reclient/run_in_container.sh line 13 at r1 (raw file):

    WORK_DIRECTORY=$(pwd | cut -c25-94)
    WORKING_DIRECTORY=$(pwd | cut -c95-)
    exec docker run --rm -w /work${WORKING_DIRECTORY} -v ${HOST_ROOT}${WORK_DIRECTORY}:/work $(echo "$CONTAINER" | cut -c10-) /bin/env "$@"

nit: Shouldn't it be cut -c9-?


deployment-examples/docker-compose-reclient/worker.json line 55 at r1 (raw file):

        },
        // Need to specify a placeholder here otherwise Priority scheduling does
        // not work.

hmmm, is this a bug? Shouldn't we just be able to pass an empty array?

If not can we file a ticket and reference it?


deployment-examples/docker-compose-reclient/worker_precondition_script.sh line 3 at r1 (raw file):

#!/bin/sh
set -eu

nit: Can we add a comment here stating when and how this file runs?


deployment-examples/docker-compose-reclient/worker_precondition_script.sh line 5 at r1 (raw file):

AVAILABLE_MEMORY_KB=$(awk '/MemAvailable/ { printf "%d \n", $2 }' /proc/meminfo);
# At least 2Gb of RAM currently available

nit: period.

Copy link
Collaborator Author

@chrisstaite-menlo chrisstaite-menlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @allada)


deployment-examples/docker-compose-reclient/docker-compose.yml line 67 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

fyi: this will certainly slow everything down due to the amount of logs being generated. I would suggest warn.

It might be worth for me to spend time evaluating what is "info" and what is "debug", right now they are mixed.

Yeah, this is a lay-over from debugging that I forgot to remove.


deployment-examples/docker-compose-reclient/README.md line 14 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

Doesn't REclient replace goma?

Can you tell this was copy-amended?


deployment-examples/docker-compose-reclient/README.md line 39 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: I'd remove this as it's about to be fixed.

It was really just a reminder to me to remove it once it is fixed.


deployment-examples/docker-compose-reclient/run_in_container.sh line 13 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

oh the memories.

It turns out this has a high chance of causing spurious errors (from experience). We got around these random docker failures by running a script in docker that would collect the exit code and write it to a file then outside docker read the file. If the file was not written too, it'd kill/restart the worker and inform the scheduler to disconnect all future work (because it means docker, kernel or other system failure if no exit code).

no action, but FYI.

Yeah, I've been there too... I hope this doesn't come back to bite me


deployment-examples/docker-compose-reclient/run_in_container.sh line 13 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: Shouldn't it be cut -c9-?

I don't think so, it appears to be working fine for me.


deployment-examples/docker-compose-reclient/worker.json line 55 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

hmmm, is this a bug? Shouldn't we just be able to pass an empty array?

If not can we file a ticket and reference it?

Yeah, I wasn't sure to be honest how it was supposed to work.

@allada
Copy link
Member

allada commented Sep 26, 2023

Did you forget to push the changes? It looks nearly ready to land.

@chrisstaite-menlo
Copy link
Collaborator Author

Mostly I've been busy sorting other stuff to get this over the line. But also, there's a few glitches with set-up that I wanted to iron out so the README just worked out of the box that I haven't had time to finalise yet.

@chrisstaite-menlo chrisstaite-menlo force-pushed the reclient_example branch 2 times, most recently from 47b96f3 to 980fd05 Compare October 11, 2023 13:56
Copy link
Collaborator Author

@chrisstaite-menlo chrisstaite-menlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the README such that it actually works now.

Reviewable status: 5 of 7 files reviewed, 4 unresolved discussions (waiting on @allada)


deployment-examples/docker-compose-reclient/README.md line 52 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

note: You can also use:

gn gen --args="use_goma=true rbe_cfg_dir=\"../../buildtools/reclient_cfgs/linux\"" out/Default

Done.

Copy link
Collaborator Author

@chrisstaite-menlo chrisstaite-menlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready now, but I've not tested it in-place. I'd like someone separately to run through the steps in the README and validate it actually works. Could you do that @allada ?

Reviewable status: 3 of 7 files reviewed, 3 unresolved discussions (waiting on @allada)

@allada
Copy link
Member

allada commented Oct 12, 2023

Is it ok to defer this and have @blakehatch or @aaronmondal do it?

They are going to be ramping up over the next week and this would even be better to have completely fresh eyes on it.

@chrisstaite-menlo
Copy link
Collaborator Author

Apparently extends support doesn't exist in docker-compose anymore?? docker/compose#4315

@chrisstaite-menlo chrisstaite-menlo force-pushed the reclient_example branch 2 times, most recently from 90dc9c1 to 3252364 Compare October 19, 2023 12:48
Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 4 files at r3, 1 of 2 files at r4.
Reviewable status: 5 of 7 files reviewed, all discussions resolved (waiting on @chrisstaite-menlo)

@chrisstaite-menlo chrisstaite-menlo force-pushed the reclient_example branch 2 times, most recently from 67f3869 to 790d785 Compare November 15, 2023 11:24
Copy link
Collaborator Author

@chrisstaite-menlo chrisstaite-menlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 7 files at r1, 2 of 4 files at r3, 2 of 2 files at r4, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @chrisstaite-menlo)

@chrisstaite-menlo
Copy link
Collaborator Author

Little bump on this.

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about +@adam-singer why don't you take a look at this first, then I'll do one final review. Hopefully this will help you get some knowledge on deployment examples?

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @adam-singer)

Copy link
Member

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @chrisstaite-menlo)


deployment-examples/docker-compose-reclient/docker-compose.yml line 1 at r4 (raw file):

# Copyright 2023 The Turbo Cache Authors. All rights reserved.

Mentioned a few times below, the project name has recently changed, could we apply those changes to this patch "Turbo Cache" -> "Native Link"


deployment-examples/docker-compose-reclient/docker-compose.yml line 19 at r4 (raw file):

services:
  turbo_cache_local_cas:
    image: allada/turbo-cache:latest

@allada do we have a different/canonical location for this image?


deployment-examples/docker-compose-reclient/README.md line 13 at r4 (raw file):

There are a number of roles required to perform a Reclient set up:

 - Turbo Cache Scheduler - The Turbo Cache instance that handles all of the

The project has been renamed to "Native Link", lets word towards that


deployment-examples/docker-compose-reclient/README.md line 49 at r4 (raw file):

fail_early_min_fallback_ratio=0.5
deps_cache_max_mb=256
# TODO(b/276727504) Re-enable once noop build shutdown time bug is fixed

This looks like an internal bug link, can we remove or specific a bit more what the bug is related to?


deployment-examples/docker-compose-reclient/run_in_container.sh line 15 at r4 (raw file):

    # on the host and we need to account for that.  This strips off the
    # /root/.cache/turbo-cache and that is populated with $HOST_ROOT from the
    # worker.json which is then populated by ${TURBO_CACHE_DIR} from the

TURBO_CACHE_DIR -> NATIVE_LINK_DIR


deployment-examples/docker-compose-reclient/run_in_container.sh line 17 at r4 (raw file):

    # worker.json which is then populated by ${TURBO_CACHE_DIR} from the
    # docker-compose.yml.
    WORK_DIRECTORY=$(pwd | cut -c25-94)

for the pipes to cut what kind of assumptions are those character ranges making? It is unclear what they might be


deployment-examples/docker-compose-reclient/worker.json line 13 at r4 (raw file):

      "grpc": {
        "instance_name": "main",
        "endpoints": ["grpc://${CAS_ENDPOINT:-127.0.0.1}:50051"],

Both environment variables are CAS_ENDPOINT, is there a situation where we would have a difference between these endpoints and their logical hosts?


deployment-examples/docker-compose-reclient/worker.json line 21 at r4 (raw file):

        "fast": {
          "filesystem": {
            "content_path": "/root/.cache/turbo-cache/data-worker-test/content_path-cas",

turbo-cache -> native-link, same below


deployment-examples/docker-compose-reclient/worker.json line 45 at r4 (raw file):

      "additional_environment": {
        "CONTAINER": {"Property": "container-image"},
        "HOST_ROOT": {"Value": "${TURBO_CACHE_DIR}"},

TURBO_CACHE_DIR -> NATIVE_LINK_DIR

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @adam-singer and @chrisstaite-menlo)


deployment-examples/docker-compose-reclient/docker-compose.yml line 19 at r4 (raw file):

Previously, adam-singer (Adam Singer) wrote…

@allada do we have a different/canonical location for this image?

Yeah this is now TraceMachina/native-link

@chrisstaite-menlo
Copy link
Collaborator Author

deployment-examples/docker-compose-reclient/docker-compose.yml line 19 at r4 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

Yeah this is now TraceMachina/native-link

FYI: This isn't updated in docker-compose/docker-compose.yml.

Copy link
Collaborator Author

@chrisstaite-menlo chrisstaite-menlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @adam-singer)


deployment-examples/docker-compose-reclient/docker-compose.yml line 1 at r4 (raw file):

Previously, adam-singer (Adam Singer) wrote…

Mentioned a few times below, the project name has recently changed, could we apply those changes to this patch "Turbo Cache" -> "Native Link"

Done.


deployment-examples/docker-compose-reclient/README.md line 13 at r4 (raw file):

Previously, adam-singer (Adam Singer) wrote…

The project has been renamed to "Native Link", lets word towards that

Done.


deployment-examples/docker-compose-reclient/README.md line 49 at r4 (raw file):

Previously, adam-singer (Adam Singer) wrote…

This looks like an internal bug link, can we remove or specific a bit more what the bug is related to?

This is a Chromium bug link, those that work on Chromium will understand and there's no reason to be using this if you don't work on Chromium.


deployment-examples/docker-compose-reclient/run_in_container.sh line 15 at r4 (raw file):

Previously, adam-singer (Adam Singer) wrote…

TURBO_CACHE_DIR -> NATIVE_LINK_DIR

Done.


deployment-examples/docker-compose-reclient/run_in_container.sh line 17 at r4 (raw file):

Previously, adam-singer (Adam Singer) wrote…

for the pipes to cut what kind of assumptions are those character ranges making? It is unclear what they might be

There's a whole comment explaining that above.


deployment-examples/docker-compose-reclient/worker.json line 13 at r4 (raw file):

Previously, adam-singer (Adam Singer) wrote…

Both environment variables are CAS_ENDPOINT, is there a situation where we would have a difference between these endpoints and their logical hosts?

Yes, but this is simply an example deployment, one wouldn't usually run the worker on the same node as the scheduler, build and CAS.


deployment-examples/docker-compose-reclient/worker.json line 21 at r4 (raw file):

Previously, adam-singer (Adam Singer) wrote…

turbo-cache -> native-link, same below

Done.


deployment-examples/docker-compose-reclient/worker.json line 45 at r4 (raw file):

Previously, adam-singer (Adam Singer) wrote…

TURBO_CACHE_DIR -> NATIVE_LINK_DIR

Done.

Copy link
Member

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @allada and @chrisstaite-menlo)


deployment-examples/docker-compose-reclient/docker-compose.yml line 19 at r4 (raw file):

Previously, chrisstaite-menlo (Chris Staite) wrote…

FYI: This isn't updated in docker-compose/docker-compose.yml.

@chrisstaite-menlo looks like we will have a few more days of breaking changes, do you mind if we leave this open and revisit it next week for landing. At that point we should have stabilized image and binary command name. Thank you for this and patients around breaking changes

@MarcusSorealheis
Copy link
Collaborator

@adam-singer and @allada I think we can look back into getting this working again.

Copy link
Member

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and pending CI: publish-image, windows-2022 / beta (waiting on @adam-singer)


deployment-examples/docker-compose-reclient/docker-compose.yml line 19 at r4 (raw file):

Previously, adam-singer (Adam Singer) wrote…

@chrisstaite-menlo looks like we will have a few more days of breaking changes, do you mind if we leave this open and revisit it next week for landing. At that point we should have stabilized image and binary command name. Thank you for this and patients around breaking changes

This image is now now be ghcr.io/tracemachina/nativelink:wm6l3qi72hv9phn44853ir1dxs3q7k65 instead of tracemachina/nativelink:latest. I think it's a drop-in replacement, but I'm not 100% sure. Might need some adjustments to mount paths. The new image already has the cas executable set as entrypoint, so the command to run should just be the path to the config file, i.e.

command: /root/local-storage-cas.json

Copy link
Member

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcusSorealheis agree.
@chrisstaite-menlo we are ready to review this, pointed out some breaking changes since 0.1.0 release, also will likely need to revert Cargo.lock file.

Reviewable status: 0 of 1 LGTMs obtained, and pending CI: publish-image, windows-2022 / beta


deployment-examples/docker-compose-reclient/scheduler.json line 7 at r6 (raw file):

        "instance_name": "main",
        "endpoints": ["grpc://${CAS_ENDPOINT:-127.0.0.1}:50051"],
        "store_type": "CAS"

CAS -> cas


deployment-examples/docker-compose-reclient/scheduler.json line 14 at r6 (raw file):

        "instance_name": "main",
        "endpoints": ["grpc://${CAS_ENDPOINT:-127.0.0.1}:50051"],
        "store_type": "AC"

AC -> ac


deployment-examples/docker-compose-reclient/scheduler.json line 28 at r6 (raw file):

          "simple": {
            "supported_platform_properties": {
              "cpu_count": "Minimum",

Minimum -> minimum


deployment-examples/docker-compose-reclient/scheduler.json line 29 at r6 (raw file):

            "supported_platform_properties": {
              "cpu_count": "Minimum",
              "container-image": "Priority"

Priority -> priority


deployment-examples/docker-compose-reclient/scheduler.json line 72 at r6 (raw file):

    }
  }, {
    "listen_address": "0.0.0.0:50061",

This has changed to the following format https://github.com/adam-singer/native-link/blob/d8239640a9fa26c932a4c234ee2d263837159388/nativelink-config/examples/basic_cas.json#L131

    "listener": {
      "http": {
        "socket_address": "0.0.0.0:50061"
      }
    },

deployment-examples/docker-compose-reclient/scheduler.json line 81 at r6 (raw file):

        "scheduler": "MAIN_SCHEDULER",
      },
      "prometheus": {

prometheus -> experimental_prometheus

Copy link

vercel bot commented Jan 17, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
nativelink-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 16, 2024 11:16am

Copy link
Collaborator Author

@chrisstaite-menlo chrisstaite-menlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Local / ubuntu-22.04, Remote / large-ubuntu-22.04, Vercel, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, pre-commit-checks, publish-image, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, vale, windows-2022 / stable, zig-cc ubuntu-20.04, zig-cc ubuntu-22.04 (waiting on @adam-singer)


deployment-examples/docker-compose-reclient/scheduler.json line 7 at r6 (raw file):

Previously, adam-singer (Adam Singer) wrote…

CAS -> cas

Done.


deployment-examples/docker-compose-reclient/scheduler.json line 28 at r6 (raw file):

Previously, adam-singer (Adam Singer) wrote…

Minimum -> minimum

Done.


deployment-examples/docker-compose-reclient/scheduler.json line 29 at r6 (raw file):

Previously, adam-singer (Adam Singer) wrote…

Priority -> priority

Done.


deployment-examples/docker-compose-reclient/scheduler.json line 72 at r6 (raw file):

Previously, adam-singer (Adam Singer) wrote…

This has changed to the following format https://github.com/adam-singer/native-link/blob/d8239640a9fa26c932a4c234ee2d263837159388/nativelink-config/examples/basic_cas.json#L131

    "listener": {
      "http": {
        "socket_address": "0.0.0.0:50061"
      }
    },

Done.


deployment-examples/docker-compose-reclient/scheduler.json line 81 at r6 (raw file):

Previously, adam-singer (Adam Singer) wrote…

prometheus -> experimental_prometheus

Done.

Copy link
Collaborator Author

@chrisstaite-menlo chrisstaite-menlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Local / ubuntu-22.04, Remote / large-ubuntu-22.04, Vercel, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, pre-commit-checks, publish-image, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, vale, windows-2022 / stable, zig-cc ubuntu-20.04, zig-cc ubuntu-22.04 (waiting on @adam-singer)


deployment-examples/docker-compose-reclient/scheduler.json line 14 at r6 (raw file):

Previously, adam-singer (Adam Singer) wrote…

AC -> ac

Done.

Copy link
Member

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r7, 7 of 8 files at r8, all commit messages.
Reviewable status: 0 of 1 LGTMs obtained, and pending CI: pre-commit-checks, vale

Copy link
Member

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisstaite-menlo was able to test this patch against latest main with the following adjustments to the worker.json https://gist.github.com/adam-singer/a7aa64abf6445eb5752f418a457ca25a#file-gistfile0-txt-L28 (breaking changes from previous release)

Followed instructions in the read me, but instead of creating / updating the reproxy.cfg used the following cli to forward all reclient parameters required

RBE_reclient_timeout=60m RBE_exec_timeout=4m RBE_instance=main RBE_service=127.0.0.1:50052 RBE_alsologtostderr=true RBE_service_no_security=true RBE_service_no_auth=true RBE_local_resource_fraction=0.00001 RBE_automatic_auth=false RBE_gcert_refresh_timeout=20 RBE_compression_threshold=-1 RBE_metrics_namespace=main RBE_platform= RBE_experimental_credentials_helper= RBE_experimental_credentials_helper_args= autoninja -v -j 1 -C out/Default unit_tests

Checked out the chromium codebase from a client machine using the following instructions https://chromium.googlesource.com/chromium/src/+/main/docs/linux/build_instructions.md

Can you apply the changes and land this?

Reviewable status: 0 of 1 LGTMs obtained, and pending CI: pre-commit-checks, vale

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks @chrisstaite-menlo, I am going to try this in a day or so and double check everything works.

We are trying to be a lot more diligent on documentation (due to feedback from users). This quarter we are planning on making compiling chrome our base case, meaning we are going to build a bit of CI and tests and such around this PR that we are going to use as a large integration & performance test.

Thanks again 😄

Reviewable status: 0 of 1 LGTMs obtained, and pending CI: pre-commit-checks, vale (waiting on @adam-singer)

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and pending CI: pre-commit-checks, vale (waiting on @adam-singer)

a discussion (no related file):
Note: I suggest reading all the comments before any actions are done.

Second: Could you rebase, there's some breaking changes.

Last: I did test it and although it was not easy to setup, I did get it to work. We are going to be doing a lot of work here over the next couple months so assume lots of changes (hopefully to make your life better too).

Thanks again, awesome as usual 😄



deployment-examples/docker-compose-reclient/docker-compose.yml line 1 at r8 (raw file):

# Copyright 2023 The Native Link Authors. All rights reserved.

nit: NativeLink


deployment-examples/docker-compose-reclient/README.md line 13 at r8 (raw file):

There are a number of roles required to perform a Reclient set up:

 - Native Link Scheduler - The Native Link instance that handles all of the

nit: NativeLink (and others in this file)


deployment-examples/docker-compose-reclient/README.md line 28 at r8 (raw file):

## Running

To start Native Link simply execute `docker-compose up` from this directory.

Lets add a section telling users to follow the instructions here on how to install and setup chrome:
https://chromium.googlesource.com/chromium/src/+/main/docs/linux/build_instructions.md

Lets also state that we will assume the user's checked out chrome to ~/chromium/src

This way when we talk about things that need to be in ~/chromium so we can explicitly mark it as well as assume we are building on linux.


deployment-examples/docker-compose-reclient/README.md line 32 at r8 (raw file):

port 50052.

Create the file buildtools/reclient_cfgs/reproxy.cfg with the following:

nit: Please mention this change needs to happen in chrome's checkout.


deployment-examples/docker-compose-reclient/README.md line 32 at r8 (raw file):

port 50052.

Create the file buildtools/reclient_cfgs/reproxy.cfg with the following:

It think we should have users modify reproxy.cfg.template instead? This way gclient sync or gclient runhooks does the work for us?


deployment-examples/docker-compose-reclient/README.md line 34 at r8 (raw file):

Create the file buildtools/reclient_cfgs/reproxy.cfg with the following:

instance=main

I suggest modifying template (as stated above), then:

instance=$rbe_instance
service=127.0.0.1:50052
log_format=reducedtext
$auth_flags
fail_early_min_action_count=4000
fail_early_min_fallback_ratio=0.5
deps_cache_max_mb=256
# TODO(b/276727504) Re-enable once noop build shutdown time bug is fixed
# enable_deps_cache=true
async_reproxy_termination=true
use_unified_uploads=true
fast_log_collection=true
depsscanner_address=$depsscanner_address

# Improve upload/download concurrency
max_concurrent_streams_per_conn=50
max_concurrent_requests_per_conn=50
min_grpc_connections=50
cas_concurrency=1000

# TODO(allada) Native Link doesn't yet support compressed blob upload
compression_threshold=-1
use_batches=false

# Metric metadata
metrics_namespace=$rbe_project

# Disables both TLS and authentication
service_no_security=true
# Required to stop autoninja from complaining about authentication despite being
# implied by service_no_security
service_no_auth=true

Then in the readme ask people to modify ~/chromium/.gclient with something like:

solutions = [
  {
    "name": "src",
    "url": "https://chromium.googlesource.com/chromium/src.git",
    "managed": False,
    "custom_deps": {},
    "custom_vars": {
      "rbe_instance": "main",
      "rbe_project": "main",
      "depsscanner_address": "exec:///home/allada/chromium/src/buildtools/reclient/scandeps_server",
    },
  },
]

I'm going to see about pushing an upstream chrome change to make remotebuildexecution.googleapis.com:443 a variable too. This would eventually streamline everything once we enable a few more features.


deployment-examples/docker-compose-reclient/README.md line 65 at r8 (raw file):

Now that the environment is set up you simply need to enable it for your
Chromium build by modifying your `args.gn` file to contain:

Maybe suggest calling gn args out/Default and pasting it in there instead?


deployment-examples/docker-compose-reclient/worker.json line 41 at r8 (raw file):

    }
  },
  "workers": [{

nit corrected for latest main changes:

  "workers": [{
    "local": {
      "worker_api_endpoint": {
        "uri": "grpc://${SCHEDULER_ENDPOINT:-127.0.0.1}:50061",
      },
      "entrypoint": "/root/run_in_container.sh",
      "additional_environment": {
        "CONTAINER": {
          "property": "container-image"
        },
        "HOST_ROOT": {
          "value": "${NATIVE_LINK_DIR}"
        }
      },
      "cas_fast_slow_store": "WORKER_FAST_SLOW_STORE",
      "upload_action_result": {
        "ac_store": "GRPC_LOCAL_AC_STORE"
      },
      "work_directory": "/root/.cache/nativelink/work",
      "platform_properties": {
        "OSFamily": {
          "values": ["Linux"]
        },
        "cpu_count": {
          "query_cmd": "nproc"
        },
        // Need to specify a placeholder here otherwise Priority scheduling does
        // not work.
        "container-image": {
          "values": ["placeholder"]
        }
      },
      "experimental_precondition_script": "/root/worker_precondition_script.sh"
    }
  }],

deployment-examples/docker-compose-reclient/worker.json line 48 at r8 (raw file):

      "entrypoint_cmd": "/root/run_in_container.sh",
      "additional_environment": {
        "CONTAINER": {"Property": "container-image"},

nit: Lets name this NL_CONTAINER_IMAGE for consistency.


deployment-examples/docker-compose-reclient/worker.json line 49 at r8 (raw file):

      "additional_environment": {
        "CONTAINER": {"Property": "container-image"},
        "HOST_ROOT": {"Value": "${NATIVE_LINK_DIR}"},

fyi: I do have a local patch to make this normalized, and will have a proper way to get the worker's root directory.


deployment-examples/docker-compose-reclient/worker.json line 49 at r8 (raw file):

      "additional_environment": {
        "CONTAINER": {"Property": "container-image"},
        "HOST_ROOT": {"Value": "${NATIVE_LINK_DIR}"},

nit: Lets name this NL_HOST_ROOT for consistency.

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and pending CI: pre-commit-checks, vale (waiting on @adam-singer)


deployment-examples/docker-compose-reclient/README.md line 34 at r8 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

I suggest modifying template (as stated above), then:

instance=$rbe_instance
service=127.0.0.1:50052
log_format=reducedtext
$auth_flags
fail_early_min_action_count=4000
fail_early_min_fallback_ratio=0.5
deps_cache_max_mb=256
# TODO(b/276727504) Re-enable once noop build shutdown time bug is fixed
# enable_deps_cache=true
async_reproxy_termination=true
use_unified_uploads=true
fast_log_collection=true
depsscanner_address=$depsscanner_address

# Improve upload/download concurrency
max_concurrent_streams_per_conn=50
max_concurrent_requests_per_conn=50
min_grpc_connections=50
cas_concurrency=1000

# TODO(allada) Native Link doesn't yet support compressed blob upload
compression_threshold=-1
use_batches=false

# Metric metadata
metrics_namespace=$rbe_project

# Disables both TLS and authentication
service_no_security=true
# Required to stop autoninja from complaining about authentication despite being
# implied by service_no_security
service_no_auth=true

Then in the readme ask people to modify ~/chromium/.gclient with something like:

solutions = [
  {
    "name": "src",
    "url": "https://chromium.googlesource.com/chromium/src.git",
    "managed": False,
    "custom_deps": {},
    "custom_vars": {
      "rbe_instance": "main",
      "rbe_project": "main",
      "depsscanner_address": "exec:///home/allada/chromium/src/buildtools/reclient/scandeps_server",
    },
  },
]

I'm going to see about pushing an upstream chrome change to make remotebuildexecution.googleapis.com:443 a variable too. This would eventually streamline everything once we enable a few more features.

corrections:

instance=$rbe_instance
service=127.0.0.1:50051
log_format=reducedtext
# auth_flags # NativeLink chrome demo does not use credentails.
fail_early_min_action_count=4000
fail_early_min_fallback_ratio=0.5
deps_cache_max_mb=256
# TODO(b/276727504) Re-enable once noop build shutdown time bug is fixed
# enable_deps_cache=true
async_reproxy_termination=true
use_unified_uploads=true
fast_log_collection=true
depsscanner_address=$depsscanner_address

# Improve upload/download concurrency
max_concurrent_streams_per_conn=50
max_concurrent_requests_per_conn=50
min_grpc_connections=50
cas_concurrency=1000

# TODO(allada) Native Link doesn't yet support compressed blob upload
compression_threshold=-1
use_batches=false

# Metric metadata
metrics_namespace=$rbe_project

# Disables both TLS and authentication
service_no_security=true
# Required to stop autoninja from complaining about authentication despite being
# implied by service_no_security
service_no_auth=true

And

solutions = [
  {
    "name": "src",
    "url": "https://chromium.googlesource.com/chromium/src.git",
    "managed": False,
    "custom_deps": {},
    "custom_vars": {
      "rbe_instance": "main",
      "rbe_project": "main",
      "depsscanner_address": "exec://%s/buildtools/reclient/scandeps_server",
    },
  },
]

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and pending CI: pre-commit-checks, vale (waiting on @adam-singer)


deployment-examples/docker-compose-reclient/scheduler.json line 23 at r8 (raw file):

  },
  "schedulers": {
    "MAIN_SCHEDULER": {

suggest:

    "MAIN_SCHEDULER": {
      "property_modifier": {
        "modifications": [
          {"add": {"name": "cpu_count", "value": "1"}}
        ],
        "scheduler": {
          "simple": {
            "supported_platform_properties": {
              "cpu_count": "minimum",
              "OSFamily": "exact",
              "InputRootAbsolutePath": "priority",
              "container-image": "priority",
              "label:action_default": "priority",
              "label:action_large": "priority"
            }
          }
        }
      }
    }

deployment-examples/docker-compose-reclient/worker.json line 41 at r8 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit corrected for latest main changes:

  "workers": [{
    "local": {
      "worker_api_endpoint": {
        "uri": "grpc://${SCHEDULER_ENDPOINT:-127.0.0.1}:50061",
      },
      "entrypoint": "/root/run_in_container.sh",
      "additional_environment": {
        "CONTAINER": {
          "property": "container-image"
        },
        "HOST_ROOT": {
          "value": "${NATIVE_LINK_DIR}"
        }
      },
      "cas_fast_slow_store": "WORKER_FAST_SLOW_STORE",
      "upload_action_result": {
        "ac_store": "GRPC_LOCAL_AC_STORE"
      },
      "work_directory": "/root/.cache/nativelink/work",
      "platform_properties": {
        "OSFamily": {
          "values": ["Linux"]
        },
        "cpu_count": {
          "query_cmd": "nproc"
        },
        // Need to specify a placeholder here otherwise Priority scheduling does
        // not work.
        "container-image": {
          "values": ["placeholder"]
        }
      },
      "experimental_precondition_script": "/root/worker_precondition_script.sh"
    }
  }],

Corrected again for what I think is better:

  "workers": [{
    "local": {
      "worker_api_endpoint": {
        "uri": "grpc://${SCHEDULER_ENDPOINT:-127.0.0.1}:50061",
      },
      "entrypoint": "/root/run_in_container.sh",
      "additional_environment": {
        "CONTAINER": {
          "property": "container-image"
        },
        "HOST_ROOT": {
          "value": "${NATIVE_LINK_DIR}"
        }
      },
      "cas_fast_slow_store": "WORKER_FAST_SLOW_STORE",
      "upload_action_result": {
        "ac_store": "GRPC_LOCAL_AC_STORE"
      },
      "work_directory": "/root/.cache/nativelink/work",
      "platform_properties": {
        "OSFamily": {
          "values": ["Linux"]
        },
        "cpu_count": {
          "query_cmd": "nproc"
        },
        // Need to specify a placeholder here otherwise Priority scheduling does
        // not work.
        "container-image": {
          "values": ["placeholder"]
        },
        "InputRootAbsolutePath": {
          "values": ["placeholder"]
        },
        "label:action_default": {
          "values": ["placeholder"]
        },
        "label:action_large": {
          "values": ["placeholder"]
        }
      },
      "experimental_precondition_script": "/root/worker_precondition_script.sh"
    }
  }],

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 8 files at r8, all commit messages.
Reviewable status: 0 of 1 LGTMs obtained, and pending CI: pre-commit-checks, vale (waiting on @adam-singer)

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@aaronmondal
Copy link
Member

This information is now in the chromium example https://github.com/TraceMachina/nativelink/tree/main/deployment-examples/chromium

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants