Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ray master #3

Open
wants to merge 215 commits into
base: master
Choose a base branch
from
Open

Ray master #3

wants to merge 215 commits into from

Commits on Oct 1, 2018

  1. [rllib] Simplify sample batch size and num envs config, n_step adjust…

    …ment (ray-project#2995)
    
    * simplify vec batch requirements
    
    * Update rllib-training.rst
    
    * Update rllib-training.rst
    
    * Update rllib-training.rst
    
    * Update rllib-training.rst
    
    * Update rllib-training.rst
    
    * Update rllib-models.rst
    ericl authored Oct 1, 2018
    Configuration menu
    Copy the full SHA
    814c35b View commit details
    Browse the repository at this point in the history
  2. [rllib] Default to truncate_episodes and add some more config validat…

    …ors (ray-project#2967)
    
    * update
    
    * link it
    
    * warn about truncation
    
    * fix
    
    * Update rllib-training.rst
    
    * deprecate tests failing
    ericl authored Oct 1, 2018
    Configuration menu
    Copy the full SHA
    e4bea8d View commit details
    Browse the repository at this point in the history
  3. [rllib] Propagate model options correctly in ARS / ES, to action dist…

    … of PPO (ray-project#2974)
    
    * fix
    
    * fix
    
    * fix it
    
    * propagate conf to action dist
    
    * move carla example too
    
    * rr
    
    * Update policies.py
    
    * wip
    
    * lint
    ericl authored Oct 1, 2018
    Configuration menu
    Copy the full SHA
    b45bed4 View commit details
    Browse the repository at this point in the history
  4. [Java] Fix the required-resources issue of actor member function in J…

    …ava worker. (ray-project#3002)
    
    This fixes a bug in which Java actor methods inherit the resource requirements of the actor creation task.
    jovany-wang authored and robertnishihara committed Oct 1, 2018
    Configuration menu
    Copy the full SHA
    fcef4ed View commit details
    Browse the repository at this point in the history
  5. [rllib] Remove legacy multiagent support (ray-project#2975)

    * remove legacy
    
    * remove reshaper
    ericl authored Oct 1, 2018
    Configuration menu
    Copy the full SHA
    2019b41 View commit details
    Browse the repository at this point in the history

Commits on Oct 2, 2018

  1. Test dying_worker_get and dying_worker_wait for xray. (ray-project#2997)

    This tests the case in which a worker is blocked in a call to ray.get or ray.wait, and then the worker dies. Then later, the object that the worker was waiting for becomes available. We need to make sure not to try to send a message to the dead worker and then die. Related to ray-project#2790.
    robertnishihara authored and richardliaw committed Oct 2, 2018
    Configuration menu
    Copy the full SHA
    3ce8eb2 View commit details
    Browse the repository at this point in the history

Commits on Oct 3, 2018

  1. fix bug: (ray-project#3000)

    before fix,RAY_FUN_CACHE use only get method ,can only get null
      fix : put after create
    bibabolynn authored and robertnishihara committed Oct 3, 2018
    Configuration menu
    Copy the full SHA
    9c606ea View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    cc7e2ec View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    d73ee36 View commit details
    Browse the repository at this point in the history
  4. Move function/actor exporting & loading code to function_manager.py (r…

    …ay-project#3003)
    
    Move function/actor exporting & loading code to function_manager.py to prepare the code change for function descriptor for python.
    guoyuhong authored and robertnishihara committed Oct 3, 2018
    Configuration menu
    Copy the full SHA
    9948e8c View commit details
    Browse the repository at this point in the history

Commits on Oct 4, 2018

  1. Minor improvements and fixes in Python code. (ray-project#3022)

    This commit fix some small defects. 
    1. Remove a comment that should have been removed in ray-project#3003
    2. Remove `redis_protected_mode` that is never used in `ray.init()`
    3. Fix `object_id_seed` that is forgotten to be passed into `ray._init()`
    4. Remove several redundant brackets.
    suquark authored and robertnishihara committed Oct 4, 2018
    Configuration menu
    Copy the full SHA
    f2dbd30 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    01bb073 View commit details
    Browse the repository at this point in the history
  3. Introduce concept of resources required for placing a task. (ray-proj…

    …ect#2837)
    
    * Introduce concept of resources required for placement.
    * Add placement resources to task spec
    * Update java worker
    * Update taskinfo.java
    robertnishihara authored and atumanov committed Oct 4, 2018
    Configuration menu
    Copy the full SHA
    faa31ae View commit details
    Browse the repository at this point in the history

Commits on Oct 5, 2018

  1. Configuration menu
    Copy the full SHA
    0651d3b View commit details
    Browse the repository at this point in the history
  2. [core] Improve logging message when plasma store is started. (ray-pro…

    …ject#3029)
    
    Improve logging message when plasma store is started.
    richardliaw authored and robertnishihara committed Oct 5, 2018
    Configuration menu
    Copy the full SHA
    ecd8f39 View commit details
    Browse the repository at this point in the history

Commits on Oct 6, 2018

  1. Bug/log syncer fails with parentheses (ray-project#2653)

    * Update rsync command
    
    * Escape rsync locations
    
    * Fix the accidental variable move
    
    * Update rsync to use -s flag
    hartikainen authored and ericl committed Oct 6, 2018
    Configuration menu
    Copy the full SHA
    2d35a97 View commit details
    Browse the repository at this point in the history

Commits on Oct 8, 2018

  1. [Java] Load driver resources from local path. (ray-project#3001)

    ## What do these changes do?
    1. Add a configuration item `driver.resource-path`.
    2. Load driver resources from the local path which is specified in the `ray.conf`.
    
    Before this change, we should add all driver resources(like user's jar package, dependencies package and config files) into `classpath`.
    
    After this change, we should add the driver resources into the mount path which we can configure it in `ray.conf`, and we shouldn't configure `classpath` for driver resources any more.
    
    ## Related issue number
    N/A
    jovany-wang authored and raulchen committed Oct 8, 2018
    Configuration menu
    Copy the full SHA
    84bf5fc View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ef1f2fd View commit details
    Browse the repository at this point in the history

Commits on Oct 10, 2018

  1. [cmake] avoid to re-build pyarrow (ray-project#2963)

    * bugfix: env exists check error
    
    * support to avoid re-build pyarrow in project
    
    * bugfix: adapt gtest for centos lib64
    
    * bugfix: check gtest lib exists in the directory
    
    * bugfix: find gtest with checking all libs exists
    
    * prefix RAY_ to thirdparty env variables to avoid conflicts with other module
    
    * arrow use glog from ray
    
    * change the glog and gtest install dir
    chuxi authored and pcmoritz committed Oct 10, 2018
    Configuration menu
    Copy the full SHA
    060891a View commit details
    Browse the repository at this point in the history

Commits on Oct 11, 2018

  1. [Java] Improve some Java code (ray-project#3040)

    This PR improves some java codes,  and removes some duplicated code.
    jovany-wang authored and robertnishihara committed Oct 11, 2018
    Configuration menu
    Copy the full SHA
    4a2ed47 View commit details
    Browse the repository at this point in the history
  2. [Java] Fix loading driver resources issue. (ray-project#3046)

    ## What do these changes do?
    Fix the issue how we load driver resources by a specified path.
    Also this addressed the comments from the related PR [3044](ray-project#3044).
    
    ## Related PRs:
     [ray-project#3044](ray-project#3044) and [ray-project#3001](ray-project#3001).
    jovany-wang authored and raulchen committed Oct 11, 2018
    Configuration menu
    Copy the full SHA
    828fe24 View commit details
    Browse the repository at this point in the history

Commits on Oct 12, 2018

  1. Configuration menu
    Copy the full SHA
    f9b58d7 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    87639b9 View commit details
    Browse the repository at this point in the history

Commits on Oct 13, 2018

  1. Configuration menu
    Copy the full SHA
    473ee4e View commit details
    Browse the repository at this point in the history

Commits on Oct 14, 2018

  1. Configuration menu
    Copy the full SHA
    866c7a5 View commit details
    Browse the repository at this point in the history

Commits on Oct 15, 2018

  1. Configuration menu
    Copy the full SHA
    4dc78b7 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    3c891c6 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6240ccb View commit details
    Browse the repository at this point in the history

Commits on Oct 16, 2018

  1. [tune] Fix (some more) misleading comments in tune/results.py (ray-pr…

    …oject#3068)
    
    ## What do these changes do?
    
    Fix the misleading comments in code for:
     - `EPISODES_THIS_ITER`
     - `EPISODES_TOTAL`
    
    Had noted it before and planned to fix it along with some other changes but seemed very relevant to stay next to ray-project#3058 so sending this now.
    praveen-palanisamy authored and richardliaw committed Oct 16, 2018
    Configuration menu
    Copy the full SHA
    4d8cfc0 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    64e5eb3 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    a9e454f View commit details
    Browse the repository at this point in the history

Commits on Oct 17, 2018

  1. Add password authentication to Redis ports (ray-project#2952)

    * Implement Redis authentication
    
    * Throw exception for legacy Ray
    
    * Add test
    
    * Formatting
    
    * Fix bugs in CLI
    
    * Fix bugs in Raylet
    
    * Move default password to constants.h
    
    * Use pytest.fixture
    
    * Fix bug
    
    * Authenticate using formatted strings
    
    * Add missing passwords
    
    * Add test
    
    * Improve authentication of async contexts
    
    * Disable Redis authentication for credis
    
    * Update test for credis
    
    * Fix rebase artifacts
    
    * Fix formatting
    
    * Add workaround for issue ray-project#3045
    
    * Increase timeout for test
    
    * Improve C++ readability
    
    * Fixes for CLI
    
    * Add security docs
    
    * Address comments
    
    * Address comments
    
    * Adress comments
    
    * Use ray.get
    
    * Fix lint
    pschafhalter authored and pcmoritz committed Oct 17, 2018
    Configuration menu
    Copy the full SHA
    a41bbc1 View commit details
    Browse the repository at this point in the history

Commits on Oct 18, 2018

  1. Configuration menu
    Copy the full SHA
    3c0803e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2c52d9d View commit details
    Browse the repository at this point in the history
  3. Remove Redis protected mode (ray-project#3073)

    Follow-up to ray-project#2925 and ray-project#2952. Removes the Redis protected mode implementation from Ray which was replaced by Redis port authentication.
    pschafhalter authored and robertnishihara committed Oct 18, 2018
    Configuration menu
    Copy the full SHA
    b82fd15 View commit details
    Browse the repository at this point in the history
  4. [c++] Refine Log Code (ray-project#2816)

    * Support setting logging level from env variable
    
    * Remove Env Variable related code
    
    * lint
    guoyuhong authored and pcmoritz committed Oct 18, 2018
    Configuration menu
    Copy the full SHA
    653c5b1 View commit details
    Browse the repository at this point in the history

Commits on Oct 19, 2018

  1. Adding Python3.7 wheels support (ray-project#2546)

    * Adding Python3.7 wheels support
    
    * Adding Mac wheels update
    
    * fix
    
    * numpy version
    
    * choose different numpy versions depending on python version
    
    * fix
    devin-petersohn authored and pcmoritz committed Oct 19, 2018
    Configuration menu
    Copy the full SHA
    8fcdafc View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    fa46978 View commit details
    Browse the repository at this point in the history
  3. [xray] All messages on main asio event loop should be written asynchr…

    …onously (ray-project#3023)
    
    * copy over ref code
    
    * wip async writes
    
    * compiles
    
    * fix error handling
    
    * add test
    
    * amend
    
    * fix test
    
    * clang fmgt
    
    * clang format
    
    * wip
    
    * yapf
    
    * rename format script
    
    * test error
    
    * clangfmt
    
    * add test to list
    
    * warn
    
    * ref test
    
    * fix test
    
    * comment
    
    * add capture
    
    * Update client_connection.cc
    
    * wip
    
    * fix compile
    ericl authored and stephanie-wang committed Oct 19, 2018
    Configuration menu
    Copy the full SHA
    9d23fa0 View commit details
    Browse the repository at this point in the history
  4. [Java] Support dynamically defining resources when submitting task. (r…

    …ay-project#3070)
    
    ## What do these changes do?
    Before this PR, if we want to specify some resources, we must do as following codes:
    ```java
    @rayremote(Resources={ResourceItem("CPU", 10)})
    public static void f1() {
    // do sth
    }
    
    @rayremote(Resources={ResourceItem("CPU", 10)})
    class Demo {
    // sth
    }
    ```
    Unfortunately, it's no way for us to create another actor or task with different resources required.
    
    After this PR, the thing will be:
    ```java
    ActorCreationOptions option = new ActorCreationOptions(); 
    option.resources.put("CPU", 4.0);
    RayActor<Echo> echo1 = Ray.createActor(Echo::new, option);
    option.resources.put("Res-A", 4.0);
    RayActor<Echo> echo2 = Ray.createActor(Echo::new, option);
    
    
    //if we don't specify resource,  the resources will be `{"cpu":0.0}` by default.
    Ray.call(Echo::echo, echo2, 100);
    ```
    
    
    ## Related issue number
    N/A
    jovany-wang authored and raulchen committed Oct 19, 2018
    Configuration menu
    Copy the full SHA
    b410ee0 View commit details
    Browse the repository at this point in the history
  5. [java] fix check exception type (ray-project#3093)

    <!--
    Thank you for your contribution!
    
    Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request.
    -->
    
    ## What do these changes do?
    remove TaskExecutionException, use RayException instead
    <!-- Please give a short brief about these changes. -->
    
    ## Related issue number
    
    <!-- Are there any issues opened that will be resolved by merging this change? -->
    bibabolynn authored and raulchen committed Oct 19, 2018
    Configuration menu
    Copy the full SHA
    9a5c273 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    9a2b533 View commit details
    Browse the repository at this point in the history

Commits on Oct 20, 2018

  1. Configuration menu
    Copy the full SHA
    59901a8 View commit details
    Browse the repository at this point in the history

Commits on Oct 21, 2018

  1. Fill driver id into actor notification when finishing assigned task. (r…

    …ay-project#3080)
    
    ## What do these changes do?
    Fill driver id into actor notification when finishing assigned task.
    Also it improves codes.
    jovany-wang authored and raulchen committed Oct 21, 2018
    Configuration menu
    Copy the full SHA
    a4db5bb View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    40c4148 View commit details
    Browse the repository at this point in the history

Commits on Oct 22, 2018

  1. [rllib] switch to python logger (ray-project#3098)

    * logg
    
    * set rllib logger
    
    * comment
    
    * info
    
    * rlib
    
    * comment
    
    * add format
    
    * fix lint
    
    * add file info
    
    * update
    
    * add ts
    
    * lint
    
    * better docs
    
    * fix value error
    
    * soft log level
    ericl authored Oct 22, 2018
    Configuration menu
    Copy the full SHA
    221d166 View commit details
    Browse the repository at this point in the history
  2. [tune] Fix SearchAlg finishing early (ray-project#3081)

    * Fix trial search alg finishing early
    
    * Fix lint
    
    * fix lint
    
    * nit fix
    richardliaw authored and ericl committed Oct 22, 2018
    Configuration menu
    Copy the full SHA
    eff7cb4 View commit details
    Browse the repository at this point in the history

Commits on Oct 23, 2018

  1. Retry connections to redis for async and subscribe contexts (ray-proj…

    …ect#3105)
    
    This is fixing a problem that @devin-petersohn observed on the windows subsystem for linux.
    
    In theory, redis should be up once the async connect is happening and there should be no retries needed for the async connect. However on the windows subsystem for linux, the async connect was failing even though the synchronous one was working. Maybe windows has a different semantics here than linux.
    pcmoritz authored and robertnishihara committed Oct 23, 2018
    Configuration menu
    Copy the full SHA
    8d8b6e5 View commit details
    Browse the repository at this point in the history
  2. update (ray-project#3112)

    ericl authored Oct 23, 2018
    Configuration menu
    Copy the full SHA
    73a092e View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    22dd7e0 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    9d2e864 View commit details
    Browse the repository at this point in the history
  5. Use XRay backend by default. (ray-project#3020)

    * Use XRay backend by default.
    
    * Remove irrelevant valgrind tests.
    
    * Fix
    
    * Move tests around.
    
    * Fix
    
    * Fix test
    
    * Fix test.
    
    * String/unicode fix.
    
    * Fix test
    
    * Fix unicode issue.
    
    * Minor changes
    
    * Fix bug in test_global_state.py.
    
    * Fix test.
    
    * Linting
    
    * Try arrow change and other object manager changes.
    
    * Use newer plasma client API
    
    * Small updates
    
    * Revert plasma client api change.
    
    * Update
    
    * Update arrow and allow SendObjectHeaders to fail.
    
    * Update arrow
    
    * Update python/ray/experimental/state.py
    
    Co-Authored-By: robertnishihara <[email protected]>
    
    * Address comments.
    robertnishihara authored and pcmoritz committed Oct 23, 2018
    Configuration menu
    Copy the full SHA
    9c1826e View commit details
    Browse the repository at this point in the history

Commits on Oct 24, 2018

  1. [Java] support python worker command in raylet (ray-project#3092)

    <!--
    Thank you for your contribution!
    
    Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request.
    -->
    
    ## What do these changes do?
    
    support raylet, which is started by java runManager, to start python default_worker.py . 
    
    So when doing local test of java call python task, it helps auto start python worker.
    
    ## Related issue number
    
    <!-- Are there any issues opened that will be resolved by merging this change? -->
    chuxi authored and raulchen committed Oct 24, 2018
    Configuration menu
    Copy the full SHA
    7c1fd19 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    55d161b View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    5aa2961 View commit details
    Browse the repository at this point in the history

Commits on Oct 26, 2018

  1. Configuration menu
    Copy the full SHA
    d34516f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    d3148cc View commit details
    Browse the repository at this point in the history
  3. [java] customize path of ray.conf (ray-project#3100)

    users can add custom path of ray.config by using -Dray.config=/path/to/ray.conf
    bibabolynn authored and raulchen committed Oct 26, 2018
    Configuration menu
    Copy the full SHA
    b4614ae View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    055daf1 View commit details
    Browse the repository at this point in the history
  5. Remove legacy Ray code. (ray-project#3121)

    * Remove legacy Ray code.
    
    * Fix cmake and simplify monitor.
    
    * Fix linting
    
    * Updates
    
    * Fix
    
    * Implement some methods.
    
    * Remove more plasma manager references.
    
    * Fix
    
    * Linting
    
    * Fix
    
    * Fix
    
    * Make sure class IDs are strings.
    
    * Some path fixes
    
    * Fix
    
    * Path fixes and update arrow
    
    * Fixes.
    
    * linting
    
    * Fixes
    
    * Java fixes
    
    * Some java fixes
    
    * TaskLanguage -> Language
    
    * Minor
    
    * Fix python test and remove unused method signature.
    
    * Fix java tests
    
    * Fix jenkins tests
    
    * Remove commented out code.
    robertnishihara authored and pcmoritz committed Oct 26, 2018
    Configuration menu
    Copy the full SHA
    658c142 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    6531eed View commit details
    Browse the repository at this point in the history

Commits on Oct 27, 2018

  1. Delete empty pubsub keys (ray-project#3146)

    We found that there are large amount of pub-sub keys with no content in it (This case is worse when wait-id is used in the key name.).
    This logic of deleting empty pub-sub keys from GCS was in legacy ray but not in raylet.
    guoyuhong authored and robertnishihara committed Oct 27, 2018
    Configuration menu
    Copy the full SHA
    befbf78 View commit details
    Browse the repository at this point in the history

Commits on Oct 28, 2018

  1. [sgd] Merge sharded param server based SGD implementation (ray-projec…

    …t#3033)
    
    This includes most of the TF code used for the OSDI experiment. Perf sanity check on p3.16xl instances: Overall scaling looks ok, with the multi-node results within 5% of OSDI final numbers. This seems reasonable given that hugepages are not enabled here, and the param server shards are placed randomly.
    
    $ RAY_USE_XRAY=1 ./test_sgd.py --gpu --batch-size=64 --num-workers=N \
      --devices-per-worker=M --strategy=<simple|ps> \
      --warmup --object-store-memory=10000000000
    
    Images per second total
    gpus total              | simple | ps
    ========================================
    1                       | 218
    2 (1 worker)            | 388
    4 (1 worker)            | 759
    4 (2 workers)           | 176    | 623
    8 (1 worker)            | 985
    8 (2 workers)           | 349    | 1031
    16 (2 nodes, 2 workers) | 600    | 1661
    16 (2 nodes, 4 workers) | 468    | 1712   <--- OSDI perf was 1817
    ericl authored Oct 28, 2018
    Configuration menu
    Copy the full SHA
    af0c117 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    d6bf890 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    a404401 View commit details
    Browse the repository at this point in the history

Commits on Oct 29, 2018

  1. Allow the node manager port and object manager port to be set through… (

    ray-project#3130)
    
    * Allow the node manager port and object manager port to be set through ray start.
    
    * Linting
    
    * Fix Java test
    
    * Address comments.
    robertnishihara authored and pcmoritz committed Oct 29, 2018
    Configuration menu
    Copy the full SHA
    fd854ff View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    08fc9e5 View commit details
    Browse the repository at this point in the history
  3. Use /tmp instead of /dev/shm for object store on Linux if /dev/shm is…

    … too small. (ray-project#3149)
    
    * Use /tmp instead of /dev/shm for object store on Linux if /dev/shm is too small.
    
    * Add logging statement and address comments.
    
    * Fix
    robertnishihara authored and pcmoritz committed Oct 29, 2018
    Configuration menu
    Copy the full SHA
    9868af4 View commit details
    Browse the repository at this point in the history
  4. Deprecate num_workers argument to ray.init and ray start. (ray-projec…

    …t#3114)
    
    * Remove num_workers argument.
    
    * Fix
    
    * Fix
    robertnishihara authored and pcmoritz committed Oct 29, 2018
    Configuration menu
    Copy the full SHA
    32f0d6b View commit details
    Browse the repository at this point in the history
  5. Fix linting. (ray-project#3155)

    robertnishihara authored and ericl committed Oct 29, 2018
    Configuration menu
    Copy the full SHA
    e49839c View commit details
    Browse the repository at this point in the history

Commits on Oct 30, 2018

  1. Configuration menu
    Copy the full SHA
    a221f55 View commit details
    Browse the repository at this point in the history
  2. [xray] Implement faster flush policy for lineage cache (ray-project#3071

    )
    
    * Policy that flushes the lineage stash immediately
    
    * Fix bug where remote tasks in uncommitted lineage weren't getting subscribed to, add reg test
    
    * test
    
    * Fix bug where waiting task was getting subscribed
    
    * Cleanup
    
    * Update src/ray/raylet/lineage_cache.cc
    
    Co-Authored-By: stephanie-wang <[email protected]>
    
    * Update src/ray/raylet/lineage_cache.cc
    
    Co-Authored-By: stephanie-wang <[email protected]>
    
    * cleanup
    
    * cleanup
    
    * Add another test for task with many parents
    
    * fix, unsubscribe to new waiting tasks
    
    * Unsubscribe as soon as the commit notification is handled
    stephanie-wang authored Oct 30, 2018
    Configuration menu
    Copy the full SHA
    aacbd00 View commit details
    Browse the repository at this point in the history
  3. [tune] Modify stop criteria in hyperopt example (ray-project#3102)

    Modify `training_iteraion` to `timesteps_total` because only `timesteps_total` is inside the reporter.
    stvreumi authored and richardliaw committed Oct 30, 2018
    Configuration menu
    Copy the full SHA
    9df2e6e View commit details
    Browse the repository at this point in the history

Commits on Oct 31, 2018

  1. Update task_table and object_table API. (ray-project#3161)

    * Update task_table and object_table API.
    
    * Fix
    robertnishihara authored and pcmoritz committed Oct 31, 2018
    Configuration menu
    Copy the full SHA
    1f29a96 View commit details
    Browse the repository at this point in the history

Commits on Nov 1, 2018

  1. [tune] Add Fractional GPU example/docs (ray-project#3169)

    * Add example for fractional GPU support
    
    * Update tune_mnist_keras.py
    
    * Update doc/source/tune-usage.rst
    richardliaw authored Nov 1, 2018
    Configuration menu
    Copy the full SHA
    2086a57 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    cd284bb View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    b2caed9 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    60f2804 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    57d6e98 View commit details
    Browse the repository at this point in the history
  6. Add use_raylet option for backwards compatibility. (ray-project#3176)

    * Add use_raylet option for backwards compatibility.
    
    * Update message.
    robertnishihara authored and devin-petersohn committed Nov 1, 2018
    Configuration menu
    Copy the full SHA
    e612e26 View commit details
    Browse the repository at this point in the history

Commits on Nov 2, 2018

  1. Configuration menu
    Copy the full SHA
    2bef984 View commit details
    Browse the repository at this point in the history
  2. Rename get_task -> worker_idle in timeline. (ray-project#3179)

    * Rename get_task -> worker_idle in timeline.
    
    * Fix test.
    robertnishihara authored and pcmoritz committed Nov 2, 2018
    Configuration menu
    Copy the full SHA
    5822aa2 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    e495ab5 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    8c03683 View commit details
    Browse the repository at this point in the history
  5. Fix 'tempfile' docs (ray-project#3180)

    * Fix docs.
    
    * Update doc/source/tempfile.rst
    
    Co-Authored-By: suquark <[email protected]>
    
    * Remove doc for raylet socket.
    suquark authored and pcmoritz committed Nov 2, 2018
    Configuration menu
    Copy the full SHA
    5ce7ed7 View commit details
    Browse the repository at this point in the history

Commits on Nov 3, 2018

  1. Configuration menu
    Copy the full SHA
    ca7d4c2 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    9a0f0db View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    0da15b1 View commit details
    Browse the repository at this point in the history

Commits on Nov 4, 2018

  1. Configuration menu
    Copy the full SHA
    7d69c77 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    369cb83 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    fb6ac28 View commit details
    Browse the repository at this point in the history

Commits on Nov 5, 2018

  1. Configuration menu
    Copy the full SHA
    99bac44 View commit details
    Browse the repository at this point in the history
  2. [rllib] Fix rllib rollouts script and add test (ray-project#3211)

    ## What do these changes do?
    
    Clean up the checkpointing to handle the new checkpoint dirs. Add a test for rollout.py
    
    ## Related issue number
    
    ray-project#3206
    ray-project#3204
    ericl authored and richardliaw committed Nov 5, 2018
    Configuration menu
    Copy the full SHA
    813f517 View commit details
    Browse the repository at this point in the history
  3. Caching task resource requirements. (ray-project#3231)

    * caching resource requirements
    
    * small fixes
    
    * avoid copying the resource map
    istoica authored and pcmoritz committed Nov 5, 2018
    Configuration menu
    Copy the full SHA
    d8ae9de View commit details
    Browse the repository at this point in the history

Commits on Nov 6, 2018

  1. Increase timeout before reconstruction is triggered (ray-project#3217)

    * Increase timeout to 10s
    
    * Skip eviction reconstruction tests
    
    * Add stress test for many actors to one
    
    * Fix test by shortening it.
    
    * lower number of processes in stress test
    
    * Skip slow test
    stephanie-wang authored Nov 6, 2018
    Configuration menu
    Copy the full SHA
    bf88aa5 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    4968cc5 View commit details
    Browse the repository at this point in the history
  3. Cap object store memory to 20GB when size is None (ray-project#3243)

    * Update services.py
    
    * Update services.py
    ericl authored and pcmoritz committed Nov 6, 2018
    Configuration menu
    Copy the full SHA
    80f6369 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    8356a01 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    f3efcd2 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    725df3a View commit details
    Browse the repository at this point in the history

Commits on Nov 7, 2018

  1. Configuration menu
    Copy the full SHA
    344b4ef View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ca58570 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    2e04ffe View commit details
    Browse the repository at this point in the history
  4. Cache resources in SchedulingQueue (ray-project#3232)

    * cache resources
    
    * fix
    
    * documentation and remove old code
    
    * fix PR
    
    * update documentation
    
    * linting
    pcmoritz authored and stephanie-wang committed Nov 7, 2018
    Configuration menu
    Copy the full SHA
    4182b85 View commit details
    Browse the repository at this point in the history
  5. Enable timeline visualizations of object transfers. (ray-project#3255)

    * Plot object transfers.
    
    * Linting
    robertnishihara authored and pcmoritz committed Nov 7, 2018
    Configuration menu
    Copy the full SHA
    1dd5d92 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    29e3362 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    cf9e838 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    43df405 View commit details
    Browse the repository at this point in the history

Commits on Nov 8, 2018

  1. Expose internal config parameters for starting Ray (ray-project#3246)

    ## What do these changes do?
    
    This PR exposes the CL option for using a config parameter. This is important for certain tests (i.e., FT tests that removing nodes) to run quickly.
    
    Note that this is bad practice and should be replaced with GFLAGS or some equivalent as soon as possible.
    
    ray-project#3239 depends on this.
    
    TODO:
     - [x] Add documentation to method arguments before merging.
     - [x] Add test to verify this works?
    
    ## Related issue number
    richardliaw authored Nov 8, 2018
    Configuration menu
    Copy the full SHA
    0bab8ed View commit details
    Browse the repository at this point in the history
  2. Allow multiple threads to call ray.get and ray.wait (ray-project#3244)

    * Handle multiple threads calling ray.get
    
    * Multithreaded ray.wait
    
    * Pass in current task ID in java backend
    
    * Add multithreaded actor to tests, add warning messages to worker for multithreaded ray.get
    
    * Fix test
    
    * Some cleanups
    
    * Improve error message
    
    * Add assertion
    
    * Cleanup, throw error in HandleTaskUnblocked if task not actually blocked
    
    * lint
    
    * Fix python worker reset
    
    * Fix references to reconstruct_objects
    
    * Linting
    
    * java lint
    
    * Fix java
    
    * Fix iterator
    stephanie-wang authored Nov 8, 2018
    Configuration menu
    Copy the full SHA
    d950e92 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    9b27941 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    8894883 View commit details
    Browse the repository at this point in the history

Commits on Nov 9, 2018

  1. [autoscaler] Add option to allow private ips only (ray-project#3270)

    * merge
    
    * update
    
    * upd
    
    * Update python/ray/autoscaler/autoscaler.py
    
    Co-Authored-By: ericl <[email protected]>
    
    * Update python/ray/autoscaler/autoscaler.py
    
    Co-Authored-By: ericl <[email protected]>
    
    * Update python/ray/autoscaler/aws/config.py
    
    Co-Authored-By: ericl <[email protected]>
    
    * fix
    ericl authored and pcmoritz committed Nov 9, 2018
    Configuration menu
    Copy the full SHA
    588705b View commit details
    Browse the repository at this point in the history
  2. [tune] Annotated Example Page and showcase Tutorials (ray-project#3267)

    Adds an example page and link in codebase.
    
    Closes ray-project#2728.
    richardliaw authored Nov 9, 2018
    Configuration menu
    Copy the full SHA
    22113be View commit details
    Browse the repository at this point in the history
  3. [rllib] rollout.py should reduce num workers (ray-project#3263)

    ## What do these changes do?
    
    Don't create an excessive amount of workers for rollout.py, and also fix up the env wrapping to be consistent with the internal agent wrapper.
    
    ## Related issue number
    
    Closes ray-project#3260.
    ericl authored and richardliaw committed Nov 9, 2018
    Configuration menu
    Copy the full SHA
    9dd3eed View commit details
    Browse the repository at this point in the history

Commits on Nov 10, 2018

  1. [autoscaler] missing example-full.yaml file in the latest wheel for p…

    …rovider type "local"
    ericl authored Nov 10, 2018
    Configuration menu
    Copy the full SHA
    a51d618 View commit details
    Browse the repository at this point in the history
  2. [tune] Support "None" for upload_dir

    richardliaw authored and ericl committed Nov 10, 2018
    Configuration menu
    Copy the full SHA
    29c182d View commit details
    Browse the repository at this point in the history
  3. Speed up task dispatch. (ray-project#3234)

    * speed up task dispatch
    
    * minor changes
    
    * improved comments
    
    * improved comments
    
    * change argument of DispatchTasks to list of tasks
    
    * dispatch only tasks whose dependencies have been fullfiled
    
    * some updated comments
    
    * refactored DispatchQueue() and Assigntask() to avoid the copy of the ready list
    
    * minor fixes
    
    * some more minor fixes
    
    * some more minor fixes
    
    * added more comments
    
    * better comments?
    
    * fixed all feedback comments, minus making the argument of AssignTask() const
    
    * Assigntask() now taskes a const argument
    
    * Do the task copy outside of the callback
    
    * fix linting
    istoica authored and pcmoritz committed Nov 10, 2018
    Configuration menu
    Copy the full SHA
    d681893 View commit details
    Browse the repository at this point in the history

Commits on Nov 11, 2018

  1. Configuration menu
    Copy the full SHA
    53489d2 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    463511f View commit details
    Browse the repository at this point in the history
  3. [rllib] Don't reset envs when possible (ray-project#3290)

    * laz
    
    * better errors
    ericl authored Nov 11, 2018
    Configuration menu
    Copy the full SHA
    49e2085 View commit details
    Browse the repository at this point in the history

Commits on Nov 12, 2018

  1. [tune] Fix default handling for timesteps (ray-project#3293)

    This PR fixes an issue where previously if timesteps_this_iter = 0,
    then it would render as "None".
    
    Closes ray-project#3057.
    richardliaw authored Nov 12, 2018
    Configuration menu
    Copy the full SHA
    e37891d View commit details
    Browse the repository at this point in the history

Commits on Nov 13, 2018

  1. Configuration menu
    Copy the full SHA
    bd0dbde View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ce6e01b View commit details
    Browse the repository at this point in the history
  3. [rllib] Add self-supervised loss to model (ray-project#3291)

    # What do these changes do?
    
    Allow self-supervised losses to be easily defined in custom models. Add this to the reference policy graphs.
    ericl authored and richardliaw committed Nov 13, 2018
    Configuration menu
    Copy the full SHA
    d90f365 View commit details
    Browse the repository at this point in the history
  4. [core] Add Global State Test for multi-node setting (ray-project#3239)

    * add test for adding node
    
    * multinode test fixes
    
    * First pass at allowing updatable values
    
    * Fix compilation issues
    
    * Add config file parsing
    
    * Full initialization
    
    * Wrote a good test
    
    * configuration parsing and stuff
    
    * docs
    
    * write some tests, make it good
    
    * fixed init
    
    * Add all config options and bring back stress tests.
    
    * Update python/ray/worker.py
    
    * Update python/ray/worker.py
    
    * Fix internalization
    
    * some last changes
    
    * Linting and Java fix
    
    * add docstring
    
    * Fix test, add assertions
    
    * pytest ext
    
    * lint
    
    * lint
    richardliaw authored and pschafhalter committed Nov 13, 2018
    Configuration menu
    Copy the full SHA
    c0423db View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    6ee7a3b View commit details
    Browse the repository at this point in the history
  6. [tune] Doc: Autofilled, StatusReporter (ray-project#3294)

    * autofill and revise doc page for things
    
    * lint
    
    * comments
    richardliaw authored and ericl committed Nov 13, 2018
    Configuration menu
    Copy the full SHA
    c3a2c7e View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    97f4237 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    d4fad22 View commit details
    Browse the repository at this point in the history

Commits on Nov 14, 2018

  1. Configuration menu
    Copy the full SHA
    65c27c7 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    9d4847a View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    577c1dd View commit details
    Browse the repository at this point in the history
  4. Kill actor child processes on shutdown (ray-project#3297)

    * example
    
    * add env
    
    * test pg
    
    * change to test
    
    * add atexit test
    
    * Update rllib-env.rst
    
    * comment
    
    * revert unnecessary file
    
    * fix title when actor is idle
    
    * Update python/ray/actor.py
    
    Co-Authored-By: ericl <[email protected]>
    ericl authored Nov 14, 2018
    Configuration menu
    Copy the full SHA
    1660c9d View commit details
    Browse the repository at this point in the history
  5. KL Divergence Metrics (ray-project#3300)

    * added KL divergence metrics
    
    * fix
    andrewztan authored and ericl committed Nov 14, 2018
    Configuration menu
    Copy the full SHA
    57c7b42 View commit details
    Browse the repository at this point in the history
  6. [rllib] Add test for multi-agent support and fix IMPALA multi-agent (r…

    …ay-project#3289)
    
    IMPALA support for multiagent was broken since IMPALA has a requirement that batch sizes be of a certain length. However multi-agent envs can create variable-length batches.
    
    Fix this by adding zero-padding as needed (similar to the RNN case).
    ericl authored Nov 14, 2018
    Configuration menu
    Copy the full SHA
    706dc1d View commit details
    Browse the repository at this point in the history

Commits on Nov 15, 2018

  1. Update redis version in setup.py (ray-project#3333)

    * `redis` has released a new version (https://github.com/andymccurdy/redis-py/releases/tag/3.0.0)
    * `ray` is not compatible with this version
    * This PR adds the "compatible release" operator for `redis` version 2.10.6.
    lewisbelcher authored and robertnishihara committed Nov 15, 2018
    Configuration menu
    Copy the full SHA
    5319fd0 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    b6a12d1 View commit details
    Browse the repository at this point in the history
  3. Raise exception if the node is nearly out of memory (ray-project#3323)

    * wip
    
    * add
    
    * comment
    
    * escape hatch
    
    * update
    
    * object store too
    
    * .2
    ericl authored and pcmoritz committed Nov 15, 2018
    Configuration menu
    Copy the full SHA
    5723291 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    1be1455 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    98edf75 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    d10cb57 View commit details
    Browse the repository at this point in the history

Commits on Nov 16, 2018

  1. Add debug string to raylet (ray-project#3317)

    * initial debug string
    
    * format
    
    * wip debug string
    
    * fix compile
    
    * fix
    
    * update
    
    * finished
    
    * to file
    
    * logs dir
    
    * use temp root
    
    * fix
    
    * override
    ericl authored and pcmoritz committed Nov 16, 2018
    Configuration menu
    Copy the full SHA
    e0bf9d7 View commit details
    Browse the repository at this point in the history
  2. Don't unsubscribe dependencies for infeasible tasks. (ray-project#3338)

    * Make scheduling queues RemoveTasks return task states as well.
    
    * Add test
    
    * Don't unsubscribe for infeasible tasks when spilling over.
    
    * Linting
    
    * Address comments.
    robertnishihara authored and pcmoritz committed Nov 16, 2018
    Configuration menu
    Copy the full SHA
    60b22d9 View commit details
    Browse the repository at this point in the history

Commits on Nov 17, 2018

  1. Configuration menu
    Copy the full SHA
    ab1e0f5 View commit details
    Browse the repository at this point in the history
  2. Suppress duplicate pre-emptive object pushes. (ray-project#3276)

    * Suppress duplicate pre-emptive object pushes.
    
    * Add test.
    
    * Fix linting
    
    * Remove timer and inline recent_pushes_ into local_objects_.
    
    * Improve test.
    
    * Fix
    
    * Fix linting
    
    * Enable retrying pull from same object manager. Randomize object manager.
    
    * Speed up test
    
    * Linting
    
    * Add test.
    
    * Minor
    
    * Lengthen pull timeout and reissue pull every time a new object becomes available.
    
    * Increase pull timeout in test.
    
    * Wait for nodes to start in object manager test.
    
    * Wait longer for nodes to start up in test.
    
    * Small fixes.
    
    * _submit -> _remote
    
    * Change assert to warning.
    robertnishihara authored and pcmoritz committed Nov 17, 2018
    Configuration menu
    Copy the full SHA
    5cbc597 View commit details
    Browse the repository at this point in the history
  3. Update stale example links

    ericl authored Nov 17, 2018
    Configuration menu
    Copy the full SHA
    61e3bbb View commit details
    Browse the repository at this point in the history

Commits on Nov 19, 2018

  1. Configuration menu
    Copy the full SHA
    e4bb5d8 View commit details
    Browse the repository at this point in the history

Commits on Nov 20, 2018

  1. Configuration menu
    Copy the full SHA
    d4dbd27 View commit details
    Browse the repository at this point in the history
  2. Add ordered_set container. (ray-project#3352)

    * Add ordered_set container.
    
    * Fix
    
    * Linting
    
    * Constructors
    
    * Remove O(n) call to list.size().
    
    * Fix.
    
    * Add documentation.
    
    * Add iterators to ordered_set container implementation.
    
    * iterator_type -> iterator
    
    * Make typedefs private
    
    * Add const_iterator
    robertnishihara authored and pcmoritz committed Nov 20, 2018
    Configuration menu
    Copy the full SHA
    f2b5500 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    afc48d7 View commit details
    Browse the repository at this point in the history
  4. [rllib] Set ape-x local exploration to 0, also load explorations befo…

    …re training steps (ray-project#3349)
    
    
    ## What do these changes do?
    
    This should fix high explorations being used after restore / for rollouts.
    
    ## Related issue number
    
    (dev list issue)
    ericl authored and richardliaw committed Nov 20, 2018
    Configuration menu
    Copy the full SHA
    5972c29 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    abdc3b5 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    b0bfd10 View commit details
    Browse the repository at this point in the history
  7. Ready queue refactor to make Dispatching tasks more efficient (ray-pr…

    …oject#3324)
    
    * put queues outside
    
    * working version, still needs to be optimized
    
    * implement round robin
    
    * proper round robin
    
    * fix spillback
    
    * update
    
    * fix
    
    * cleanup
    
    * more cleanups
    
    * fix
    
    * fix
    
    * add documentation
    
    * explanation for hash combiner
    
    * speed it up
    
    * cleanup and linting
    
    * linting
    
    * comments
    
    * Update scheduling_queue.h
    
    * temp commit
    
    * fixes
    
    * update
    
    * fix
    
    * cleanup
    
    * cleanup
    
    * lint
    
    * more prints
    
    * more prints
    
    * increase sleep
    
    * documentation
    
    * sleep
    
    * fix
    
    * fix
    
    * sleep longer
    
    * update
    
    * fix
    
    * fix
    
    * fix
    
    * Add ordered_set container.
    
    * Fix
    
    * Linting
    
    * Constructors
    
    * Remove O(n) call to list.size().
    
    * fixes
    
    * use ordered set
    
    * Fix.
    
    * Add documentation.
    
    * Add iterators to ordered_set container implementation.
    
    * iterator_type -> iterator
    
    * Make typedefs private
    
    * Add const_iterator
    
    * fix
    
    * fix test
    
    * linting
    
    * lint
    
    * update
    
    * add documentation
    
    * linting
    pcmoritz authored Nov 20, 2018
    Configuration menu
    Copy the full SHA
    d3697ce View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    c24d87b View commit details
    Browse the repository at this point in the history
  9. Remove uses of std::list::size (ray-project#3358)

    * worker pool and client conn
    
    * Fix linting
    
    * unordered set
    
    * move
    ericl authored and pcmoritz committed Nov 20, 2018
    Configuration menu
    Copy the full SHA
    686cf20 View commit details
    Browse the repository at this point in the history

Commits on Nov 21, 2018

  1. Configuration menu
    Copy the full SHA
    1a926c9 View commit details
    Browse the repository at this point in the history
  2. Fix failure handling for actor death (ray-project#3359)

    * Broadcast actor death, clean up dummy objects
    
    * Reduce logging and clean up state when failing a task
    
    * lint
    
    * Make actor failure test nicer, reduce node timeout
    stephanie-wang authored Nov 21, 2018
    Configuration menu
    Copy the full SHA
    3e33f6f View commit details
    Browse the repository at this point in the history
  3. [tune] Node Fault Tolerance (ray-project#3238)

    This PR introduces single-node fault tolerance for Tune.
    
    ## Previous behavior:
     - Actors will be restarted without checking if resources are available. This can lead to problems if we lose resources.
    
    ## New behavior:
     - RUNNING trials will be resumed on another node on a best effort basis (meaning they will run if resources available). 
     - If the cluster is saturated, RUNNING trials on that failed node will become PENDING and queued.
     - During recovery, TrialSchedulers and SearchAlgorithms should receive notification of this (via `trial_runner.stop_trial`) so that they don’t wait/block for a trial that isn’t running.
    
    
    Remaining questions:
     -  Should `last_result` be consistent during restore?
    Yes; but not for earlier trials (trials that are yet to be checkpointed).
    
     - Waiting for some PRs to merge first (ray-project#3239)
    
    Closes ray-project#2851.
    richardliaw authored Nov 21, 2018
    Configuration menu
    Copy the full SHA
    784a639 View commit details
    Browse the repository at this point in the history

Commits on Nov 22, 2018

  1. Fix memory leak in lineage cache (ray-project#3366)

    * Move children_ map inside Lineage
    
    * Update lineage_cache.cc
    
    * Test and fixes
    
    * Remove unused
    stephanie-wang authored Nov 22, 2018
    Configuration menu
    Copy the full SHA
    6b32363 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    24bfe8a View commit details
    Browse the repository at this point in the history
  3. When getting a role/profile, catch only exception that indicates the …

    …role/profile already exists, allow others to be raised (ray-project#3383)
    GiliR4t1qbit authored and ericl committed Nov 22, 2018
    Configuration menu
    Copy the full SHA
    b9ae5ed View commit details
    Browse the repository at this point in the history
  4. fix py3 (ray-project#3382)

    ericl authored and pcmoritz committed Nov 22, 2018
    Configuration menu
    Copy the full SHA
    41b6b50 View commit details
    Browse the repository at this point in the history
  5. [rllib] docs for td3 (ray-project#3381)

    * td3 doc
    
    * Update rllib-env.rst
    ericl authored Nov 22, 2018
    Configuration menu
    Copy the full SHA
    8b76bab View commit details
    Browse the repository at this point in the history

Commits on Nov 24, 2018

  1. [rllib] Fix use_lstm option when using custom model with dict space (r…

    …ay-project#3368)
    
    ## What do these changes do?
    
    This passes in the right obs space to the lstm model wrapper, so that it doesn't attempt to un-flatten the already processed dict observation.
    
    ## Related issue number
    
    Closes ray-project#3367
    ericl authored and richardliaw committed Nov 24, 2018
    Configuration menu
    Copy the full SHA
    55fca82 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    18a8dbf View commit details
    Browse the repository at this point in the history

Commits on Nov 25, 2018

  1. Fix incompatibility with most recent version of Redis. (ray-project#3379

    )
    
    * Fix incompatibility with most recent version of Redis.
    
    * Fix
    
    * Fixes.
    robertnishihara authored and devin-petersohn committed Nov 25, 2018
    Configuration menu
    Copy the full SHA
    3856533 View commit details
    Browse the repository at this point in the history
  2. [rllib] Refactor the sampler (ray-project#3387)

    * refactor
    
    * fix test
    
    * add perf test
    
    * Update sampler.py
    ericl authored Nov 25, 2018
    Configuration menu
    Copy the full SHA
    b85e7b4 View commit details
    Browse the repository at this point in the history
  3. UI changes, fix the task timeline and add the object transfer timelin…

    …e to UI. (ray-project#3397)
    
    * Saving
    
    * Fix cmake and remove object/task search boxes.
    
    * Add comment
    robertnishihara authored and pcmoritz committed Nov 25, 2018
    Configuration menu
    Copy the full SHA
    0f0099f View commit details
    Browse the repository at this point in the history

Commits on Nov 27, 2018

  1. Configuration menu
    Copy the full SHA
    aa94d3d View commit details
    Browse the repository at this point in the history
  2. [rllib] PPO doesn't work with fractional num gpus (ray-project#3396)

    * frac ppo
    
    * gpu test
    ericl authored Nov 27, 2018
    Configuration menu
    Copy the full SHA
    e3c088f View commit details
    Browse the repository at this point in the history
  3. Add script for running stress tests. (ray-project#3378)

    * Add script for running stress tests.
    
    * Add an actor tree test where actors die with some probability
    
    * Improve test.
    
    * Small fix
    
    * Update tests.
    
    * Minor change
    robertnishihara authored and pcmoritz committed Nov 27, 2018
    Configuration menu
    Copy the full SHA
    20b8b1d View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    0d56fc1 View commit details
    Browse the repository at this point in the history

Commits on Nov 28, 2018

  1. Configuration menu
    Copy the full SHA
    c2108ca View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f0df97d View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    82863b5 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    139fbf7 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    c46ea2f View commit details
    Browse the repository at this point in the history

Commits on Nov 29, 2018

  1. Configuration menu
    Copy the full SHA
    7e319db View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    fd7e494 View commit details
    Browse the repository at this point in the history
  3. Fault tolerance for actor creation (ray-project#3422)

    * Add regression test
    
    * Request actor creation if no actor location found
    
    * Comments
    
    * Address comments
    
    * Increase test timeout
    
    * Trigger test
    stephanie-wang authored and ericl committed Nov 29, 2018
    Configuration menu
    Copy the full SHA
    48a5935 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    4d2010a View commit details
    Browse the repository at this point in the history
  5. [rllib] Support batch norm layers (ray-project#3369)

    * batch norm
    
    * lint
    
    * fix dqn/ddpg update ops
    
    * bn model
    
    * Update tf_policy_graph.py
    
    * Update multi_gpu_impl.py
    
    * Apply suggestions from code review
    
    Co-Authored-By: ericl <[email protected]>
    ericl authored Nov 29, 2018
    Configuration menu
    Copy the full SHA
    07d8cbf View commit details
    Browse the repository at this point in the history

Commits on Nov 30, 2018

  1. Configuration menu
    Copy the full SHA
    447604a View commit details
    Browse the repository at this point in the history

Commits on Dec 1, 2018

  1. Configuration menu
    Copy the full SHA
    454d3aa View commit details
    Browse the repository at this point in the history
  2. Update readme to contain logo (ray-project#3443)

    * Adding logo to readme
    
    * Updating link
    
    * Add badge
    
    * Addressing comments
    
    * Moving logo
    
    * Change align
    
    * Move image
    devin-petersohn authored and ericl committed Dec 1, 2018
    Configuration menu
    Copy the full SHA
    5751261 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    0603e0b View commit details
    Browse the repository at this point in the history

Commits on Dec 2, 2018

  1. Configuration menu
    Copy the full SHA
    abd37df View commit details
    Browse the repository at this point in the history
  2. Upgrade Arrow to include Plasma TensorFlow Op release fix (ray-projec…

    …t#3448)
    
    This includes a fix so the TensorFlow op releases memory properly (apache/arrow#3061) and the possibility to store arrow data structures in plasma (apache/arrow#2832).
    
    ray-project#3404
    pcmoritz authored and robertnishihara committed Dec 2, 2018
    Configuration menu
    Copy the full SHA
    c5b5cda View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    13c8ce4 View commit details
    Browse the repository at this point in the history
  4. Fix bug in ray.wait (ray-project#3445)

    ray.wait depends on callbacks from the GCS to decide when an object has appeared in the cluster. The raylet crashes if a callback is received for a wait request that has already completed, but this actually can happen, depending on the order of calls. More precisely:
    
    1. Objects A and B are put in the cluster.
    2. Client calls ray.wait([A, B], num_returns=1).
    3. Client subscribes to locations for A and B. Locations are cached for both, so callbacks are posted for each.
    4. Callback for A fires. The wait completes and the request is removed.
    5. Callback for B fires. The wait request no longer exists and raylet crashes.
    stephanie-wang authored and robertnishihara committed Dec 2, 2018
    Configuration menu
    Copy the full SHA
    4abafd7 View commit details
    Browse the repository at this point in the history

Commits on Dec 3, 2018

  1. Configuration menu
    Copy the full SHA
    7abfbfd View commit details
    Browse the repository at this point in the history

Commits on Dec 4, 2018

  1. [rllib] Auto clip actions to Box space range; deprecate squash_to_ran…

    …ge (ray-project#3426)
    
    * fix clip
    
    * tweak wording
    
    * remove squash entirely
    
    * Update rllib-models.rst
    
    * fix argument order
    
    * Apply suggestions from code review
    
    Co-Authored-By: ericl <[email protected]>
    ericl authored Dec 4, 2018
    Configuration menu
    Copy the full SHA
    d820597 View commit details
    Browse the repository at this point in the history
  2. Tweak/exec attach info (ray-project#3447)

    * Add custom cluster name to exec info
    
    * Update submit info to match exec info
    hartikainen authored and ericl committed Dec 4, 2018
    Configuration menu
    Copy the full SHA
    be6567e View commit details
    Browse the repository at this point in the history
  3. [rllib] Allow envs to be auto-registered; add on_train_result callbac…

    …k with curriculum example (ray-project#3451)
    
    * train step and docs
    
    * debug
    
    * doc
    
    * doc
    
    * fix examples
    
    * fix code
    
    * integration test
    
    * fix
    
    * ...
    
    * space
    
    * instance
    
    * Update .travis.yml
    
    * fix test
    ericl authored Dec 4, 2018
    Configuration menu
    Copy the full SHA
    ce355d1 View commit details
    Browse the repository at this point in the history
  4. [tune] Component notification on node failure + Tests (ray-project#3414)

    Changes include:
     - Notify Components on Requeue
     - Slight refactoring of Node Failure handling
     - Better tests
    richardliaw authored Dec 4, 2018
    Configuration menu
    Copy the full SHA
    9d0bd50 View commit details
    Browse the repository at this point in the history

Commits on Dec 5, 2018

  1. Configuration menu
    Copy the full SHA
    93a9d32 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    06f6431 View commit details
    Browse the repository at this point in the history
  3. Add the extra fallback for serialization (ray-project#3468)

    * Add the extra fallback for serialization.
    
    * Better comments & warnings. quotes.
    
    * Update test/runtest.py
    
    Co-Authored-By: suquark <[email protected]>
    
    * Update test/runtest.py
    
    Co-Authored-By: suquark <[email protected]>
    
    * linting
    
    * Don't hijack too much errors.
    
    * simplify the test
    
    * Update runtest.py
    
    * simplify
    suquark authored and pcmoritz committed Dec 5, 2018
    Configuration menu
    Copy the full SHA
    2e6f9be View commit details
    Browse the repository at this point in the history
  4. increase container memory and shm to 20G (ray-project#3475)

    * increase container memory and shm to 20G
    
    * variables are POWERFUL
    shaneknapp authored and atumanov committed Dec 5, 2018
    Configuration menu
    Copy the full SHA
    7a79b7f View commit details
    Browse the repository at this point in the history

Commits on Dec 6, 2018

  1. [rllib] fixes from dogfooding multi-agent (ray-project#3456)

    auto wrap multi-agent dict and tuple spaces by keeping a policy -> preprocessor in the sampler
    add some Q-learning debug stats
    report min, max of custom metrics
    better errors
    ericl authored Dec 6, 2018
    Configuration menu
    Copy the full SHA
    d864f29 View commit details
    Browse the repository at this point in the history
  2. [tune] Deprecate ambiguous function values (use tune.function / tune.…

    …sample_from instead) (ray-project#3457)
    
    * wip
    
    * exclude
    ericl authored Dec 6, 2018
    Configuration menu
    Copy the full SHA
    412aaa5 View commit details
    Browse the repository at this point in the history
  3. Fix failure of test_free_objects_multi_node (ray-project#3481)

    It is possible that `test_free_objects_multi_node` would fail sometimes. If we run this test 20 times, we may found at least one failure.
    
    The cause is that the test is based on function tasks. One raylet may create more than one worker to execute the tasks. So flush operations may be separated to several workers and not clean all the worker objects held by the plasma client.
    
    In this PR, I change function task to actor tasks, which guarantee all the tasks are executed in one worker of a raylet.
    guoyuhong authored and robertnishihara committed Dec 6, 2018
    Configuration menu
    Copy the full SHA
    b9e1977 View commit details
    Browse the repository at this point in the history
  4. [tune/rllib] Use cloudpickle to dump config (ray-project#3462)

    <!--
    Thank you for your contribution!
    
    Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request.
    -->
    
    ## What do these changes do?
    
    JSON Logger now uses cloudpickle to dump the configs as welll, which pkls the functions needed for multi-agent replay.
    
    ## Related issue number
    
    <!-- Are there any issues opened that will be resolved by merging this change? -->
    eugenevinitsky authored and richardliaw committed Dec 6, 2018
    Configuration menu
    Copy the full SHA
    7a7c6e5 View commit details
    Browse the repository at this point in the history

Commits on Dec 7, 2018

  1. Removing the check about the size re: ray-project#3450 (ray-project#3464

    )
    
    * Removing the check about the size re: ray-project#3450
    
    * Addressing comments
    
    * Update services.py
    devin-petersohn authored and pcmoritz committed Dec 7, 2018
    Configuration menu
    Copy the full SHA
    970babf View commit details
    Browse the repository at this point in the history
  2. Experimental asyncio support (ray-project#2015)

    * Init commit for async plasma client
    
    * Create an eventloop model for ray/plasma
    
    * Implement a poll-like selector base on `ray.wait`. Huge improvements.
    
    * Allow choosing workers & selectors
    
    * remove original design
    
    * initial implementation of epoll-like selector for plasma
    
    * Add a param for `worker` used in `PlasmaSelectorEventLoop`
    
    * Allow accepting a `Future` which returns object_id
    
    * Do not need `io.py` anymore
    
    * Create a basic testing model
    
    * fix: `ray.wait` returns tuple of lists
    
    * fix a few bugs
    
    * improving performance & bug fixing
    
    * add test
    
    * several improvements & fixing
    
    * fix relative import
    
    * [async] change code format, remove old files
    
    * [async] Create context wrapper for the eventloop
    
    * [async] fix: context should return a value
    
    * [async] Implement futures grouping
    
    * [async] Fix bugs & replace old functions
    
    * [async] Fix bugs found in tests
    
    * [async] Implement `PlasmaEpoll`
    
    * [async] Make test faster, add tests for epoll
    
    * [async] Fix code format
    
    * [async] Add comments for main code.
    
    * [async] Fix import path.
    
    * [async] Fix test.
    
    * [async] Compatibility.
    
    * [async] less verbose to not annoy the CI.
    
    * [async] Add test for new API
    
    * [async] Allow showing debug info in some of the test.
    
    * [async] Fix test.
    
    * [async] Proper shutdown.
    
    * [async] Lint~
    
    * [async] Move files to experimental and create API
    
    * [async] Use async/await syntax
    
    * [async] Fix names & styles
    
    * [async] comments
    
    * [async] bug fixing & use pytest
    
    * [async] bug fixing & change tests
    
    * [async] use logger
    
    * [async] add tests
    
    * [async] lint
    
    * [async] type checking
    
    * [async] add more tests
    
    * [async] fix bugs on waiting a future while timeout. Add more docs.
    
    * [async] Formal docs.
    
    * [async] Add typing info since these codes are compatible with py3.5+.
    
    * [async] Documents.
    
    * [async] Lint.
    
    * [async] Fix deprecated call.
    
    * [async] Fix deprecated call.
    
    * [async] Implement a more reasonable way for dealing with pending inputs.
    
    * [async] Fix docs
    
    * [async] Lint
    
    * [async] Fix bug: Type for time
    
    * [async] Set our eventloop as the default eventloop so that we can get it through `asyncio.get_event_loop()`.
    
    * [async] Update test & docs.
    
    * [async] Lint.
    
    * [async] Temporarily print more debug info.
    
    * [async] Use `Poll` as a default option.
    
    * [async] Limit resources.
    
    * new async implementation for Ray
    
    * implement linked list
    
    * bug fix
    
    * update
    
    * support seamless async operations
    
    * update
    
    * update API
    
    * fix tests
    
    * lint
    
    * bug fix
    
    * refactor names
    
    * improve doc
    
    * properly shutdown async_api
    
    * doc
    
    * Change the table on the index page.
    
    * Adjust table size.
    
    * Only keeps `as_future`.
    
    * change how we init connection
    
    * init connection in `ray.worker.connect`
    
    * doc
    
    * fix
    
    * Move initialization code into the module.
    
    * Fix docs & code
    
    * Update pyarrow version.
    
    * lint
    
    * Restore index.rst
    
    * Add known issues.
    
    * Apply suggestions from code review
    
    Co-Authored-By: suquark <[email protected]>
    
    * rename
    
    * Update async_api.rst
    
    * Update async_api.py
    
    * Update async_api.rst
    
    * Update async_api.py
    
    * Update worker.py
    
    * Update async_api.rst
    
    * fix tests
    
    * lint
    
    * lint
    
    * replace the magic number
    suquark authored and pcmoritz committed Dec 7, 2018
    Configuration menu
    Copy the full SHA
    c2c501b View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    8395523 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    f6490f9 View commit details
    Browse the repository at this point in the history

Commits on Dec 8, 2018

  1. Configuration menu
    Copy the full SHA
    462e6ef View commit details
    Browse the repository at this point in the history

Commits on Dec 9, 2018

  1. Configuration menu
    Copy the full SHA
    8b5827b View commit details
    Browse the repository at this point in the history
  2. [rllib] Multi-GPU support for Multi-Agent PPO (ray-project#3479)

    * wip
    
    * fix
    
    * remove check
    
    * fix null
    
    * revert
    
    * lint and kl
    
    * also fix rollout
    ericl authored Dec 9, 2018
    Configuration menu
    Copy the full SHA
    7aec357 View commit details
    Browse the repository at this point in the history
  3. Add return value for recontruction RPC. (ray-project#3493)

    * Add return value for recontruct RPC.
    
    * Fix comment function name
    guoyuhong authored and ericl committed Dec 9, 2018
    Configuration menu
    Copy the full SHA
    0136af5 View commit details
    Browse the repository at this point in the history
  4. Add option to evict keys LRU from the sharded redis tables (ray-proje…

    …ct#3499)
    
    * wip
    
    * wip
    
    * format
    
    * wip
    
    * note
    
    * lint
    
    * fix
    
    * flag
    
    * typo
    
    * raise timeout
    
    * fix
    
    * optional get
    
    * fix flag
    
    * increase timeout in test
    
    * update docs
    
    * format
    ericl authored and pcmoritz committed Dec 9, 2018
    Configuration menu
    Copy the full SHA
    cffe8f9 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    ce606a9 View commit details
    Browse the repository at this point in the history