Skip to content

Commit

Permalink
Update Readme and Minor Fixes (#256)
Browse files Browse the repository at this point in the history
  • Loading branch information
iesahin authored Jul 15, 2024
1 parent 3a0735d commit fab1536
Show file tree
Hide file tree
Showing 56 changed files with 615 additions and 1,021 deletions.
62 changes: 34 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,14 @@ $ cargo install xvc

[installed]: https://www.rust-lang.org/tools/install

If you want to use Xvc with Python console and Jupyter notebooks, you can also install it with `pip`:

```shell
$ pip install xvc
```

Note that pip installation doesn't make `xvc` available as a shell command. Please see [xvc.py](https://github.com/iesahin/xvc.py) for usage details.

## 🏃🏾 Quicktart

Xvc seamlessly monitors your files and directories on top of Git. To commence, execute the following command within the repository:
Expand All @@ -54,7 +62,7 @@ Include your data files and directories for tracking:
$ xvc file track my-data/ --as symlink
```

This command calculates content hashes for data (using BLAKE-3, by default) and logs them. The changes are committed to Git, and the files are copied to content-addressed directories within `.xvc/b3`. Additionally, read-only symbolic links to these directories are created.
This command calculates content hashes for data (using BLAKE-3, by default) and logs them. The changes are committed to Git, and the files are copied to content-addressed directories within `.xvc/b3`. Additionally, read-only symbolic links to these directories are created.

You can specify different [recheck (checkout) methods](https://docs.xvc.dev/ref/xvc-file-recheck/) for files and directories, depending on your use case.
If you need to track model files that change frequently, you can set recheck method `--as copy` (the default).
Expand All @@ -66,13 +74,13 @@ $ xvc file track my-models/ --as copy
Configure a cloud storage to share the files you added.

```shell
$ xvc storage new s3 --name my-remote --region us-east-1 --bucket-name my-xvc-remote
$ xvc storage new s3 --name my-storage --region us-east-1 --bucket-name my-xvc-remote
```

You can send the files to this storage.

```shell
$ xvc file send --to my-remote
$ xvc file send --to my-storage
```

When you (or someone else) want to access these files later, you can clone the Git repository and get the files from the
Expand All @@ -83,7 +91,7 @@ $ git clone https://example.com/my-machine-learning-project
Cloning into 'my-machine-learning-project'...

$ cd my-machine-learning-project
$ xvc file bring my-data/ --from my-remote
$ xvc file bring my-data/ --from my-storage

```

Expand All @@ -103,15 +111,15 @@ The script uses the Faker library and this library must be available where you r
$ xvc pipeline step new --step-name install-deps --command 'python3 -m pip install --quiet --user -r requirements.txt'
```

We'll make this this step to depend on `requirements.txt` file, so when the file changes it will make the step run.
We'll make this this step to depend on `requirements.txt` file, so when the file changes it will make the step run.

```console
$ xvc pipeline step dependency --step-name install-deps --file requirements.txt
```

Xvc allows to create dependencies between pipeline steps. Dependent steps wait for dependencies to finish successfully.
Xvc allows to create dependencies between pipeline steps. Dependent steps wait for dependencies to finish successfully.

Now we create a step to run the script and make `install-deps` step a dependency of it.
Now we create a step to run the script and make `install-deps` step a dependency of it.

```console
$ xvc pipeline step new --step-name generate-data --command 'python3 generate_data.py'
Expand All @@ -124,40 +132,39 @@ After you define the pipeline, you can run it by:
$ xvc pipeline run
[DONE] install-deps (python3 -m pip install --quiet --user -r requirements.txt)
[OUT] [generate-data] CSV file generated successfully.

[DONE] generate-data (python3 generate_data.py)

```

Xvc allows many kinds of dependnecies, like [files](https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#file-dependencies),
[groups of files and directories defined by globs](https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#glob-dependencies),
[regular expression searches in files](https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#regex-dependencies),
[line ranges in files](https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#line-dependencies),
Xvc allows many kinds of dependnecies, like [files](https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#file-dependencies),
[groups of files and directories defined by globs](https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#glob-dependencies),
[regular expression searches in files](https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#regex-dependencies),
[line ranges in files](https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#line-dependencies),
[hyper-parameters defined in YAML, JSON or TOML files](https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#hyper-parameter-dependencies)
[HTTP URLs](https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#url-dependencies),
[shell command outputs](https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#generic-command-dependencies),
and [other steps](https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#step-dependencies).
[shell command outputs](https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#generic-command-dependencies),
and [other steps](https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#step-dependencies).

Suppose you're only interested in the IQ scores of those with _Dr._ in front of their names and how they differ from the rest in the dataset we created. Let's create a regex search dependency to the data file that will show all _doctors_ IQ scores.

```console
$ xvc pipeline step new --step-name dr-iq --command 'echo "${XVC_REGEX_ADDED_ITEMS}" >> dr-iq-scores.csv '
$ xvc pipeline step new --step-name dr-iq --command 'echo "${XVC_ADDED_REGEX_ITEMS}" >> dr-iq-scores.csv '
$ xvc pipeline step dependency --step-name dr-iq --regex-items 'random_names_iq_scores.csv:/^Dr\..*'
```

The first line specifies a command, when run writes `${XVC_REGEX_ADDED_ITEMS}` environment variable to `dr-iq-scores.csv` file.
The second line specifies the dependency which will also populate the `$[XVC_REGEX_ADDED_ITEMS]` environment variable in the command.
The first line specifies a command, when run writes `${XVC_ADDED_REGEX_ITEMS}` environment variable to `dr-iq-scores.csv` file.
The second line specifies the dependency which will also populate the `$[XVC_ADDED_REGEX_ITEMS]` environment variable in the command.

Some dependency types like [regex items],
Some dependency types like [regex items],
[line items] and [glob items] inject environment variables in the commands they are a dependency.
For example, if you have two million files specified with a glob, but want to run a script only on the added files after the last run, you can use these environment variables.

For example, if you have two million files specified with a glob, but want to run a script only on the added files after the last run, you can use these environment variables.

When you run the pipeline again, a file named `dr-iq-scores.csv` will be created. Note that, as `requirements.txt` didn't change `install-deps` step and its dependent `generate-data` steps didn't run.

```console
$ xvc pipeline run
[DONE] dr-iq (echo "${XVC_REGEX_ADDED_ITEMS}" >> dr-iq-scores.csv )
[DONE] dr-iq (echo "${XVC_ADDED_REGEX_ITEMS}" >> dr-iq-scores.csv )

$ cat dr-iq-scores.csv
Dr. Brian Shaffer,122
Expand All @@ -166,15 +173,15 @@ Dr. Mallory Payne MD,70
Dr. Sherry Leonard,93
Dr. Susan Swanson,81

````
```

We are using this feature to get lines starting with `Dr.` from the file and write them to another file. When the file changes, e.g. another record matching the dependency regex added to the `random_names_iq_scores.csv` file, it will also be added to `dr-iq-scores.csv` file.

```console
$ zsh -cl 'echo "Dr. Albert Einstein,144" >> random_names_iq_scores.csv'

$ xvc pipeline run
[DONE] dr-iq (echo "${XVC_REGEX_ADDED_ITEMS}" >> dr-iq-scores.csv )
[DONE] dr-iq (echo "${XVC_ADDED_REGEX_ITEMS}" >> dr-iq-scores.csv )

$ cat dr-iq-scores.csv
Dr. Brian Shaffer,122
Expand Down Expand Up @@ -285,7 +292,7 @@ $ cat my-pipeline.json
"outputs": []
},
{
"command": "echo /"${XVC_REGEX_ADDED_ITEMS}/" >> dr-iq-scores.csv ",
"command": "echo /"${XVC_ADDED_REGEX_ITEMS}/" >> dr-iq-scores.csv ",
"dependencies": [
{
"RegexItems": {
Expand Down Expand Up @@ -347,8 +354,7 @@ You can edit the file to change commands, add new dependencies, etc. and import
$ xvc pipeline import --file my-pipeline.json --overwrite
```

Lastly, if you noticed that the commands are long to type, there is an `xvc aliases` command that prints a set of aliases for commands. You can source the output in your `.zshrc` or `.bashrc`, and use the following commands instead, e.g., `xvc pipelines run` becomes `pvc run`.

Lastly, if you noticed that the commands are long to type, there is an `xvc aliases` command that prints a set of aliases for commands. You can source the output in your `.zshrc` or `.bashrc`, and use the following commands instead, e.g., `xvc pipelines run` becomes `pvc run`.

```console
$ xvc aliases
Expand Down Expand Up @@ -450,7 +456,7 @@ And, biggest thanks to Rust designers, developers and contributors. Although I c
- Star this repo. I feel very happy for every star and send my best wishes to you. That's a certain win to spend your two seconds for me. Thanks.
- Use xvc. Tell me how it works for you, read the [documentation](https://docs.xvc.dev), [report bugs](https://github.com/iesahin/xvc/issues), [discuss features](https://github.com/iesahin/xvc/discussions).
- Please note that, I don't accept large code PRs. Please open an issue to discuss your idea and write/modify a
reference page before sending a PR. I'm happy to discuss and help you to implement your idea. Also, it may require a copyright transfer to me, as there may be cases which I provide the code in other licenses.
reference page before sending a PR. I'm happy to discuss and help you to implement your idea. Also, it may require a copyright transfer to me, as there may be cases which I provide the code in other licenses.

## 📜 License

Expand All @@ -461,7 +467,7 @@ Xvc is licensed under the [GNU GPL 3.0 License](https://github.com/iesahin/xvc/b
I'm using Xvc daily and I'm happy with it. Tracking all my files with Git via arbitrary servers and cloud providers is
something I always need. I'm happy to improve and maintain it as long as I use it.

Given that I'm working on this for the last two years for pure technical bliss, you can expect me to work on it more.
Given that I'm working on this for the last two years for pure technical bliss, you can expect me to work on it more.

## ⚠️ Disclaimer

Expand Down
2 changes: 1 addition & 1 deletion book/src/ref/xvc-file-send.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Arguments:
[TARGETS]... Targets to send/push/upload to storage

Options:
-r, --remote <REMOTE> Storage name or guid to send the files
-r, --storage <REMOTE> Storage name or guid to send the files
--force Force even if the files are already present in the storage
-h, --help Print help

Expand Down
4 changes: 2 additions & 2 deletions book/src/ref/xvc-file-share.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@
$ xvc file share --help
Share a file from S3 compatible storage for a limited time

Usage: xvc file share [OPTIONS] --remote <REMOTE> <TARGET>
Usage: xvc file share [OPTIONS] --storage <REMOTE> <TARGET>

Arguments:
<TARGET> File to send/push/upload to storage

Options:
-r, --remote <REMOTE> Storage name or guid to send the files
-r, --storage <REMOTE> Storage name or guid to send the files
-d, --duration <DURATION> Period to send the files to. You can use s, m, h, d, w suffixes [default: 24h]
-h, --help Print help

Expand Down
10 changes: 5 additions & 5 deletions book/src/ref/xvc-pipeline-step-dependency-glob-items.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Unline glob dependency, glob items dependency keeps track of the individual file
command run with the list of files from a glob and you want to track added and removed files, use this. Otherwise if
your command for all the files in a glob and don't need to track which files have changed, use the glob dependency.

This one injects `${XVC_GLOB_ADDED_ITEMS}`, `${XVC_GLOB_REMOVED_ITEMS}`, `${XVC_GLOB_CHANGED_ITEMS}` and `${XVC_GLOB_ALL_ITEMS}` to the command
This one injects `${XVC_ADDED_GLOB_ITEMS}`, `${XVC_REMOVED_GLOB_ITEMS}`, `${XVC_CHANGED_GLOB_ITEMS}` and `${XVC_ALL_GLOB_ITEMS}` to the command
environment.

This command works only in Xvc repositories.
Expand Down Expand Up @@ -41,7 +41,7 @@ $ tree
Add a step to list the added files.

```console
$ xvc pipeline step new --step-name files-changed --command 'echo "### Added Files:\n${XVC_GLOB_ADDED_ITEMS}\n### Removed Files:\n${XVC_GLOB_REMOVED_ITEMS}\n### Changed Files:\n${XVC_GLOB_CHANGED_ITEMS}"'
$ xvc pipeline step new --step-name files-changed --command 'echo "### Added Files:\n${XVC_ADDED_GLOB_ITEMS}\n### Removed Files:\n${XVC_REMOVED_GLOB_ITEMS}\n### Changed Files:\n${XVC_CHANGED_GLOB_ITEMS}"'

$ xvc pipeline step dependency --step-name files-changed --glob-items 'dir-*/*'

Expand All @@ -63,7 +63,7 @@ dir-0002/file-0003.bin
### Changed Files:

[DONE] files-changed (echo "### Added Files:/n${XVC_GLOB_ADDED_ITEMS}/n### Removed Files:/n${XVC_GLOB_REMOVED_ITEMS}/n### Changed Files:/n${XVC_GLOB_CHANGED_ITEMS}")
[DONE] files-changed (echo "### Added Files:/n${XVC_ADDED_GLOB_ITEMS}/n### Removed Files:/n${XVC_REMOVED_GLOB_ITEMS}/n### Changed Files:/n${XVC_CHANGED_GLOB_ITEMS}")

$ xvc pipeline run

Expand All @@ -82,7 +82,7 @@ dir-0001/file-0001.bin
### Changed Files:

[DONE] files-changed (echo "### Added Files:/n${XVC_GLOB_ADDED_ITEMS}/n### Removed Files:/n${XVC_GLOB_REMOVED_ITEMS}/n### Changed Files:/n${XVC_GLOB_CHANGED_ITEMS}")
[DONE] files-changed (echo "### Added Files:/n${XVC_ADDED_GLOB_ITEMS}/n### Removed Files:/n${XVC_REMOVED_GLOB_ITEMS}/n### Changed Files:/n${XVC_CHANGED_GLOB_ITEMS}")

```

Expand All @@ -99,6 +99,6 @@ $ xvc pipeline run
### Changed Files:
dir-0001/file-0002.bin
[DONE] files-changed (echo "### Added Files:/n${XVC_GLOB_ADDED_ITEMS}/n### Removed Files:/n${XVC_GLOB_REMOVED_ITEMS}/n### Changed Files:/n${XVC_GLOB_CHANGED_ITEMS}")
[DONE] files-changed (echo "### Added Files:/n${XVC_ADDED_GLOB_ITEMS}/n### Removed Files:/n${XVC_REMOVED_GLOB_ITEMS}/n### Changed Files:/n${XVC_CHANGED_GLOB_ITEMS}")

```
10 changes: 5 additions & 5 deletions book/src/ref/xvc-pipeline-step-dependency-line-items.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ You can make your steps to depend on lines of text files. The lines are defined
When the text in those lines change, the step is invalidated.

Unlike line dependencies, this dependency type keeps track of the lines in the
file. You can use `${XVC_LINE_ALL_ITEMS}`, `${XVC_LINE_ADDED_ITEMS}`, and
`${XVC_LINE_REMOVED_ITEMS}` environment variables in the command. Please be
file. You can use `${XVC_ALL_LINE_ITEMS}`, `${XVC_ADDED_LINE_ITEMS}`, and
`${XVC_REMOVED_LINE_ITEMS}` environment variables in the command. Please be
aware that for large set of lines, this dependency can take up considerable
space to keep track of all lines and if you don't need to keep track of changed
lines, you can use `--lines` dependency.
Expand Down Expand Up @@ -49,7 +49,7 @@ $ cat people.csv
Let's a step to show the first 10 lines of the file:

```console
$ xvc pipeline step new --step-name print-top-10 --command 'echo "Added Lines:\n ${XVC_LINE_ADDED_ITEMS}\nRemoved Lines:\n${XVC_LINE_REMOVED_ITEMS}"'
$ xvc pipeline step new --step-name print-top-10 --command 'echo "Added Lines:\n ${XVC_ADDED_LINE_ITEMS}\nRemoved Lines:\n${XVC_REMOVED_LINE_ITEMS}"'

```

Expand Down Expand Up @@ -77,7 +77,7 @@ $ xvc pipeline run
Removed Lines:

[DONE] print-top-10 (echo "Added Lines:/n ${XVC_LINE_ADDED_ITEMS}/nRemoved Lines:/n${XVC_LINE_REMOVED_ITEMS}")
[DONE] print-top-10 (echo "Added Lines:/n ${XVC_ADDED_LINE_ITEMS}/nRemoved Lines:/n${XVC_REMOVED_LINE_ITEMS}")

``````

Expand All @@ -104,7 +104,7 @@ $ xvc pipeline run
Removed Lines:
"Hank", "M", 30, 71, 158
[DONE] print-top-10 (echo "Added Lines:/n ${XVC_LINE_ADDED_ITEMS}/nRemoved Lines:/n${XVC_LINE_REMOVED_ITEMS}")
[DONE] print-top-10 (echo "Added Lines:/n ${XVC_ADDED_LINE_ITEMS}/nRemoved Lines:/n${XVC_REMOVED_LINE_ITEMS}")
```

12 changes: 6 additions & 6 deletions book/src/ref/xvc-pipeline-step-dependency-regex-items.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ You can specify a regular expression matched against the lines from a file as a
the matched results changed.

Unlike regex dependencies, regex item dependencies keep track of the matched items. You can access them with
`${XVC_REGEX_ALL_ITEMS}`, `${XVC_REGEX_ADDED_ITEMS}`, and `${XVC_REGEX_REMOVED_ITEMS}` environment variables.
`${XVC_ALL_REGEX_ITEMS}`, `${XVC_ADDED_REGEX_ITEMS}`, and `${XVC_REMOVED_REGEX_ITEMS}` environment variables.

This command works only in Xvc repositories.

Expand Down Expand Up @@ -44,8 +44,8 @@ $ cat people.csv
Now, let's add steps to the pipeline to count males and females in the file:

```console
$ xvc pipeline step new --step-name new-males --command 'echo "New Males:\n ${XVC_REGEX_ADDED_ITEMS}"'
$ xvc pipeline step new --step-name new-females --command 'echo "New Females:\n ${XVC_REGEX_ADDED_ITEMS}"'
$ xvc pipeline step new --step-name new-males --command 'echo "New Males:\n ${XVC_ADDED_REGEX_ITEMS}"'
$ xvc pipeline step new --step-name new-females --command 'echo "New Females:\n ${XVC_ADDED_REGEX_ITEMS}"'
$ xvc pipeline step dependency --step-name new-females --step new-males
```

Expand Down Expand Up @@ -77,7 +77,7 @@ $ xvc pipeline run
"Omar", "M", 38, 70, 145
"Quin", "M", 29, 71, 176
[DONE] new-males (echo "New Males:/n ${XVC_REGEX_ADDED_ITEMS}")
[DONE] new-males (echo "New Males:/n ${XVC_ADDED_REGEX_ITEMS}")
[OUT] [new-females] New Females:
"Elly", "F", 30, 66, 124
"Fran", "F", 33, 66, 115
Expand All @@ -87,7 +87,7 @@ $ xvc pipeline run
"Page", "F", 31, 67, 135
"Ruth", "F", 28, 65, 131
[DONE] new-females (echo "New Females:/n ${XVC_REGEX_ADDED_ITEMS}")
[DONE] new-females (echo "New Females:/n ${XVC_ADDED_REGEX_ITEMS}")

``````

Expand Down Expand Up @@ -130,6 +130,6 @@ $ xvc pipeline run
[OUT] [new-females] New Females:
"Asude", "F", 12, 55, 110
[DONE] new-females (echo "New Females:/n ${XVC_REGEX_ADDED_ITEMS}")
[DONE] new-females (echo "New Females:/n ${XVC_ADDED_REGEX_ITEMS}")

```
6 changes: 3 additions & 3 deletions config/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "xvc-config"
version = "0.6.7"
version = "0.6.8"
edition = "2021"
description = "Xvc configuration management"
authors = ["Emre Şahin <[email protected]>"]
Expand All @@ -20,8 +20,8 @@ debug = true


[dependencies]
xvc-logging = { version = "0.6.7", path = "../logging" }
xvc-walker = { version = "0.6.7", path = "../walker" }
xvc-logging = { version = "0.6.8", path = "../logging" }
xvc-walker = { version = "0.6.8", path = "../walker" }


## Cli and config
Expand Down
Loading

0 comments on commit fab1536

Please sign in to comment.