-
Notifications
You must be signed in to change notification settings - Fork 15
/
3-2-version-control.qmd
251 lines (152 loc) · 15.8 KB
/
3-2-version-control.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
## Version control
Version control systems (VCS) are essential for managing intricate research projects, tracking changes in code and text systematically. Distributed version control systems (DVCS) like Git preserve different versions of content, offering significant advantages over local or centralized VCS. In Git, every user possesses a complete repository copy, ensuring redundancy and flexibility. This decentralized approach enhances collaboration, enabling multiple contributors to work independently on the same project, ensuring data integrity and maintaining the project's reproducibility. Git's flexibility and redundancy make it an ideal choice for researchers and developers, facilitating organization and preserving the integrity of their work.
- **Reproducible Collaboration**: Git enables collaboration among multiple contributors, ensuring that changes are tracked, documented, and reproducible, promoting transparency and teamwork
- **Traceable Versioning**: Git provides a detailed version history, allowing users to trace back every change made to the project, ensuring transparency and facilitating the reproduction of specific project states
- **Error Reversal**: Git allows for reversal of errors, enabling researchers to quickly identify and correct mistakes, ensuring that the project remains consistent and reproducible
- **Attributable Contributions**: Git records the contributions made by each team member, attributing specific changes to individuals. This transparency aids in understanding who contributed what, ensuring accountability and facilitating collaboration
- **Documentation and Context**: Git commits contain detailed messages, providing context for changes made. This documentation enhances the reproducibility of the project by allowing others to understand the reasoning behind specific alterations
### What is version control?
Let's say you are writing a paper. You will edit your paper and might want to keep different versions of it. A common way to handle that is by using different file names for different versions.
::: captioned-image-container
![http://www.phdcomics.com/comics/archive/phd101212s.gif](http://www.phdcomics.com/comics/archive/phd101212s.gif){fig-alt="http://www.phdcomics.com/comics/archive/phd101212s.gif"}
:::
This way of "version control" is outdated and error-prone. The most common proper version control system today is [Git](https://git-scm.com/), which I'd like to introduce to you now.
### Git for version control
{{< video https://youtu.be/HMqUFlu0gFc?si=bRwXodcwrKhHxLlA >}}
Git is free and open source 😃🙌.
With Git you can track different versions of your paper. For each version you can add a description ("commit message") and you even automatically track who made which change if you are working in a group. You can always go back to old versions.
The way you work with Git is that you have the version database both on our computers and on a server. To get the changes from and to the server you use commands (`pull` = download stuff from server, `push` = upload stuff to server).
::: captioned-image-container
![](images/vc-server.jpg){fig-alt="Reproducibility scale: a scale showing **not reproducible** on one end and **reproducible** on the other end."}
:::
Most researchers use GitLab or GitHub as platforms for working with Git and they also serve as a neat front end for the server. GitLab and GitHub give us some extra neat features for collaboration (e.g. issues, Wiki, ...).
Learning Git can be daunting 🙀. Historically it was developed by software engineers for other software engineers.
That makes it particularly useful for us when writing code, but it can also feel a bit too nerdy in the beginning.
I promise though, once you've got the hang of it, you'll love it!
### The difference between Git, GitHub and GitLab
- **Git**: A distributed version control system enabling local tracking of code changes, allowing collaboration and version management.
- **GitHub**: A web-based platform hosting Git repositories, offering collaborative features like pull requests, issues, and wikis. It provides tools for team collaboration and project management.
- **GitLab**: A Git repository manager providing source code management, CI/CD pipelines, issue tracking, and more. GitLab can be self-hosted or used as a cloud-based service, giving users control over their infrastructure and development process.
You can use GitHub or GitLab alongside Git when you need centralized collaboration, project management, community engagement, remote backup, and automated continuous integration and deployment workflows. But you can also use the command line tool Git independently if you only work locally. In this example, we use Git together with GitHub. However, the process would also be very similar if you use GitLab.
### Install and work with git and GitHub (Video Tutorial)
{{< video https://youtu.be/Y4VwIEVfZsY >}}
### Install Git on your computer 💻
Depending on the system you are working on, there are different ways to install Git.
::: {.panel-tabset}
## Linux
For the latest stable version for Linux Ubuntu run this command in your terminal:
```bash
apt-get install git
```
## Mac
Usually on Mac Git is already installed by default. To see if it's already there you
can open your terminal and type:
```bash
git --version
```
The Output should show you a version of Git.
If not you can use homebrew to install it in your terminal.
Install [homebrew](https://brew.sh/) if you don't already have it, then type:
```bash
brew install git
```
## Windows
If you are working on Windows, I recommend installing Git Bash. This provides you with a terminal where you can work similarly to the terminal on Mac and Linux, using the same commands. Git Bash also automatically installs Git for you. You can download Git for Windows [here](https://gitforwindows.org/).
You can visit [git-scm.com](https://git-scm.com/downloads) to get more information on how you can download Git for your system.
:::
### Create a new Repository on GitHub
**Sign in to GitHub:**
Open your browser, go to [GitHub](https://github.com/), and sign in to your GitHub account. If you don't have an account, you can create one by clicking on the "Sign up for GitHub" button.
**Create a New Repository:**
In the upper-right corner of any page, click on the `+` icon and select "New repository."
Alternatively, on your GitHub profile page, click on the "Repositories" tab, and then click the green "New" button.
**Fill out the Repository Information:**
- Enter a **Repository name**: Choose a name for your repository. This name will be part of your repository's URL.
- Enter a **Description**: Optionally, you can add a description of your repository.
- Choose **Public** or **Private**: Decide whether you want your repository to be public (visible to everyone) or private (accessible only to people you specify).
- Initialize this repository with a **README**: Check this option to create a README file for your repository. A README file provides information about your project. Including a README.md file in your repository provides essential project information such as a brief description, installation instructions, usage guidelines, and contribution guidelines. It acts as a guide for users and contributors, enhancing the understanding and adoption of your project.
- Choose a **.gitignore** file: If your project involves specific programming languages or frameworks, you can select a `.gitignore` template to exclude certain files from version control. The .gitignore file is crucial for excluding specific files or directories from version control. This ensures that sensitive or irrelevant files (like configuration files, logs, or compiled binaries) are not inadvertently shared or committed. It keeps your repository clean, reduces clutter, and avoids potential security risks. **You don't have to it for now.**
- Choose a **license**: If you want to add an open-source license to your repository, you can choose one from the provided options. You don't have to it for now.
**Create the Repository:**
- Click the green "Create repository" button. Your new repository is now created.
> **Note**: Starting from a local repository and pushing it to GitHub later is a common approach. However, it's important to note that for successful pushing, you need to initialize an empty repository on GitHub first to avoid divergence issues when the initial commit history differs between your local repository and the remote repository on GitHub. This ensures seamless syncing and collaboration between local and remote versions of your project.
### Clone a GitHub Repository to your local machine
I will show you how to clone a repository with ssh key. When you are on Github in your repository and select the green button `<> code`, you can choose https or ssh under the cloning points. If you choose https you have to identify yourself later with your GitHub password. The SSH key is another authentication method that is more secure in comparison. If you want to know how to create an SSH key pair, check [here](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent).
**Clone GitHub Repository with SSH URL:**
Open the terminal on your computer.
Use the `git clone` command with the SSH URL of your GitHub repository to clone your repository to your local machine:
```bash
git clone [email protected]:username/repository.git
```
Make sure to replace `username` with your GitHub username and `repository` with your repository name.
Navigate into the newly created directory:
```bash
cd repository
```
**Make Changes:**
Make a change to your README.md file. You can use your favorite editor for this, e.g. Notepad ++, nano, or vim. In the video tutorial you can see how to make the change with the nano editor.
**The command `git status`**
When you run the `git status` command, Git provides you with information about the current state of your repository. Here's what the typical output messages mean:
1. **On branch [branch name]:** Indicates the name of the branch you are currently on. Branches allow you to work on different features or versions of your project simultaneously.
2. **Changes not staged for commit:** Git detects modifications in your working directory that haven't been staged yet. These changes won't be included in the next commit unless you explicitly add them.
3. **Changes to be committed:** Lists the changes that are staged and ready to be included in the next commit. These changes will be part of the commit when you run `git commit`.
4. **Untracked files:** Shows files in your working directory that Git is not tracking. These files are not included in version control until you explicitly add them using `git add`.
5. **Your branch is ahead of 'origin/[branch]' by [number] commits:** Indicates that your local branch has commits that haven't been pushed to the remote repository (origin). It suggests that you might want to push these changes to keep the remote repository up-to-date.
6. **nothing to commit, working tree clean:** This message appears when there are no changes in your working directory or staged changes waiting to be committed. Your repository is in a clean state.
These messages help you understand the status of your repository, indicating which files have been modified, which changes are staged for commit, and if your local branch is ahead or behind the corresponding branch on the remote repository. Understanding `git status` output is essential for effective version control and collaboration.
**Check Status and Stage Changes:**
Check the status of your repository to see which files have been changed:
```bash
git status
```
Your output from `git status` should look something like this:
```bash
> $ git status [±master ●]
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: README.md
no changes added to commit (use "git add" and/or "git commit -a")
```
This message means there are changes in the `README.md` file that have not been staged for commit. You need to use `git add README.md` to stage the changes, and then `git commit -m "Your message"` to commit them.
Stage the changed file:
```bash
git add README.md
```
**Commit Changes:**
Commit the changes with a descriptive message:
```bash
git commit -m "Description of the changes made"
```
Replace `"Description of the changes made"` with a brief description of the changes you made. The `-m` flag in `git commit` is used to add a commit message directly from the command line. When you use `git commit -m "Your message"`, you can include a brief description of the changes you are committing without opening a text editor, making the commit process quicker and more efficient.
**Push Changes to GitHub:**
Push your changes to your GitHub repository:
```bash
git push origin main
```
After this step is successful, your changes are published to your GitHub repository! 🎉
### Most common commands in Git
| Command | Description |
|----------------------|----------------------------------------------------------------------------------|
| `git init` | Initiates a new git repository |
| `git status` | Shows the status of all files in the working directory and the staging area |
| `git add` | Add a new file or changes made to a file to the staging area |
| `git commit` | Creates a new commit that contains all changes from the staging area and adds it to the project's history |
| `git push` | Adds commits from a local repository to a remote repository (e.g. GitHub or GitLab) |
| `git log` | Shows the history of all commits |
| `git diff` | Shows changes made to files since the last commit |
| `git pull` | Adds commits from a remote repository (e.g. GitHub or GitLab) to a local repository |
| `git remote add origin` | Connects a local repository with a remote repository |
| `git clone` | Creates a local copy of a remote repository |
| `git config` | Can be used to show or change general options of a repository |
### Other version control systems
There are many other ways of doing version control out there.
**Subversion:** Simpler systems like Subversion are used less these days as Git offers more flexibility.
**Google docs and friends:** Many online text editors (Google Docs, OneDrive, ...) offer versioning now. It is not as advanced and versatile, but a nice way to work in a [WYSIWYG](https://en.wikipedia.org/wiki/WYSIWYG) (What You See Is What You Get) editor. Git works in its entirety only with plain-text files, such as Markdown, HTML, JSON or LaTeX. For formats that are not plain-text, such as PDF or Word files, some features that git brings with it do not work. You can still commit these files to git, but e.g. the highlighting of line-by-line changes does not work with these files.
**Versioning data:** Version control of data is a difficult task. Let's leave that for another day. See [here](https://the-turing-way.netlify.app/reproducible-research/vcs/vcs-data.html) for more info for now.
### Further reading
- [Version Control](https://the-turing-way.netlify.app/reproducible-research/vcs.html), The Turing Way
- [Version Control with Git](https://swcarpentry.github.io/git-novice/), Software Carpentry
- [Version Control with Git](https://annakrystalli.me/rrresearchACCE20/version-control-with-git.html) (for R users), Anna Krystalli
- [Set up Git with RStudio & GitLab](https://gitlab.com/HeidiSeibold/setup-git-rstudio-gitlab), Heidi Seibold