-
Notifications
You must be signed in to change notification settings - Fork 1
/
make-recipe.html
413 lines (359 loc) · 27.2 KB
/
make-recipe.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>ggd make-recipe — GGD documentation</title>
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/alabaster.css" type="text/css" />
<link rel="stylesheet" type="text/css" href="_static/style.css" />
<link rel="stylesheet" type="text/css" href="_static/font-awesome-4.7.0/css/font-awesome.min.css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="ggd make-meta-recipe" href="make-metarecipe.html" />
<link rel="prev" title="ggd show-env" href="show-env.html" />
<link href="https://fonts.googleapis.com/css?family=Lato|Raleway" rel="stylesheet">
<link href="https://fonts.googleapis.com/css?family=Inconsolata" rel="stylesheet">
<meta name="msapplication-TileColor" content="#ffffff">
<meta name="msapplication-TileImage" content="_static/ms-icon-144x144.png">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/selectize.js/0.12.6/css/selectize.bootstrap3.min.css">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/4.3.1/css/bootstrap.min.css">
<script src="https://cdnjs.cloudflare.com/ajax/libs/datatables/1.10.21/js/jquery.dataTables.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/selectize.js/0.12.6/js/standalone/selectize.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/4.3.1/js/bootstrap.bundle.min.js"></script>
</head><body>
<div class="document">
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<p class="logo">
<a href="index.html">
<img class="logo" src="_static/logo/GoGetData_name_logo.png" alt="Logo"/>
</a>
</p>
<h3>Navigation</h3>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="quick-start.html">GGD Quick Start</a></li>
<li class="toctree-l1"><a class="reference internal" href="using-ggd.html">Using GGD</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="GGD-CLI.html">GGD Commands</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="ggd-search.html">ggd search</a></li>
<li class="toctree-l2"><a class="reference internal" href="install.html">ggd install</a></li>
<li class="toctree-l2"><a class="reference internal" href="predict-path.html">ggd predict-path</a></li>
<li class="toctree-l2"><a class="reference internal" href="uninstall.html">ggd uninstall</a></li>
<li class="toctree-l2"><a class="reference internal" href="list.html">ggd list</a></li>
<li class="toctree-l2"><a class="reference internal" href="list-file.html">ggd get-files</a></li>
<li class="toctree-l2"><a class="reference internal" href="pkg-info.html">ggd pkg-info</a></li>
<li class="toctree-l2"><a class="reference internal" href="show-env.html">ggd show-env</a></li>
<li class="toctree-l2 current"><a class="current reference internal" href="#">ggd make-recipe</a></li>
<li class="toctree-l2"><a class="reference internal" href="make-metarecipe.html">ggd make-meta-recipe</a></li>
<li class="toctree-l2"><a class="reference internal" href="check-recipe.html">ggd check-recipe</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="meta-recipes.html">GGD meta-recipes</a></li>
<li class="toctree-l1"><a class="reference internal" href="contribute.html">Contribute</a></li>
<li class="toctree-l1"><a class="reference internal" href="private_recipes.html">Private Recipes</a></li>
<li class="toctree-l1"><a class="reference internal" href="workflows.html">Using GGD in Workflows</a></li>
<li class="toctree-l1"><a class="reference internal" href="recipes.html">Available Data Packages</a></li>
</ul>
<ul>
<li class="toctree-l1"><a href="https://github.com/gogetdata/ggd-recipes">ggd-recipes @ Github</a></li>
<li class="toctree-l1"><a href="https://github.com/gogetdata/ggd-cli">ggd-cli @ Github</a></li>
</ul>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="ggd-make-recipe">
<span id="id1"></span><h1>ggd make-recipe<a class="headerlink" href="#ggd-make-recipe" title="Permalink to this headline">¶</a></h1>
<p>[<a class="reference internal" href="index.html#home-page"><span class="std std-ref">Click here to return to the home page</span></a>]</p>
<p>ggd make-recipe is used to create a ggd data recipe from a bash script which contains the information on
extracting and processing the data.</p>
<p>This provides a simple resource to create a recipe where the users need only create the base script and
ggd will generate the remainder of the pieces required for a ggd data recipe.</p>
<ul class="simple">
<li><p><strong>recipe</strong>: A data recipe is a directory containing a set of files that comprise information about the recipe.
This includes: A meta.yaml file, which is the meta data information for the soon to be ggd data package;
a post-link script, which contains the information about file and data management; a recipe script, which
contains the information on how to get the data and how to process it; and a checksum file, which is used
to ensure that the contents of the data files installed from ggd have not changed.</p></li>
<li><p><strong>package</strong>: A data package is created from building/packaging the ggd data recipe. It is a bgzipped tar
file that contains the built data recipe and additional metadata information for conda system handling.</p></li>
</ul>
<p><code class="code docutils literal notranslate"><span class="pre">ggd</span> <span class="pre">make-recipe</span></code> takes a bash script created by you and turns it into a data recipe. This data recipe will then be
turned into a data package using <a class="reference internal" href="check-recipe.html#ggd-check-recipe"><span class="std std-ref">ggd check-recipe</span></a>. Finally, the new data package will
be added to the ggd repo and ggd conda channel through an automatic continuous integration system. For more details see
the <a class="reference internal" href="contribute.html#make-data-packages"><span class="std std-ref">contribute</span></a> documentation.</p>
<p>The first step in this process is to create a bash script with instructions on downloading and processing the data,
then using <code class="code docutils literal notranslate"><span class="pre">ggd</span> <span class="pre">make-recipe</span></code> to create a ggd data recipe</p>
<div class="section" id="using-ggd-make-recipe">
<h2>Using ggd make-recipe<a class="headerlink" href="#using-ggd-make-recipe" title="Permalink to this headline">¶</a></h2>
<p>Creating a ggd recipe is easy using the <code class="code docutils literal notranslate"><span class="pre">ggd</span> <span class="pre">make-recipe</span></code> tool.
Running <code class="code docutils literal notranslate"><span class="pre">ggd</span> <span class="pre">make-recipe</span> <span class="pre">-h</span></code> will give you the following help message:</p>
<p>make-recipe arguments:</p>
<table class="docutils align-default">
<colgroup>
<col style="width: 38%" />
<col style="width: 63%" />
</colgroup>
<thead>
<tr class="row-odd"><th class="head"><p>ggd make-recipe</p></th>
<th class="head"><p>Make a ggd data recipe from a bash script</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">-h</span></code>, <code class="docutils literal notranslate"><span class="pre">--help</span></code></p></td>
<td><p>show this help message and exit</p></td>
</tr>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">-c</span></code>, <code class="docutils literal notranslate"><span class="pre">--channel</span></code></p></td>
<td><p>(Optional) The ggd channel to use. (Default = genomics)</p></td>
</tr>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">-d</span></code>, <code class="docutils literal notranslate"><span class="pre">--dependency</span></code></p></td>
<td><p>any software dependencies (in bioconda, conda-forge) or
data-dependency (in ggd). May be used as many times as needed.</p></td>
</tr>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">-p</span></code>, <code class="docutils literal notranslate"><span class="pre">--platform</span></code></p></td>
<td><p>(Optional) Whether to use noarch as the platform or the system
platform. If set to ‘none’ the system platform will be
used. (Default = noarch. Noarch means no architecture
and is platform agnostic.)</p></td>
</tr>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">-s</span></code>, <code class="docutils literal notranslate"><span class="pre">--species</span></code></p></td>
<td><p><strong>Required</strong> Species recipe is for</p></td>
</tr>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">-g</span></code>, <code class="docutils literal notranslate"><span class="pre">--genome-build</span></code></p></td>
<td><p><strong>Required</strong> Genome-build the recipe is for</p></td>
</tr>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">--author</span></code></p></td>
<td><p><strong>Required</strong> The author(s) of the data recipe being created, (This recipe)</p></td>
</tr>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">-pv</span></code>, <code class="docutils literal notranslate"><span class="pre">--package-version</span></code></p></td>
<td><p><strong>Required</strong> The version of the ggd package. (First time package = 1,
updated package > 1)</p></td>
</tr>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">-dv</span></code>, <code class="docutils literal notranslate"><span class="pre">--data-version</span></code></p></td>
<td><p><strong>Required</strong> The version of the data (itself) being downloaded and
processed (EX: dbsnp-127) If there is no data version
apparent we recommend you use the date associated with
the files or something else that can uniquely identify
the ‘version’ of the data</p></td>
</tr>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">-dp</span></code>, <code class="docutils literal notranslate"><span class="pre">--data-provider</span></code></p></td>
<td><p><strong>Required</strong> The data provider where the data was accessed.
(Example: UCSC, Ensembl, gnomAD, etc.)</p></td>
</tr>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">--summary</span></code></p></td>
<td><p><strong>Required</strong> A detailed comment describing the recipe</p></td>
</tr>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">-k</span></code>, <code class="docutils literal notranslate"><span class="pre">--keyword</span></code></p></td>
<td><p><strong>Required</strong> A keyword to associate with the recipe. May be
specified more that once. Please add enough keywords
to better describe and distinguish the recipe</p></td>
</tr>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">-cb</span></code>, <code class="docutils literal notranslate"><span class="pre">--coordinate-base</span></code></p></td>
<td><p><strong>Required</strong> The genomic coordinate basing for the file(s) in the
recipe. That is, the coordinates exclusive start at genomic
coordinate 0 or 1, and the end coordinate is either
inclusive (everything up to and including the end
coordinate) or exclusive (everything up to but not
including the end coordinate) Files that do not have
coordinate basing, like fasta files, specify NA for
not applicable.</p></td>
</tr>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">-n</span></code>, <code class="docutils literal notranslate"><span class="pre">--name</span></code></p></td>
<td><p><strong>Required</strong> The sub-name of the recipe being created. (e.g. cpg-
islands, pfam-domains, gaps, etc.) This will not be
the final name of the recipe, but will specific to the data gathered
and processed by the recipe</p></td>
</tr>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">script</span></code></p></td>
<td><p><strong>Required</strong> bash script that contains the commands to obtain and
process the data</p></td>
</tr>
</tbody>
</table>
<div class="section" id="additional-argument-explanation">
<h3>Additional argument explanation:<a class="headerlink" href="#additional-argument-explanation" title="Permalink to this headline">¶</a></h3>
<p>Required arguments:</p>
<ul class="simple">
<li><p><em>-s:</em> The <code class="code docutils literal notranslate"><span class="pre">-s</span></code> flag is used to declare the species of the data recipe.</p></li>
<li><p><em>-g:</em> The <code class="code docutils literal notranslate"><span class="pre">-g</span></code> flag is used to declare the genome-build of the data recipe.</p></li>
<li><p><em>–authors:</em> The <code class="code docutils literal notranslate"><span class="pre">--authors</span></code> flag is used to declare the authors of the ggd data recipe.</p></li>
<li><p><em>-pv:</em> The <code class="code docutils literal notranslate"><span class="pre">-pv</span></code> flag is used to declare the version of the ggd recipe being created. (1 for first time recipe, and 2+ for updated recipes)</p></li>
<li><p><em>-dv:</em> The <code class="code docutils literal notranslate"><span class="pre">-dv</span></code> flag is used to declare the version of the data being downloaded and processed. If a version is not
available for the specific data, use something that can identify the data uniquely such as when the date the data
was created.</p></li>
<li><p><em>-dp:</em> The <code class="code docutils literal notranslate"><span class="pre">-dp</span></code> flag is used to designate where the original data is coming from. Please make sure to indicate the data provider correctly to
both give credit to the data create/provider as well as to help uniquely identify the data origin.</p></li>
<li><p><em>–summary:</em> The <code class="code docutils literal notranslate"><span class="pre">--summary</span></code> flag is used to provide a summary/description of the recipe. Provide enough information to explain what the data is and
where it is coming from.</p></li>
<li><p><em>-k:</em> The <code class="code docutils literal notranslate"><span class="pre">-k</span></code> flag is used to declare keywords associated with the data and recipe. If there are multiple keywords, the <cite>-k</cite> flag
should be used for each keywords. (Example: -k ref -k reference)</p></li>
<li><p><em>-cb:</em> The <code class="code docutils literal notranslate"><span class="pre">-cb</span></code> flag designates the coordinate base of the data files created from this recipe. Please follow general genomic file
coordinate standards based on the file format you are creating. Please indicate the coordinate basing of the file created here using this
flag.</p></li>
<li><p><em>-n:</em> <code class="code docutils literal notranslate"><span class="pre">-n</span></code> represents the sub-name of the recipe. Sub-name refers to a portion of the name that will help to uniquely identify the
recipe from all other recipes based on the data the recipe creates. The full name will include the genome build the data provider and the
ggd recipe version. <strong>DO NOT</strong> include the genome build, data provider, or ggd recipe version here. Those will be designated with other flags.
The name should be specific to the data being processed or curated by the recipe. (Please provide an identifiable name. Example: cpg-islands)</p></li>
<li><p><em>script:</em> <code class="code docutils literal notranslate"><span class="pre">script</span></code> represents the bash script containing the information on data extraction and processing.</p></li>
</ul>
<p>Optional arguments:</p>
<ul class="simple">
<li><p><em>-c:</em> The <code class="code docutils literal notranslate"><span class="pre">-c</span></code> flag is used to declare which ggd channel to use. (genomics is the default)</p></li>
<li><p><em>-d:</em> The <code class="code docutils literal notranslate"><span class="pre">-d</span></code> flag is used to declare software dependencies in conda, bioconda, and conda-forge, and data-dependencies in
ggd for creating the package. If there are no dependencies this flag is not needed.</p></li>
<li><p><em>-p:</em> The <code class="code docutils literal notranslate"><span class="pre">-p</span></code> flag is used to set the noarch platform or not. By default “noarch” is set, which means the package will be
built and installed with no architecture designation. This means it should be able to build on linux and macOS. If this is not
true you will need to set <code class="code docutils literal notranslate"><span class="pre">-p</span></code> to “none”. The system you are using, linux or macOS will take then take the place of noarch.</p></li>
</ul>
</div>
</div>
<div class="section" id="data-recipe-standards">
<h2>Data recipe standards<a class="headerlink" href="#data-recipe-standards" title="Permalink to this headline">¶</a></h2>
<ol class="arabic simple">
<li><p>The name of the data recipe should be short, simple, but identifiable and unique. For example, if you are creating a recipe that access
the cpg-islands track from UCSC you would provide the name <cite>cpg-islands</cite> for the name parameter when running <code class="code docutils literal notranslate"><span class="pre">ggd</span> <span class="pre">make-recipes</span></code>.
The final recipe name will contain the genome build, the name provider using <code class="code docutils literal notranslate"><span class="pre">-n</span></code>, the data provider, and the version. (<cite>hg19-cpg-islands-ucsc-v1</cite>)</p></li>
<li><p>The data should be named after the recipe name. Please make sure all data that is produced by the recipe prior to the file extensions is named after the recipe name.</p></li>
<li><p>Please add many keywords. Keywords help to distinguish and describe the data files. Please add as many keywords that can help to distinguish and describe the data</p></li>
<li><p>Data files should be labeled and sorted consistently across different genome builds. The data sorting standard for ggd data recipes is regulated by a tool called <cite>gsort</cite>.
Please us <cite>gsort</cite> whenever you need to sort genomic data files. (<cite>gsort</cite> can be installed with conda if it is not on your system now.) The associated genome files used
with gsort can be found at <a class="reference external" href="https://github.com/gogetdata/ggd-recipes/tree/master/genomes">ggd-recipes/genomes</a>. If the desired genome file for a specific genome build
is not available raise an issue on <a class="reference external" href="https://github.com/gogetdata/ggd-recipes/issues">ggd-recipes::issues</a> and someone from the ggd team will help.
ggd also uses <cite>check-sort-order</cite> for additional QC of the data. If you are unsure about the sort order of your data please test it with <cite>check-sort-order</cite></p></li>
</ol>
</div>
<div class="section" id="examples">
<h2>Examples<a class="headerlink" href="#examples" title="Permalink to this headline">¶</a></h2>
<div class="section" id="a-simple-example-of-creating-a-ggd-recipe">
<h3>1. A simple example of creating a ggd recipe<a class="headerlink" href="#a-simple-example-of-creating-a-ggd-recipe" title="Permalink to this headline">¶</a></h3>
<p>get_data.sh:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nv">genome</span><span class="o">=</span>https://raw.githubusercontent.com/gogetdata/ggd-recipes/master/genomes/Homo_sapiens/hg19/hg19.genome
wget --quiet -O - http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/gap.txt.gz <span class="se">\</span>
<span class="p">|</span> gzip -dc <span class="se">\</span>
<span class="p">|</span> awk -v <span class="nv">OFS</span><span class="o">=</span><span class="s2">"\t"</span> <span class="s1">'BEGIN {print "#chrom\tstart\tend\tsize\ttype\tstrand"} {print $2,$3,$4,$7,$8,"+"}'</span> <span class="se">\</span>
<span class="p">|</span> gsort /dev/stdin <span class="nv">$genome</span> <span class="se">\</span>
<span class="p">|</span> bgzip -c > hg19-gaps-ucsc-v1.bed.gz
tabix hg19-gaps-ucsc-v1.bed.gz
</pre></div>
</div>
<p>ggd make-recipe</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ggd make-recipe -s Homo_sapiens -g hg19 --author mjc -pv <span class="m">1</span> -dv <span class="m">27</span>-Apr-2009 -dp UCSC --summary <span class="s1">'Assembly gaps from USCS'</span> -k gaps -k region -cb <span class="m">0</span>-based-inclusive -n gaps data_script.sh
:ggd:make-recipe: checking hg19
:ggd:make-recipe: Wrote output to hg19-gaps-ucsc-v1/
:ggd:make-recipe: To <span class="nb">test</span> that the recipe is working, and before pushing the new recipe to gogetdata/ggd-recipes, please run:
$ ggd check-recipe hg19-gaps-ucsc-v1/
</pre></div>
</div>
<p>This code will create a new ggd recipe:</p>
<blockquote>
<div><ul class="simple">
<li><p>Directory Name: <strong>hg19-gaps-ucsc-v1</strong></p></li>
<li><p>Files: <strong>meta.yaml</strong>, <strong>post-link.sh</strong>, <strong>recipe.sh</strong>, and <strong>checksums_file.txt</strong></p></li>
</ul>
</div></blockquote>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>The directory name <strong>hg19-gaps-ucsc-v1</strong> is the ggd recipe</p>
</div>
</div>
<div class="section" id="a-more-complex-ggd-recipe">
<h3>2. A more complex ggd recipe<a class="headerlink" href="#a-more-complex-ggd-recipe" title="Permalink to this headline">¶</a></h3>
<p>get_data.sh</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>wget --quiet http://evs.gs.washington.edu/evs_bulk_data/ESP6500SI-V2-SSA137.GRCh38-liftover.snps_indels.vcf.tar.gz
<span class="c1"># extract individual chromosome files</span>
tar -zxf ESP6500SI-V2-SSA137.GRCh38-liftover.snps_indels.vcf.tar.gz
<span class="c1"># combine chromosome files into one</span>
<span class="o">(</span>grep ^# ESP6500SI-V2-SSA137.GRCh38-liftover.chr1.snps_indels.vcf<span class="p">;</span> cat ESP6500SI-V2-SSA137.GRCh38-liftover.chr*.snps_indels.vcf <span class="p">|</span> grep
<span class="c1"># sort the chromosome data according to the .genome file from github</span>
gsort temp.vcf https://raw.githubusercontent.com/gogetdata/ggd-recipes/master/genomes/Homo_sapiens/GRCh37/GRCh37.genome <span class="se">\</span>
<span class="p">|</span> bgzip -c > ESP6500SI.all.snps_indels.vcf.gz
<span class="c1"># tabix it</span>
tabix -p vcf ESP6500SI.all.snps_indels.vcf.gz
<span class="c1"># get handle for reference file</span>
<span class="nv">reference_fasta</span><span class="o">=</span><span class="s2">"</span><span class="k">$(</span>ggd get-files <span class="s1">'grch37-reference-genome-1000g-v1'</span> -s <span class="s1">'Homo_sapiens'</span> -g <span class="s1">'GRCh37'</span> -p <span class="s1">'grch37-reference-genomie-1000g-v1.fa'</span><span class="k">)</span><span class="s2">"</span>
<span class="c1"># get the sanitizer script</span>
wget --quiet https://raw.githubusercontent.com/arq5x/gemini/00cd627497bc9ede6851eae2640bdaff9f4edfa3/gemini/annotation_provenance/sanit
<span class="c1"># sanitize</span>
zless ESP6500SI.all.snps_indels.vcf.gz <span class="p">|</span> python sanitize-esp.py <span class="p">|</span> bgzip -c > temp.gz
tabix temp.gz
<span class="c1"># decompose with vt</span>
vt decompose -s temp.gz <span class="p">|</span> vt normalize -r <span class="nv">$reference_fasta</span> - <span class="se">\</span>
<span class="p">|</span> perl -pe <span class="s1">'s/\([EA_|T|AA_]\)AC,Number=R,Type=Integer/\1AC,Number=R,Type=String/'</span> <span class="se">\</span>
<span class="p">|</span> bgzip -c > grch37-esp-variants-uw-v1.vcf.gz
tabix grch37-esp-variants-uw-v1.vcf.gz
<span class="c1"># clean up environment</span>
rm ESP6500SI-V2-SSA137.GRCh38-liftover.snps_indels.vcf.tar.gz
rm ESP6500SI-V2-SSA137.GRCh38-liftover.chr*.snps_indels.vcf
rm ESP6500SI.all.snps_indels.vcf.gz.tbi
rm ESP6500SI.all.snps_indels.vcf.gz
rm temp.gz
rm temp.gz.tbi
rm temp.vcf
rm sanitize-esp.py
</pre></div>
</div>
<p>ggd make-recipe</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ggd make-recipe <span class="se">\</span>
-s Homo_sapiens <span class="se">\</span>
-g GRCh37 <span class="se">\</span>
--author mjc <span class="se">\</span>
-pv <span class="m">1</span> <span class="se">\</span>
-dv ESP6500SI-V2 <span class="se">\</span>
-dp UW <span class="se">\</span>
--summary <span class="s1">'ESP variants (More Info: http://evs.gs.washington.edu/EVS/#tabs-7)'</span> <span class="se">\</span>
-k ESP <span class="se">\</span>
-k vcf-file <span class="se">\</span>
-cb <span class="m">1</span>-based-exclusive <span class="se">\</span>
-d grch37-reference-genome-1000g-v1 <span class="se">\</span>
-d gsort <span class="se">\</span>
-d vt <span class="se">\</span>
-n esp-variants <span class="se">\</span>
data_script.sh
:ggd:make-recipe: checking GRCh37
:ggd:make-recipe: Wrote output to grch37-esp-variants-uw-v1/
:ggd:make-recipe: To <span class="nb">test</span> that the recipe is working, and before pushing the new recipe to gogetdata/ggd-recipes, please run:
$ ggd check-recipe grch37-esp-variants-uw-v1/
</pre></div>
</div>
<p>This code will create a new ggd recipe:</p>
<blockquote>
<div><ul class="simple">
<li><p>Directory Name: <strong>grch37-esp-variants-uw-v1</strong></p></li>
<li><p>Files: <strong>meta.yaml</strong>, <strong>post-link.sh</strong>, <strong>recipe.sh</strong>, and <strong>checksums_file.txt</strong></p></li>
</ul>
</div></blockquote>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>The directory name <strong>grch37-esp-variants-uw-v1</strong> is the ggd recipe</p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="footer">
©2016-2021, The GoGetData team.
|
<a href="_sources/make-recipe.rst.txt"
rel="nofollow">Page source</a>
</div>
</body>
</html>