Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to get dataset #12

Open
Sun-Happy-YKX opened this issue Nov 30, 2022 · 5 comments
Open

how to get dataset #12

Sun-Happy-YKX opened this issue Nov 30, 2022 · 5 comments

Comments

@Sun-Happy-YKX
Copy link

I'm new to transformer recently and don't know how to get the dataset in this project.
Please help me to provide a linux script if you can.

@Gi-gigi
Copy link

Gi-gigi commented Mar 19, 2023

请问兄弟你解决了嘛?可否进一步交流一下~

@Luoxiaofan666
Copy link

the same question ,Please help me to provide a linux script if you can.

@Exiurs
Copy link

Exiurs commented Feb 20, 2024

@Shengqi-Kong
Copy link

https://blog.csdn.net/xunan003/article/details/130110232

链接挂了,直接提示403forbidden,难怪运行也会报错,server直接挂了

@JaceJu-frog
Copy link

First you can download dataset into yout own computer:
train = wget "https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/training.tar.gz"
valid =wget "https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/validation.tar.gz"
test =wget "https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/mmt_task1_test2016.tar.gz"
and unzip them to any route (just a case "~/Python/DATASETS/Multi30k/") .
Then you can use TranslationDataset class to load the data and split them:

from torchtext.datasets import TranslationDataset, Multi30k
ROOT = '~/Python/DATASETS/Multi30k/'
Multi30k.download(ROOT)

(trnset, valset, testset) = TranslationDataset.splits(   
                                      path       = ROOT,  
                                      exts       = ['.en', '.de'],   
                                      fields     = [('src', srcfield), ('trg',tgtfield)],
                                      test       = 'test2016'
                                      )

ref: pytorch/text#312 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants