Modern large language models for text generation show impressive results: they can compose a poem, change the style of texts, and even write a meaningful essay on a free topic. However, such models can be used for malicious purposes, such as generating fake news, automatic product reviews, and fake political content. Thus, a new task arises: to learn to distinguish texts written by a human from texts generated by neural models.
The RuATD (Russian Artificial Text Detection) Dialogue Shared task is dedicated to the problem of automatic recognition of generated texts. We suggest two tracks:
- Determine if the text was generated automatically or written by a human (binary classification);
- Determine which model from the list was used to generate this text (multi-class classification).
We provide train and dev splits. The part of the set was annotated automatically by different generative models. We use various language models fine-tuned on different tasks: machine translation, paraphrasing, summarization, simplification and unconditional text generation - are used to generate texts. The texts written by a human were collected from open sources from different domains.
Binary annotation consists of the following labels:
- H – a human text
- M – a machine-generated text
Multi-class annoattions consists of the following labels:
- OPUS-MT – this text was translated by means of the OPUS model
- ruGPT3-Large – this text was generated with ruGPT3-Large
- etc
Two files, sample_submit_binary and sample_submit_multiple, present the submission format.
A sample from training dataset is provided below.
H | M-MT (FR→RU) |
---|---|
Эх, у меня может быть и нет денег, но у меня всё ещё есть гордость. | Может, у меня нет денег, но у меня всегда есть гордость. |
Меня покусали комары. | Меня похитили муски. |
Я не могу чувствовать себя в гостинице как дома. | Я не могу чувствовать себя дома в отеле. |
Эта книга показалась мне интересной. | Я нашёл эту интересную книгу. |
Я был полон решимости помочь ему, даже рискуя собственной жизнью. | Я был готов помочь ему в опасности своей жизни. |
Моя квартира находится меньше чем в пяти минутах пешком от станции. | Моя квартира находится на расстоянии менее пяти минут от станции. |
For the Shared Task evaluation we use standard metrics for classification, such as accuracy.
We provide the the baseline and public its code in the open source. There are two baselines:
- tf-idf + logistic regression
- fine-tuning BERT
- Participants are allowed to use any additional materials and any pre-trained models, with the exception of direct markup of the test set and search on the Internet.
- Participants can work solely or in teams on the Shared task.
- All participants will be invited to submit papers to Dialogue 2022 proceedings.
- There will be another round of evaluation to ensure, that the participants did not search on the Web and other open sources. We will ask every team to publish their solutions in open access and ask to peer review published solutions.
- End of December 2021 - start January 2022 – publication of the train set
- 17 January 2022 – The shared task is open
- 7 March 2022, 9 AM (Moscow time) – The shared task is closed
- 8 March 2020 - preliminary results
- 9-13 March 2022 - models peer reveiw and official results
- 15 March 2022 – articles submissions
Ekaterina Artemova (HSE, Huawei Noah’s Ark Lab)
Konstantin Nikolaev (HSE)
Vladislav Mikhailov (SberDevices)
Marat Saidov (HSE)
Ivan Smurov (ABBYY, MIPT)
Elena Tutubalina (Sber)
Alena Fenogenova (SberDevices)
Daniil Cherniavskii (Skolkovo Institute of Science and Technology)
Tatiana Shavrina (AIRI, SberDevices)
Tatiana Shamardina (ABBYY)
Anastasiya Valeeva (MIPT)