Skip to content

Latest commit

 

History

History
49 lines (37 loc) · 1.68 KB

README.md

File metadata and controls

49 lines (37 loc) · 1.68 KB

ParaBART

Code for our NAACL-2021 paper "Disentangling Semantics and Syntax in Sentence Embeddings with Pre-trained Language Models".

If you find this repository useful, please consider citing our paper.

@inproceedings{huang2021disentangling,
  title = {Disentangling Semantics and Syntax in Sentence Embeddings with Pre-trained Language Models},
  author = {Huang, James Y. and Huang, Kuan-Hao and Chang, Kai-Wei},
  booktitle = {NAACL},
  year = {2021}
}

Dependencies

  • Python==3.7.6
  • PyTorch==1.6.0
  • Transformers==3.0.2

Pre-trained Models

Our pre-trained ParaBART model is available here

Training

  • Download the dataset and put it under ./data/
  • Run the following command to train ParaBART
python train_parabart.py --data_dir ./data/

Evaluation

  • Download the SentEval toolkit and datasets
  • Name your trained model model.pt and put it under ./model/
  • Run the following command to evaluate ParaBART on semantic textual similarity and syntactic probing tasks
python parabart_senteval.py --senteval_dir ../SentEval --model_dir ./model/
  • Download QQP-Easy and QQP-Hard datasets here
  • Run the following command to evaluate ParaBART on QQP datasets
python parabart_qqpeval.py

Author

James Yipeng Huang / @jyhuang36