Evaluates the capabilities of fine-tuned language models at finding logical similarities
install sentence-transformers
Python 3.6 or higher, PyTorch 1.2.0 or higher and transformers v3.0.2 or higher. The code does not work with Python 2.7.
First, follow the installation for PyTroch you can find here: PyTorch - Get Started. As the provided models can have a high computational overhead, it is recommend to run them on a GPU. See the PyTorch page how to install PyTorch for GPU (CUDA).
Install with pip
Install the sentence-transformers with pip
:
pip install -U sentence-transformers
Install from sources
Alternatively, you can also clone the latest version from the repository and install it directly from the source code:
pip install -e .
Download the pretrained model from here and place it in the folder
Use the new UI
Run
python UI.py
The first query may be very slow as it creates the embedding for all the questions on LeetCode, but queries after that should only takes seconds.
The "Tags" section gives a list similar questions' tags and their frequency.
Run it from terminal
To use it run
python main.py "your question here"
Due to an attempt to clean up the database by stripping away examples and implementation details, several questions became empty strings. So when given a short phrase, the output will be all empty strings.
The leetData.csv file includes pairs of coding questions from leetCode and a similarity score. It is created using my other repo, Leetcode_similar_question_scraper
running
python train.py
will continus to train the model using the included dataset.
The model is created by continuing to train a BERT model built for the sts benchmark using the leetcode dataset.
The initial results seem positive
A simple change of subject which threw off google search and the orginal model trained on sts dataset was succesfullly detected by the new model.
However with a more complicated paraphrase, the model was unable to find the orginal question it was based on. But the similar questions it returned are all solved by dynamic programming, which is the way to solve the problem in the second query.