SPARKNLP 1034 implement starcoder2 for causal lm #14358

prabod · 2024-08-01T05:51:37Z

This PR introduces StarCoder2

Description

This pull request adds support for the StarCoder2 model in the Spark NLP library. StarCoder2 is a Transformer model designed specifically for code generation and understanding. It features 3 billion parameters and is trained on a diverse dataset including multiple programming languages, making it highly versatile for various coding tasks. The new model provides enhanced functionality for code completion, generation, and understanding.

Motivation and Context

The inclusion of StarCoder2 addresses the need for a robust code-focused language model within Spark NLP. This model will significantly improve the library's capabilities in code generation and understanding tasks, offering developers a powerful tool for software development and data science projects.

How Has This Been Tested?

The StarCoder2 model has been tested using various unit and integration tests to ensure its proper functionality within the Spark NLP framework. Tests were conducted to verify the model's ability to perform code completion and generation tasks accurately. Additionally, performance benchmarks were compared to existing models to ensure its efficacy.

Screenshots (if appropriate):

Types of changes

New feature (non-breaking change which adds functionality)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have read the CONTRIBUTING page.
I have added tests to cover my changes.
All new and existing tests passed.

prabod requested a review from maziyarpanahi August 1, 2024 05:51

prabod self-assigned this Aug 1, 2024

maziyarpanahi approved these changes Sep 1, 2024

View reviewed changes

maziyarpanahi changed the base branch from master to release/550-release-candidate September 1, 2024 18:22

prabod added 2 commits September 2, 2024 03:22

Added starcoder scala API

549a904

Added python api

65bac0b

prabod force-pushed the SPARKNLP-1034-Implement-Starcoder2ForCausalLM branch from 26d3612 to 65bac0b Compare September 2, 2024 03:23

maziyarpanahi merged commit 66d94a4 into release/550-release-candidate Sep 2, 2024
1 of 4 checks passed

maziyarpanahi mentioned this pull request Sep 2, 2024

release/550-release-candidate #14389

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARKNLP 1034 implement starcoder2 for causal lm #14358

SPARKNLP 1034 implement starcoder2 for causal lm #14358

prabod commented Aug 1, 2024 •

edited

Loading

SPARKNLP 1034 implement starcoder2 for causal lm #14358

SPARKNLP 1034 implement starcoder2 for causal lm #14358

Conversation

prabod commented Aug 1, 2024 • edited Loading

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

prabod commented Aug 1, 2024 •

edited

Loading