Phi-3-Vision-Microsoft-Multimodal

Microsoft Phi-3 Vision-the first Multimodal model By Microsoft, a multimodal model that brings together language and vision capabilities. the multimodal version comes with 128K context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures.

Demo with Huggingface🤗

Hugging Face🤗 : click-1
Hugging Face🤗 : click-2
Hugging Face🤗 : click-3

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
Phi_3_vision_128k_instruct.ipynb		Phi_3_vision_128k_instruct.ipynb
README.md		README.md
image.png		image.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phi-3-Vision-Microsoft-Multimodal

About

Releases

Packages

Languages

License

divakarkumarp/Phi-3-Vision-MS-Multimodal

Folders and files

Latest commit

History

Repository files navigation

Phi-3-Vision-Microsoft-Multimodal

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages