Skip to content

feat: automatically adapt to current free VRAM state #182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 52 commits into from
Apr 4, 2024
Merged

Conversation

giladgd
Copy link
Contributor

@giladgd giladgd commented Mar 18, 2024

Description of change

  • feat: read tensor info from gguf files
  • feat: inspect gguf command
  • feat: inspect measure command
  • feat: readGgufFileInfo function
  • feat: GGUF file info on LlamaModel
  • feat: estimate VRAM usage of the model and context with certain options to adapt to current VRAM state and set great defaults for gpuLayers and contextSize. no manual configuration of those options is needed anymore to maximize performance
  • feat: JinjaTemplateChatWrapper
  • feat: use the tokenizer.chat_template header from the gguf file when available - use it to find a better specialized chat wrapper or use JinjaTemplateChatWrapper with it as a fallback
  • feat: improve resolveChatWrapper
  • feat: simplify generation CLI commands: chat, complete, infill
  • feat: read GPU device names
  • feat: get token type
  • refactor: gguf
  • test: separate gguf tests to model dependent and model independent tests
  • test: switch to new vitest test signature
  • fix: use the new llama.cpp CUDA flag
  • fix: improve chat wrappers tokenization
  • fix: bugs

Fixes #133

Pull-Request Checklist

  • Code is up-to-date with the master branch
  • npm run format to apply eslint formatting
  • npm run test passes with this change
  • This pull request links relevant issues as Fixes #0000
  • There are new or updated unit tests validating the change
  • Documentation has been updated to reflect this change
  • The new commits and pull request title follow conventions explained in pull request guidelines (PRs that do not follow this convention will not be merged)

@giladgd giladgd requested a review from ido-pluto March 18, 2024 19:40
@giladgd giladgd self-assigned this Mar 18, 2024
@giladgd giladgd marked this pull request as draft March 20, 2024 00:54
@giladgd giladgd changed the title test: organize gguf tests feat: read tensor info from gguf files Mar 20, 2024
@giladgd giladgd changed the title feat: read tensor info from gguf files feat: automatically adapt to current free VRAM state Apr 2, 2024
@giladgd giladgd marked this pull request as ready for review April 2, 2024 23:22
Copy link
Contributor

@ido-pluto ido-pluto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@giladgd giladgd merged commit 35e6f50 into beta Apr 4, 2024
10 checks passed
@giladgd giladgd deleted the gilad/bugFixes2 branch April 4, 2024 19:25
Copy link

github-actions bot commented Apr 4, 2024

🎉 This PR is included in version 3.0.0-beta.15 🎉

The release is available on:

Your semantic-release bot 📦🚀

@giladgd giladgd mentioned this pull request Apr 4, 2024
17 tasks
Copy link

github-actions bot commented Sep 24, 2024

🎉 This PR is included in version 3.0.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

feat: max GPU layers param
2 participants