feat: automatically adapt to current free VRAM state #182

giladgd · 2024-03-18T19:40:31Z

Description of change

feat: read tensor info from gguf files
feat: inspect gguf command
feat: inspect measure command
feat: readGgufFileInfo function
feat: GGUF file info on LlamaModel
feat: estimate VRAM usage of the model and context with certain options to adapt to current VRAM state and set great defaults for gpuLayers and contextSize. no manual configuration of those options is needed anymore to maximize performance
feat: JinjaTemplateChatWrapper
feat: use the tokenizer.chat_template header from the gguf file when available - use it to find a better specialized chat wrapper or use JinjaTemplateChatWrapper with it as a fallback
feat: improve resolveChatWrapper
feat: simplify generation CLI commands: chat, complete, infill
feat: read GPU device names
feat: get token type
refactor: gguf
test: separate gguf tests to model dependent and model independent tests
test: switch to new vitest test signature
fix: use the new llama.cpp CUDA flag
fix: improve chat wrappers tokenization
fix: bugs

Fixes #133

Pull-Request Checklist

Code is up-to-date with the master branch
npm run format to apply eslint formatting
npm run test passes with this change
This pull request links relevant issues as Fixes #0000
There are new or updated unit tests validating the change
Documentation has been updated to reflect this change
The new commits and pull request title follow conventions explained in pull request guidelines (PRs that do not follow this convention will not be merged)

…end on the currently available VRAM, token meter to track token usage

…ice names

…BuiltinSpecialToken` inheritance from `SpecialToken`

…whitespace in responses when using `JinjaTemplateChatWrapper` by default

…ency

ido-pluto

LGTM

…ase it afterward

github-actions · 2024-04-04T20:52:35Z

🎉 This PR is included in version 3.0.0-beta.15 🎉

The release is available on:

Your semantic-release bot 📦🚀

github-actions · 2024-09-24T18:12:36Z

🎉 This PR is included in version 3.0.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

giladgd added 6 commits March 18, 2024 20:48

fix: bugs

621c736

docs: add canonical URL link

1ce8cf1

test: switch to new vitest test signature

7c333d0

test: separate gguf tests to model dependent and model independent tests

13e1ad6

chore: update .gitignore

3fc475b

feat: add disabled recursive clone feature

45edf6a

giladgd requested a review from ido-pluto March 18, 2024 19:40

giladgd self-assigned this Mar 18, 2024

giladgd added 7 commits March 18, 2024 22:55

chore: update vitest

6f1abcb

refactor: GgufParser

3995ae1

refactor: rename stream to fileReader

34a4c50

refactor: gguf

69466ae

feat: inspect gguf command

d0bd7cc

fix: rename build option to builds in the clear command

ab94b63

feat: read tensor info from a GGUF file

63ba9b9

giladgd marked this pull request as draft March 20, 2024 00:54

style: lint fix

29c629c

giladgd changed the title ~~test: organize gguf tests~~ feat: read tensor info from gguf files Mar 20, 2024

giladgd added 12 commits March 20, 2024 18:07

refactor: use a gguf version specific parser

24093b1

refactor: rename ggufParser directory to parser

98e911d

refactor: move files

550188b

refactor: rename getGgufFileInfo

642ccc8

feat: add more options to inspect gguf command

7bf8a7c

feat: calculate model VRAM usage based on header tensor info

ddbd29e

test: skip VRAM tests when running on a machine without a GPU

6b9d2b9

feat: calculate context VRAM usage based on header tensor info

cad3fd2

feat: flexible default gpuLayers and contextSize options that dep…

d30f06b

…end on the currently available VRAM, token meter to track token usage

feat: simplify chat, complete and infill commands, list GPU dev…

87ff5e8

…ice names

docs: update README.md

cc5cb5b

fix: CUDA GPU info

d50f3a4

giladgd added 12 commits April 2, 2024 20:30

feat: improve resolveChatWrapper resolution algorithm

517d8ee

refactor: rename SpecialToken to SpecialTokensText and separate `…

a8c677a

…BuiltinSpecialToken` inheritance from `SpecialToken`

refactor: rename BuiltinSpecialToken to SpecialToken

128618b

feat: improve control of leading space in tokenization, trim leading …

d754b4b

…whitespace in responses when using `JinjaTemplateChatWrapper` by default

fix: bugs

1037959

feat: add noJinja and noTrimWhitespace flags to the chat command

e4ccbff

test: update parseModelFileName.test.ts

a61ba62

test: move files

eecf2c3

fix: bugs

c95f2fe

test: fix tests

dc5dfca

fix: macOS build

547cfa8

fix: update lifecycle-utils to improve splitText's runtime effici…

367e043

…ency

giladgd changed the title ~~feat: read tensor info from gguf files~~ feat: automatically adapt to current free VRAM state Apr 2, 2024

giladgd marked this pull request as ready for review April 2, 2024 23:22

giladgd added 2 commits April 3, 2024 02:56

test: add sensible timeouts

ccae9fe

fix: vitest config

6ec6829

ido-pluto approved these changes Apr 3, 2024

View reviewed changes

giladgd added 5 commits April 3, 2024 21:52

feat: inspect measure command

8f08876

feat: improve VRAM consumption estimations

ffb0eab

test: update tests

f2e52d3

feat: improve VRAM consumption estimations

67f89c6

feat: reserve memory for a model/context before its creation and rele…

cb2f3c8

…ase it afterward

giladgd merged commit 35e6f50 into beta Apr 4, 2024
10 checks passed

giladgd deleted the gilad/bugFixes2 branch April 4, 2024 19:25

github-actions bot added the released on @beta label Apr 4, 2024

giladgd mentioned this pull request Apr 4, 2024

feat: version 3.0 #105

Merged

17 tasks

giladgd mentioned this pull request Jul 28, 2024

feat: Llama 3.1 support, Phi-3 support #273

Merged

7 tasks

github-actions bot added the released label Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: automatically adapt to current free VRAM state #182

feat: automatically adapt to current free VRAM state #182

giladgd commented Mar 18, 2024 •

edited

Loading

ido-pluto left a comment

github-actions bot commented Apr 4, 2024

github-actions bot commented Sep 24, 2024 •

edited by giladgd

Loading

feat: automatically adapt to current free VRAM state #182

feat: automatically adapt to current free VRAM state #182

Conversation

giladgd commented Mar 18, 2024 • edited Loading

Description of change

Pull-Request Checklist

ido-pluto left a comment

Choose a reason for hiding this comment

github-actions bot commented Apr 4, 2024

github-actions bot commented Sep 24, 2024 • edited by giladgd Loading

giladgd commented Mar 18, 2024 •

edited

Loading

github-actions bot commented Sep 24, 2024 •

edited by giladgd

Loading