Skip to content

Local HTML cache #899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
clemlesne opened this issue Jan 17, 2025 · 3 comments
Open

Local HTML cache #899

clemlesne opened this issue Jan 17, 2025 · 3 comments

Comments

@clemlesne
Copy link

clemlesne commented Jan 17, 2025

Is your feature request related to a problem? Please describe.

Implement a local HTML caching. To avoid re-process all scraping each time. It's notably long with recursive depths scrapings.

Describe the solution you'd like

Add a local cache near:

def handle_web_source(self, state, source):

Describe alternatives you've considered

  • Implement myself, not feasible as this is bundled in many graph implementations

Related

Copy link

dosubot bot commented Apr 20, 2025

Hi, @clemlesne. I'm Dosu, and I'm helping the Scrapegraph-ai team manage their backlog. I'm marking this issue as stale.

Issue Summary:

  • Proposal to add local HTML caching for performance optimization during recursive depth scrapings.
  • Suggested implementation location in the code, noting complexity due to integration with various graph implementations.
  • Linked to related issue Cache path is declared but never used in the code #898, indicating broader context or dependency.
  • No comments or activity since the issue was created.

Next Steps:

  • Please let me know if this issue is still relevant to the latest version of the Scrapegraph-ai repository by commenting here.
  • If there is no further activity, the issue will be automatically closed in 7 days.

Thank you for your understanding and contribution!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Apr 20, 2025
@clemlesne
Copy link
Author

Still relevant

@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Apr 24, 2025
Copy link

dosubot bot commented Apr 24, 2025

@PeriniM, the user has indicated that the proposal for adding local HTML caching for performance optimization is still relevant. Could you please assist them with this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant