Skip to content

Using AthenaDatasetDefinition in Sagemaker processing job as input results in error with missing "sagemaker_processing" database. #5176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
leo4ever opened this issue May 13, 2025 · 1 comment
Labels
component: processing Relates to the SageMaker Processing Platform type: bug

Comments

@leo4ever
Copy link

Describe the bug
I am trying to setup a Sagemaker processing job where the job input is defined using the AthenaDatasetDefinition. When executing the job, it fails with message below. It appears the job is trying to create a new database sagemaker_processing. I have tried to specify to reuse an existing database using the dataset definition parameters and also specified the output S3 URI parameter but they don't seem to help.

{"level":"ERROR","ts":"2025-05-13T16:18:55.242Z","msg":"[sagemaker logs] [Input: input-1] Error creating database 'sagemaker_processing' in catalog 'awsdatacatalog'."} {"level":"ERROR","ts":"2025-05-13T16:18:55.242Z","msg":"[sagemaker logs] [Input: input-1] Error AccessDeniedException: User: arn:aws:sts::726167300549:assumed-role/99999-sagemaker-devmanaged-role/SageMaker is not authorized to perform: glue:CreateDatabase on resource: arn:aws:glue:us-west-2:726167300549:catalog because no identity-based policy allows the glue:CreateDatabase action"}

To reproduce

  1. Define a sagemaker processing job using AthenaDatasetDefinition as ProcessingInput.
  2. Execute the job

Expected behavior

  1. Job executes without trying to create a new database.

Screenshots or logs
{"level":"INFO","ts":"2025-05-13T16:18:55.011Z","msg":"[sagemaker logs] [Input: input-1] Athena dataset definition specified. Starting athena query execution."} {"level":"INFO","ts":"2025-05-13T16:18:55.011Z","msg":"[sagemaker logs] [Input: input-1] Creating database 'sagemaker_processing' in catalog 'awsdatacatalog' if doesn't exist already."} {"level":"ERROR","ts":"2025-05-13T16:18:55.242Z","msg":"[sagemaker logs] [Input: input-1] Error creating database 'sagemaker_processing' in catalog 'awsdatacatalog'."} {"level":"ERROR","ts":"2025-05-13T16:18:55.242Z","msg":"[sagemaker logs] [Input: input-1] Error AccessDeniedException: User: arn:aws:sts::726167300549:assumed-role/99999-sagemaker-devmanaged-role/SageMaker is not authorized to perform: glue:CreateDatabase on resource: arn:aws:glue:us-west-2:726167300549:catalog because no identity-based policy allows the glue:CreateDatabase action"}

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.227.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): ScriptProcessor
  • Framework version:
  • Python version: 3.11.11
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): N

Additional context

@pintaoz-aws pintaoz-aws added type: bug component: processing Relates to the SageMaker Processing Platform labels May 14, 2025
@shah-rukk
Copy link

Seem your role "arn:aws:sts::726167300549:assumed-role/99999-sagemaker-devmanaged-role/SageMaker" does not have glue:CreateDatabase permissions

  1. Try granting glue:CreateDatabase or Glue:* (temporary) permissions to your role
  2. Test it with some Admin role.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: processing Relates to the SageMaker Processing Platform type: bug
Projects
None yet
Development

No branches or pull requests

3 participants