Glue Jobの実行が完了した後、それを検知して他のGlue Jobを自動実行させるために検討/調査したことを残しておきます
実装前に検討したこと
最初の構想
今回複数あるGlue Jobのいずれか一つの実行が完了した時、その完了を検知し、特定のGlue Jobを実行する仕様を検討していました。EventBridgeで検知後、Glue Jobを開始できれば一番お手軽かと考えていました
Glue Workflowを利用した場合の構想
しかし、EventBridgeから直接Glue Jobを実行することは現時点で不可能なようで、Glue Jobを実行するためには、Glue Workflowと Glue Triggerを作成する必要がありました。下図のcommon-glue-job
で「どのjobが発火したか」で一部処理内容を分岐させるため、job名をパラメータとしてGlue Workflowに渡したいと考えていました。しかしどんなに探してみても方法は見つからず。。
最終的な処理フロー
Event Bridgeで検知後、Glue Workflowをキックする際にパラメータを設定できないようなので、結局分岐させたかった数の分だけEvent Bridge, Workflow, Triggerを作成する羽目になりました
Cloud Formationで実装
1つのjobに対してのeventbridge, glue workflowの実装内容を残しておきます
Type: AWS::Glue::Job
Properties:
Name: "glue-job-test01"
Role: !Ref GlueServiceRole
GlueVersion: "3.0"
Command:
Name: "glueetl"
PythonVersion: "3"
ScriptLocation: "s3://aws-glue-scripts-${AWS::AccountId}-ap-northeast-1/admin/emptyjob.py"
DefaultArguments:
--TempDir: "s3://aws-glue-temporary-${AWS::AccountId}-ap-northeast-1/"
--enable-continuous-cloudwatch-log: "true"
--enable-job-insights: "true"
--enable-metrics: "true"
--job-bookmark-option: "job-bookmark-disable"
--job-language: "python"
ExecutionProperty:
MaxConcurrentRuns: 1
WorkerType: "G.2X"
NumberOfWorkers: 10
MaxRetries: 0
Timeout: 2880
Type: AWS::Glue::Job
Properties:
Name: "commo-glue-job"
Role: !Ref GlueServiceRole
Command:
Name: "pythonshell"
PythonVersion: "3"
ScriptLocation: "s3://aws-glue-scripts-${AWS::AccountId}-ap-northeast-1/admin/emptyjob2.py"
DefaultArguments:
--TempDir: "s3://aws-glue-temporary-${AWS::AccountId}-ap-northeast-1/"
--enable-continuous-cloudwatch-log: "true"
--enable-job-insights: "true"
--enable-metrics: "true"
--job-bookmark-option: "job-bookmark-disable"
ORIGINAL_JOB: ""
ExecutionProperty:
MaxConcurrentRuns: 10
MaxRetries: 0
Timeout: 2880
DependsOn:
- EventBridgeGlueExecutionRole
- GlueWorkflowTest01
Type: AWS::Events::Rule
Properties:
Name: EventRuleTest01
EventPattern:
source:
- aws.glue
detail-type:
- Glue Job State Change
detail:
jobName:
- !Ref GlueJobTest01
state:
- SUCCEEDED
Targets:
-
Arn: !Sub arn:aws:glue:${AWS::Region}:${AWS::AccountId}:workflow/${GlueWorkflowTest01}
Id: GlueJobTriggersWorkflow
RoleArn: !GetAtt 'EventBridgeGlueExecutionRole.Arn'
Type: AWS::Glue::Workflow
Properties:
Name: wf-test01
Description: Glue workflow triggered by Glue Job Event
Type: AWS::Glue::Trigger
DependsOn:
- GlueJobTest01
Properties:
Name: test_01_job_trigger
Description: Glue trigger which is listening on another Glue Job
Type: EVENT
Actions:
- JobName: !Ref CommonGlueJob
Arguments:
--ORIGINAL_JOB: testjob
WorkflowName: !Ref GlueWorkflowTest01
Actions: –> Argumentsの設定でtriggerからパラメータを渡すことができます
Type: AWS::IAM::Role
DependsOn:
- GlueWorkflowTest01
Properties:
RoleName: !Sub EventBridgeGlueExecutionRole-${AWS::StackName}
Description: Has permissions to invoke the NotifyEvent API for an AWS Glue workflow.
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service:
- events.amazonaws.com
Action:
- sts:AssumeRole
Path: /
Policies:
- PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- glue:notifyEvent
Resource:
- !Sub arn:aws:glue:${AWS::Region}:${AWS::AccountId}:workflow/${GlueWorkflowTest01}
PolicyName: EventBridgeGlueExecutionRole-policy
Type: AWS::IAM::Role
Properties:
RoleName: GlueServiceRole-workflowtest
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: glue.amazonaws.com
Action: sts:AssumeRole
Path: /
Policies:
- PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- s3:PutObject
- s3:GetObject
- s3:ListObject
- s3:GetBucketLocation
- s3:ListBucket
- s3:ListAllMyBuckets
- s3:GetBucketAcl
- glue:*
Resource: '*'
- Effect: Allow
Action:
- logs:CreateLogStream
- logs:PutLogEvents
- logs:CreateLogGroup
Resource:
- arn:aws:logs:*:*:/aws-glue/*
PolicyName: GlueServiceRole-workflowtest
最後に
最後に一つjobを動かしたいだけなのに、すごく手間が掛かった印象です。
Step FunctionsではGlue Jobの実行が可能のようなので、そちらで作り直そうかと検討しています。