This module creates the entire AWS infrastructure required for Tamr to work with AWS EMR. Currently, this module supports 3 patterns of use:
- Creation of infrastruction for static HBase cluster
- Creation of infrastructure for static Spark cluster
- Creation of infrastructure for ephemeral Spark cluster (the cluster itself is not created)
Fully working examples for each pattern of use. These examples might require extra resources to run the examples.
This module creates:
- 5 Security Groups
- One security group for EMR Managed Master instance(s)
- One security group for EMR Managed Core instance(s)
- One security group for additional ports for Master instance(s)
- One security group for additional ports for Core instance(s)
- One service access security group that can be attached to any instance
- Security group rules. The number of the security group rules varies based on the number of CIDRs or source SGs provided.
- 2 IAM Policies:
- Minimum required EMR service policy
- Minimum required EMR EC2 policy
- 2 IAM roles:
- Tamr EMR service IAM role
- Tamr EMR EC2 IAM role
- 1 IAM instance profile for EMR EC2 instances
- 1 bucket object with the cluster's JSON configuration in the root directory S3 bucket
If you are creating a static HBase or Spark cluster, this module also creates:
- 1 EMR Cluster and associated EMR Security Configuration
Note: For creating the logs and root directory buckets and/or S3-related permissions, use the terraform-aws-s3 module.
Name | Version |
---|---|
terraform | >= 0.13 |
aws | >= 3.36.0, < 4.0.0 |
No provider.
Name | Description | Type | Default | Required |
---|---|---|---|---|
applications | List of applications to run on EMR | list(string) |
n/a | yes |
bucket_name_for_logs | S3 bucket name for cluster logs. | string |
n/a | yes |
bucket_name_for_root_directory | S3 bucket name for storing root directory | string |
n/a | yes |
emr_config_file_path | Path to the EMR JSON configuration file. Please include the file name as well. | string |
n/a | yes |
emr_managed_core_sg_ids | List of EMR managed core security group ids | list(string) |
n/a | yes |
emr_managed_master_sg_ids | List of EMR managed master security group ids | list(string) |
n/a | yes |
emr_service_access_sg_ids | List of EMR service access security group ids | list(string) |
n/a | yes |
key_pair_name | Name of the Key Pair that will be attached to the EC2 instances | string |
n/a | yes |
subnet_id | ID of the subnet where the EMR cluster will be created | string |
n/a | yes |
vpc_id | VPC ID of the network | string |
n/a | yes |
abac_valid_tags | Valid tags for maintaining resources when using ABAC IAM Policies with Tag Conditions. Make sure tags contain a key value specified here. |
map(list(string)) |
{} |
no |
additional_policy_arns | List of policy ARNs to attach to EMR EC2 instance profile. | list(string) |
[] |
no |
additional_tags | [DEPRECATED: Use tags instead] Additional tags to be attached to the resources created. |
map(string) |
{} |
no |
arn_partition | The partition in which the resource is located. A partition is a group of AWS Regions. Each AWS account is scoped to one partition. The following are the supported partitions: aws -AWS Regions aws-cn - China Regions aws-us-gov - AWS GovCloud (US) Regions |
string |
"aws" |
no |
bootstrap_actions | Ordered list of bootstrap actions that will be run before Hadoop is started on the cluster nodes. | list(object({ |
[] |
no |
bucket_path_to_logs | Path in logs bucket to store cluster logs e.g. mycluster/logs | string |
"" |
no |
cluster_name | Name for the EMR cluster to be created | string |
"TAMR-EMR-Cluster" |
no |
core_bid_price | Bid price for each EC2 instance in the core instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instances |
string |
"" |
no |
core_bid_price_as_percentage_of_on_demand_price | Bid price as percentage of on-demand price for core instances | number |
100 |
no |
core_block_duration_minutes | Duration for core spot instances, in minutes | number |
0 |
no |
core_ebs_size | The volume size, in gibibytes (GiB). | string |
"500" |
no |
core_ebs_type | Type of volumes to attach to the core nodes. Valid options are gp2, io1, standard and st1 | string |
"gp2" |
no |
core_ebs_volumes_count | Number of volumes to attach to the core nodes | number |
1 |
no |
core_instance_fleet_name | Name for the core instance fleet | string |
"CoreInstanceFleet" |
no |
core_instance_on_demand_count | Number of on-demand instances for the spot instance fleet | number |
1 |
no |
core_instance_spot_count | Number of spot instances for the spot instance fleet | number |
0 |
no |
core_instance_type | The EC2 instance type of the core nodes | string |
"m4.xlarge" |
no |
core_timeout_action | Timeout action for core instances | string |
"SWITCH_TO_ON_DEMAND" |
no |
core_timeout_duration_minutes | Spot provisioning timeout for core instances, in minutes | number |
10 |
no |
create_static_cluster | True if the module should create a static cluster. False if the module should create supporting infrastructure but not the cluster itself. | bool |
true |
no |
custom_ami_id | The ID of a custom Amazon EBS-backed Linux AMI | string |
null |
no |
emr_ec2_iam_policy_name | Name for the IAM policy attached to the EMR service role | string |
"tamr-emr-ec2-policy" |
no |
emr_ec2_instance_profile_name | Name of the new instance profile for EMR EC2 instances | string |
"tamr_emr_ec2_instance_profile" |
no |
emr_ec2_role_name | Name of the new IAM role for EMR EC2 instances | string |
"tamr_emr_ec2_role" |
no |
emr_managed_core_sg_name | Name for the EMR managed core security group | string |
"TAMR-EMR-Core" |
no |
emr_managed_master_sg_name | Name for the EMR managed master security group | string |
"TAMR-EMR-Master" |
no |
emr_managed_sg_name | Name for the EMR managed security group | string |
"TAMR-EMR-Internal" |
no |
emr_service_access_sg_name | Name for the EMR Service Access security group | string |
"TAMR-EMR-Service-Access" |
no |
emr_service_iam_policy_name | Name for the IAM policy attached to the EMR Service role | string |
"tamr-emr-service-policy" |
no |
emr_service_role_name | Name of the new IAM service role for the EMR cluster | string |
"tamr_emr_service_role" |
no |
enable_http_port | EMR services like Ganglia run on the http port | bool |
false |
no |
hadoop_config_path | Path in root directory bucket to upload Hadoop config to | string |
"config/hadoop/conf/" |
no |
hbase_config_path | Path in root directory bucket to upload HBase config to | string |
"config/hbase/conf.dist/" |
no |
json_configuration_bucket_key | Key (i.e. path) of JSON configuration bucket object in the root directory bucket | string |
"config.json" |
no |
master_bid_price | Bid price for each EC2 instance in the master instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instances |
string |
"" |
no |
master_bid_price_as_percentage_of_on_demand_price | Bid price as percentage of on-demand price for master instances | number |
100 |
no |
master_block_duration_minutes | Duration for master spot instances, in minutes | number |
0 |
no |
master_ebs_size | The volume size, in gibibytes (GiB). | string |
"100" |
no |
master_ebs_type | Type of volumes to attach to the master nodes. Valid options are gp2, io1, standard and st1 | string |
"gp2" |
no |
master_ebs_volumes_count | Number of volumes to attach to the master nodes | number |
1 |
no |
master_instance_fleet_name | Name for the master instance fleet | string |
"MasterInstanceFleet" |
no |
master_instance_on_demand_count | Number of on-demand instances for the master instance fleet | number |
1 |
no |
master_instance_spot_count | Number of spot instances for the master instance fleet | number |
0 |
no |
master_instance_type | The EC2 instance type of the master nodes | string |
"m4.xlarge" |
no |
master_timeout_action | Timeout action for master instances | string |
"SWITCH_TO_ON_DEMAND" |
no |
master_timeout_duration_minutes | Spot provisioning timeout for master instances, in minutes | number |
10 |
no |
permissions_boundary | ARN of the policy that will be used to set the permissions boundary for all IAM Roles created by this module | string |
null |
no |
release_label | The release label for the Amazon EMR release. | string |
"emr-5.29.0" |
no |
require_abac_for_subnet | If abac_valid_tags is specified, choose whether or not to require ABAC also for actions related to the subnet | bool |
true |
no |
s3_policy_arns | [DEPRECATED] List of policy ARNs to attach to EMR EC2 instance profile. Use 'additional_policy_arns' instead. | list(string) |
[] |
no |
security_configuration | The name of an EMR Security Configuration | string |
null |
no |
tags | A map of tags to add to all resources. Replaces additional_tags . |
map(string) |
{} |
no |
utility_script_bucket_key | Key (i.e. path) to upload the utility script to | string |
"util/upload_hbase_config.sh" |
no |
Name | Description |
---|---|
core_ebs_size | The core EBS volume size, in gibibytes (GiB). |
core_ebs_type | The core EBS volume size, in gibibytes (GiB). |
core_ebs_volumes_count | Number of volumes to attach to the core nodes |
core_fleet_instance_count | Number of on-demand and spot core instances configured |
core_instance_type | The EC2 instance type of the core nodes |
emr_ec2_instance_profile_arn | ARN of the EMR EC2 instance profile created |
emr_ec2_instance_profile_name | Name of the EMR EC2 instance profile created |
emr_ec2_role_arn | ARN of the EMR EC2 role created for EC2 instances |
emr_managed_core_sg_ids | List of security group ids of the EMR Core Security Group |
emr_managed_master_sg_ids | List of security group ids of the EMR Master Security Group |
emr_managed_sg_id | Security group id of the EMR Managed Security Group for internal communication |
emr_service_access_sg_ids | List of security group ids of the EMR Service Access Security Group |
emr_service_role_arn | ARN of the EMR service role created |
emr_service_role_name | Name of the EMR service role created |
hbase_config_path | Path in the root directory bucket that HBase config was uploaded to. |
json_config_s3_key | The name of the json configuration object in the bucket. |
log_uri | The path to the S3 location where logs for this cluster are stored. |
master_ebs_size | The master EBS volume size, in gibibytes (GiB). |
master_ebs_type | Type of volumes to attach to the master nodes. Valid options are gp2, io1, standard and st1 |
master_ebs_volumes_count | Number of volumes to attach to the master nodes |
master_fleet_instance_count | Number of on-demand and spot master instances configured |
master_instance_type | The EC2 instance type of the master nodes |
release_label | The release label for the Amazon EMR release. |
subnet_id | ID of the subnet where EMR cluster was created |
tamr_emr_cluster_id | Identifier for the AWS EMR cluster created. Empty string if set up infrastructure for ephemeral cluster. |
tamr_emr_cluster_name | Name of the AWS EMR cluster created |
upload_config_script_s3_key | The name of the upload config script object in the bucket. |
This repo is based on:
- Terraform standard module structure
- AWS EMR HBase
- AWS EMR Cluster Terraform Docs
- Default IAM roles for EMR
- Service role for EMR
- EC2 role for EMR (Instance Profile)
- Best Practices for EMR
- AWS EMR Security Groups
- AWS EMR Additional Security Groups
- AWS EMR Security Configuration
- AWS EMR Bootstrap Actions
Run make terraform/docs
to generate the section of docs around terraform inputs, outputs and requirements.
Run make lint
, this will run terraform fmt, in addition to a few other checks to detect whitespace issues.
NOTE: this requires having docker working on the machine running the test
- Update version contained in
VERSION
- Document changes in
CHANGELOG.md
- Create a tag in github for the commit associated with the version
Apache 2 Licensed. See LICENSE for full details.