Open issues

Azkaban appears to hold HDFS socket connections open indefinitely/very long time
AZK-57
[security] users can DoS azkaban box trivially
AZK-44
Remove a schedule job kills one existing job
AZK-137
Azkaban failed to run job about two hours later after it started
AZK-130
Delete Jobs/Flows/Uploaded Zips in Azkaban
AZK-54
unrar not working in azkaban in a script.
AZK-138
Azkaban restart of failed workflow restart existing running jobs
AZK-136
Azkaban crashes on submitting 2 jobs having same name
AZK-135
deadlock
AZK-134
Allow pig.additional.jars for Pig scripts
AZK-133
Separate Job Runner From Rest of Azkaban
AZK-132
Add Alerting for Jobs
AZK-131
Environment variable JOB_OUTPUT_PROP_FILE is not being set by azkaban when running a python job
AZK-128
Azkaban 0.10 breaks variable substitution in environment variables
AZK-127
Fix race condition in AzkabanProcess
AZK-126
JMX MBean and Agent to refresh workflows from disk
AZK-125
Azkaban Web UI should have more standard access logs.
AZK-124
Add jmx hooks for Schedule-Unschedule commands
AZK-123
Queue jobs
AZK-122
LoggingJob leaks log file descriptors
AZK-120
Job details page does not handle properties with long text well
AZK-118
NPE in Azkaban
AZK-117
Azkaban jobs sometimes cannot be cancelled
AZK-116
Resource locks should have a timeout
AZK-115
Azkaban job should fail if it attempts to acquire locks multiple times
AZK-114
Delete jobs feature would be useful
AZK-113
Jobs should be able to override job.failure.email and job.success.email
AZK-111
stream leaks
AZK-107
Authentication / Authorization support with LDAP and Kerberos
AZK-106
UI Updates to cover new functionality
AZK-105
Create RESTful API
AZK-104
Improve scheduler to support more flexible triggers
AZK-103
Persistence layer for saving definitions (workflow, task, data) as well as metrics related to workflow execution and data nodes
AZK-102
JSON for defining workflow nodes, task or workunit nodes and data nodes.
AZK-101
Replace workflow engine with an implementation based on a graph library
AZK-100
conditional job type
AZK-96
hadoop file viewer doesn't display LZO encoded data
AZK-95
More advance time scheduling system (Cron like)
AZK-92
Job details page throw HTTP Status 500 message for invalid logs
AZK-91
Working.dir & Environment variables do not support the parameter propagation logic.
AZK-89
azkaban does not change classpath when running jobs across multiple packages
AZK-88
parameter substitution without brackets
AZK-87
extensible job types
AZK-86
Cancel Button doesn't cancel a job completely
AZK-85
Scheduler sometimes fire events twice.
AZK-84
Azkaban jsTreeView is too slow. Switch it out with different js package
AZK-82
Logs mixed up
AZK-77
Job triggers (when etl job finishes, run my job)
AZK-76
Show inherited properties (from props file) in job details page
AZK-74
Show jobs that DEPEND on the jobjob in job details page
AZK-73
issue 1 of 91

Azkaban appears to hold HDFS socket connections open indefinitely/very long time

Description

On one of our Azkaban servers, we are seeing over 1400 connections to DataNode RPC ports. Considering that we only have 44 nodes in our HDFS, this seems extremely high. Given a big enough grid, Azkaban will eventually exhaust all ephemeral ports on the server.

Environment

None

Status

Assignee

Richard Park

Reporter

Allen Wittenauer

Labels

None

Priority

Blocker