Releases

Changes in 0.16.0

Check out the 0.16.0 milestone to see new features / bugfixes in detail.

Important

The next release of Singularity (0.17.0) will contain a bump to mesos 1.1.2. Any critical bug fixes will be backported to 0.16 for a short period. See #1571 for more details on the upcoming upgrade

New Endpoints

1559 Added two additional endpoints to more easily track the lifecyle of a task. Both endpoints return a SingularityTaskState object with basic details about the current state of the task

  • /track/task/{taskId} - for tracking active tasks which have already been assigned an id
  • /track/run/{requestId}/{runId} - For tracking tasks by runId (e.g. an ON_DEMAND task). This endpoint can also search pending tasks

Improvements

  • 1560 - Separate timeouts for health check and task running
  • 1568 - Better bash escaping in docker runner script
  • 1515 - Add flag to trigger run of scheduled task on deploy
  • 1561 - Combine offers to schedule tasks more efficiently
  • 1562 - Retry failed client requests in SingularityClient
  • 1538 - Allow immediate runs in pending queue with deploy
  • 1482 - Better task balancing
  • 1557 - Immediate uploads on executor teardown
  • 1555 - Tailer improvements for tail_of_finished logs
  • 1549 - Baragon 0.5.0
  • 1526 - Add skip lb removal flag to DeleteRequestRequest
  • 1534 - Upgrade docker client
  • 1533 - Add mark as active/inactive endpoints to client
  • 1524 - Easy endpoint to check if user is authorized for request

Bug Fixes

  • 1572 - Fix bug when 'maxTotalHealthcheckTimeout' set
  • 1575 - Clarify expected runtime, add execution time limit in ui
  • 1569 - Remove extra scheduled entries from pending queue
  • 1564 - Fix values for the container type field so they deserialize properly.
  • 1553 - Release lock on bounce when bounced with no running instances
  • 1554 - Refuse to add cleaning tasks to LB
  • 1556 - Catch 404s when there is no task history in run now modal
  • 1535 - Ensure path is set on read endpoint
  • 1491 - Fix maxDeployIdSize and maxRequestIdSize validation

Documentation

  • 1567 - Changes to SMTP documentation & connection
  • 1548 - Fix markdown formatting
  • 1528 - Rephrase flagging host

Changes in 0.15.1

This is a bug fix release!

Check out the 0.15.1 milestone to see updates in detail.

Bug Fixes

  • 1520 - Fix for request stuck in deleting state
  • 1529 - Default cacheOffers to false and add docs
  • 1530 - Fix for Optional request bodies

Documentation

  • 1527 - Typo in documentation
  • 1531 - Bump version numbers in documentation

Changes in 0.15.0

Check out the 0.15.0 milestone to see new features / bugfixes in detail.

New Features

  • 1519 - Spread to all slaves

Adds a best effort attempt to deploy a task on all slaves. You can enable this by adding spreadAllSlavesEnabled: true to your Singularity yaml file. You can then set SPREAD_ALL_SLAVES for the placementStrategy on a SingularityRequest.

  • 1405 - Slave Usage Monitoring UI

In the Admin drop-down on the UI, there is now a Slave Usage monitoring page. This page contains visualizations about resource usages amongst all the slaves. The circular progress meters display the percent of total resources utilized on all slaves. The resources (cpu and memory) heat map displays percentage utilized per slave.

  • 1456 - Bump to java 8/Jersey 2/Dropwizard 1

Singularity is now officially on java 8! As part of this upgrade we have also bumped some library versions. Most notably upgrading to Jersey 2 and Dropwizard 1

The Singularity log tailer got a makeover to improve performance and memory usage. Currently you can still toggle between old and new tailer versions, but this will be removed in future releases.

Improvements

  • 1516 - Include deploy marker and oldest deploy step in state
  • 1517 - More flexible match on filename from logrotate
  • 1511 - Add ability to prefix all email subjects
  • 1489 - Fix Task Search navigation and refresh
  • 1510 - Add getTaskByRunIdForRequest to client
  • 1483 - Switch not loaded to loading in first tailer screen
  • 1503 - Don't show logs panel if task never running
  • 1486 - Use web cache for api calls from ui
  • 1478 - Add leader cache
  • 1481 - Allow artifact list to be specified on deploys
  • 1507 - Ability to provide HttpConfig for SingularityClient
  • 1499 - Warn when removing a request with lb configs
  • 1487 - Add a message to the request history when scaling
  • 1474 - Allow expensive endpoints to be disabled for non-admins
  • 1477 - Better zk cleanup for removed requests
  • 1480 - Put a limit on number of slaves to decommission at once
  • 1479 - Reduce timeout on requests to task sandboxes
  • 1472 - Add spacing line at the bottom of logs
  • 1476 - Better zk performance logging
  • 1475 - Singularity scheduler lock logging
  • 1465 - Pass job user as environment variable to task
  • 1473 - More disabled actions for pollers
  • 1466 - Add links between tailer versions
  • 1450 - Allow new tabs from global search page
  • 1452 - Add a task credit system
  • 1400 - Add slave and task usage tracking inside singularity ZK
  • 1447 - Set an optional max number of active tasks for ON_DEMAND requests
  • 1442 - Display previous overridden cleanups
  • 1451 - Additional disaster actions
  • 1453 - Remove the async status update queue

Bug Fixes

  • 1518 - Be sure to close Graphite properly
  • 1505 - Don't include cleaning tasks in instance count
  • 1513 - Write files in subdirectories to splat path
  • 1504 - Don't redirect until done fetching active tasks
  • 1502 - MD5 is case insensitive
  • 1492 - Correctly redirect in ui when no instances are found
  • 1498 - Forbid health checks longer than kill time
  • 1501 - Allow copying from the JSON button dialog
  • 1490 - Mark as not bouncing if paused before bounce completes
  • 1470 - Fix tailer when reloading file
  • 1469 - Add jita access for updating readWriteAccessGroup
  • 1467 - Fix task direct link to logs
  • 1428 - Prevent flapping slave from rejoining cluster
  • 1443 - Ignore 404s for /priority/freeze endpoint

Documentation

  • 1471 - Add in a note about running singularity on docker for mac

Changes in 0.14.1

This is a bug fix release.

Check out the 0.14.1 milestone to see new features / bugfixes in detail.

Bug Fixes

  • 1458 - Remove expiring scale on deploy finish
  • 1457 - Don't allow multiple bounces for the same request
  • 1461 - Interchange argument order in rootComponent
  • 1462 - extract in mesos uri defaults to true, executable defaults to false

Documentation

  • 1459 - fix README formatting

Changes in 0.14.0

Check out the 0.14.0 milestone to see new features / bugfixes in detail.

Important

The next release of Singularity (0.15.0) will include an upgrade to java 8.

Configuration Changes

#1391 include a rework of some of the S3 settings in Singularity. If you use the SingularityExecutor or SingularityExecutorCleanup modules and use the S3 upload features, you will need an update to your configuration. The fields for specifying which files to upload have been moved out of the SingularityExecutor. An example below shows all fields that would move.

Old Configuration (gets removed from SingularityExecutor and SingularityExecutorCleanup yaml files)

executor:
  s3UploaderBucket: my-logs-bucket
  s3UploaderKeyPattern: "%requestId/%Y/%m/%taskId_%index-%s-%filename"
  s3UploaderAdditionalFiles:
    - access.log
  s3StorageClass: "STANDARD_IA"
  applyS3StorageClassAfterBytes: 75000

New Configuration (if not already present for use with S3 log searching)

# in SingularityExecutorCleanup yaml configuration
executorCleanup:
  defaultS3Bucket: my-logs-bucket
  s3KeyFormat: "%requestId/%Y/%m/%taskId_%index-%s-%filename"
  s3StorageClass: "STANDARD_IA"
  applyS3StorageClassAfterBytes: 75000
  s3UploaderAdditionalFiles:
   - filename: access.log
     # The default directory in the executor was set to 'logs', now it must be manually specified
     # If not specified, the directory to search for log files will be the task app directory in the sandbox
     directory: logs

# in SingularityService yaml configuration
s3:
  s3Bucket: my-logs-bucket
  s3KeyFormat: "%requestId/%Y/%m/%taskId_%index-%s-%filename"
  s3StorageClass: "STANDARD_IA"
  applyS3StorageClassAfterBytes: 75000
  s3UploaderAdditionalFiles:
   - filename: access.log
     # The default directory in the executor was set to 'logs', now it must be manually specified
     # If not specified, the directory to search for log files will be the task app directory in the sandbox
     directory: logs

NOTE - To upgrade smoothly, it is strongly recommended to deploy SingularityService and the SingularityExecutorCleanup before deploying the SingularityExecutor

New Features

  • 1306 - Smarter Healthchecks

Singularity healthchecks are now split into two phases. Previous settings will continue to function but are deprecated. Health check options are now specified in the healthcheck object on the SingularityDeploy See The new documentation for more on updates to healthchecks.

Improvements

  • 1435 - Updated mysql index on startedAt
  • 1449 - Improve status logging
  • 1434 - Match mesos artifact defaults
  • 1424 - Rename metric for immediate uploaders
  • 1427 - Ability to specify params on request history in client
  • 1429 - Standardize on toString/hash/equals formats
  • 1402 - Introduce offer cache to allow better resource allocation
  • 1399 - Add ability to upload files immediately to S3
  • 1417 - Ability to specify cache on mesos artifact
  • 1377 - Report currentActiveInstances on SingularityDeployProgress
  • 1410 - Use an ldap cache
  • 1414 - Add style to change display of overflow text in run now dialog
  • 1415 - Replace existing global search with search from requests page
  • 1411 - Logging clean up for mesos protos
  • 1334 - Sanitize file data before parsing with Jackson
  • 1386 - Add a DELETEING state for reqeusts being deleted
  • 1388 - send graphite datapoints with optional tags
  • 1391 - S3 Search Improvements
  • 1401 - Ability to override docker workdir
  • 1398 - Clarifications on S3ArtifactSignature
  • 1376 - Update request for 'deploy to unpause' before saving pending deploy
  • 1397 - Shade com.google.thirdparty
  • 1375 - Allow setting S3 storage class at upload time
  • 1373 - customizable --use-compress-program for tar
  • 1392 - Optionally skip addition of extra s3 metadata
  • 1382 - Add maxTasksPerOffer at request level
  • 1367 - Additional settings for history purger
  • 1360 - Bounce updates for placement and scaling
  • 1369 - Additional threshold for deleting task history row data in sql

Bug Fixes

  • 1437 - Defer loading log files till after page load
  • 1445 - use implicit acks when not offloading status updates to another thread
  • 1433 - Properly roll back from an overdue incremental deploy
  • 1426 - Prevent building duplicate immediate uploaders
  • 1412 - fix duplicate exit checker, add longer initial task wait
  • 1423 - Update docker parameters on deploy form
  • 1409 - Properly allow destroy task from ui
  • 1407 - Forbid more characters from request/deploy IDs
  • 1418 - Fixes for Immediate Uploaders
  • 1403 - Import formModal on racks page
  • 1371 - S3Downlaoder - Block on download only
  • 1387 - No retry for scheduled tasks run on-demand through UI
  • 1380 - Remove compressed log viewing endpoint
  • 1381 - remove all compressed log viewing code
  • 1372 - Update wrapper script for shell commands to read correct pid
  • 1368 - Read group from task data if present

Documentation

  • 1365 - Fix PATH in API annotations

Changes in 0.13.0

Check out the 0.13.0 milestone to see new features / bugfixes in detail.

New Features

1342 - Introduce expiring machine state changes

On the Slaves and Racks pages in the UI, there is now an Expiration field present when initiating a state change. When a duration is specified here, the slave or rack will revert to the specified state after time has elapsed. Note that if another state change takes place before the expiration time, the expiration will only remain active if the resulting state transition is still valid.

For example, if decommissioning a slave with an expiration of 1 minute, with a evertToState of ACTIVE, it is possible the slave will finish decommissioning in that time. So, it's state would change from DECOMMISSIONING to DECOMMISSIONED. In this case, the expiring action will remain active, because DECOMMISSIONED -> ACTIVE is still a valid state change.

1219 - Starred requests persistence

User settings (so far only starred requests) are now stored server-side instead of client-side in localStorage. If a username is not found, you will be prompted for a username when you first load the ui. Any starred requests you currently have in localStorage will automatically be migrated to the server-side storage.

Improvements

  • 1355 - Update uuid to version 3.0.0
  • 1352 - Check isEmpty on attributes for more accurate message
  • 1354 - Count losts tasks with a Meter
  • 1347 - Nicer format for disaster email
  • 1259 - Alternate compression formats and viewing compressed files in UI
  • 1348 - Add tests for SingularityUI
  • 1344 - Also grab containerId when grabbing directory

Bug Fixes

  • 1353 - To string fix for Deploy and Builder
  • 1351 - Only allow patch versions of webpack
  • 1349 - Fixes for run now dialog
  • 1345 - Fix when bounce alert banner is shown
  • 1341 - Fix custom executor command on new deploy page
  • 1343 - Fix js TypeError on task detail page
  • 1332 - Ensure quotes and new lines are escaped in echo

Documentation

  • 1350 - Docs updates and addition of missing swagger annotations

Changes in 0.12.0

Check out the 0.12.0 milestone to see new features / bugfixes in detail.

Migrations

#1283 (Change deployHistory bytes to a MEDIUMBLOB), #1316 (Expand requestHistory.createdAt column to millisecond precision), and #1319 (Make the history purger query more efficient), contain migrations.

If you have a large number of tasks in your database (e.g. more than 100k), it is possible that the last of these migrations (#1319) could be very slow when run via liquibase. If this is a concern, we recommend using pt-online-schema-change to run your migration.

In order to run your migration with pt-online-schema-change, the following command is equal to liquibase migration 14.

pt-online-schema-change \
  --user=(your db user) \
  --ask-pass \
  --alter "ADD COLUMN purged BOOLEAN NOT NULL DEFAULT false, ADD KEY purged (requestId, purged, updatedAt)" \
  --execute \
  D=(your database name),t=taskHistory

Improvements

  • #1135 Surface taskReconciliationStartedAt in SingularityState object
  • #1217 Request group level actions in the ui
  • #1221 Get count of results for blended history calls
  • #1226 Ability to have multiple readWrite groups
  • #1227 Ability to redeploy from the ui
  • #1244 Add team requests to dashboard
  • #1264 Execution timeout for tasks
  • #1268 Process status updates in separate thread
  • #1284 Show launching tasks separate from active in status
  • #1290 More thorough validation for scale changes
  • #1291 Better messages for sentry reporting
  • #1293 Cache files with md5, add more detail for cache misses
  • #1295 Clean up tests + make travis build more reliable
  • #1298 Updates to slave information and reconciling slaves
  • #1301 Add readWriteGroups to request form ui
  • #1304 More even distribution among racks
  • #1308 Get the SingularityClient up to date
  • #1310 Shortcut to task by instance number in ui
  • #1314 Add global read only groups
  • #1315 Ability to search task history by runId
  • #1320 Ability to run shell command before killing task
  • #1324 Extra filtering on fuzzy match
  • #1325 Refresh task lists appropriately
  • #1330 Show dropdown of previous command line args in run-now modal
  • #1333 Remove OOMKiller and LogWatcher Modules
  • #1338 Surface SingularityPendingRequestParent response from SingularityClient.runSingularityRequest

Bug Fixes

  • #1278 Only send email if the list of active disasters has changed
  • #1281 Make critical task lag require more than a single overdue task
  • #1282 Fix finished log link
  • #1289 Don't rely on Singularity active requests list when searching historical logs
  • #1294 Fix shell command modal file watching
  • #1297 Don't show loading forever on empty log files
  • #1303 Consider tasks with skipped healthchecks in cleaner
  • #1311 Add pending request on failed deploy as well
  • #1317 Use saveTaskCleanup() instead of createTaskCleanup() for deploys
  • #1321 Make deleteTaskHistoryBytes property do what it says it does
  • #1326 Fix js TypeError on task detail page
  • #1328 Fix props and prop types in disasters page
  • #1331 Make task destroy work from the ui

Documentation

  • #1285 Add versions in readme

Changes in 0.11.0

Check out the 0.11.0 milestone to see new features / bugfixes in detail.

New Features

Improvements

  • #1272 - Show seconds in timestamps for healthchecks
  • #1271 - Update files not found message
  • #1262 - Support for setting user in default executor
  • #1248 - Reorganize task label colors
  • #1240 - Also allow DELETE/PUT when using CORS filter
  • #1137 - SingularityService configuration & DC/OS support
  • #1197 - Guarantee durationMillis is present in getExpiringBounce response
  • #1191 - Support for task id var substitution in env
  • #1169 - Support multiple docker parameters and task labels
  • #1100 - Add method for grabbing a snapshot of the master metrics to MesosClient
  • #1091 - Support overriding the log level in tests
  • #1033 - Support task history search on updatedAt
  • #1030 - Introduce the concept of a SingularityRequestGroup

Bug Fixes

  • #1273 - Resolve the correct log path based on taskAppDirectory
  • #1267 - Remove bogus label from file browser actions column
  • #1266 - Select the correct healthcheck for UI alert banner
  • #1263 - Undo click-to-copy changes
  • #1256 - Fix typo in task alerts
  • #1252 - Fix for negative durations
  • #1246 - Make duration fields string inputs again
  • #1176 - Remove the now > start check in RFC5545

Documentation

  • #1265 - Update slave-extras.md

Changes in 0.10.1

This is a bug fix release.

Check out the 0.10.1 milestone to see bugfixes in detail.

UI Fixes

  • #1235 Properly handle deleted requests
  • #1236 UI updates for default config setup. After the release of 0.10.0, several users reported being unable to navigate directly to a page of the Singularity UI. This was due to an unanticipated change in how URLs were routed in the Backbone to React migration. This has been fixed in 0.10.1.
  • #1238 Use the relative path for appRoot
  • #1234 Run now UI service restoration
  • #1239 Add href to nav items
  • #1242 Sort environment variables alphabetically
  • #1232 UI development docs Backbone/Coffeescript => es6/React/Redux

Other Fixes and Improvements

  • #1188 Sort task history updates by ExtendedTaskState enum ordinal, not timestamp
  • #1189 Better cleanup of incremental actions
  • #1194 Deploy failure messages for non-task-specific failures
  • #1212 Allow searching of logs for deleted requests

Changes in 0.10.0

Check out the 0.10.0 milestone to see new features / bugfixes in detail.

UI Rewrite (Backbone/Coffeescript -> React/Redux/JS6)

This rewrite was composed of a number of pull requests, but the consolidated diff can be seen in #1077. Other UI changes include:

  • #1195 - Request form improvements
  • #1223 - Tags input tweaks (for cmd line input)
  • #1192 - Code cleanliness improvements
  • #1225 - Fix the back button when navigating through files on the task detail page
  • #1211 - Render every row on dashboard tables
  • #1210 - Don't show the wait for replacement task option when killing tasks in certain request types
  • #1215 - Check for presence of promise before attempting to catch errors
  • #1207 - UI support for absent deploy field
  • #1209 - Name all the modals that don't already have a name
  • #1208 - Fix duration field overflowing modal in firefox
  • #1206 - Refresh the request detail page after performing actions
  • #1199 - Sentry support
  • #1200 - Aggregate Tailer fix
  • #1190 - Run now fixes
  • #1202 - Add case-sensitive paths plugin
  • #1196 - Request groups page
  • #1164 - Request and Deploy Title Copy Link
  • #1178 - Replace dashes with underscores when searching for matching hosts
  • #1186 - Add ability to show task disk resource if configured to do so
  • #1182 - Add timezone field and kill dropdowns if there are less than 5 options
  • #1185 - Add JSON Button to the Racks and Slaves pages
  • #1179 - Handle 404 properly
  • #1158 - Fix the request detail page bounce messages
  • #1165 - Improvements to task search
  • #1156 - Truncate long tags; Hover trigger to show the whole thing

Improvements

  • #1170 - Add TASK_ID environment variable with the singularity task_id
  • #1150 - Add support for timezone field for scheduled requests
  • #1138 - Docker auth config in custom executor
  • #1006 - Changes to user requested task kills in custom executor

Bug Fixes

  • #1184 - Fail differently if request data not present during deploy check
  • #1171 - Do not remove from LB if not yet added

Changes in 0.9.0

Check out the 0.9.0 milestone to see new features / bugfixes in detail.

Configuration Changes and Deprecations

  • #1000 Tweaks to s3UploaderAdditionalFiles
    • The SingularityExecutor configuration field s3UploaderAdditionalFiles, now supports a directory field to specify the directory to search for files to upload. However, the default for this directory is now the task directory (created when using SingularityExecutor), previously it was the logs directory within the task directory. To mimic previous behavior add a directory of logs to any existing s3UploaderAdditionalFiles settings.
  • #1166 Change API path "/skipHealthchecks" to "/skip-healthchecks"
    • The API path /skipHealthchecks has been renamed to /skip-healthchecks to be more consistent with the rest of the api and avoid using camel case in paths. The previous endpoint is deprecated and will be removed in an upcoming release

Improvements

  • #993 / #1020 Add support for disk resource
  • #1048 Updated image tags in compose files
  • #1093 Add downloadUrl field, for explicitly downloading from S3
  • #1099 include offer ID in resourceOffers() logging statement
  • #1110 Add 'extraScript' field to UI configuration for extra analytics/etc
  • #1045 Configurable max metadata message length
  • #1084 Only show wait for replacement checkbox if a WORKER or a SERVICE
  • #1134 Rename pubish_gitbook.sh to publish_gitbook.sh
  • #1118 Always launch docker containers in the correct parent cgroup
  • #1051 Configurable max attempts for docker pull in custom executor
  • #1155 Enable use of docker parameters in custom executor script
  • #1167 Expiring pause deletion responsibility shift
  • #1109 Allow resource override to be set on a run now request

Bug Fixes

  • #1088 Avoid race condition in deploy to unpause
  • #1108 Fix docker compose yaml to get hostname properly
  • #1083 Remove extraneous console.logs
  • #1070 Bump to version 0.28.2 to fix MESOS-5449
  • #1096 Fix filename truncation in sandbox endpoint
  • #1082 Fix broken shell log link
  • #1085 Fix table paging
  • #1145 Fix default hostname in compose yaml
  • #1067 Bump moment minor version to mitigate CVE-2016-4055
  • #1069 Concurrent run once tasks
  • #1081 Handle new lines in docker environment variables correctly
  • #1090 Fix docker integration environment
  • #1161 Request - Null Owner fix
  • #1163 RETRY for scheduled task should keep cmdLineArgs
  • #1175 Remove requirement of BRIDGE mode to specify port map

Changes in 0.8.0

Check out the 0.8.0 milestone to see new features / bugfixes in detail.

New Features

  • #971 Ability to update request data upon successful deploy
  • #1071 Initial support for RFC5545 schedules

Improvements

  • #1021 Convert racks and slaves pages to react
  • #1031 Better request and deploy id validation
  • #1032 Allow glob matching in addition to fuzzy matching on Requests and Tasks page
  • #1039 UI improvements with global search tool
  • #1043 Add support for hourly logrotate
  • #1057 Move back to react-typeahead mainline version instead of HubSpot fork
  • #1059 Set proper Content-Type and Content-Encoding for s3 uploads
  • #1060 Setup babel for ES6 and JSX transformation
  • #1064 Shade guava for SingularityClient
  • #1073 Build the UI as its own module

Bug Fixes

  • #975 Remove cleanup after bounce expire
  • #1044 Only show task killed in message for healthchecks if not in running state
  • #1068 Make sure to remove obsolete pending requests
  • #1078 Typo: "Settingss" -> "Settings" on the Deploy form

Changes in 0.7.1

This is a bug fix release.

Check out the 0.7.1 milestone to see bugfixes in detail.

  • #1034 Change package.json 'vex' dependency to 'vex-js'
  • #1049 Don't set shell if arguments list is empty, ability to override shell
  • #1050 Add polyfills for Object.assign and Promise

Changes in 0.7.0

This release bumps Singularity’s Mesos dependency from version 0.23.0 to 0.28.1. Check out the documentation on the mesos site for more information about upgrading your mesos cluster to 0.28.1.

#994 - Upgrade mesos version to 0.28.1


Changes in 0.6.2

This is a bug fix release.

Bug Fixes

  • #1078 Typo: "Settingss" -> "Settings" on the Deploy form
  • #1068 Make sure to remove obsolete pending requests
  • #975 Remove cleanup after bounce expire

Changes in 0.6.1

This is a bug fix release.

Check out the 0.7.1 milestone to see bugfixes in detail. (Changes for 0.6.1 are the same as 0.7.1)

  • #1034 Change package.json 'vex' dependency to 'vex-js'
  • #1049 Don't set shell if arguments list is empty, ability to override shell
  • #1050 Add polyfills for Object.assign and Promise

Changes in 0.6.0

Check out the 0.6.0 milestone to see new features / bugfixes in detail.

New Features

  • #879 Add ability to post task metadata to a
  • #996 Webhooks UI

Improvements

  • #916 New task checker should respect deploy healthcheck retry settings
  • #933 #979 Email, tail logs from bottom of file, find the first error
  • #950 Allow multiple on demand requests to be queued up
  • #955 Optional redirect in the UI when receiving a 401 from the api
  • #959 Add thread checker types
  • #974 Log tailer rewrite
  • #980 Add template, domains and additionalRoutes lb fields
  • #981 #988 #1017Improvements to task page UI
  • #985 Better failure message when creating a scheduled request with no schedule
  • #995 Only suggest an even number accross racks for SERVICE type
  • #1011 Auto-scroll to bottom of run-now dialog
  • #1012 Make email colors match UI colors
  • #1014 Update layout for 4 logs in aggregate tailer
  • #1015 Show only log name in aggregate tailer title
  • #1016 Don't show over/under provisioned on status page if they are 0
  • #1022 Aggregate tailer tooltip for host and instance number
  • #1023 Dashboard UI improvements
  • #1026 Better handling of logs not found in aggregate tailer
  • #1028 Show TASK_ERROR state as danger in the ui
  • #1036 Add a defaultPortMapping which exposes all mesos provided ports

Bug Fixes

  • #984 Corrected task failure count in deploy failure info
  • #990 Fix slaves link not filtering tasks properly
  • #991 Fix missing user being treated as admin
  • #996 Don't get attributes from missing deployProgress
  • #998 Enable buttons on dashboard paused requests toable
  • #1004 Convert SingularityMailer to an interface to avoid errors with missing smtp config
  • #1008 Don't trigger cooldown is TASK_LOST reason is invalid offers
  • #1024 Don't show edit reqeust button if hideNewRequestButton is true
  • #1027 CPU usage bar transitions back to non-error colors
  • #1037 Avoid IndexOutOfBounds error when scaling down during a deploy

Documentation

  • #1005 Fix configuration documentation

Changes in 0.5.0

Check out the 0.5.0 milestone to see new features / bugfixes in detail.

New Features

#839 enables better task history searching via the Singularity UI and API. (Also #890, #932, #935)

As of 0.5.0, Singularity has better support for searching historical tasks. A global task search endpoint was added:

/api/history/tasks -> Retrieve the history sorted by startedAt for all inactive tasks.

The above endpoint as well as /api/history/request/{requestId}/tasks now take additonal query parameters:

  • requestId: Optional request id to match (only for /api/history/tasks endpoint as it is already specified in the path for /request/{requestId}/tasks)
  • deployId: Optional deploy id to match
  • host: Optional host (slave host name) to match
  • lastTaskStatus: Optional ExtendedTaskState to match
  • startedAfter: Optionally match only tasks started after this time (13 digit unix timestamp)
  • startedBefore: Optionally match only tasks started before this time (13 digit unix timestamp)
  • orderDirection: Sort direction (by startedAt), can be ASC or DESC, defaults to DESC (newest tasks first)
  • count: Maximum number of items to return, defaults to 100 and has a maximum value of 1000
  • page: Page of items to view (e.g. page 1 is the first count items, page 2 is the next count items), defaults to 1

For clusters using mysql that have a large number of tasks in the history, a relevant configuration option of taskHistoryQueryUsesZkFirst has been added in the base Singularity Configuration. This option can be used to either prefer efficiency or exact ordering when searching through task history, it defaults to false.

  • When false the setting will prefer correct ordering. This may require multiple database calls, since Singularity needs to determine the overall order of items base on persisted (in mysql) and non-persisted (still in zookeeper) tasks. The overall search may be less efficient, but the ordering is guranteed to be correct.

  • When true the setting will prefer efficiency. In this case, it will be assumed that all task histories in zookeeper (not yet persisted) come before those in mysql (persisted). This results in faster results and fewer queries, but ordering is not guaranteed to be correct between persisted and non-persisted items.

Task Search DB Migration

Before deploying task search (release 0.4.12) it is neccessary to run liquibase migrations 10 and 11. This migration adds the neccessary columns and indexes, and backfills data for those new columns in the taskHistory table so that searching can be done efficiently and on more fields. If you have a large number of tasks in your database (e.g. more than 100k) , it is possible that these migrations could be very slow when run via liquibase. If this is a concern, we recommend using pt-online-schema-change to run your migration.

In order to run your migration with pt-online-schema-change, the following command is equal to liquibase migration 10.

pt-online-schema-change \
  --user=(your db user) \
  --ask-pass \
  --alter "ADD KEY startedAt2 (startedAt, requestId), ADD COLUMN host VARCHAR(100) CHARACTER SET ASCII NULL,  ADD COLUMN startedAt TIMESTAMP NULL, DROP KEY deployId, ADD KEY startedAt (requestId, startedAt), ADD KEY lastTaskStatus (requestId, lastTaskStatus, startedAt), ADD KEY deployId (requestId, deployId, startedAt), ADD KEY host (requestId, host, startedAt)" \
  --execute \
  D=(your database name),t=taskHistory

This will complete liquibase migration 10. In order to get the liquibase table in order, you can run a command of db fast-forward which will create the entry in the migrations table for the next migration to run. So, if you previously ran migration 9, it will only create migration 10.

java -jar SingularityService/target/SingularityService-.jar db fast-forward ./config.yaml --migrations migrations.sql

The update statements to backfill the newly added columns can also possibly be slow when you have a large number of tasks (e.g. more than 100k) in your history. There are a few options you can use to help this migration run more smoothly:

  • Add an additional ADD KEY host2 (host) to the end of the --alter statement before running the pt-online-schema-change above. This index is not neccessary for search, but will allow the host field backfill to run much quicker
  • Run each of the update statements in a loop with an additional LIMIT XXXX added, where XXXX is some number (for example 5000). This way you are not trying to update the entire table in a single query (which would lock the table until the query was done), but are updating it in chunks. You can continue running this loop until the migrations are done.

If you ran the update for migration 11 manually, you can run the db fast-forward command from above again in order to update the migrations table for liquibase.

As a last note, if there is a gap of time between running these migrations and deploying the new version of Singularity, it is wise to run the backfill queries manually an additional time. If any tasks have persisted to the database between the intial migration run and the time of deploying the new Singularity version, those tasks will not have the host and startedAt columns filled in.

#817 enables incremental deploys. This allows the user to deploy any portion of instances at a time, and either pause for a pre-determined time, or wait for a manual signal to start deploying the next portion of instances.

Incremental Deploys

As of 0.5.0 Singularity supports an incremental deploy for finer-grained control when rolling out new changes. This deploy is enabled via a few extra fields on the SingularityDeploy object when starting a deploy:

  • deployInstanceCountPerStep: Deploy this many instances at a time until the total instance count for the request is reached is reached (Optional, default is all instances at once)
  • deployStepWaitTimeMs: Wait this many milliseconds between deploy steps before continuing to deploy the next deployInstanceCountPerStep instances (Optional, default is 0, i.e. continue immediately)
  • autoAdvanceDeploySteps: automatically advance to the next target instance count after deployStepWaitTimeMs seconds (Optional, defaults to true). If this is false, then manual confirmation will be needed to move to the next target instance count. This can be done via the ui.

Example

TestService is currently running 3 instances. During the next deploy, you want to replace only 1 of these instances at a time and have Singularity wait at least a minute after deploying one so you can verify that everything works as expected. The following fields can be added to the deploy json to accomplish this:

deployInstanceCountPerStep: 1
deployStepWaitTimeMs: 60000
autoAdvanceDeploySteps: true

When the deploy starts, Singularity will start 1 (deployInstanceCountPerStep) instance from the new deploy (The 3 old instances will still be running). Once the new task is determined to be healthy a few things happen:

  • Singularity will add the instance from the new deploy to the load balancer (if applicable)
  • Singularity will shut down 1 (deployInstanceCountPerStep) of the instances from the old deploy after removing it from the load balancer (if applicable)
  • Singularity will start counting down the 60000 ms until it launches the next deployInstanceCountPerStep instances

Once the deployStepWaitTimeMs of wait time has elapsed, Singularity will start this process again, launching a second task for the new deploy, waiting until it is healthy, then shutting down a task from the old deploy. This will continue until the deploy fails, the deploy is cancelled, or all instances are part of the new deploy and it succeeds.

A few more things to note about the incremental deploy process:

  • If the deploy fails or is cancelled, Singularity replaces any missing instances from the old deploy and makes sure they are healthy before shutting down active/healthy instances from the new deploy. (i.e. you will never be under capacity)
  • At any time, it is possible to advance the deploy to another target instance count via the UI or API. In other words, you can skip the remaining deployStepWaitTimeMs, skip steps of the deploy, or even decrease the instance count to roll back a step.

Improvements

  • #885 Allow destination targets to be specified
  • #891 Use pushState for log line links
  • #892 Cut down on amount of task data stored in ZK
  • #893 Better deploy failure information
  • #894 Task kill checkbox label
  • #906 Suggest an even number accross racks for rackSensitive requests (Also, #949)
  • #910 Basic framework auth
  • #912 Upgrade basepom version
  • #923 Goodbye brunch, hello gulp
  • #925 Get memory info from deploy object instead of mesos task resources
  • #934 Don't display healthcheck message for tasks in state TASK_LOST or TASK_FINISHED
  • #936 Add ability to search through tasks table by request type
  • #937 Refactor HistoricalTasks Collections into One
  • #938 Fix Duplicates in Fuzzy Search
  • #939 Add mailhog to compose-dev
  • #943 Don't capture ctrl+t
  • #944 More descriptive error message on http 200 with no json
  • #951 After submitting or editing a request, redirect to that request's page
  • #964 Bump to latest horizon version
  • #965 Warning if singularity has no leader or leader has no mesos connection
  • #966 More consistent terminology and deploy failure on task page
  • #968 Better Slave Decommissioned / Offline Communication
  • #969 Add extra command line args to task finished email
  • #970 Task badge color for TASK_KILLED is now default (grey)
  • #977 Run now dialog improvements

Bug Fixes

  • #902 Fix time display in emails
  • #903 Ensure run once tasks don't get launched at startup
  • #913 Don't display healthcheck notification if task state is failed
  • #914 Include docker exception cause in thread checker
  • #927 Missing an early return for numPorts == 0
  • #930 Remove min-height from fileBrowserSubview
  • #952 Don't Lose Currently Typed Owner/Rack Affinity When Losing Focus
  • #957 Deploy to unpause should also remove any expiring pause
  • #962 Multiple ui fixes
  • #963 Also set shell false if pending task cmdLineArgs are set

Documentation

  • #919 Update config docs for recently added fields
  • #922 Add docs on request and deploy concepts
  • #940 Use gitbook for docs
  • #958 Generate swagger json alongside docs

Changes in 0.4.11

Check out the 0.4.11 milestone to see new features / bugfixes in detail.

Improvements

  • #915 Add mailhog to integration test environment
  • #908 Better sorting with fuzzy search
  • #907 Make SingularityExecutor handle docker volumes when hostPath is absent
  • #900 Store status update reason field in SingularityTaskHistoryUpdate

Changes in 0.4.10

BUG The UI build for this release is known to only contain partial css. Please use the 0.4.11 release instead

Check out the 0.4.10 milestone to see new features / bugfixes in detail.

New Features

  • #887 - Allow deploy to specify a port index for healthchecks and LB api. See custom ports for more details.

improvements

  • #854 - Add methods to client for unpause and scale requests
  • #856 - Upgrade Horizon
  • #866 - Add a default bounce expiration
  • #871 - Add support for override defaults file
  • #875 - Fix unpause button for expiring pause in UI
  • #881 - Logrotate and S3 upload tweaks
  • #884 - Include shell command name in log filename
  • #889 - Add support for max numbers of objects in ZK when no database is configured
  • #901 - Lock brunch to version 2.2.x

Bug Fixes

  • #788 - Fix router base path when hosted at /
  • #826 - Tweak how ulimit is called
  • #857 - Consider LB when deciding to remove something during bounce
  • #860 - Fix the sorting in the wrong order bug
  • #864 - Cancel futures in thread, avoiding docker deadlocks on cleanup
  • #865 - Better handling of Docker timeouts
  • #904 - Deploy and edit buttons need to be link elements, not button elements

Changes in 0.4.9

Renamed endpoints

This endpoint was renamed:

  • /requests/request/{requestId}/instances --> /requests/request/{requestId}/scale

These endpoints were renamed to fix a typo in the URL:

  • /racks/rack/{rackId}/decomission --> /racks/rack/{rackId}/decommission
  • /slaves/slave/{rackId}/decomission --> /slaves/slave/{rackId}/decommission

Expiring Actions

Released in 0.4.9

Action expiration + additional action metadata

Some actions in Singularity now have the concept of expiration (as in, giving up after a certain period of time). Corresponding endpoints have been updated to accept more information about action expiration and action metadata.

Rack and slave operations

  • /racks/rack/{rackId}/decommission
  • /racks/rack/{rackId}/freeze
  • /racks/rack/{rackId}/activate
  • /slaves/slave/{slaveId}/decommission
  • /slaves/slave/{slaveId}/freeze
  • /slaves/slave/{slaveId}/activate

These URLs accept a JSON object with this format:

name type required description
message string optional A message to show to users about why this action was taken

Request bounce

  • /requests/request/{requestId}/bounce

This URL accepts a JSON object with this format:

name type required description
skipHealthchecks boolean optional Instruct replacement tasks for this bounce only to skip healthchecks
durationMillis long optional The number of milliseconds to wait before reversing the effects of this action (letting it expire)
message string optional A message to show to users about why this action was taken
actionId string optional An id to associate with this action for metadata purposes
incremental boolean optional If present and set to true, old tasks will be killed as soon as replacement tasks are available, instead of waiting for all replacement tasks to be healthy

Scheduling a request to run immediately

  • /requests/request/{requestId}/run

This URL accepts a JSON object with this format:

name type required description
runId string optional An id to associate with this request which will be associated with the corresponding launched tasks
skipHealthchecks boolean optional If set to true, healthchecks will be skipped for this task run
commandLineArgs Array[string] optional Command line arguments to be passed to the task
message string optional A message to show to users about why this action was taken

Unpausing a request

  • /requests/request/{requestId}/unpause

This URL accepts a JSON object with this format:

name type required description
skipHealthchecks boolean optional If set to true, instructs new tasks that are scheduled immediately while unpausing to skip healthchecks
message string optional A message to show to users about why this action was taken
actionId string optional An id to associate with this action for metadata purposes

Exit request cooldown

  • /requests/request/{requestId}/exit-cooldown

This URL accepts a JSON object with this format:

name type required description
skipHealthchecks boolean optional Instruct new tasks that are scheduled immediately while executing cooldown to skip healthchecks
message string optional A message to show to users about why this action was taken
actionId string optional An id to associate with this action for metadata purposes

Deleting a request

  • /requests/request/{requestId}

This URL accepts a JSON object with this format:

name type required description
message string optional A message to show to users about why this action was taken
actionId string optional An id to associate with this action for metadata purposes

Killing a task

  • /tasks/task/{taskId}

This URL accepts a JSON object with this format:

name type required description
waitForReplacementTask boolean optional If set to true, treats this task kill as a bounce - launching another task and waiting for it to become healthy
override boolean optional If set to true, instructs the executor to attempt to immediately kill the task, rather than waiting gracefully
message string optional A message to show to users about why this action was taken
actionId string optional An id to associate with this action for metadata purposes

Scaling requests

  • /requests/request/{requestId}/scale (previously /requests/request/{requestId}/instances)

This URL accepts a JSON object with this format:

name type required description
skipHealthchecks boolean optional If set to true, healthchecks will be skipped while scaling this request (only)
durationMillis long optional The number of milliseconds to wait before reversing the effects of this action (letting it expire)
message string optional A message to show to users about why this action was taken
actionId string optional An id to associate with this action for metadata purposes
instances int optional The number of instances to scale to

Pausing a request

  • /requests/request/{requestId}/pause

This URL accepts a JSON object with this format:

name type required description
durationMillis long optional The number of milliseconds to wait before reversing the effects of this action (letting it expire)
killTasks boolean optional If set to false, tasks will be allowed to finish instead of killed immediately
message string optional A message to show to users about why this action was taken
actionId string optional An id to associate with this action for metadata purposes

NOTE: The user field has been removed from this object.

Disabling request healthchecks

  • /requests/request/{requestId}/skip-healthchecks

This URL accepts a JSON object with this format:

name type required description
skipHealthchecks boolean optional If set to true, healthchecks will be skipped for all tasks for this request until reversed
durationMillis long optional The number of milliseconds to wait before reversing the effects of this action (letting it expire)
message string optional A message to show to users about why this action was taken
actionId string optional An id to associate with this action for metadata purposes

New endpoints for cancelling actions

These endpoints were added in order to support cancelling certain actions:

  • DELETE /requests/request/{requestId}/scale -- Cancel an expiring scale
  • DELETE /requests/request/{requestId}/skip-healthchecks -- Cancel an expiring skip healthchecks override
  • DELETE /request/{requestId}/pause -- Cancel (unpause) an expiring pause
  • DELETE /request/{requestId}/bounce -- Cancel a bounce

Other Improvements and Fixes

  • #837 - Make sure literal host ports are processed correctly
  • #842 - Task is cleaning default msg fix
  • #849 - Include message with emails
  • #850 - Fix unified tailer
  • #851 - Warning to disable healthchecks for < 1 hour
  • #852 - Page now automatically refreshes even after invalid duration entered
  • #853 - File error fix
  • #859 - Don't show tasks as overdue instantly
  • #863 - Don't show deleted request message if not deleted
  • #868 - Bump to Brunch 2

Changes in 0.4.8

Check out the 0.4.8 milestone to see new features / bugfixes in detail.

Improvements

  • #744 - New log tail UI
  • #774 - allow requests to override their email notification settings
  • #782 - support per-bucket creds for downloading artifacts
  • #801 - Link each column section
  • #805 - Change empty task sandbox message depending on current task state
  • #806 - Add filters on request table for active deploy and no deploy
  • #812 - Clean tasks on decommissioned hosts
  • #813 - Get task info by runId
  • #815 - New fuzzy search algorithm for better results and perf
  • #819 - better launch msg
  • #825 - surface info about the pending deploy in request detail page

    Bug Fixes

  • #773 - time-box the docker client to avoid ever getting stuck
  • #779 - Hide log link on task page if the file doesn't exist
  • #821 - fix issue with incorrect parsing of task status
  • #824 - be sure to transfer labels to the task info
  • #827 - be sure to check deploy health for incremental bounce

Changes in 0.4.7

Check out the 0.4.7 milestone to see new features / bugfixes in detail.

Fixed

  • When the 0.4.6 release was built, the static assets for the Web UI weren't properly packaged in the JAR. This is now fixed in 0.4.7.

Improvements

  • #777 - Properly sort requests in the Web UI
  • #820 - Web UI gives blank page after first deploy

Changes in 0.4.6

Check out the 0.4.6 milestone to see new features / bugfixes in detail.

New Features

  • Shell commands using the singularity executor. You can read more about the shell commands feature here
  • Introduce an incremental bounce. An incremental bounce will kill old tasks as new tasks become healthy instead of witing for all new tasks to be healthy before shutting down old tasks. This is especially useful when running services with many instances on a lean cluster. The option is available in the ui when asking to bounce, or by adding a query param of incremental=true to your POST request to the /request/{requestId}/bounce endpoint.

Improvements

  • #635 - Truncate beginning of S3 log filenames
  • #690 - improve ui uncaught error message
  • #705 - allow for custom switch user commands in SingularityExecutor
  • #694 - Remove paused requests from LB
  • #708 - Show extra cmd line arguments and allow rerunning of tasks
  • #713 - Give meaningful titles to all pages
  • #719 - Add logs section with link to latest
  • #721 - Task alerts
  • #729 - Optionally take start and end time query params for S3 log search
  • #730 - Change checkpoint default to true
  • #732 - Star requests from the request detail page
  • #734 - Change allocated cpu units to floats in the json object
  • #739 - Add request and task links to breadcrumbs on tail view
  • #745 - Only update links instead of rerendering the whole page
  • #748 - Auto-exit s3uploader and s3downloader if no S3 credentials are set
  • #749 - Custom time formats
  • #751 - Use dropwizard-guicier 0.7.1.2
  • #756 - Add support for read-only user groups
  • #759 - Email tweaks
  • #766 - Deny a bounce if there aren't enough resources to complete it
  • #771 - Expose run-task method in client
  • #772 - Bake user query params in to authentication system
  • #775 - If inside a HTTP request, include URL in sentry error
  • #783 - Tweaks to make grepping a file easier
  • #784 - Set a default for s3UploaderKeyPattern
  • #785 - Allow default value for readOnlyGroups
  • #786 - Add support for default healthcheckMaxRetries and healthcheckMaxTotalTimeoutSeconds values in SingularityConfiguration
  • #789 - Surface info about bounces in request detail page
  • #790 - Better executor logging
  • #792 - Skip building web UI via skipSingularityWebUI property
  • #794 - Show paused requests in dashboard view
  • #795 - Add option to also bounce when scaling
  • #796 - Make files table in task view sortable
  • #802 - Expose killed task records
  • #803 - Watch reset and check active tasks manually
  • #809 - Add link to finished service.log to failed healthcheck notification

Bug Fixes

  • #682 - Configuring the Network field should not be predicated on having port mappings configured
  • #717 - Fix thread checker, case where docker container has already stopped
  • #722 - Fix tooltip positioning on edit request page
  • #735 - Make sure tasks are in ZK
  • #743 - Thread pool of 1 if prefixes is 0, avoid IllegalArgument
  • #761 - Fix NPE when check for exception cause
  • #763 - Don't throw thread check exceptions if task was already asked to stop
  • #780 - Stop tailing if scroll to top is clicked
  • #791 - Fix search input
  • #804 - Add tasks from /killed to decom badges
  • #808 - Fixes + tweaks to DECOM badge

Config Changes

  • #750 - Remove old property-style runnable config code. The .properties configuration format for Singularity slave helpers:

    • SingularityExecutor
    • SingularityExecutorCleanup
    • SingularityS3Downloader
    • SingularityS3Uploader
    • SingularityOOMKiller
    • SingularityLogWatcher

    If you use any of these, please convert your configuration to the .yaml style.


Changes in 0.4.5

Check out the 0.4.5 milestone to see new features / bugfixes in detail.

  • Singularity 0.4.5 bumps its Mesos dependency from 0.21.0 to 0.23.0. #657
  • If upgrading from a version prior to 0.4.4, you will need to run database migrations. Refer to the database docs for how to run migrations before starting the new version of Singularity.

Deprecated

  • The .properties configuration format for Singularity slave helpers:

    • SingularityExecutor
    • SingularityExecutorCleanup
    • SingularityS3Downloader
    • SingularityS3Uploader
    • SingularityOOMKiller
    • SingularityLogWatcher

    If you use any of these, please convert your configuration to the .yaml style.

    .properties support will be removed completely in 0.4.6.