Releases

Changes in 1.2.0

Check out the 1.2.0 milestone to see new features / bugfixes in detail.

New Features

Improvements

  • 2074 - Log extra information about the docker image we are about to use
  • 2071 - Better exposure of LB status in UI
  • 2065 - Quieter offer logging
  • 2062 - Remove unused import in RequestResource
  • 2063 - Debounce UI Request/Task Search
  • 2059 - Do not retry ON_DEMAND tasks which were killed by user request.
  • 2060 - Self Service Shuffle Opt Out
  • 2057 - Slightly Smarter Task Shuffling
  • 2058 - Version SingularityUI assets
  • 2051 - Better crash loop detection
  • 2052 - Carry forward "Task launched because" message when retrying task.
  • 2050 - Better preferential scheduling
  • 2049 - Prioritize memory shuffles over cpu

Bug Fixes

  • 2056 - Validate counts of racks to avoid accidental addition
  • 2072 - Quiet down some exceptions that don't matter
  • 2073 - Fix unauthorized error message
  • 2069 - Fix run now modal not being able to tail task launch
  • 2067 - Remove old static build settings
  • 2068 - Consistently compare memory in GB, not MB/B
  • 2066 - Automatically handle tasks that might be stuck in launching status
  • 2064 - Prevent request action modals from closing upon regaining focus
  • 2061 - Properly reconnect after losing mesos master connection
  • 2054 - fix mismatch in http/https protocol
  • 2048 - Don't mutate args in lb sync comparison
  • 2046 - Process a reconciliation for all tasks with missing state in zk

Changes in 1.1.0

Check out the 1.1.0 milestone to see new features / bugfixes in detail.

Singularity 1.1.0 is mainly focused on scheduler performance. Internally we have also begun testing this version on java 11. Some java 11 notes:

  • While the default GC algorithm for java 11 has been updated to G1, we have seen better scheduler throughput and overall performance for SingularityService using parallelgc. We have not yet extensively tested zgc for Singularity
  • The SingularityExecutor must be upgraded to 1.1.0 to work with java 11. Earlier versions of all other components should still function on java11, but may not be as performant without the upadtes in this release
  • If using mysql with Singularity, the ssl-related updates in the updated version of mysql-connector-java will provide much better performance and remove an error that would previously get logged whenever a new connection was created

New Features

  • 2035 - New client method to get active tasks' ids
  • 2036 - Add endpoint to fetch all active task states
  • 2009 - Add ability to override environment in run now modal
  • 1971 - Filtered S3 task sandbox file search (API Only)

Improvements

  • 2039 - Make task history persister not use the request level lock
  • Scheduler Performance Improvements
    • 2034 - More performance updates
    • 2032 - Reconnect, offer, and startup performance improvements
    • 2023 - Cache deploy stats per offer check run
    • 2024 - Log lock hold times at debug level if too long
    • 2007 - More efficient use of state data
  • Java 11 Support Updates
    • 2025 - Upgrade mysql-connector-java
    • 2026 - Add jarkata runtime dep for executor module
    • 2020 - Update ProcessUtils to work with java11
  • 2018 - Add hour to s3 key format options
  • 2014 - Don't immediately clean all history items, let the history purger do it
  • 2008 - Make the request id global search include all but deleted requests
  • 2011 - Retry more TASK_LOST cases on deploy
  • 2012 - Show the runId in task tables on request detail page
  • 2000 - Module + ObjectMapper cleanup

Bug Fixes

  • 2038 - Catch more SQL exceptions in persister queries
  • 2037 - offer executor needs at least 2 threads
  • 2033 - Don't allow LB sync to remove last remaining upstreams
  • 2029 - Race condition fixes
  • 2022 - Just log the task ID in new task checker so we can see the throwable
  • 2019 - Remount RequestDetailPage when jumping between requests
  • 2021 - return early for missing deploy
  • 2016 - Ensure delayed oneoff tasks get rescheduled on new deploy
  • 2013 - Validate number of ports in run now resource overrides
  • 2010 - Clear resource usage after a new deploy
  • 2006 - Fix for overdue missing deploy
  • 2003 - Task history ui updates for sort
  • 2041 - Findbugs fixes

Documentation

  • 2015 - Updated cooldown docs

Changes in 1.0.0

Singularity has been running our production infrastructure at HubSpot for years and the team is happy to announce a (probably long overdue) 1.0.0 release. Check out the 1.0.0 milestone to see new changes in detail.

Breaking Changes

Singularity 1.0.0 includes a pile of tech debt cleanup. The most significant of these is a move from the deprecated guava Optional to the newer java.util.Optional. Any java clients using the SingularityBase or SingularityClient modules will need to update appropriately. While very similar, the two Optionals are not binary compatible. This article breifly explains a few of the differences.

Improvements

  • 1986/1993/1992 - Tech debt cleanup and dependency updates
  • 1994 - Bump bootstrap from 3.3.7 to 3.4.1
  • 1996 - Bump eslint from 2.13.1 to 4.18.2

Bug Fixes

  • 1985 - Fix api reference docs link
  • 1981 - Fix typo on request utilization component
  • 1960 - Fix SingularityClient#killTask() result parsing
  • 1987 - Also check sql for task/directory
  • 1988 - Do not allow user to override STARTED_BY_USER variable

Changes in 0.23.0

Check out the 0.23.0 milestone to see new features / bugfixes in detail. 0.23.0 in general represents a number of performance improvements in relation to Singularity's usage of zookeeper and mysql as well as a mesos version bump.

Migrations

MySQL/Postgres

0.23.0 contains multiple database migrations (https://github.com/HubSpot/Singularity/pull/1928 + https://github.com/HubSpot/Singularity/pull/1956). These must be run BEFORE deploying the new version of SingularityService and are compatible with the running 0.22.0 release. You can check out our docs on migrations to run these with liquibase. If you manage a larger installation of Singularity utilizaing mysql (e.g. millions of tasks in task history), we recommend running the migrations using pt-online-schema-change to minimize interruptions. Migrations and ptosc arguments are listed below for convinience:

  • Addition of usage tracking table - This can be run with liquibase since it is a non-blocking migration and is the first changeSet in the new release. To run only a single changeSet in liquibase (e.g. to then run the remaining ones with ptosc), add the --count 1 option when running db migrate
  • Change requestHistory table charset + add json column - --alter "CHARACTER SET ascii COLLATE ascii_bin, MODIFY COLUMN request blob DEFAULT NULL, MODIFY COLUMN requestId varchar(100) CHARACTER SET ascii COLLATE ascii_bin NOT NULL, MODIFY COLUMN requestState ENUM ('CREATED', 'UPDATED', 'DELETING', 'DELETED', 'PAUSED', 'UNPAUSED', 'ENTERED_COOLDOWN', 'EXITED_COOLDOWN', 'FINISHED', 'DEPLOYED_TO_UNPAUSE', 'BOUNCED', 'SCALED', 'SCALE_REVERTED') NOT NULL, MODIFY COLUMN user varchar(100) CHARACTER SET ascii COLLATE ascii_bin DEFAULT NULL, MODIFY COLUMN message varchar(280) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL, ADD COLUMN json JSON DEFAULT NULL"
  • Change deployHistory table charset + add json column - --alter "CHARACTER SET ascii COLLATE ascii_bin, MODIFY COLUMN bytes MEDIUMBLOB DEFAULT NULL, MODIFY COLUMN requestId varchar(100) CHARACTER SET ascii COLLATE ascii_bin NOT NULL, MODIFY COLUMN deployId varchar(100) CHARACTER SET ascii COLLATE ascii_bin NOT NULL, MODIFY COLUMN user varchar(100) CHARACTER SET ascii COLLATE ascii_bin DEFAULT NULL, MODIFY COLUMN message varchar(280) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL, MODIFY COLUMN deployState ENUM ('SUCCEEDED', 'FAILED_INTERNAL_STATE', 'CANCELING', 'WAITING', 'OVERDUE', 'FAILED', 'CANCELED') NOT NULL, ADD COLUMN json JSON DEFAULT NULL"
  • Change taskHistory table charset/enums + add json column - --alter "CHARACTER SET ascii COLLATE ascii_bin, MODIFY COLUMN bytes MEDIUMBLOB DEFAULT NULL, MODIFY COLUMN taskId varchar(200) CHARACTER SET ascii COLLATE ascii_bin NOT NULL, MODIFY COLUMN requestId varchar(100) CHARACTER SET ascii COLLATE ascii_bin NOT NULL, MODIFY COLUMN lastTaskStatus ENUM ('TASK_LAUNCHED', 'TASK_STAGING', 'TASK_STARTING', 'TASK_RUNNING', 'TASK_CLEANING', 'TASK_KILLING', 'TASK_FINISHED', 'TASK_FAILED', 'TASK_KILLED', 'TASK_LOST', 'TASK_LOST_WHILE_DOWN', 'TASK_ERROR', 'TASK_DROPPED', 'TASK_GONE', 'TASK_UNREACHABLE', 'TASK_GONE_BY_OPERATOR', 'TASK_UNKNOWN') NOT NULL, MODIFY COLUMN runId varchar(100) CHARACTER SET ascii COLLATE ascii_bin DEFAULT NULL, MODIFY COLUMN deployId varchar(100) CHARACTER SET ascii COLLATE ascii_bin DEFAULT NULL, ADD COLUMN json JSON DEFAULT NULL, ADD KEY requestDeployUpdated (requestId, deployId, updatedAt), ADD KEY hostUpdated (host, updatedAt)"
  • Change taskUsage table charset - --alter "CHARACTER SET ascii COLLATE ascii_bin ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8, MODIFY COLUMN requestId varchar(100) CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT '', MODIFY COLUMN taskId varchar(200) CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT ''"

As seen above, these migrations prep Singularity to use mysql's json data type instead of a blob for history storage. All net-new history will be stored in the json format and old lob columns are not yet dropped. Singularity will look for either format currently when fetching individual task histories. You can kick off a backfill of data from blob -> json format by sending an http POST to the /api/history/sql-backfill?batchSize=20 endpoint on SingularityService (batch size is configurable to balance resources vs speed). If Singularity need to restart/etc, this process is idempotent and can be kicked off as many times as needed, though only one invocation can run at a time.

Zookeeper

On startup, Singularity will run a number of migrations to zookeeper task data. These are aimed at reducing the possible size of any single zookeeper read. You may notice that the first startup of the new Singularity release is slower due to these changes running. This migration is idempotent and will be re-attempted on next startup if it should fail.

Mesos Version Upgrade

Singularity 0.23.0 is build against mesos 1.8, but should be compatible with all earlier 1.x versions of mesos

New Features

  • 1958 - Configurably request DNS preresolution for load-balanced services
  • 1955 - Pre-resolve upstreams for Singularity-managed BaragonServices.

Performance Improvements

  • 1976 - UI Performance Updates for active/pending tasks pages
  • 1972 - More efficient active tasks call for executor cleanup
  • 1956 - SQL migrations for efficiency (blob -> json + utf8 -> ascii)
  • 1963 - Also purge old deploy and request history from SQL
  • 1920 - Add ability to do sns-based updates instead of webhooks
  • 1922 - The zoo is under new management (zk cleanup)
  • 1938 - Make Agent/Rack Resource use proxy to leader
  • 1932 - Refactor calling of offer evaluation
  • 1939 - Add option to fetch a batch of requests
  • 1928 - MySQL task resource usage storage
  • 1906 - Reduce S3Uploader memory usage during directory scan

General Improvements

  • 1962 - Support searching for request logs with a specified date range via SingularityClient.
  • 1791 - Allow fetching full request data (with deploy data) in SingularityClient
  • 1975 - junit5
  • 1957 - Bump Baragon version to 0.9.0.
  • 1961 - Rework cooldown logic
  • 1954 - UI And Other Improvements
  • 1919 - Configurably skip shell command prefix for Docker tasks only.
  • 1908 - Cleaner failover + update dependencies
  • 1909 - Skip fuser/lsof check when uploader is marked as immediate
  • 1944 - Handle status updates from recovering agents appropriately
  • 1951 - Add builder methods to SingularityRequestBuilder & SingularityDeployBuilder
  • 1945 - Bump to mesos 1.8.0
  • 1946 - Alternative way to specify auth for the mesos scheduler api
  • 1947 - Add zk leader indicator on status ui
  • 1941 - Add ability to disable task shuffle from UI
  • 1942 - Check assigned ports are available in SingularityExecutor
  • 1943 - Upstream validation
  • 1937 - Shuffle tasks on hosts with overutilized memory resources
  • 1930 - Flag for immediate task history persist
  • 1915 - Make Singularity report byte counts to monitor against jute buffer size
  • 1914 - Fix handling of file-based health check failure
  • 1905 - Add token authenticator option

Bug Fixes

  • 1978 - Fix deploy link on requests page + pending tasks table
  • 1979 - Fix missing extension for additional logrotate file
  • 1969 - Calculate max task lag after excluding on demands with instance limit
  • 1970 - Proxy deploy cancellations to leader
  • 1973 - tweak cooldown thresholds and evaluation logic
  • 1974 - Fix task history page size on refresh
  • 1953 - Only clean if the sandbox directory still exists.
  • 1966 - Fix request state filter and task search task state in UI
  • 1952 - Only log an exception if it has one
  • 1949 - Update to newer download endpoint name
  • 1950 - Configurably delete logrotateAdditionalFiles 15 mins after task termination.
  • 1933 - More explicit choice of canSkipZk flag
  • 1940 - Files from S3Downloader should be world readable, not read-writeable
  • 1934 - Explicitly check for UnknownHostExceptions
  • 1936 - Correctly choose a system load metric.
  • 1931 - Change persist strategy
  • 1925 - Always remove from LB, even if add in WAITING
  • 1926 - Also attempt to get TaskHistory from history manager for mail
  • 1927 - Skip offers with null id
  • 1929 - Put a time limit on the uploader check so we don't get stuck
  • 1913 - Clarify logging in offer scheduler
  • 1912 - Also fetch the original file for log snippets in email
  • 1911 - Fix non-paginated fetch of s3 logs for task
  • 1910 - Do the entire deploy history persist under the request lock
  • 1907 - On demand requests with instances shouldn't trigger task lag over instance count
  • 1902 - Fix typo in auth check path
  • 1899 - Add missing enum

Documentation

  • 1901 - New adopter

Changes in 0.22.0

Check out the 0.22.0 milestone to see new features / bugfixes in detail. 0.22.0 represents an upgrade to mesos 1.6.1. This also includes an upgrade to protobuf 3 and a move to a separate fork of mesos-rxjava. Instuctions for upgrading mesos can be found here. In internal testing we found communications between old mesos master/upgraded scheduler and vice versa were both backwards compatible back to mesos 1.1.0.

New Features

  • 1873 - Non-http health checks

Improvements

  • 1893 - Use a profile to build for postgresql
  • 1894 - Add trace logging for header passthrough authenticator
  • 1895 - Add option to specify executor cleanup creds in file
  • 1885 - Add guava/jdk8 modules to default SingularityClient object mapper
  • 1887 - Remove all switch user bits from docker wrapper
  • 1880 - Improve ExtendedTaskState readability
  • 1888 - Report uploader/downloader metrics by writing to a file.
  • 1891 - Add ability to set uploader additional files at deploy level
  • 1849 - Agent attribute minimums
  • 1876 - Better alerting for task lag
  • 1877 - Update alert banner to only show when there's a widespread issue for task lag
  • 1878 - Add support for inactive task history filtered by deploy ID
  • 1875 - Ability to rotate at a max service log size
  • 1854 - Ability to open sandbox files from SingularityUI
  • 1867 - Upgrade to mesos 1.6.1 and fork of mesos-rxjava
  • 1869 - Faster run now enqueue
  • 1860 - Make SingularityUI snappier

Bug Fixes

  • 1882 - Be sure to remove expiring bounces when a bounce completes
  • 1847 - Ability to set 'shell' in deploy ui
  • 1883 - Acquire a request-level lock when persisting deploy history.
  • 1879 - Make task history paging less flaky
  • 1881 - Don't count pending tasks limited by instance count in task lag
  • 1874 - Additional updates for backpressure handling
  • 1853 - Failover when we miss Mesos Master heartbeats.
  • 1868 - Email unsubscription bugfix
  • 1866 - Clean up logrotate files sooner
  • 1871 - Add ability to ignore logrotate hourly output
  • 1863 - Grab the request lock when persisting request/deploy history

Documentation

  • 1884 - Fix broken docs link

Changes in 0.21.0

Check out the 0.21.0 milestone to see new features / bugfixes in detail.

New Features

  • 1844 - Add postgres support for JDBIHistory
  • 1843 - Add email unsubscriptions endpoint
  • 1838 - Customizable bash in startup script

Improvements

  • 1852 - Use a lock & a timeout when checking whether an upload target file is open
  • 1851 - lsof alternative for open files check
  • 1835 - UI - Don't autofocus the search
  • 1834 - Option to skip scheduling on agents with missing usage data
  • 1833 - Move directory logging to trace level
  • 1801 - Add deep link page number to task history page
  • 1829 - Prepare a StreamingOutput response when serving file downloads
  • 1832 - Nix the old dashboard and default to requests page
  • 1831 - Only run a single history persister at once
  • 1822 - Prevent new host overloading
  • 1826 - Only require read authorization to view list of agents
  • 1817 - Proxy download from Mesos agent over Singularity. Users no longer require direct mesos agent api access to download files
  • 1814 - Updated logrotate frequency to allow for an override

Bug Fixes

  • 1846 - Logic fix for rebalance racks cleanup
  • 1850 - Make requestUtilization map thread safe.
  • 1841 - Fix required groups auth check on requests endpoints
  • 1839 - Remove sort in getPortByIndex
  • 1840 - Don't try to parse task ids from other frameworks
  • 1836 - Remove second declaration of defaultProps in UITable in SingaularityUI.
  • 1837 - Carry forward resource overrides when retrying run-nows.
  • 1825 - Deprioritize STARTUP-type pending requests.
  • 1815 - Clarify global search message
  • 1810 - Task cleanup fixes for decommission and delete
  • 1811 - Create task cleanups for scale down at request update time

Documentation

  • Redo gitbook to include newer swagger ui based on openapi json

Changes in 0.20.1

This is a bug fix release

Bug Fixes


Changes in 0.20.0

Check out the 0.20.0 milestone to see new features / bugfixes in detail.

Configuration Changes

#1784 simplified the configuration for the weighting of different resources when evaluating offers. An old version of the config would look like:

mesos:
  longRunningFreeResourceWeight: 0.5
  longRunningUsedResourceWeight: 0.5
  nonLonRunningFreeResourceWeight: 0.5
  nonLongRunningUsedResourceWeight: 0.5
  scoringStrategy: SPREAD_TASK_USAGE

longRunningUsedCpuWeightForOffer: 0.25
longRunningUsedMemWeightForOffer: 0.65
longRunningUsedDiskWeightForOffer: 0.1
freeCpuWeightForOffer: 0.25
freeMemWeightForOffer: 0.65
freeDiskWeightForOffer: 0.1
defaultOfferScoreForMissingUsage: 0.3
maxNonLongRunningUsedResourceWeight: 0.5
considerNonLongRunningTaskLongRunningAfterRunningForSeconds: 3600

The new simplified config is now (defaults are shown):

mesos:
  allocatedResourceWeight: 0.5
  inUseResourceWeight: 0.5
  cpuWeight: 0.4
  memWeight: 0.4
  diskWeight: 0.2

Default behavior remains the same while eliminating complexity in the scoring system for evaluating offers

New Features

  • Singularity now has new api docs (PR link) powered by an updated version of swagger and using open api 3.0. A page is available in the UI when running Singularity to view the docs for the release you are currently running.
  • 1775 - Add an optional cpu hard limit

Improvements

  • 1800 - Ability to maintain the same path for custom nav bar links
  • 1795 - Add clear all buttons for dead agents and inactive hosts
  • 1783 - Simplify mesos master uri parsing
  • 1796 - Allow a buffer for tasks near the cpu hard limit
  • 1639 - reduce S3 uploader mem usage
  • 1727 - Allow user to configure which load metric is used for offer scoring
  • 1720 - Account for expected usage when scoring offers
  • 1763 - Support SSE S3 in SingularityUploader
  • 1770 - Support a configurable delay for task shuffles.
  • 1776 - Configurably omit offers from hosts that are overloaded
  • 1787 - Add run time column to request item's task history
  • 1769 - Support placeholders in webhook URIs.
  • 1778 - Include current task usage in the task shuffle cleanup messages
  • 1785 - Send email on failing replacement tasks.
  • 1788 - Add ability to set attributes that mark agent for only preemptible tasks

Bug Fixes

  • 1805 - Fix race condition where two tasks decommission at the same time
  • 1797 - Fix UI message for incremental deploy counts
  • 1716 - Remove expiring scale when new scale has no expiration
  • 1724 - Uploader refactoring and additional attempts for immediate uploaders
  • 1732 - Better check for finish of a bounce
  • 1781 - Log the full list of healthy task ids when killing a task
  • 1777 - Do not upload files outside task sandbox
  • 1782 - Proxy run-nows to the leader.
  • 1706 - Smarter table pagination
  • 1773 - Retry lost tasks
  • 1794 - Key run-nows with runId in addition to current epoch millis.

Changes in 0.19.2

Check out the 0.19.2 milestone to see new features / bugfixes in detail.

Improvements

  • 1762 - Allow deploy of paused requests
  • 1758 - Enable framework auth over http api
  • 1754 - Add an offer scoring mode accounting for max historical usage of all tasks
  • 1760 - Sort tasks to shuffle by overusage, not usage

Bug Fixes

  • 1761 - Remove task credit ui calls, no longer used
  • 1762 - Don't require group overrides for s3 logs listing + allow log level override in docker setup
  • 1759 - Fix duration in logging statement
  • 1757 - Don't count pending requests towards underprovisioning
  • 1741 - Fix lag banner in UI
  • 1755 - Catch exceptions in offer scoring and log them appropriately
  • 1748 - Permalink request within group
  • 1751 - Limit the number of tasks that can shuffle for cpu rebalance at once
  • 1752 - Clarified command not found exception
  • 1766 - More usage collection in parallel, less webhooks in parallel
  • 1767 - Periodically flush the queue to make sure batch work does not get stuck
  • 1768 - Keep track of offers not accounted for in SingularityOfferHolders returned by checkOffers

Changes in 0.19.1

This is a bugfix release

Check out the 0.19.1 milestone to see bugfixes in detail.

  • 1740 - Only send healthcheck object from ui when setting all fields

Changes in 0.19.0

Check out the 0.19.0 milestone to see new features / bugfixes in detail.

New Features

  • 1668/1650 Adds support for more overrides in run now requests. You can now override items like environment variables for individual runs of a SingularityRequest by POSTing json in the form of a SingularityRunNowRequest
  • 1690 - Adds initial support for Mesos containers, with volume sources and network mapping. These can be specified in the containerInfo.volumes and containerInfo.mesos.image sections of the SingularityContainerInfo in your SingularityDeploy
  • 1702 - Updates the internal locking scheme for Singularity to allow more parallel processing. As a result, the concurrency of offer and status update processing can now be tuned, with increased concurrency coming with a cost of increased memory/cpu usage by the scheduler. The following parameters in the mesos section of the SingularityConfiguration impact concurrency and tuning of the scheduler:
    • statusUpdateConcurrencyLimit - The number of status updates that can be processed in parallel. Defaults to 500 and is backed by its own cached thread pool
    • maxStatusUpdateQueueSize - A semaphore limits the number of submissions to the status update cached thread pool. If there are currently no more permits available (i.e. > statusUpdateConcurrencyLimit status updates), these are added to a queue where they wait until more capacity is available. This configuration parameter controls teh max size of that queue. It is recommended that this be set a bit above the maximum number of tasks you expect to have active in Singularity at any one time, due to the fact that during reconciliation a status update for each task is sent in rapid succession.
    • offersConcurrencyLimit - The number of offer scoring calculations and checks to be done in parallel. Defaults to 100. This should generally not need to be updated.

Improvements

API/Scheduler

  • 1666 - Set AVAILABILITY_ZONE on the default task environment.
  • 1681 - Provide option to prevent emails for scale events
  • 1653 - Endpoint to allow users to delete pending on-demand tasks
  • 1657 - Include disk resources when scoring offers
  • 1683 - Support usage of DefaultServerFactory to find port
  • 1692 - Wire up overrides for the S3 uploader path.
  • 1616 - Remove swagger jackson version override
  • 1600 - Report disk usage
  • 1690 - Second pass at Mesos containers, with volume sources and network mapping
  • 1682 - Ability to update authorized groups separately from full request
  • 1695 - Allow the cors bundle to be more configurable
  • 1697 - Support uploads to GCS
  • 1703 - Better webhook auth timeouts and exception messages
  • 1699 - add support for overriding S3 endpoint URL for the downloader
  • 1707 - Enable server side encryption params for uploads
  • 1717 - Collect cpu usage for tasks that have just started

UI

  • 1654 - Ability to specify quick links for requests
  • 1636 - New dashboard in Singularity + UI performance improvements
  • 1604 - Disk usage ui
  • 1687 - Correct copy button on InfoBlocks of Task tab. Also, remove clipboard.js.
  • 1704 - Add support for configurable navbar title links.
  • 1701 - Permalinks for bounce and scale modals
  • 1705 - Capitals search on Singularity requests page

Bug Fixes

  • 1609 - Use HostAndPort#getHostText instead of HostAndPort#getHost
  • 1658 - Corrected path for executor download fallback
  • 1659 - Make sandbox logs/ dir world-readable.
  • 1685 - Allow a request in FINISHED state to be redeployed
  • 1688 - Also check pending requests on the track task endpoint
  • 1693 - Properly send task destroy message to executor
  • 1698 - Don't show s3 logs error message as a pop up
  • 1708 - Better catch for statusUpdate exceptions
  • 1718 - Account for task level overrides in usage collection
  • 1719 - Remove unneeded call to unsafeProcessStatusUpdate, fix tasksPerOfferHost check
  • 1711 - Take system usage into account when scoring offers
  • 1696 - Fix nav bar for mobile view
  • 1715 - Use total system cpus, not totalCpus in system calculation
  • 1713 - Fix mobile menu responsiveness
  • 1726 - Don't operate directly on pending tasks during statusUpdate
  • 1721 - Fix cached offer checkin in resourceOffers

Documentation

  • 1700 - Fixed task webhook docs

Changes in 0.18.2

This is a bugfix release

Check out the 0.18.2 milestone to see new features / bugfixes in detail.

  • 1671 - Remove SingularityServiceBase module, no longer needed
  • 1674 - Different order for ui build profile

Changes in 0.18.1

This is a bugfix release

Check out the 0.18.1 milestone to see bugfixes in detail.

  • 1663 - Reimplement getPortByIndex in SingularityTask
  • 1664 - Add REASON_AGENT status update reasons
  • 1661 - Enable auth headers in old tailer
  • 1655 - Easier setup for local dev with SingularityService

Changes in 0.18.0

Check out the 0.18.0 milestone to see new features / bugfixes in detail.

Mesos 1

The release of Singularity 0.18.0 marks an update to mesos 1.x. Singularity will now utilize the mesos http api when connecting to mesos (native libraries no longer need to be present for Singularity to run). Configuration changes for Singularity Service are needed when upgrading so be sure to check out the upgrading to mesos 1 docs.

New Features/Updates

  • 1571 - Mesos 1.1.2
  • 1648 - Remove mesos dependency from SingularityBase

Improvements

  • 1631 - Support subdirectories in the s3 uploader
  • 1618 - Pass resource overrides to environment.
  • 1626 - Expose instance counts for singularity requests
  • 1629 - Add shell commands to the client
  • 1633 - Make the leniency of OPTIMISTIC tunable.
  • 1599 - Add logrotateFrequency field in new deploy form
  • 1647 - S3 Folders UI display

Bug Fixes

  • 1628 - Preserve instance id order when scaling down
  • 1634 - Catch errors in FetchRequestArgHistory
  • 1625 - Fixes to latest log file link
  • 1630 - Mesos backpressure
  • 1621 - look at new scale request before opting to old request
  • 1642 - Shade protobuf for SingularityClient/SingularityBase
  • 1649 - Don't error when ArtifactManager copies duplicate files

Changes in 0.17.1

This is a bugfix release

Check out the 0.17.1 milestone to see bugfixes in detail.

  • 1740 - Only send healthcheck object from ui when setting all fields

Changes in 0.17.0

Check out the 0.17.0 milestone to see new features / bugfixes in detail.

Note: Mesos 1.1.x support didn't quite make it into this release, but those changes are coming soon. In the meantime, here are the new features in Singularity 0.17.0.

New Features

  • 1592 - Resource Usage UI
  • 1576 - Evenly-spread task placement
  • 1570 - resource usage endpoint
  • 1610 - Highlight new files

Improvements

  • 1583 - Track average scheduling delay when accepting tasks.
  • 1578 - Allow WORKER <-> SERVICE type change if not load balanced
  • 1586 - Refactor tailer instance selection dropdown
  • 1615 - Add visibility around getChildren() calls.
  • 1597 - Also relocate guava-retrying in SingularityClient
  • 1598 - timestampSeconds -> timestamp in statistics object
  • 1594 - Leader cache everywhere
  • 1590 - package.json tune-up
  • 1603 - Reorganize request group view
  • 1611 - Add request group filter
  • 1608 - Add Nitro as adopter

Bug Fixes

  • 1591 - Fix flaky testSchedulerPriority test
  • 1593 - More flaky tests
  • 1588 - Force Guava cache maintenance before processing cached offers.
  • 1587 - Fix key error on requests page
  • 1595 - Only write to the leader cache when it's active.
  • 1614 - Fix cron length validation when creating new requests.

Thanks

  • @ssalinas
  • @darcatron
  • @kwm4385

Changes in 0.16.2

This is a bug fix release!

Check out the 0.16.2 milestone to see pull requests in detail.

Bug Fixes

  • 1605 - Deploy IDs should allow '.'

Changes in 0.16.1

This is a bugfix release

Check out the 0.16.1 milestone to see new features / bugfixes in detail.

  • 1580 - Fix mapping of docker fields in 'new deploy' ui

Changes in 0.16.0

Check out the 0.16.0 milestone to see new features / bugfixes in detail.

Important

The next release of Singularity (0.17.0) will contain a bump to mesos 1.1.2. Any critical bug fixes will be backported to 0.16 for a short period. See #1571 for more details on the upcoming upgrade

New Endpoints

1559 Added two additional endpoints to more easily track the lifecyle of a task. Both endpoints return a SingularityTaskState object with basic details about the current state of the task

  • /track/task/{taskId} - for tracking active tasks which have already been assigned an id
  • /track/run/{requestId}/{runId} - For tracking tasks by runId (e.g. an ON_DEMAND task). This endpoint can also search pending tasks

Improvements

  • 1560 - Separate timeouts for health check and task running
  • 1568 - Better bash escaping in docker runner script
  • 1515 - Add flag to trigger run of scheduled task on deploy
  • 1561 - Combine offers to schedule tasks more efficiently
  • 1562 - Retry failed client requests in SingularityClient
  • 1538 - Allow immediate runs in pending queue with deploy
  • 1482 - Better task balancing
  • 1557 - Immediate uploads on executor teardown
  • 1555 - Tailer improvements for tail_of_finished logs
  • 1549 - Baragon 0.5.0
  • 1526 - Add skip lb removal flag to DeleteRequestRequest
  • 1534 - Upgrade docker client
  • 1533 - Add mark as active/inactive endpoints to client
  • 1524 - Easy endpoint to check if user is authorized for request

Bug Fixes

  • 1572 - Fix bug when 'maxTotalHealthcheckTimeout' set
  • 1575 - Clarify expected runtime, add execution time limit in ui
  • 1569 - Remove extra scheduled entries from pending queue
  • 1564 - Fix values for the container type field so they deserialize properly.
  • 1553 - Release lock on bounce when bounced with no running instances
  • 1554 - Refuse to add cleaning tasks to LB
  • 1556 - Catch 404s when there is no task history in run now modal
  • 1535 - Ensure path is set on read endpoint
  • 1491 - Fix maxDeployIdSize and maxRequestIdSize validation

Documentation

  • 1567 - Changes to SMTP documentation & connection
  • 1548 - Fix markdown formatting
  • 1528 - Rephrase flagging host

Changes in 0.15.1

This is a bug fix release!

Check out the 0.15.1 milestone to see updates in detail.

Bug Fixes

  • 1520 - Fix for request stuck in deleting state
  • 1529 - Default cacheOffers to false and add docs
  • 1530 - Fix for Optional request bodies

Documentation

  • 1527 - Typo in documentation
  • 1531 - Bump version numbers in documentation

Changes in 0.15.0

Check out the 0.15.0 milestone to see new features / bugfixes in detail.

New Features

  • 1519 - Spread to all agents

Adds a best effort attempt to deploy a task on all agents. You can enable this by adding spreadAllAgentsEnabled: true to your Singularity yaml file. You can then set SPREAD_ALL_AGENTS for the placementStrategy on a SingularityRequest.

  • 1405 - Agent Usage Monitoring UI

In the Admin drop-down on the UI, there is now a Agent Usage monitoring page. This page contains visualizations about resource usages amongst all the agents. The circular progress meters display the percent of total resources utilized on all agents. The resources (cpu and memory) heat map displays percentage utilized per agent.

  • 1456 - Bump to java 8/Jersey 2/Dropwizard 1

Singularity is now officially on java 8! As part of this upgrade we have also bumped some library versions. Most notably upgrading to Jersey 2 and Dropwizard 1

The Singularity log tailer got a makeover to improve performance and memory usage. Currently you can still toggle between old and new tailer versions, but this will be removed in future releases.

Improvements

  • 1516 - Include deploy marker and oldest deploy step in state
  • 1517 - More flexible match on filename from logrotate
  • 1511 - Add ability to prefix all email subjects
  • 1489 - Fix Task Search navigation and refresh
  • 1510 - Add getTaskByRunIdForRequest to client
  • 1483 - Switch not loaded to loading in first tailer screen
  • 1503 - Don't show logs panel if task never running
  • 1486 - Use web cache for api calls from ui
  • 1478 - Add leader cache
  • 1481 - Allow artifact list to be specified on deploys
  • 1507 - Ability to provide HttpConfig for SingularityClient
  • 1499 - Warn when removing a request with lb configs
  • 1487 - Add a message to the request history when scaling
  • 1474 - Allow expensive endpoints to be disabled for non-admins
  • 1477 - Better zk cleanup for removed requests
  • 1480 - Put a limit on number of agents to decommission at once
  • 1479 - Reduce timeout on requests to task sandboxes
  • 1472 - Add spacing line at the bottom of logs
  • 1476 - Better zk performance logging
  • 1475 - Singularity scheduler lock logging
  • 1465 - Pass job user as environment variable to task
  • 1473 - More disabled actions for pollers
  • 1466 - Add links between tailer versions
  • 1450 - Allow new tabs from global search page
  • 1452 - Add a task credit system
  • 1400 - Add agent and task usage tracking inside singularity ZK
  • 1447 - Set an optional max number of active tasks for ON_DEMAND requests
  • 1442 - Display previous overridden cleanups
  • 1451 - Additional disaster actions
  • 1453 - Remove the async status update queue

Bug Fixes

  • 1518 - Be sure to close Graphite properly
  • 1505 - Don't include cleaning tasks in instance count
  • 1513 - Write files in subdirectories to splat path
  • 1504 - Don't redirect until done fetching active tasks
  • 1502 - MD5 is case insensitive
  • 1492 - Correctly redirect in ui when no instances are found
  • 1498 - Forbid health checks longer than kill time
  • 1501 - Allow copying from the JSON button dialog
  • 1490 - Mark as not bouncing if paused before bounce completes
  • 1470 - Fix tailer when reloading file
  • 1469 - Add jita access for updating readWriteAccessGroup
  • 1467 - Fix task direct link to logs
  • 1428 - Prevent flapping agent from rejoining cluster
  • 1443 - Ignore 404s for /priority/freeze endpoint

Documentation

  • 1471 - Add in a note about running singularity on docker for mac

Changes in 0.14.1

This is a bug fix release.

Check out the 0.14.1 milestone to see new features / bugfixes in detail.

Bug Fixes

  • 1458 - Remove expiring scale on deploy finish
  • 1457 - Don't allow multiple bounces for the same request
  • 1461 - Interchange argument order in rootComponent
  • 1462 - extract in mesos uri defaults to true, executable defaults to false

Documentation

  • 1459 - fix README formatting

Changes in 0.14.0

Check out the 0.14.0 milestone to see new features / bugfixes in detail.

Important

The next release of Singularity (0.15.0) will include an upgrade to java 8.

Configuration Changes

#1391 include a rework of some of the S3 settings in Singularity. If you use the SingularityExecutor or SingularityExecutorCleanup modules and use the S3 upload features, you will need an update to your configuration. The fields for specifying which files to upload have been moved out of the SingularityExecutor. An example below shows all fields that would move.

Old Configuration (gets removed from SingularityExecutor and SingularityExecutorCleanup yaml files)

executor:
  s3UploaderBucket: my-logs-bucket
  s3UploaderKeyPattern: "%requestId/%Y/%m/%taskId_%index-%s-%filename"
  s3UploaderAdditionalFiles:
    - access.log
  s3StorageClass: "STANDARD_IA"
  applyS3StorageClassAfterBytes: 75000

New Configuration (if not already present for use with S3 log searching)

# in SingularityExecutorCleanup yaml configuration
executorCleanup:
  defaultS3Bucket: my-logs-bucket
  s3KeyFormat: "%requestId/%Y/%m/%taskId_%index-%s-%filename"
  s3StorageClass: "STANDARD_IA"
  applyS3StorageClassAfterBytes: 75000
  s3UploaderAdditionalFiles:
   - filename: access.log
     # The default directory in the executor was set to 'logs', now it must be manually specified
     # If not specified, the directory to search for log files will be the task app directory in the sandbox
     directory: logs

# in SingularityService yaml configuration
s3:
  s3Bucket: my-logs-bucket
  s3KeyFormat: "%requestId/%Y/%m/%taskId_%index-%s-%filename"
  s3StorageClass: "STANDARD_IA"
  applyS3StorageClassAfterBytes: 75000
  s3UploaderAdditionalFiles:
   - filename: access.log
     # The default directory in the executor was set to 'logs', now it must be manually specified
     # If not specified, the directory to search for log files will be the task app directory in the sandbox
     directory: logs

NOTE - To upgrade smoothly, it is strongly recommended to deploy SingularityService and the SingularityExecutorCleanup before deploying the SingularityExecutor

New Features

  • 1306 - Smarter Healthchecks

Singularity healthchecks are now split into two phases. Previous settings will continue to function but are deprecated. Health check options are now specified in the healthcheck object on the SingularityDeploy See The new documentation for more on updates to healthchecks.

Improvements

  • 1435 - Updated mysql index on startedAt
  • 1449 - Improve status logging
  • 1434 - Match mesos artifact defaults
  • 1424 - Rename metric for immediate uploaders
  • 1427 - Ability to specify params on request history in client
  • 1429 - Standardize on toString/hash/equals formats
  • 1402 - Introduce offer cache to allow better resource allocation
  • 1399 - Add ability to upload files immediately to S3
  • 1417 - Ability to specify cache on mesos artifact
  • 1377 - Report currentActiveInstances on SingularityDeployProgress
  • 1410 - Use an ldap cache
  • 1414 - Add style to change display of overflow text in run now dialog
  • 1415 - Replace existing global search with search from requests page
  • 1411 - Logging clean up for mesos protos
  • 1334 - Sanitize file data before parsing with Jackson
  • 1386 - Add a DELETEING state for reqeusts being deleted
  • 1388 - send graphite datapoints with optional tags
  • 1391 - S3 Search Improvements
  • 1401 - Ability to override docker workdir
  • 1398 - Clarifications on S3ArtifactSignature
  • 1376 - Update request for 'deploy to unpause' before saving pending deploy
  • 1397 - Shade com.google.thirdparty
  • 1375 - Allow setting S3 storage class at upload time
  • 1373 - customizable --use-compress-program for tar
  • 1392 - Optionally skip addition of extra s3 metadata
  • 1382 - Add maxTasksPerOffer at request level
  • 1367 - Additional settings for history purger
  • 1360 - Bounce updates for placement and scaling
  • 1369 - Additional threshold for deleting task history row data in sql

Bug Fixes

  • 1437 - Defer loading log files till after page load
  • 1445 - use implicit acks when not offloading status updates to another thread
  • 1433 - Properly roll back from an overdue incremental deploy
  • 1426 - Prevent building duplicate immediate uploaders
  • 1412 - fix duplicate exit checker, add longer initial task wait
  • 1423 - Update docker parameters on deploy form
  • 1409 - Properly allow destroy task from ui
  • 1407 - Forbid more characters from request/deploy IDs
  • 1418 - Fixes for Immediate Uploaders
  • 1403 - Import formModal on racks page
  • 1371 - S3Downlaoder - Block on download only
  • 1387 - No retry for scheduled tasks run on-demand through UI
  • 1380 - Remove compressed log viewing endpoint
  • 1381 - remove all compressed log viewing code
  • 1372 - Update wrapper script for shell commands to read correct pid
  • 1368 - Read group from task data if present

Documentation

  • 1365 - Fix PATH in API annotations

Changes in 0.13.0

Check out the 0.13.0 milestone to see new features / bugfixes in detail.

New Features

1342 - Introduce expiring machine state changes

On the Agents and Racks pages in the UI, there is now an Expiration field present when initiating a state change. When a duration is specified here, the agent or rack will revert to the specified state after time has elapsed. Note that if another state change takes place before the expiration time, the expiration will only remain active if the resulting state transition is still valid.

For example, if decommissioning a agent with an expiration of 1 minute, with a evertToState of ACTIVE, it is possible the agent will finish decommissioning in that time. So, it's state would change from DECOMMISSIONING to DECOMMISSIONED. In this case, the expiring action will remain active, because DECOMMISSIONED -> ACTIVE is still a valid state change.

1219 - Starred requests persistence

User settings (so far only starred requests) are now stored server-side instead of client-side in localStorage. If a username is not found, you will be prompted for a username when you first load the ui. Any starred requests you currently have in localStorage will automatically be migrated to the server-side storage.

Improvements

  • 1355 - Update uuid to version 3.0.0
  • 1352 - Check isEmpty on attributes for more accurate message
  • 1354 - Count losts tasks with a Meter
  • 1347 - Nicer format for disaster email
  • 1259 - Alternate compression formats and viewing compressed files in UI
  • 1348 - Add tests for SingularityUI
  • 1344 - Also grab containerId when grabbing directory

Bug Fixes

  • 1353 - To string fix for Deploy and Builder
  • 1351 - Only allow patch versions of webpack
  • 1349 - Fixes for run now dialog
  • 1345 - Fix when bounce alert banner is shown
  • 1341 - Fix custom executor command on new deploy page
  • 1343 - Fix js TypeError on task detail page
  • 1332 - Ensure quotes and new lines are escaped in echo

Documentation

  • 1350 - Docs updates and addition of missing swagger annotations

Changes in 0.12.0

Check out the 0.12.0 milestone to see new features / bugfixes in detail.

Migrations

#1283 (Change deployHistory bytes to a MEDIUMBLOB), #1316 (Expand requestHistory.createdAt column to millisecond precision), and #1319 (Make the history purger query more efficient), contain migrations.

If you have a large number of tasks in your database (e.g. more than 100k), it is possible that the last of these migrations (#1319) could be very slow when run via liquibase. If this is a concern, we recommend using pt-online-schema-change to run your migration.

In order to run your migration with pt-online-schema-change, the following command is equal to liquibase migration 14.

pt-online-schema-change \
  --user=(your db user) \
  --ask-pass \
  --alter "ADD COLUMN purged BOOLEAN NOT NULL DEFAULT false, ADD KEY purged (requestId, purged, updatedAt)" \
  --execute \
  D=(your database name),t=taskHistory

Improvements

  • #1135 Surface taskReconciliationStartedAt in SingularityState object
  • #1217 Request group level actions in the ui
  • #1221 Get count of results for blended history calls
  • #1226 Ability to have multiple readWrite groups
  • #1227 Ability to redeploy from the ui
  • #1244 Add team requests to dashboard
  • #1264 Execution timeout for tasks
  • #1268 Process status updates in separate thread
  • #1284 Show launching tasks separate from active in status
  • #1290 More thorough validation for scale changes
  • #1291 Better messages for sentry reporting
  • #1293 Cache files with md5, add more detail for cache misses
  • #1295 Clean up tests + make travis build more reliable
  • #1298 Updates to agent information and reconciling agents
  • #1301 Add readWriteGroups to request form ui
  • #1304 More even distribution among racks
  • #1308 Get the SingularityClient up to date
  • #1310 Shortcut to task by instance number in ui
  • #1314 Add global read only groups
  • #1315 Ability to search task history by runId
  • #1320 Ability to run shell command before killing task
  • #1324 Extra filtering on fuzzy match
  • #1325 Refresh task lists appropriately
  • #1330 Show dropdown of previous command line args in run-now modal
  • #1333 Remove OOMKiller and LogWatcher Modules
  • #1338 Surface SingularityPendingRequestParent response from SingularityClient.runSingularityRequest

Bug Fixes

  • #1278 Only send email if the list of active disasters has changed
  • #1281 Make critical task lag require more than a single overdue task
  • #1282 Fix finished log link
  • #1289 Don't rely on Singularity active requests list when searching historical logs
  • #1294 Fix shell command modal file watching
  • #1297 Don't show loading forever on empty log files
  • #1303 Consider tasks with skipped healthchecks in cleaner
  • #1311 Add pending request on failed deploy as well
  • #1317 Use saveTaskCleanup() instead of createTaskCleanup() for deploys
  • #1321 Make deleteTaskHistoryBytes property do what it says it does
  • #1326 Fix js TypeError on task detail page
  • #1328 Fix props and prop types in disasters page
  • #1331 Make task destroy work from the ui

Documentation

  • #1285 Add versions in readme

Changes in 0.11.0

Check out the 0.11.0 milestone to see new features / bugfixes in detail.

New Features

Improvements

  • #1272 - Show seconds in timestamps for healthchecks
  • #1271 - Update files not found message
  • #1262 - Support for setting user in default executor
  • #1248 - Reorganize task label colors
  • #1240 - Also allow DELETE/PUT when using CORS filter
  • #1137 - SingularityService configuration & DC/OS support
  • #1197 - Guarantee durationMillis is present in getExpiringBounce response
  • #1191 - Support for task id var substitution in env
  • #1169 - Support multiple docker parameters and task labels
  • #1100 - Add method for grabbing a snapshot of the master metrics to MesosClient
  • #1091 - Support overriding the log level in tests
  • #1033 - Support task history search on updatedAt
  • #1030 - Introduce the concept of a SingularityRequestGroup

Bug Fixes

  • #1273 - Resolve the correct log path based on taskAppDirectory
  • #1267 - Remove bogus label from file browser actions column
  • #1266 - Select the correct healthcheck for UI alert banner
  • #1263 - Undo click-to-copy changes
  • #1256 - Fix typo in task alerts
  • #1252 - Fix for negative durations
  • #1246 - Make duration fields string inputs again
  • #1176 - Remove the now > start check in RFC5545

Documentation

  • #1265 - Update agent-extras.md

Changes in 0.10.1

This is a bug fix release.

Check out the 0.10.1 milestone to see bugfixes in detail.

UI Fixes

  • #1235 Properly handle deleted requests
  • #1236 UI updates for default config setup. After the release of 0.10.0, several users reported being unable to navigate directly to a page of the Singularity UI. This was due to an unanticipated change in how URLs were routed in the Backbone to React migration. This has been fixed in 0.10.1.
  • #1238 Use the relative path for appRoot
  • #1234 Run now UI service restoration
  • #1239 Add href to nav items
  • #1242 Sort environment variables alphabetically
  • #1232 UI development docs Backbone/Coffeescript => es6/React/Redux

Other Fixes and Improvements

  • #1188 Sort task history updates by ExtendedTaskState enum ordinal, not timestamp
  • #1189 Better cleanup of incremental actions
  • #1194 Deploy failure messages for non-task-specific failures
  • #1212 Allow searching of logs for deleted requests

Changes in 0.10.0

Check out the 0.10.0 milestone to see new features / bugfixes in detail.

UI Rewrite (Backbone/Coffeescript -> React/Redux/JS6)

This rewrite was composed of a number of pull requests, but the consolidated diff can be seen in #1077. Other UI changes include:

  • #1195 - Request form improvements
  • #1223 - Tags input tweaks (for cmd line input)
  • #1192 - Code cleanliness improvements
  • #1225 - Fix the back button when navigating through files on the task detail page
  • #1211 - Render every row on dashboard tables
  • #1210 - Don't show the wait for replacement task option when killing tasks in certain request types
  • #1215 - Check for presence of promise before attempting to catch errors
  • #1207 - UI support for absent deploy field
  • #1209 - Name all the modals that don't already have a name
  • #1208 - Fix duration field overflowing modal in firefox
  • #1206 - Refresh the request detail page after performing actions
  • #1199 - Sentry support
  • #1200 - Aggregate Tailer fix
  • #1190 - Run now fixes
  • #1202 - Add case-sensitive paths plugin
  • #1196 - Request groups page
  • #1164 - Request and Deploy Title Copy Link
  • #1178 - Replace dashes with underscores when searching for matching hosts
  • #1186 - Add ability to show task disk resource if configured to do so
  • #1182 - Add timezone field and kill dropdowns if there are less than 5 options
  • #1185 - Add JSON Button to the Racks and Agents pages
  • #1179 - Handle 404 properly
  • #1158 - Fix the request detail page bounce messages
  • #1165 - Improvements to task search
  • #1156 - Truncate long tags; Hover trigger to show the whole thing

Improvements

  • #1170 - Add TASK_ID environment variable with the singularity task_id
  • #1150 - Add support for timezone field for scheduled requests
  • #1138 - Docker auth config in custom executor
  • #1006 - Changes to user requested task kills in custom executor

Bug Fixes

  • #1184 - Fail differently if request data not present during deploy check
  • #1171 - Do not remove from LB if not yet added

Changes in 0.9.0

Check out the 0.9.0 milestone to see new features / bugfixes in detail.

Configuration Changes and Deprecations

  • #1000 Tweaks to s3UploaderAdditionalFiles
    • The SingularityExecutor configuration field s3UploaderAdditionalFiles, now supports a directory field to specify the directory to search for files to upload. However, the default for this directory is now the task directory (created when using SingularityExecutor), previously it was the logs directory within the task directory. To mimic previous behavior add a directory of logs to any existing s3UploaderAdditionalFiles settings.
  • #1166 Change API path "/skipHealthchecks" to "/skip-healthchecks"
    • The API path /skipHealthchecks has been renamed to /skip-healthchecks to be more consistent with the rest of the api and avoid using camel case in paths. The previous endpoint is deprecated and will be removed in an upcoming release

Improvements

  • #993 / #1020 Add support for disk resource
  • #1048 Updated image tags in compose files
  • #1093 Add downloadUrl field, for explicitly downloading from S3
  • #1099 include offer ID in resourceOffers() logging statement
  • #1110 Add 'extraScript' field to UI configuration for extra analytics/etc
  • #1045 Configurable max metadata message length
  • #1084 Only show wait for replacement checkbox if a WORKER or a SERVICE
  • #1134 Rename pubish_gitbook.sh to publish_gitbook.sh
  • #1118 Always launch docker containers in the correct parent cgroup
  • #1051 Configurable max attempts for docker pull in custom executor
  • #1155 Enable use of docker parameters in custom executor script
  • #1167 Expiring pause deletion responsibility shift
  • #1109 Allow resource override to be set on a run now request

Bug Fixes

  • #1088 Avoid race condition in deploy to unpause
  • #1108 Fix docker compose yaml to get hostname properly
  • #1083 Remove extraneous console.logs
  • #1070 Bump to version 0.28.2 to fix MESOS-5449
  • #1096 Fix filename truncation in sandbox endpoint
  • #1082 Fix broken shell log link
  • #1085 Fix table paging
  • #1145 Fix default hostname in compose yaml
  • #1067 Bump moment minor version to mitigate CVE-2016-4055
  • #1069 Concurrent run once tasks
  • #1081 Handle new lines in docker environment variables correctly
  • #1090 Fix docker integration environment
  • #1161 Request - Null Owner fix
  • #1163 RETRY for scheduled task should keep cmdLineArgs
  • #1175 Remove requirement of BRIDGE mode to specify port map

Changes in 0.8.0

Check out the 0.8.0 milestone to see new features / bugfixes in detail.

New Features

  • #971 Ability to update request data upon successful deploy
  • #1071 Initial support for RFC5545 schedules

Improvements

  • #1021 Convert racks and agents pages to react
  • #1031 Better request and deploy id validation
  • #1032 Allow glob matching in addition to fuzzy matching on Requests and Tasks page
  • #1039 UI improvements with global search tool
  • #1043 Add support for hourly logrotate
  • #1057 Move back to react-typeahead mainline version instead of HubSpot fork
  • #1059 Set proper Content-Type and Content-Encoding for s3 uploads
  • #1060 Setup babel for ES6 and JSX transformation
  • #1064 Shade guava for SingularityClient
  • #1073 Build the UI as its own module

Bug Fixes

  • #975 Remove cleanup after bounce expire
  • #1044 Only show task killed in message for healthchecks if not in running state
  • #1068 Make sure to remove obsolete pending requests
  • #1078 Typo: "Settingss" -> "Settings" on the Deploy form

Changes in 0.7.1

This is a bug fix release.

Check out the 0.7.1 milestone to see bugfixes in detail.

  • #1034 Change package.json 'vex' dependency to 'vex-js'
  • #1049 Don't set shell if arguments list is empty, ability to override shell
  • #1050 Add polyfills for Object.assign and Promise

Changes in 0.7.0

This release bumps Singularity’s Mesos dependency from version 0.23.0 to 0.28.1. Check out the documentation on the mesos site for more information about upgrading your mesos cluster to 0.28.1.

#994 - Upgrade mesos version to 0.28.1


Changes in 0.6.2

This is a bug fix release.

Bug Fixes

  • #1078 Typo: "Settingss" -> "Settings" on the Deploy form
  • #1068 Make sure to remove obsolete pending requests
  • #975 Remove cleanup after bounce expire

Changes in 0.6.1

This is a bug fix release.

Check out the 0.7.1 milestone to see bugfixes in detail. (Changes for 0.6.1 are the same as 0.7.1)

  • #1034 Change package.json 'vex' dependency to 'vex-js'
  • #1049 Don't set shell if arguments list is empty, ability to override shell
  • #1050 Add polyfills for Object.assign and Promise

Changes in 0.6.0

Check out the 0.6.0 milestone to see new features / bugfixes in detail.

New Features

  • #879 Add ability to post task metadata to a
  • #996 Webhooks UI

Improvements

  • #916 New task checker should respect deploy healthcheck retry settings
  • #933 #979 Email, tail logs from bottom of file, find the first error
  • #950 Allow multiple on demand requests to be queued up
  • #955 Optional redirect in the UI when receiving a 401 from the api
  • #959 Add thread checker types
  • #974 Log tailer rewrite
  • #980 Add template, domains and additionalRoutes lb fields
  • #981 #988 #1017Improvements to task page UI
  • #985 Better failure message when creating a scheduled request with no schedule
  • #995 Only suggest an even number accross racks for SERVICE type
  • #1011 Auto-scroll to bottom of run-now dialog
  • #1012 Make email colors match UI colors
  • #1014 Update layout for 4 logs in aggregate tailer
  • #1015 Show only log name in aggregate tailer title
  • #1016 Don't show over/under provisioned on status page if they are 0
  • #1022 Aggregate tailer tooltip for host and instance number
  • #1023 Dashboard UI improvements
  • #1026 Better handling of logs not found in aggregate tailer
  • #1028 Show TASK_ERROR state as danger in the ui
  • #1036 Add a defaultPortMapping which exposes all mesos provided ports

Bug Fixes

  • #984 Corrected task failure count in deploy failure info
  • #990 Fix agents link not filtering tasks properly
  • #991 Fix missing user being treated as admin
  • #996 Don't get attributes from missing deployProgress
  • #998 Enable buttons on dashboard paused requests toable
  • #1004 Convert SingularityMailer to an interface to avoid errors with missing smtp config
  • #1008 Don't trigger cooldown is TASK_LOST reason is invalid offers
  • #1024 Don't show edit reqeust button if hideNewRequestButton is true
  • #1027 CPU usage bar transitions back to non-error colors
  • #1037 Avoid IndexOutOfBounds error when scaling down during a deploy

Documentation

  • #1005 Fix configuration documentation

Changes in 0.5.0

Check out the 0.5.0 milestone to see new features / bugfixes in detail.

New Features

#839 enables better task history searching via the Singularity UI and API. (Also #890, #932, #935)

As of 0.5.0, Singularity has better support for searching historical tasks. A global task search endpoint was added:

/api/history/tasks -> Retrieve the history sorted by startedAt for all inactive tasks.

The above endpoint as well as /api/history/request/{requestId}/tasks now take additonal query parameters:

  • requestId: Optional request id to match (only for /api/history/tasks endpoint as it is already specified in the path for /request/{requestId}/tasks)
  • deployId: Optional deploy id to match
  • host: Optional host (agent host name) to match
  • lastTaskStatus: Optional ExtendedTaskState to match
  • startedAfter: Optionally match only tasks started after this time (13 digit unix timestamp)
  • startedBefore: Optionally match only tasks started before this time (13 digit unix timestamp)
  • orderDirection: Sort direction (by startedAt), can be ASC or DESC, defaults to DESC (newest tasks first)
  • count: Maximum number of items to return, defaults to 100 and has a maximum value of 1000
  • page: Page of items to view (e.g. page 1 is the first count items, page 2 is the next count items), defaults to 1

For clusters using a database that have a large number of tasks in the history, a relevant configuration option of taskHistoryQueryUsesZkFirst has been added in the base Singularity Configuration. This option can be used to either prefer efficiency or exact ordering when searching through task history, it defaults to false.

  • When false the setting will prefer correct ordering. This may require multiple database calls, since Singularity needs to determine the overall order of items base on persisted (in the database) and non-persisted (still in zookeeper) tasks. The overall search may be less efficient, but the ordering is guranteed to be correct.

  • When true the setting will prefer efficiency. In this case, it will be assumed that all task histories in zookeeper (not yet persisted) come before those in the database (persisted). This results in faster results and fewer queries, but ordering is not guaranteed to be correct between persisted and non-persisted items.

Task Search DB Migration

Before deploying task search (release 0.4.12) it is neccessary to run liquibase migrations 10 and 11. This migration adds the neccessary columns and indexes, and backfills data for those new columns in the taskHistory table so that searching can be done efficiently and on more fields. If you have a large number of tasks in your database (e.g. more than 100k) , it is possible that these migrations could be very slow when run via liquibase. If this is a concern, we recommend using pt-online-schema-change to run your migration.

In order to run your migration with pt-online-schema-change, the following command is equal to liquibase migration 10.

pt-online-schema-change \
  --user=(your db user) \
  --ask-pass \
  --alter "ADD KEY startedAt2 (startedAt, requestId), ADD COLUMN host VARCHAR(100) CHARACTER SET ASCII NULL,  ADD COLUMN startedAt TIMESTAMP NULL, DROP KEY deployId, ADD KEY startedAt (requestId, startedAt), ADD KEY lastTaskStatus (requestId, lastTaskStatus, startedAt), ADD KEY deployId (requestId, deployId, startedAt), ADD KEY host (requestId, host, startedAt)" \
  --execute \
  D=(your database name),t=taskHistory

This will complete liquibase migration 10. In order to get the liquibase table in order, you can run a command of db fast-forward which will create the entry in the migrations table for the next migration to run. So, if you previously ran migration 9, it will only create migration 10.

java -jar SingularityService/target/SingularityService-.jar db fast-forward ./config.yaml --migrations migrations.sql

The update statements to backfill the newly added columns can also possibly be slow when you have a large number of tasks (e.g. more than 100k) in your history. There are a few options you can use to help this migration run more smoothly:

  • Add an additional ADD KEY host2 (host) to the end of the --alter statement before running the pt-online-schema-change above. This index is not neccessary for search, but will allow the host field backfill to run much quicker
  • Run each of the update statements in a loop with an additional LIMIT XXXX added, where XXXX is some number (for example 5000). This way you are not trying to update the entire table in a single query (which would lock the table until the query was done), but are updating it in chunks. You can continue running this loop until the migrations are done.

If you ran the update for migration 11 manually, you can run the db fast-forward command from above again in order to update the migrations table for liquibase.

As a last note, if there is a gap of time between running these migrations and deploying the new version of Singularity, it is wise to run the backfill queries manually an additional time. If any tasks have persisted to the database between the intial migration run and the time of deploying the new Singularity version, those tasks will not have the host and startedAt columns filled in.

#817 enables incremental deploys. This allows the user to deploy any portion of instances at a time, and either pause for a pre-determined time, or wait for a manual signal to start deploying the next portion of instances.

Incremental Deploys

As of 0.5.0 Singularity supports an incremental deploy for finer-grained control when rolling out new changes. This deploy is enabled via a few extra fields on the SingularityDeploy object when starting a deploy:

  • deployInstanceCountPerStep: Deploy this many instances at a time until the total instance count for the request is reached is reached (Optional, default is all instances at once)
  • deployStepWaitTimeMs: Wait this many milliseconds between deploy steps before continuing to deploy the next deployInstanceCountPerStep instances (Optional, default is 0, i.e. continue immediately)
  • autoAdvanceDeploySteps: automatically advance to the next target instance count after deployStepWaitTimeMs seconds (Optional, defaults to true). If this is false, then manual confirmation will be needed to move to the next target instance count. This can be done via the ui.

Example

TestService is currently running 3 instances. During the next deploy, you want to replace only 1 of these instances at a time and have Singularity wait at least a minute after deploying one so you can verify that everything works as expected. The following fields can be added to the deploy json to accomplish this:

deployInstanceCountPerStep: 1
deployStepWaitTimeMs: 60000
autoAdvanceDeploySteps: true

When the deploy starts, Singularity will start 1 (deployInstanceCountPerStep) instance from the new deploy (The 3 old instances will still be running). Once the new task is determined to be healthy a few things happen:

  • Singularity will add the instance from the new deploy to the load balancer (if applicable)
  • Singularity will shut down 1 (deployInstanceCountPerStep) of the instances from the old deploy after removing it from the load balancer (if applicable)
  • Singularity will start counting down the 60000 ms until it launches the next deployInstanceCountPerStep instances

Once the deployStepWaitTimeMs of wait time has elapsed, Singularity will start this process again, launching a second task for the new deploy, waiting until it is healthy, then shutting down a task from the old deploy. This will continue until the deploy fails, the deploy is cancelled, or all instances are part of the new deploy and it succeeds.

A few more things to note about the incremental deploy process:

  • If the deploy fails or is cancelled, Singularity replaces any missing instances from the old deploy and makes sure they are healthy before shutting down active/healthy instances from the new deploy. (i.e. you will never be under capacity)
  • At any time, it is possible to advance the deploy to another target instance count via the UI or API. In other words, you can skip the remaining deployStepWaitTimeMs, skip steps of the deploy, or even decrease the instance count to roll back a step.

Improvements

  • #885 Allow destination targets to be specified
  • #891 Use pushState for log line links
  • #892 Cut down on amount of task data stored in ZK
  • #893 Better deploy failure information
  • #894 Task kill checkbox label
  • #906 Suggest an even number accross racks for rackSensitive requests (Also, #949)
  • #910 Basic framework auth
  • #912 Upgrade basepom version
  • #923 Goodbye brunch, hello gulp
  • #925 Get memory info from deploy object instead of mesos task resources
  • #934 Don't display healthcheck message for tasks in state TASK_LOST or TASK_FINISHED
  • #936 Add ability to search through tasks table by request type
  • #937 Refactor HistoricalTasks Collections into One
  • #938 Fix Duplicates in Fuzzy Search
  • #939 Add mailhog to compose-dev
  • #943 Don't capture ctrl+t
  • #944 More descriptive error message on http 200 with no json
  • #951 After submitting or editing a request, redirect to that request's page
  • #964 Bump to latest horizon version
  • #965 Warning if singularity has no leader or leader has no mesos connection
  • #966 More consistent terminology and deploy failure on task page
  • #968 Better Agent Decommissioned / Offline Communication
  • #969 Add extra command line args to task finished email
  • #970 Task badge color for TASK_KILLED is now default (grey)
  • #977 Run now dialog improvements

Bug Fixes

  • #902 Fix time display in emails
  • #903 Ensure run once tasks don't get launched at startup
  • #913 Don't display healthcheck notification if task state is failed
  • #914 Include docker exception cause in thread checker
  • #927 Missing an early return for numPorts == 0
  • #930 Remove min-height from fileBrowserSubview
  • #952 Don't Lose Currently Typed Owner/Rack Affinity When Losing Focus
  • #957 Deploy to unpause should also remove any expiring pause
  • #962 Multiple ui fixes
  • #963 Also set shell false if pending task cmdLineArgs are set

Documentation

  • #919 Update config docs for recently added fields
  • #922 Add docs on request and deploy concepts
  • #940 Use gitbook for docs
  • #958 Generate swagger json alongside docs

Changes in 0.4.11

Check out the 0.4.11 milestone to see new features / bugfixes in detail.

Improvements

  • #915 Add mailhog to integration test environment
  • #908 Better sorting with fuzzy search
  • #907 Make SingularityExecutor handle docker volumes when hostPath is absent
  • #900 Store status update reason field in SingularityTaskHistoryUpdate

Changes in 0.4.10

BUG The UI build for this release is known to only contain partial css. Please use the 0.4.11 release instead

Check out the 0.4.10 milestone to see new features / bugfixes in detail.

New Features

  • #887 - Allow deploy to specify a port index for healthchecks and LB api. See custom ports for more details.

improvements

  • #854 - Add methods to client for unpause and scale requests
  • #856 - Upgrade Horizon
  • #866 - Add a default bounce expiration
  • #871 - Add support for override defaults file
  • #875 - Fix unpause button for expiring pause in UI
  • #881 - Logrotate and S3 upload tweaks
  • #884 - Include shell command name in log filename
  • #889 - Add support for max numbers of objects in ZK when no database is configured
  • #901 - Lock brunch to version 2.2.x

Bug Fixes

  • #788 - Fix router base path when hosted at /
  • #826 - Tweak how ulimit is called
  • #857 - Consider LB when deciding to remove something during bounce
  • #860 - Fix the sorting in the wrong order bug
  • #864 - Cancel futures in thread, avoiding docker deadlocks on cleanup
  • #865 - Better handling of Docker timeouts
  • #904 - Deploy and edit buttons need to be link elements, not button elements

Changes in 0.4.9

Renamed endpoints

This endpoint was renamed:

  • /requests/request/{requestId}/instances --> /requests/request/{requestId}/scale

These endpoints were renamed to fix a typo in the URL:

  • /racks/rack/{rackId}/decomission --> /racks/rack/{rackId}/decommission
  • /agents/agent/{rackId}/decomission --> /agents/agent/{rackId}/decommission

Expiring Actions

Released in 0.4.9

Action expiration + additional action metadata

Some actions in Singularity now have the concept of expiration (as in, giving up after a certain period of time). Corresponding endpoints have been updated to accept more information about action expiration and action metadata.

Rack and agent operations

  • /racks/rack/{rackId}/decommission
  • /racks/rack/{rackId}/freeze
  • /racks/rack/{rackId}/activate
  • /slaves/slave/{agentId}/decommission
  • /slaves/slave/{agentId}/freeze
  • /slaves/slave/{agentId}/activate

These URLs accept a JSON object with this format:

name type required description
message string optional A message to show to users about why this action was taken

Request bounce

  • /requests/request/{requestId}/bounce

This URL accepts a JSON object with this format:

name type required description
skipHealthchecks boolean optional Instruct replacement tasks for this bounce only to skip healthchecks
durationMillis long optional The number of milliseconds to wait before reversing the effects of this action (letting it expire)
message string optional A message to show to users about why this action was taken
actionId string optional An id to associate with this action for metadata purposes
incremental boolean optional If present and set to true, old tasks will be killed as soon as replacement tasks are available, instead of waiting for all replacement tasks to be healthy

Scheduling a request to run immediately

  • /requests/request/{requestId}/run

This URL accepts a JSON object with this format:

name type required description
runId string optional An id to associate with this request which will be associated with the corresponding launched tasks
skipHealthchecks boolean optional If set to true, healthchecks will be skipped for this task run
commandLineArgs Array[string] optional Command line arguments to be passed to the task
message string optional A message to show to users about why this action was taken

Unpausing a request

  • /requests/request/{requestId}/unpause

This URL accepts a JSON object with this format:

name type required description
skipHealthchecks boolean optional If set to true, instructs new tasks that are scheduled immediately while unpausing to skip healthchecks
message string optional A message to show to users about why this action was taken
actionId string optional An id to associate with this action for metadata purposes

Exit request cooldown

  • /requests/request/{requestId}/exit-cooldown

This URL accepts a JSON object with this format:

name type required description
skipHealthchecks boolean optional Instruct new tasks that are scheduled immediately while executing cooldown to skip healthchecks
message string optional A message to show to users about why this action was taken
actionId string optional An id to associate with this action for metadata purposes

Deleting a request

  • /requests/request/{requestId}

This URL accepts a JSON object with this format:

name type required description
message string optional A message to show to users about why this action was taken
actionId string optional An id to associate with this action for metadata purposes

Killing a task

  • /tasks/task/{taskId}

This URL accepts a JSON object with this format:

name type required description
waitForReplacementTask boolean optional If set to true, treats this task kill as a bounce - launching another task and waiting for it to become healthy
override boolean optional If set to true, instructs the executor to attempt to immediately kill the task, rather than waiting gracefully
message string optional A message to show to users about why this action was taken
actionId string optional An id to associate with this action for metadata purposes

Scaling requests

  • /requests/request/{requestId}/scale (previously /requests/request/{requestId}/instances)

This URL accepts a JSON object with this format:

name type required description
skipHealthchecks boolean optional If set to true, healthchecks will be skipped while scaling this request (only)
durationMillis long optional The number of milliseconds to wait before reversing the effects of this action (letting it expire)
message string optional A message to show to users about why this action was taken
actionId string optional An id to associate with this action for metadata purposes
instances int optional The number of instances to scale to

Pausing a request

  • /requests/request/{requestId}/pause

This URL accepts a JSON object with this format:

name type required description
durationMillis long optional The number of milliseconds to wait before reversing the effects of this action (letting it expire)
killTasks boolean optional If set to false, tasks will be allowed to finish instead of killed immediately
message string optional A message to show to users about why this action was taken
actionId string optional An id to associate with this action for metadata purposes

NOTE: The user field has been removed from this object.

Disabling request healthchecks

  • /requests/request/{requestId}/skip-healthchecks

This URL accepts a JSON object with this format:

name type required description
skipHealthchecks boolean optional If set to true, healthchecks will be skipped for all tasks for this request until reversed
durationMillis long optional The number of milliseconds to wait before reversing the effects of this action (letting it expire)
message string optional A message to show to users about why this action was taken
actionId string optional An id to associate with this action for metadata purposes

New endpoints for cancelling actions

These endpoints were added in order to support cancelling certain actions:

  • DELETE /requests/request/{requestId}/scale -- Cancel an expiring scale
  • DELETE /requests/request/{requestId}/skip-healthchecks -- Cancel an expiring skip healthchecks override
  • DELETE /request/{requestId}/pause -- Cancel (unpause) an expiring pause
  • DELETE /request/{requestId}/bounce -- Cancel a bounce

Other Improvements and Fixes

  • #837 - Make sure literal host ports are processed correctly
  • #842 - Task is cleaning default msg fix
  • #849 - Include message with emails
  • #850 - Fix unified tailer
  • #851 - Warning to disable healthchecks for < 1 hour
  • #852 - Page now automatically refreshes even after invalid duration entered
  • #853 - File error fix
  • #859 - Don't show tasks as overdue instantly
  • #863 - Don't show deleted request message if not deleted
  • #868 - Bump to Brunch 2

Changes in 0.4.8

Check out the 0.4.8 milestone to see new features / bugfixes in detail.

Improvements

  • #744 - New log tail UI
  • #774 - allow requests to override their email notification settings
  • #782 - support per-bucket creds for downloading artifacts
  • #801 - Link each column section
  • #805 - Change empty task sandbox message depending on current task state
  • #806 - Add filters on request table for active deploy and no deploy
  • #812 - Clean tasks on decommissioned hosts
  • #813 - Get task info by runId
  • #815 - New fuzzy search algorithm for better results and perf
  • #819 - better launch msg
  • #825 - surface info about the pending deploy in request detail page

    Bug Fixes

  • #773 - time-box the docker client to avoid ever getting stuck
  • #779 - Hide log link on task page if the file doesn't exist
  • #821 - fix issue with incorrect parsing of task status
  • #824 - be sure to transfer labels to the task info
  • #827 - be sure to check deploy health for incremental bounce

Changes in 0.4.7

Check out the 0.4.7 milestone to see new features / bugfixes in detail.

Fixed

  • When the 0.4.6 release was built, the static assets for the Web UI weren't properly packaged in the JAR. This is now fixed in 0.4.7.

Improvements

  • #777 - Properly sort requests in the Web UI
  • #820 - Web UI gives blank page after first deploy

Changes in 0.4.6

Check out the 0.4.6 milestone to see new features / bugfixes in detail.

New Features

  • Shell commands using the singularity executor. You can read more about the shell commands feature here
  • Introduce an incremental bounce. An incremental bounce will kill old tasks as new tasks become healthy instead of witing for all new tasks to be healthy before shutting down old tasks. This is especially useful when running services with many instances on a lean cluster. The option is available in the ui when asking to bounce, or by adding a query param of incremental=true to your POST request to the /request/{requestId}/bounce endpoint.

Improvements

  • #635 - Truncate beginning of S3 log filenames
  • #690 - improve ui uncaught error message
  • #705 - allow for custom switch user commands in SingularityExecutor
  • #694 - Remove paused requests from LB
  • #708 - Show extra cmd line arguments and allow rerunning of tasks
  • #713 - Give meaningful titles to all pages
  • #719 - Add logs section with link to latest
  • #721 - Task alerts
  • #729 - Optionally take start and end time query params for S3 log search
  • #730 - Change checkpoint default to true
  • #732 - Star requests from the request detail page
  • #734 - Change allocated cpu units to floats in the json object
  • #739 - Add request and task links to breadcrumbs on tail view
  • #745 - Only update links instead of rerendering the whole page
  • #748 - Auto-exit s3uploader and s3downloader if no S3 credentials are set
  • #749 - Custom time formats
  • #751 - Use dropwizard-guicier 0.7.1.2
  • #756 - Add support for read-only user groups
  • #759 - Email tweaks
  • #766 - Deny a bounce if there aren't enough resources to complete it
  • #771 - Expose run-task method in client
  • #772 - Bake user query params in to authentication system
  • #775 - If inside a HTTP request, include URL in sentry error
  • #783 - Tweaks to make grepping a file easier
  • #784 - Set a default for s3UploaderKeyPattern
  • #785 - Allow default value for readOnlyGroups
  • #786 - Add support for default healthcheckMaxRetries and healthcheckMaxTotalTimeoutSeconds values in SingularityConfiguration
  • #789 - Surface info about bounces in request detail page
  • #790 - Better executor logging
  • #792 - Skip building web UI via skipSingularityWebUI property
  • #794 - Show paused requests in dashboard view
  • #795 - Add option to also bounce when scaling
  • #796 - Make files table in task view sortable
  • #802 - Expose killed task records
  • #803 - Watch reset and check active tasks manually
  • #809 - Add link to finished service.log to failed healthcheck notification

Bug Fixes

  • #682 - Configuring the Network field should not be predicated on having port mappings configured
  • #717 - Fix thread checker, case where docker container has already stopped
  • #722 - Fix tooltip positioning on edit request page
  • #735 - Make sure tasks are in ZK
  • #743 - Thread pool of 1 if prefixes is 0, avoid IllegalArgument
  • #761 - Fix NPE when check for exception cause
  • #763 - Don't throw thread check exceptions if task was already asked to stop
  • #780 - Stop tailing if scroll to top is clicked
  • #791 - Fix search input
  • #804 - Add tasks from /killed to decom badges
  • #808 - Fixes + tweaks to DECOM badge

Config Changes

  • #750 - Remove old property-style runnable config code. The .properties configuration format for Singularity agent helpers:

    • SingularityExecutor
    • SingularityExecutorCleanup
    • SingularityS3Downloader
    • SingularityS3Uploader
    • SingularityOOMKiller
    • SingularityLogWatcher

    If you use any of these, please convert your configuration to the .yaml style.


Changes in 0.4.5

Check out the 0.4.5 milestone to see new features / bugfixes in detail.

  • Singularity 0.4.5 bumps its Mesos dependency from 0.21.0 to 0.23.0. #657
  • If upgrading from a version prior to 0.4.4, you will need to run database migrations. Refer to the database docs for how to run migrations before starting the new version of Singularity.

Deprecated

  • The .properties configuration format for Singularity agent helpers:

    • SingularityExecutor
    • SingularityExecutorCleanup
    • SingularityS3Downloader
    • SingularityS3Uploader
    • SingularityOOMKiller
    • SingularityLogWatcher

    If you use any of these, please convert your configuration to the .yaml style.

    .properties support will be removed completely in 0.4.6.