OpenProject is the leading open source project management software.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
openproject/app/models/work_packages/scopes/for_scheduling.rb

346 lines
18 KiB

#-- encoding: UTF-8
#-- copyright
# OpenProject is an open source project management software.
# Copyright (C) 2012-2020 the OpenProject GmbH
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License version 3.
#
# OpenProject is a fork of ChiliProject, which is a fork of Redmine. The copyright follows:
# Copyright (C) 2006-2017 Jean-Philippe Lang
# Copyright (C) 2010-2013 the ChiliProject Team
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
#
# See docs/COPYRIGHT.rdoc for more details.
#++
#
module WorkPackages::Scopes
class ForScheduling
class << self
# Fetches all work packages that need to be evaluated for eventual rescheduling after a related (i.e. follows/precedes
# and hierarchy) work package is modified or created.
#
# The SQL relies on CTEs which, after constructing the set of all potential work_packages then filter down the
# work packages to the actually affected work packages. The set of potentially affected work packages can be diminished by
# manually schedule work packages.
#
# The first CTE works recursively to fetch all work packages related to the provided work package and the path of
# intermediate work packages. The work packages can either be connected via a follows relationship, a hierarchy relationship
# or a combination of both.
# E.g. in a graph of
# A <- follows - B <- hierarchy (C is parent of B) - C <- follows D
#
# D would also be subject to reschedule.
#
# At least for hierarchical relationships, we need to follow the relationship in both directions.
# E.g. in a graph of
# A <- follows - B - hierarchy (B is parent of C) -> C <- follows D
#
# D would also be subject to reschedule.
#
# That possible switch in direction means that we cannot simply get all possibly affected work packages by one
# SQL query which the DAG implementation would have allowed us to do otherwise.
# Additionally, we need to get the whole paths (with all intermediate work packages included) which would be possible
# with DAG but as we need to rely on a recursive approach already we do not need to complicate the SQL statement any
# further. Fetching the whole path (at least in one direction) relying on DAG would be faster though
# so we might revisit this if any performance shortcomings are identified.
# The first CTE returns all work packages with their path so reusing the example above, the result would be
# id | path
# A | {A}
# B | {A,B}
# C | {A,B,C}
# D | {A,B,C,D}
# If the graph where to contain multiple paths to one node work package, because of multiple follows relationship
# to the same hierarchical tree, the work package would be returned twice with different paths.
#
# The paths are followed until either:
# * no more follows and/or hierarchy relations can be followed
# * a manually scheduled work package is encountered.
#
# So if, in the example above, B would be manually scheduled, the first CTE would only return
# id | path
# A | {A}
# B | {A,B}
#
# The interim result, provided by the first CTE, is thus the set of all work packages, that are in a direct or transitive
# follows and/or hierarchy relationship up until the point where the relationships end or a manually scheduled work package
# is encountered.
#
# That set needs to be filtered down because of additional constraints on scheduling:
# * Manually scheduled work packages prevent automatic scheduling up the hierarchy chain. So even with an existing follows
# relationship work packages might not be scheduled automatically if their children or descendants are automatically
# scheduled. This is only true for a work package if *all* the children are manually scheduled either directly or because
# their respective children are all scheduled manually. In case of the hierarchy
# A and B <- hierarchy (C is parent of both A and B) C <- D
# if A and B are both scheduled manually, C is also scheduled manually and so is D. But if only A is scheduled manually,
# B, C and D are scheduled automatically.
# * the first constraint might cause gaps in the previously established paths. If a work package follows an automatically
# scheduled work package, and that preceding work package has children that are manually scheduled, the preciding
# work package will no longer be automatically scheduled and the same is then true for the following work package.
#
# To visualize the above:
# A <- follows - B <- follows C
# |
# hierarchy
# v
# D (manually)
# The first, path fetching CTE will return B, C and D. The constraint above will then remove B and D and the second
# constraint will remove C.
#
# The work packages that are identified to be in a direct or transitive relationship with the provided work packages and
# that neither have only manually scheduled children/descendants or would only be reachable via work packages for which
# the before mentioned constraint is true are returned. The provided work package is always excluded.
#
# @param work_packages WorkPackage[] A set of work packages for which the set of related work packages that might
# be subject to reschedule is fetched.
def fetch(work_packages)
return WorkPackage.none if work_packages.empty?
sql = <<~SQL
WITH
RECURSIVE
#{paths_sql(work_packages)},
#{paths_without_manual_hierarchy_sql},
#{paths_without_gaps_sql}
SELECT id FROM eligible_paths_without_gaps
SQL
WorkPackage
.where("id IN (#{sql})")
.where.not(id: work_packages)
end
private
# This recursive CTE fetches all work packages that are in a direct or transitive follows and/or hierarchy
# relationship with the provided work package.
#
# Hierarchy relationships are followed up as well as down (from and to) but follows relations are only followed
# from the predecessor to the successor (from_id to to_id).
#
# We will need the exact path (meaning all intermediate work packages) for the later filtering so for each
# recursive step the statement only adds the all the work packages directly connected to the current step and
# does not make use of the abilities of DAG. Using the transitive relationships provided by DAG should be possible
# but the constraints caused by PostgreSQL's implementation of recursive CTEs (no outer join of, no duplicate
# reference to and no subqueries with the recursive query) makes writing it extremly hard.
#
# While using DAG should theoretically be faster, as less iterative steps are required, the difference should
# not be noticeable.
#
# The CTE starts from the provided work package and for that returns:
# * the id of the work package
# * the path to that work package which is again the id but this time as a PostgreSQL array
# * again, a path, same as above but referred to as the path_root (explained below)
# * the information, that the starting work package is not manually scheduled.
# Whether the starting work package is manually scheduled or in fact automatically scheduled does make no
# difference but we need those four columns later on.
#
# For each recursive step, we return all work packages that are directly related to our current set of work
# packages by a hierarchy (up or down) or follows relationship (only successors). For each such work package
# the statement returns:
# * id of the work package that is currently at the end of a path.
# * the path to the added work package. This is the path of the work package the statement extended the path
# from (joined with) with the added work package appended.
# * the path_root which is the path up to the first work package that is within the current work package
# hierarchy. Whenever a new hierarchy is reached (indicated by joining a follow relationship), a new root
# path is created. If the hierarchy is kept, the root_path is taken from the recursive step before.
# The root_path is later on used to identify all work packages within the result set that are within
# the same hierarchy and that might need to be removed because of manual scheduling bubbling up the
# hierarchy tree. Therefore, follow relationships constructed between members of the same hierarchy are
# no problem as well.
# * the flag indicating whether the added work package is automatically or manually scheduled.
#
# Paths whose ending work package is marked to be manually scheduled are not joined with any more.
#
# The recursion ends when no more work packages can be added to the set either because:
# * There is no more work package with a relationship to the current set
# * The current paths all end in manually scheduled work packages
# Both conditions can also stop the recursion together.
def paths_sql(work_packages)
values = work_packages.map { |wp| "(#{wp.id},ARRAY[#{wp.id}], ARRAY[#{wp.id}], false)" }.join(', ')
<<~SQL
clean_paths (id, path, root_path, manually) AS (
SELECT * FROM (VALUES#{values}) AS t(id, path, root_path, manually)
UNION ALL
SELECT
CASE
WHEN relations.to_id = clean_paths.id
THEN relations.from_id
ELSE relations.to_id
END id,
CASE
WHEN relations.to_id = clean_paths.id
THEN array_append(path, relations.from_id)
ELSE array_append(path, relations.to_id)
END path,
CASE
WHEN relations.to_id = clean_paths.id AND relations.follows = 1
THEN array_append(path, relations.from_id)
ELSE clean_paths.root_path
END root_path,
work_packages.schedule_manually manually
FROM
clean_paths
JOIN
relations
ON NOT clean_paths.manually
AND (#{relations_condition_sql})
AND
((relations.to_id = clean_paths.id AND NOT relations.from_id = any(clean_paths.path))
OR (relations.from_id = clean_paths.id AND NOT relations.to_id = any(clean_paths.path) AND relations.follows = 0))
LEFT JOIN work_packages
ON (CASE
WHEN relations.to_id = clean_paths.id
THEN relations.from_id
ELSE relations.to_id
END) = work_packages.id
)
SQL
end
# Filters a set of paths (as returned by the recursive path constructing CTE above) to only contain
# work packages (and their paths) that are truly automatically scheduled.
# Even though a work package is flagged to be automatically scheduled, a work package can in fact be manually scheduled
# nonetheless if:
# * all of its paths towards their leafs have at least one manually scheduled work package in them.
#
# As the recursive CTE above terminates a paths once a manually scheduled work package is identified,
# those manually scheduled work packages are leafs for the sake of the set inserted into this query but might
# very well have children outside of the set.
#
# Identifying all leafs (for the sake of the set) is complicated by the possibility of having multiple
# follow relationships spanning into the same hierarchy tree. E.g. in a graph of
#
# C
# |
# hierarchy
# |
# v
# A <- follows - B
# ^ |
# | hierarchy
# | |
# | v
# | D (manually)
# | |
# | hierarchy
# | |
# | v
# -- follows - E
#
# D is excluded directly. But B and C also need to be considered manually scheduled as their descendant D is
# scheduled manually. But E (which is the actual leaf of that hierarchy) is reached via a different follows
# relationship.
#
# Please not that when D has an automatically scheduled sibling F:
#
# C
# |
# hierarchy
# |
# v
# A <- follows - B - hierarchy -
# ^ | |
# | hierarchy |
# | | |
# | v v
# | D (manually) F
# | |
# | hierarchy
# | |
# | v
# -- follows - E
#
# Neither B nor C are considered manually scheduled any more.
#
# The query works by joining the paths with itself and with the relations first to identify all paths (calculated by
# the CTE before) that lead to descendants of a work package. Here, the root_path is considered to avoid mixing
# individual follows relationships jumps.
# Next, the paths are joined again to identify those, that have no longer paths.
# The result are all paths that lead to descendants of a work packages identified in the path that have no longer paths
# which, within the set, are the leafs. Of those, only the paths are returned that do not lead to a manually scheduled
# work package.
# This step also removes all work packages that are scheduled manually directly.
def paths_without_manual_hierarchy_sql
<<~SQL
paths_without_manual_hierarchy AS (
SELECT
paths.id,
paths.path
FROM
clean_paths paths
LEFT JOIN
relations
ON
relations.from_id = paths.id AND "relations"."follows" = 0 AND (#{relations_condition_sql(transitive: true)})
LEFT JOIN
clean_paths to_paths
ON
relations.to_id = to_paths.id AND to_paths.root_path = paths.root_path
LEFT JOIN
clean_paths longer_paths
ON
longer_paths.path[1:array_length(longer_paths.path, 1) - 1] = to_paths.path
AND to_paths.root_path = longer_paths.root_path
AND longer_paths.path <> paths.path
WHERE longer_paths.id IS NULL
AND NOT (paths.manually OR COALESCE(to_paths.manually, false))
)
SQL
end
# Returns all paths that do not include intermediary hops (work packages) that are not within the set of paths
# themselves.
# This serves as a second filter after work packages scheduled manually by transition are removed from the set.
# E.g in a graph of
# A <- follows - B <- follows C
# |
# hierarchy
# v
# D (manually)
#
# The recursive CTE will return A, B, C and D, with D flagged as manually scheduled. The first filter will then remove
# D and B from the set. Now, there is no longer a connection between A and C. So the query below removes C from the
# result as well.
def paths_without_gaps_sql
<<~SQL
eligible_paths_without_gaps AS (
SELECT
*
FROM
paths_without_manual_hierarchy
WHERE
path <@ (SELECT array_agg(id) FROM paths_without_manual_hierarchy)
)
SQL
end
def relations_condition_sql(transitive: false)
<<~SQL
"relations"."relates" = 0 AND "relations"."duplicates" = 0 AND "relations"."blocks" = 0 AND "relations"."includes" = 0 AND "relations"."requires" = 0
AND (relations.hierarchy + relations.relates + relations.duplicates + relations.follows + relations.blocks + relations.includes + relations.requires #{transitive ? '>' : ''}= 1)
SQL
end
end
end
end