#-- encoding: UTF-8 #-- copyright # OpenProject is an open source project management software. # Copyright (C) 2012-2020 the OpenProject GmbH # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License version 3. # # OpenProject is a fork of ChiliProject, which is a fork of Redmine. The copyright follows: # Copyright (C) 2006-2017 Jean-Philippe Lang # Copyright (C) 2010-2013 the ChiliProject Team # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License # as published by the Free Software Foundation; either version 2 # of the License, or (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. # # See docs/COPYRIGHT.rdoc for more details. #++ # module WorkPackages::Scopes class ForScheduling class << self # Fetches all work packages that need to be evaluated for eventual rescheduling after a related (i.e. follows/precedes # and hierarchy) work package is modified or created. # # The SQL relies on CTEs which, after constructing the set of all potential work_packages then filter down the # work packages to the actually affected work packages. The set of potentially affected work packages can be diminished by # manually schedule work packages. # # The first CTE works recursively to fetch all work packages related to the provided work package and the path of # intermediate work packages. The work packages can either be connected via a follows relationship, a hierarchy relationship # or a combination of both. # E.g. in a graph of # A <- follows - B <- hierarchy (C is parent of B) - C <- follows D # # D would also be subject to reschedule. # # At least for hierarchical relationships, we need to follow the relationship in both directions. # E.g. in a graph of # A <- follows - B - hierarchy (B is parent of C) -> C <- follows D # # D would also be subject to reschedule. # # That possible switch in direction means that we cannot simply get all possibly affected work packages by one # SQL query which the DAG implementation would have allowed us to do otherwise. # Additionally, we need to get the whole paths (with all intermediate work packages included) which would be possible # with DAG but as we need to rely on a recursive approach already we do not need to complicate the SQL statement any # further. Fetching the whole path (at least in one direction) relying on DAG would be faster though # so we might revisit this if any performance shortcomings are identified. # The first CTE returns all work packages with their path so reusing the example above, the result would be # id | path # A | {A} # B | {A,B} # C | {A,B,C} # D | {A,B,C,D} # If the graph where to contain multiple paths to one node work package, because of multiple follows relationship # to the same hierarchical tree, the work package would be returned twice with different paths. # # The paths are followed until either: # * no more follows and/or hierarchy relations can be followed # * a manually scheduled work package is encountered. # # So if, in the example above, B would be manually scheduled, the first CTE would only return # id | path # A | {A} # B | {A,B} # # The interim result, provided by the first CTE, is thus the set of all work packages, that are in a direct or transitive # follows and/or hierarchy relationship up until the point where the relationships end or a manually scheduled work package # is encountered. # # That set needs to be filtered down because of additional constraints on scheduling: # * Manually scheduled work packages prevent automatic scheduling up the hierarchy chain. So even with an existing follows # relationship work packages might not be scheduled automatically if their children or descendants are automatically # scheduled. This is only true for a work package if *all* the children are manually scheduled either directly or because # their respective children are all scheduled manually. In case of the hierarchy # A and B <- hierarchy (C is parent of both A and B) C <- D # if A and B are both scheduled manually, C is also scheduled manually and so is D. But if only A is scheduled manually, # B, C and D are scheduled automatically. # * the first constraint might cause gaps in the previously established paths. If a work package follows an automatically # scheduled work package, and that preceding work package has children that are manually scheduled, the preciding # work package will no longer be automatically scheduled and the same is then true for the following work package. # # To visualize the above: # A <- follows - B <- follows C # | # hierarchy # v # D (manually) # The first, path fetching CTE will return B, C and D. The constraint above will then remove B and D and the second # constraint will remove C. # # The work packages that are identified to be in a direct or transitive relationship with the provided work packages and # that neither have only manually scheduled children/descendants or would only be reachable via work packages for which # the before mentioned constraint is true are returned. The provided work package is always excluded. # # @param work_packages WorkPackage[] A set of work packages for which the set of related work packages that might # be subject to reschedule is fetched. def fetch(work_packages) return WorkPackage.none if work_packages.empty? sql = <<~SQL WITH RECURSIVE #{paths_sql(work_packages)}, #{paths_without_manual_hierarchy_sql}, #{paths_without_gaps_sql} SELECT id FROM eligible_paths_without_gaps SQL WorkPackage .where("id IN (#{sql})") .where.not(id: work_packages) end private # This recursive CTE fetches all work packages that are in a direct or transitive follows and/or hierarchy # relationship with the provided work package. # # Hierarchy relationships are followed up as well as down (from and to) but follows relations are only followed # from the predecessor to the successor (from_id to to_id). # # We will need the exact path (meaning all intermediate work packages) for the later filtering so for each # recursive step the statement only adds the all the work packages directly connected to the current step and # does not make use of the abilities of DAG. Using the transitive relationships provided by DAG should be possible # but the constraints caused by PostgreSQL's implementation of recursive CTEs (no outer join of, no duplicate # reference to and no subqueries with the recursive query) makes writing it extremly hard. # # While using DAG should theoretically be faster, as less iterative steps are required, the difference should # not be noticeable. # # The CTE starts from the provided work package and for that returns: # * the id of the work package # * the path to that work package which is again the id but this time as a PostgreSQL array # * again, a path, same as above but referred to as the path_root (explained below) # * the information, that the starting work package is not manually scheduled. # Whether the starting work package is manually scheduled or in fact automatically scheduled does make no # difference but we need those four columns later on. # # For each recursive step, we return all work packages that are directly related to our current set of work # packages by a hierarchy (up or down) or follows relationship (only successors). For each such work package # the statement returns: # * id of the work package that is currently at the end of a path. # * the path to the added work package. This is the path of the work package the statement extended the path # from (joined with) with the added work package appended. # * the path_root which is the path up to the first work package that is within the current work package # hierarchy. Whenever a new hierarchy is reached (indicated by joining a follow relationship), a new root # path is created. If the hierarchy is kept, the root_path is taken from the recursive step before. # The root_path is later on used to identify all work packages within the result set that are within # the same hierarchy and that might need to be removed because of manual scheduling bubbling up the # hierarchy tree. Therefore, follow relationships constructed between members of the same hierarchy are # no problem as well. # * the flag indicating whether the added work package is automatically or manually scheduled. # # Paths whose ending work package is marked to be manually scheduled are not joined with any more. # # The recursion ends when no more work packages can be added to the set either because: # * There is no more work package with a relationship to the current set # * The current paths all end in manually scheduled work packages # Both conditions can also stop the recursion together. def paths_sql(work_packages) values = work_packages.map { |wp| "(#{wp.id},ARRAY[#{wp.id}], ARRAY[#{wp.id}], false)" }.join(', ') <<~SQL clean_paths (id, path, root_path, manually) AS ( SELECT * FROM (VALUES#{values}) AS t(id, path, root_path, manually) UNION ALL SELECT CASE WHEN relations.to_id = clean_paths.id THEN relations.from_id ELSE relations.to_id END id, CASE WHEN relations.to_id = clean_paths.id THEN array_append(path, relations.from_id) ELSE array_append(path, relations.to_id) END path, CASE WHEN relations.to_id = clean_paths.id AND relations.follows = 1 THEN array_append(path, relations.from_id) ELSE clean_paths.root_path END root_path, work_packages.schedule_manually manually FROM clean_paths JOIN relations ON NOT clean_paths.manually AND (#{relations_condition_sql}) AND ((relations.to_id = clean_paths.id AND NOT relations.from_id = any(clean_paths.path)) OR (relations.from_id = clean_paths.id AND NOT relations.to_id = any(clean_paths.path) AND relations.follows = 0)) LEFT JOIN work_packages ON (CASE WHEN relations.to_id = clean_paths.id THEN relations.from_id ELSE relations.to_id END) = work_packages.id ) SQL end # Filters a set of paths (as returned by the recursive path constructing CTE above) to only contain # work packages (and their paths) that are truly automatically scheduled. # Even though a work package is flagged to be automatically scheduled, a work package can in fact be manually scheduled # nonetheless if: # * all of its paths towards their leafs have at least one manually scheduled work package in them. # # As the recursive CTE above terminates a paths once a manually scheduled work package is identified, # those manually scheduled work packages are leafs for the sake of the set inserted into this query but might # very well have children outside of the set. # # Identifying all leafs (for the sake of the set) is complicated by the possibility of having multiple # follow relationships spanning into the same hierarchy tree. E.g. in a graph of # # C # | # hierarchy # | # v # A <- follows - B # ^ | # | hierarchy # | | # | v # | D (manually) # | | # | hierarchy # | | # | v # -- follows - E # # D is excluded directly. But B and C also need to be considered manually scheduled as their descendant D is # scheduled manually. But E (which is the actual leaf of that hierarchy) is reached via a different follows # relationship. # # Please not that when D has an automatically scheduled sibling F: # # C # | # hierarchy # | # v # A <- follows - B - hierarchy - # ^ | | # | hierarchy | # | | | # | v v # | D (manually) F # | | # | hierarchy # | | # | v # -- follows - E # # Neither B nor C are considered manually scheduled any more. # # The query works by joining the paths with itself and with the relations first to identify all paths (calculated by # the CTE before) that lead to descendants of a work package. Here, the root_path is considered to avoid mixing # individual follows relationships jumps. # Next, the paths are joined again to identify those, that have no longer paths. # The result are all paths that lead to descendants of a work packages identified in the path that have no longer paths # which, within the set, are the leafs. Of those, only the paths are returned that do not lead to a manually scheduled # work package. # This step also removes all work packages that are scheduled manually directly. def paths_without_manual_hierarchy_sql <<~SQL paths_without_manual_hierarchy AS ( SELECT paths.id, paths.path FROM clean_paths paths LEFT JOIN relations ON relations.from_id = paths.id AND "relations"."follows" = 0 AND (#{relations_condition_sql(transitive: true)}) LEFT JOIN clean_paths to_paths ON relations.to_id = to_paths.id AND to_paths.root_path = paths.root_path LEFT JOIN clean_paths longer_paths ON longer_paths.path[1:array_length(longer_paths.path, 1) - 1] = to_paths.path AND to_paths.root_path = longer_paths.root_path AND longer_paths.path <> paths.path WHERE longer_paths.id IS NULL AND NOT (paths.manually OR COALESCE(to_paths.manually, false)) ) SQL end # Returns all paths that do not include intermediary hops (work packages) that are not within the set of paths # themselves. # This serves as a second filter after work packages scheduled manually by transition are removed from the set. # E.g in a graph of # A <- follows - B <- follows C # | # hierarchy # v # D (manually) # # The recursive CTE will return A, B, C and D, with D flagged as manually scheduled. The first filter will then remove # D and B from the set. Now, there is no longer a connection between A and C. So the query below removes C from the # result as well. def paths_without_gaps_sql <<~SQL eligible_paths_without_gaps AS ( SELECT * FROM paths_without_manual_hierarchy WHERE path <@ (SELECT array_agg(id) FROM paths_without_manual_hierarchy) ) SQL end def relations_condition_sql(transitive: false) <<~SQL "relations"."relates" = 0 AND "relations"."duplicates" = 0 AND "relations"."blocks" = 0 AND "relations"."includes" = 0 AND "relations"."requires" = 0 AND (relations.hierarchy + relations.relates + relations.duplicates + relations.follows + relations.blocks + relations.includes + relations.requires #{transitive ? '>' : ''}= 1) SQL end end end end