A heuristic technique for decomposing multisets of non-negative integers according to the Minkowski sum

We study the following problem. Given a multiset $M$ of non-negative integers, decide whether there exist and, in the positive case, compute two non-trivial multisets whose Minkowski sum is equal to $M$. The Minkowski sum of two multisets A and B is a multiset containing all possible sums of any element of A and any element of B. This problem was proved to be NP-complete when multisets are replaced by sets. This version of the problem is strictly related to the factorization of boolean polynomials that turns out to be NP-complete as well. When multisets are considered, the problem is equivalent to the factorization of polynomials with non-negative integer coefficients. The computational complexity of both these problems is still unknown. The main contribution of this paper is a heuristic technique for decomposing multisets of non-negative integers. Experimental results show that our heuristic decomposes multisets of hundreds of elements within seconds independently of the magnitude of numbers belonging to the multisets. Our heuristic can be used also for factoring polynomials in N[x]. We show that, when the degree of the polynomials gets larger, our technique is much faster than the state-of-the-art algorithms implemented in commercial software like Mathematica and MatLab.


Introduction
The idea of decomposing a mathematical object into the sum (product, or other operations) of smaller ones is definitely not new.A huge literature has been devoted to the factorization of numbers, polynomials, matrices, graphs and many other mathematical objects, including sets and multisets.The basic idea behind factorization is decomposing a complex object into smaller and easier to analyze pieces.Properties satisfied by each piece might shed some light on the properties satisfied by the entire object.As an example, from irreducible factors of a polynomial, we can recover valuable information about its roots.In this paper, we study the decomposition of multisets of non-negative integers according to the Minkowski sum.Multisets are an extension of the notion of sets where, basically, multiple copies of the same element are allowed.The Minkowski sum is a binary operation that can be applied both to sets and multisets.The Minkowski sum of two multisets A and B is a multiset containing all possible sums of any element of A and any element of B.
Given a multiset M of non-negative integers, the decomposition problem asks for computing two non-trivial multisets whose Minkowski sum is equal to M .Multisets theory have applications in many fields Singh et al. (2007), e.g., in combinatorics Anderson (2002); Stanley (2011); Stanley and Fomin (1999), in the theory of relational databases Grumbach and Milo (1996); Henglein et al. (2022); Lamperti et al. (2000), in multigraphs theory DeVos et al. (2013); Dudek et al. (2013) and in computational geometry Emiris et al. (2017).The problem of decomposing multisets of non-negative integers is strictly related to the problem of factoring univariate polynomials with non-negative coefficients (see Section 2.1 for details).Even if this problem arises in a very natural way in a number of different theoretical and practical contexts, it has not been thoroughly studied (see for example Brunotte (2013); Campanini and Facchini (2019); Van de Woestijne (2012)) and its computational complexity is still unknown.To our knowledge, no polynomial time algorithm nor an NP-completeness proof exists.When multisets are replaced by sets, the decomposition problem was proved to be NP-complete Kim and Roush (2005).Other variants of the Minkowski sum decomposition problem have been studied.As an example, in Gao and Lauder (2001) the authors study the Minkowski decomposition of integral convex polytopes proving that the decisional version of this problem is again NP-complete.
The main contribution of this paper is a heuristic technique for decomposing multisets of non-negative integers which, in turn, can be applied to factoring polynomials with non-negative coefficients.
The idea behind our algorithm is to transform the decomposition problem in an optimization problem by introducing a score function for candidate solutions.A candidate solution is an approximation of a solution.The score function measures the quality of candidate solutions, i.e., the similarity to the actual solution (not necessarily unique).The score function reaches its maximum (whose value is known in advance) only at a solution for the problem.Our algorithm starts from a randomly generated candidate solution s 0 and iteratively improves it until it finds a local optimum candidate solution s k according to the score function.If s k reaches the maximum score the algorithms terminates, otherwise it starts over from another initial candidate solution computed starting from s k .The maximum number of iterations is bounded by a predetermined threshold.
We extensively tested our algorithm over randomly generated instances of different size and structure.Experimental results (see Section 4 and Tables in Appendix A and B) show that after a small number of iterations our algorithm almost always finds a solution.
As far as polynomial with non-negative coefficients factorization is concerned, no efficient and specifically designed algorithms are known.A possible natural strategy to solve this problem might consist of factoring the polynomial in Z[x] (this can be done in polynomial time) and then suitably grouping factors in Z[x] in order to get factors in N[x].Unfortunately, there exists no efficient algorithm to perform the grouping of factors whose number can be, in general, exponentially large.In our opinion, this is an interesting problem in itself.Since decomposing multisets of non-negative integers is equivalent (under some conditions we will discuss in Section 2.1) to the problem of factoring polynomials in N[x], the alternative strategy might also be used for decomposing multisets.In Section 5 we make a comparison between our algorithm and the alternative strategy depicted above unrealistically assuming that the grouping of factors can be computed for free.We used built-in functions provided in Wolfram Mathematica language for integer polynomials factorization (similar results have been found using MatLab).
Experimental results clearly show (see Tables 13,14 and 15 in Appendix B) that, when the degree of polynomials increases, our technique is much faster than going through factoring.Reversing the line of reasoning, i.e., using multisets decomposition techniques for factoring polynomials in N[x], our heuristics becomes a serious candidate to be the first effective method for factoring polynomials with non-negative coefficients.
The rest of this paper is organized as follows.In Section 2 we give basic definitions and known results.In Section 3 we describe our heuristics and we provide its pseudocode.In Section 4 we show experimental results.In Section 5 we make a comparison between our algorithm and an alternative strategy for decomposing multisets based on integer polynomial factorization.Section 6 contains conclusions and some ideas for further works.Appendices A and B contain tables with experimental data.

Definitions and Known Results
Let Z be the set of integers and Z[x] be the sets of univariate polynomials with coefficients in Z.Let N be the set of non-negative integers and N[x] be the sets of univariate polynomials with coefficients in N.
Multisets are an extension of the concept of sets.While a set can contain only one occurrence of any given element, a multiset may contain multiple occurrences of the same element.To distinguish multisets from sets, we will represent multisets by using double braces.
As an example M = {{2, 2, 3, 3, 5, 5, 5, 5, 5, 6, 8, 8}} is a multiset.Given a multiset M we denote by µ(x, M ) the number of occurrences (possibly 0) of the element x in M .Sometimes we will represent a multiset M as a set of pairs (element, µ(element, M ).With this notation, the above multiset can be written as M = {(2, 2), (3, 2), (5, 5), (6, 1), (8, 2)}.In what follows, we will consider sets and multisets of numbers.This enable us to define a binary operation on them (denoted by the symbol ⊕) sometimes called Minkowski sum.We will use the symbol ⊕ both for sets and multisets sum inferring the type of operation from the type of operands.
The identity element with respect to the set sum is {0} and the identity element with respect to the multiset sum is {{0}}.
We also define the multiset difference operation (denoted by the \ symbol) as follows.
Definition 3 (Reducible multiset (set)).A multiset (set) M of non-negative integers is reducible if and only if there exist two multisets (sets) A and B, both of them different from the identity element, such that A multiset (set) M of non-negative integers is irreducible (sometimes called prime) if and only if it is not reducible.We are now ready to state the following two problems.
Definition 4 (SET-RED).Given a set S of non-negative integers, decide whether S is reducible or not.
Definition 5 (MULTISET-RED).Given a multiset M of non-negative integers, decide whether M is reducible or not.
The following result was proved in Gao and Lauder (2001).
Unlike SET-RED, the computational complexity of MULTISET-RED is, to our knowledge, still unknown.This leads us to state the following open question.

Question 1. Is MULTISET-RED NP-complete ?
Even if we have defined SET-RED and MULTISET-RED in their decisional version, in the rest of this paper we will refer to them (with a little abuse of notation) as constructive problems, i.e, the problem of effectively computing two multisets (sets) whose Minkowski sum is equal to the multiset (set) received as input.
In the next example we show that the irreducible factorization of non-negative integer multisets is not unique.This makes the problem of factoring multisets even harder, if possible.

Multisets decomposition and polynomials factorization
One of the most studied problem in computer algebra is the problem of factoring polynomials.A huge literature has been devoted to the factorization of polynomials (without claim of exhaustiveness see Hoeij (2002); Lenstra et al. (1982); Kaltofen (1992)).The first polynomial factorization algorithm was published by Theodor Von Schubert in 1793 Schubert (1793).Since then, dozens of papers on the computational complexity of polynomial factorization have been published.In 1982, Arjen K. Lenstra, Hendric W. Lenstra, and László Lovász Lenstra et al. (1982) published the first polynomial time algorithm for factoring polynomials over Q and then over Z.
The problem of factoring polynomials over a ring can be, in a sense, labeled as "well studied" and "efficiently solved".The same cannot be said when rings are replaced by semirings (e.g. the natural numbers).Unlike the case of factoring polynomials over rings, the problem of factoring polynomials over semirings has received far less attention, there are far fewer known results and many interesting unanswered questions.One of them is the following.
As far as we know, for the N-POLY-RED problem, there are neither polynomial algorithms to solve it nor proofs of NP-completeness.N-POLY-RED problem is strictly related to the MULTISET-RED problem.
To any given polynomial p(x) ∈ N[x] it is possible to associate a multiset as follows.Let On the other hand, we can associate to any multiset As a consequence of these properties we have that -M is an irreducible multiset of non-negative integers if and only if P olynomial(M ) is an irreducible polynomial over N[x] and p is an irreducible polynomial over N[x] if and only if M ultiset(p) is an irreducible multiset of nonnegative integers.
Unfortunately, in the general case, the size of M ultiset(p) may be exponentially larger than the size of p.This prevents us from readily translating computational complexity results for MULTISET-RED into equivalent results for N-POLY-RED and viceversa.
Taking advantage of Example 3 we show that the irreducible factorization of polynomials in N[x] is not unique.

The Heuristics
In this section we provide a complete description of our heuristics by using pseudocode (for details see pages from 20 to 22 in Cormen et al. (2009)).
Given a multiset M of n non-negative integer numbers, a candidate solution for M is any multiset A (A = {{0}}) of cardinality m such that A ⊆ M and m divides n.A candidate solution A for M is also a solution for M if and only if there exists another candidate solution B (B = {{0}}) for M such that M = A ⊕ B. Given a candidate solution A for M , deciding whether A is also a solution for M can be done in polynomial time.Given a solution A for M , computing B such that M = A ⊕ B can be done in polynomial time.
Our heuristics starts from an initial candidate solution of a given cardinality and iteratively improves it (according to a given score function) until it finds a solution.The cardinality m of the initial candidate solution is unknown in advance but must divide the cardinality of M .For computing an actual decomposition of a multiset M of cardinality n we have to run our algorithm on all possible factors f of n with f ≤ √ n.We are aware that this leads to an overhead of computation, but luckily, the number of factors of any positive integer n (not exceeding √ n) is very small if compared to n.For every positive integer n, with 100 ≤ n ≤ 100.000, we computed its number of factors divided by n.It turns out that the average of these ratios is 0.00025 and the maximum is 0.058 (higher values are obtained for small numbers).For these reasons, in what follows, we will assume that the target cardinality of solutions is known.
We now give the pseudocode of each function used in our heuristics and a short explanation on how it works.

SCORE(M,
and the elements of mat would give exactly the multiset M . Assume now to run SCORE(M, C).Where C = {{0, 1, 2, 6}} is a candidate solution but not a solution.SCORE(M, C) returns 6.The matrix mat would now have the form The element at row 2 and column 3 (2 + 2 = 4) in mat cannot be found in M (note that we have already removed 0, 1, 2, 6, 2 and 3 from M ) and then SCORE(M, C) stops at line 21 returning 6, i.e., the number of elements correctly placed in mat until that moment.NEWINITIALSOLUTION takes as input a multiset M and a candidate solution S for M and returns a new initial candidate solution.To better understand how NEWINITIALSOLUTION works, we show its behavior on an example.Let A = {{0, 1, 3, 3}}, B = {{0, 2, 2, 6}}, and 2, 3, 3, 3, 3, 5, 5, 5, 5, 6, 7,

Experimental results
We tested our algorithm on an iMac equipped with a 4.2 GHz Intel Core i7 quad-core processor and 32 GB RAM (2400 MHz DDR4 ).Operating System: macOS Monterey Version 12.2.1.Our algorithm has been implemented in Wolfram Mathematica language (Version 12).To make the code more readable even to those unfamiliar with the Mathematica language, we decided to describe it providing a pseudocode version (see Section 3).
Our algorithm has been extensively tested over instances (multisets of non-negative integers) of different size and structure.Instances depend on two parameters, namely structure and range, and have been generated according to the following procedure.
For each structure and range, we tested our algorithms on a large number of instances collecting results in Tables 1 to 12 in Appendix A.
Columns of Tables contain the following data.1. Size: size of the input, i.e., cardinality of the considered multiset to find an algorithm for efficiently computing GROUP(fl)), computes a partition P = {P 1 , P 2 } (if there exists one) of the factor list fl such that the product of all the polynomials in P 1 and the product of all the polynomials in P 2 have non-negative coefficients.
In what follows we will assume that the computational cost of Line 4 is zero.Table 13 to 15 compare running times of ITERATEDSEARCH and ALTERNATIVESTRATEGY for multisets with homogeneous structure and increasing ranges.
For computing the factor list at Line 3 of ALTERNATIVESTRATEGY we make use of the function FACTORLIST provided by Mathematica Language (similar results are obtained by using the function FACTOR of MatLab).
Experimental results (see Tables 13,14 and 15) clearly show that the running time of ITERATEDSEARCH is independent of the magnitude of numbers in the multisets (exponents in the polynomials).ITERATED-SEARCH is much faster than ALTERNATIVESTRATEGY in the case of multisets containing large numbers and small multiplicity.
Doing the reverse path enable us to give a new technique for decomposing polynomials in N[x] based on ITERATEDSEARCH.

N-POLYFACT(p)
We end this section by giving a small multiset M of non-negative integers that ITERATEDSEARCH decomposes in 0.008 seconds.ALTERNATIVESTRATEGY (both using Mathematica and MatLab factorization primitives) called on the same multiset, after 24 hours of computation, was unable to find any solution.
A natural extension of this work is replacing non-negative integers with more complex mathematical objects.It would be of some interest to investigate the case of d dimensional vectors of non-negative integers with d > 1.The problem of decomposing multisets of d dimensional vectors is strictly related to the problem of factoring multivariate polynomials with non-negative coefficients, but also to a number of problems arising, for example, in the field of computational geometry and seems to be more challenging than the 1 dimensional case.
It would be interesting to investigate whether the combination of the results obtained by using our algorithm on single components of the d dimensional object can be of any help for solving the global problem.
Last case.Assume to run SCORE(M, C).Where C = {{0, 2, 2, 5}} is again a candidate solution but not a solution.SCORE(M, C) returns 11.The matrix mat would have now the form The element at row 3 and column 4 (3 + 5 = 8) in mat cannot be found in M and then SCORE(M, C) stops at line 21 returning 11, i.e., the number of elements correctly placed in mat until that moment.NEIGHBORSEARCH(M, S)1 // invariant: S[1] = 0 and S.length divides M.length 2 initial score = SCORE(M, S) 3 alternatives = DELETEDUPLICATES(M \ S) 4 for i = 2 to S.length 5 for j = 1 to alternatives.length6temp = S[i] 7 S[i] =alternatives[j] 8 new score = SCORE(M, S) 9 if new score > initial score 10 return (new score, S) 11 else S[i] = temp 12 return (initial score, S) NEIGHBORSEARCH takes as input a multiset M and a candidate solution S for M and returns a candidate solution N in the neighborhood of S such that SCORE(M, N ) > SCORE(M, S), if any.Returns S, otherwise.Given a multiset M and a candidate solution S for M , a neighbor of S is any candidate solution for M differing from S for exactly 1 element.To speed up the process, NEIGHBORSEARCH returns (line 11) the first improved candidate solution found.FINDLOCALOPT(M, S)1 // invariant: S[1] = 0 2 n = M.length 3 current score = SCORE(M, S) 4 while TRUE 5 (score, S) = NEIGHBORSEARCH(M, S) 6 if score = = n 7 return (TRUE, S) 8 if score = = current score 9 return (FALSE, S) 10 current score = scoreFINDLOCALOPT takes as input a multiset M and a candidate solution S for M and returns a candidate solution N with the property of being the best candidate solution in its neighbor, i.e., a local optimum.To accomplish this task, FINDLOCALOPT keeps on calling NEIGHBORSEARCH on improved solutions until no more improvement is found.Note that the candidate solution N produced by FINDLOCALOPT is not guaranteed to be a solution.ITERATEDSEARCH(M, m, iterations)1 // invariant: m divides M.length 2 current solution = INITIALSOLUTION(M, m) 3 for i = 1 to iterations 4 (f ound, S) = FINDLOCALOPT(M, current solution) 5 if f ound 6 return S 7 current solution = NEWINITIALSOLUTION(M, current solution) 8// note that current solution contains 0 9 return solution not found ITERATEDSEARCH takes as input a multiset M , an integer m > 1 dividing the cardinality of M and an upper bound on the number of iterations and returns a solution of cardinality m, if found.ITERAT-EDSEARCH keeps on calling FINDLOCALOPT with different initial candidate solutions (computed by NEWINITIALSOLUTION) until a solution is found or the maximum number of iterations is exceeded.NEWINITIALSOLUTION(M,S) 1 // invariant: S[1] = 0, all the elements of S are in M and S.length divides M.length 2 col = S.length 3 row = M.length/col 4 // Let mat be an row × col matrix whose entries are set to 0 5 R = M \ S 6 // first row of mat gets S 7 new set = S 8 for i = 2 to row 9 w = MIN(R) 10 R = R \ {w} 11 new set = new set {w} 12 // mat[row, 1] = w 13 for j = 2 to col 14 c = w + S[j] 15 if c ∈ R 16 r = R \ {c} 17 // mat[row, col] = c 18 else return RANDOMSAMPLE(new set, col) 19 // RANDOMSAMPLE(new set, col) must contain 0 20 return RANDOMSAMPLE(new set, col) 9, 9}}Assume to run NEWINITIALSOLUTION(M, C).Where C = {{0, 2, 2, 5}} is a candidate solution but not a solution.The matrix mat, if computed, would have the form C) stops at line 20 returning {{0, 2, 2, 5, 1, 3}}, i.e., the union of the first row of mat and the initial part (first 3 elements) of the first column of mat.Experimental results clearly show that solutions to the problem contains with high probability elements placed in the first row or in the first column of the matrix mat associated to the local optimum candidate solution.
Tab. 1: Range = 5.Number of tested instances for each structure: 1000.Range = 5.Number of tested instances for each structure: 1000.Range = 5.Number of tested instances for each structure: 1000.Range = 5.Number of tested instances for each structure: 1000.For Size = 16384, due to time limits, we reduced the number of instances to 300.Range = 5.Number of tested instances for each structure: 1000.For Size = 15625, due to time limits, we reduced the number of instances to 300.Range = 10000.Number of instances for each structure: 1000.Tab.10: Range = 10000.Number of instances for each structure: 1000.Tab.11: Range = 10000.Number of instances for each structure: 1000.For Size = 16384, due to time limits, we reduced the number of instances to 300.Tab.12: Range = 10000.Number of instances for each structure: 1000.For Size = 3125 and Size = 15625, due to time limits, we reduced the number of instances to 100.Running times for ITERATEDSEARCH and ALTERNATIVESTRATEGY called on multisets with different range values and structure = {5, 5}.Number of instances for each range: 100.Tab.14: Running times for ITERATEDSEARCH and ALTERNATIVESTRATEGY called on multisets with different range values and structure = {10, 10}.Number of instances for each range: 100.Running times for ITERATEDSEARCH and ALTERNATIVESTRATEGY called on multisets with different range values and structure = {2} 12 .Number of instances for each range: 100.