GAP (FactInt) - Chapter 3: The Routines for Specific Factorization Methods

Descriptions of the factoring methods implemented in this package can be found in [Bre89] and [Coh93]. Cohen's book contains also descriptions of the other methods mentioned in the preface.

3.1 Trial division

3.1-1 FactorsTD

‣ FactorsTD( n[, Divisors] ) ( function )

Returns: a list of two lists: The first list contains the prime factors found, and the second list contains remaining unfactored parts of n, if there are any.

This function tries to factor n by trial division. The optional argument Divisors is the list of trial divisors. If not given, it defaults to the list of primes p < 1000.


gap> FactorsTD(12^25+25^12);
[ [ 13, 19, 727 ], [ 5312510324723614735153 ] ]

3.2 Pollard's p-1

3.2-1 FactorsPminus1

‣ FactorsPminus1( n[[, a], Limit1[, Limit2]] ) ( function )

Returns: a list of two lists: The first list contains the prime factors found, and the second list contains remaining unfactored parts of n, if there are any.

This function tries to factor n using Pollard's p-1. It uses a as base for exponentiation, Limit1 as first stage limit and Limit2 as second stage limit. If the function is called with three arguments, these arguments are interpreted as n, Limit1 and Limit2. Defaults are chosen for all arguments which are omitted.

Pollard's p-1 is based on the fact that exponentiation (mod n) can be done efficiently enough to compute a^k! mod n for sufficiently large k in a reasonable amount of time. Assume that p is a prime factor of n which does not divide a, and that k! is a multiple of p-1. Then Lagrange's Theorem states that a^k! ≡ 1 (mod p). If k! is not a multiple of q-1 for another prime factor q of n, it is likely that the factor p can be determined by computing gcd(a^k!-1,n). A prime factor p is usually found if the largest prime factor of p-1 is not larger than Limit2, and the second-largest one is not larger than Limit1. (Compare with FactorsPplus1 (3.3-1) and FactorsECM (3.4-1).)


gap> FactorsPminus1( Factorial(158) + 1, 100000, 1000000 );
[ [ 2879, 5227, 1452486383317, 9561906969931, 18331561438319, 
      4837142997094837608115811103417329505064932181226548534006749213\
4508231090637045229565481657130504121732305287984292482612133314325471\
3674832962773107806789945715570386038565256719614524924705165110048148\
7161609649806290811760570095669 ], [  ] ]
gap> List( last[ 1 ]{[ 3, 4, 5 ]}, p -> Factors( p - 1 ) );
[ [ 2, 2, 3, 3, 81937, 492413 ], [ 2, 3, 3, 3, 5, 7, 7, 1481, 488011 ]
    , [ 2, 3001, 7643, 399613 ] ]

3.3 Williams' p+1

3.3-1 FactorsPplus1

‣ FactorsPplus1( n[[, Residues], Limit1[, Limit2]] ) ( function )

Returns: a list of two lists: The first list contains the prime factors found, and the second list contains remaining unfactored parts of n, if there are any.

This function tries to factor n using Williams' p+1. It tries Residues different residues, and uses Limit1 as first stage limit and Limit2 as second stage limit. If the function is called with three arguments, these arguments are interpreted as n, Limit1 and Limit2. Defaults are chosen for all arguments which are omitted.

Williams' p+1 is very similar to Pollard's p-1 (see FactorsPminus1 (3.2-1)). The difference is that the underlying group here can either have order p+1 or p-1, and that the group operation takes more time. A prime factor p is usually found if the largest prime factor of the group order is at most Limit2 and the second-largest one is not larger than Limit1. (Compare also with FactorsECM (3.4-1).)


gap> FactorsPplus1( Factorial(55) - 1, 10, 10000, 100000 );
[ [ 73, 39619, 277914269, 148257413069 ], 
  [ 106543529120049954955085076634537262459718863957 ] ]
gap> List( last[ 1 ], p -> [ Factors( p - 1 ), Factors( p + 1 ) ] );
[ [ [ 2, 2, 2, 3, 3 ], [ 2, 37 ] ], 
  [ [ 2, 3, 3, 31, 71 ], [ 2, 2, 5, 7, 283 ] ], 
  [ [ 2, 2, 2207, 31481 ], [ 2, 3, 5, 9263809 ] ], 
  [ [ 2, 2, 47, 788603261 ], [ 2, 3, 5, 13, 37, 67, 89, 1723 ] ] ]

3.4 The Elliptic Curves Method (ECM)

3.4-1 FactorsECM

‣ FactorsECM( n[, Curves[, Limit1[, Limit2[, Delta]]]] ) ( function )

‣ ECM( n[, Curves[, Limit1[, Limit2[, Delta]]]] ) ( function )

Returns: a list of two lists: The first list contains the prime factors found, and the second list contains remaining unfactored parts of n, if there are any.

This function tries to factor n using the Elliptic Curves Method (ECM). The argument Curves is the number of curves to be tried. The argument Limit1 is the initial first stage limit, and Limit2 is the initial second stage limit. The argument Delta is the increment per curve for the first stage limit. The second stage limit is adjusted appropriately. Defaults are chosen for all arguments which are omitted.

FactorsECM recognizes the option ECMDeterministic. If set, the choice of the curves is deterministic. This means that in repeated runs of FactorsECM the same curves are used, and hence for the same n the same factors are found after the same number of trials.

The Elliptic Curves Method is based on the fact that exponentiation in the elliptic curve groups E(a,b)/n can be performed fast enough to compute for example g^k! for k large enough (e.g. 100000 or so) in a reasonable amount of time and without using much memory, and on Lagrange's Theorem. Assume that p is a prime divisor of n. Then Lagrange's Theorem states that if k! is a multiple of |E(a,b)/p|, then for any elliptic curve point g, the power g^k! is the identity element of E(a,b)/p. In this situation -- under reasonable circumstances -- the factor p can be determined by taking an appropriate gcd.

In practice, the algorithm chooses in some sense "better" products P_k of small primes rather than k! as exponents. After reaching the first stage limit with P_Limit1, it considers further products P_Limit1q for primes q up to the second stage limit Limit2, which is usually set equal to something like 100 times the first stage limit. The prime q corresponds to the largest prime factor of the order of the group E(a,b)/p.

A prime divisor p is usually found if the largest prime factor of the order of one of the examined elliptic curve groups E(a,b)/p is at most Limit2 and the second-largest one is at most Limit1. Thus trying a larger number of curves increases the chance of factoring n as well as choosing a larger value for Limit1 and/or Limit2. It turns out to be not optimal either to try a large number of curves with very small Limit1 and Limit2 or to try only a few curves with very large limits. (Compare with FactorsPminus1 (3.2-1).)

The elements of the group E(a,b)/n are the points (x,y) given by the solutions of y^2 = x^3 + ax + by in the residue class ring (mod n), and an additional point ∞ at infinity, which serves as identity element. To turn this set into a group, define the product (although elliptic curve groups are usually written additively, I prefer using the multiplicative notation here to retain the analogy to FactorsPminus1 (3.2-1) and FactorsPplus1 (3.3-1)) of two points p_1 and p_2 as follows: If p_1 ≠ p_2, let l be the line through p_1 and p_2, otherwise let l be the tangent to the curve C given by the above equation in the point p_1 = p_2. The line l intersects C in a third point, say p_3. If l does not intersect the curve in a third affine point, then set p_3 := ∞. Define p_1 ⋅ p_2 by the image of p_3 under the reflection across the x-axis. Define the product of any curve point p and ∞ by p itself. This -- more or less obviously, checking associativity requires some calculation -- turns the set of points on the given curve into an abelian group E(a,b)/n.

However, the calculations are done in projective coordinates to have an explicit representation of the identity element and to avoid calculating inverses (mod n) for the group operation. Otherwise this would require using an O((log n)^3)-algorithm, while multiplication (mod n) is only O((log n)^2). The corresponding equation is given by bY^2Z = X^3 + aX^2Z + XZ^2. This form allows even more efficient computations than the Weierstrass model Y^2Z = X^3 + aXZ^2 + bZ^3, which is the projective equivalent of the affine representation y^2 = x^3 + ax + by mentioned above. The algorithm only keeps track of two of the three coordinates, namely X and Z. The curves are chosen in a way that ensures the order of the corresponding group to be divisible by 12. This increases the chance that it is smooth enough to find a factor of n. The implementation follows the description of R. P. Brent given in [Bre96], pp. 5 -- 8. In terms of this paper, for the second stage the "improved standard continuation" is used.


gap> FactorsECM(2^256+1,100,10000,1000000,100);
[ [ 1238926361552897, 
      93461639715357977769163558199606896584051237541638188580280321 ]
    , [  ] ]

3.5 The Continued Fraction Algorithm (CFRAC)

3.5-1 FactorsCFRAC

‣ FactorsCFRAC( n ) ( function )

‣ CFRAC( n ) ( function )

This function tries to factor n using the Continued Fraction Algorithm (CFRAC), also known as Brillhart-Morrison Algorithm. In case of failure an error is signalled.

Caution: The run time of this function depends only on the size of n, and not on the size of the factors. Thus if a small factor is not found during the preprocessing which is done before invoking the sieving process, the run time is as long as if n would have two prime factors of roughly equal size.

The Continued Fraction Algorithm tries to find integers x and y such that x^2 ≡ y^2 (mod n), but not ± x ≡ ± y (mod n). In this situation, taking gcd(x - y,n) yields a nontrivial divisor of n. For determining such a pair (x,y), the algorithm uses the continued fraction expansion of the square root of n. If x_i/y_i is a continued fraction approximation of the square root of n, then c_i := x_i^2 - ny_i^2 is bounded by a small constant times the square root of n. The algorithm tries to find as many c_i as possible which factor completely over a chosen factor base (a list of small primes) or with only one factor not in the factor base. The latter ones can be used if and only if a second c_i with the same "large" factor is found. Once enough values c_i have been factored, as a final stage Gaussian Elimination over GF(2) is used to determine which of the congruences x_i^2 ≡ c_i (mod n) have to be multiplied together to obtain a congruence of the desired form x^2 ≡ y^2 (mod n). Let M be the corresponding matrix. Then the entries of M are given by M_ij = 1 if an odd power of the j-th element of the factor base divides the i-th usable factored value, and M_ij = 0 otherwise. To obtain the desired congruence, it is necessary that the rows of M are linearly dependent. In other words, this means that the number of factored c_i needs to be larger than the rank of M, which is approximately given by the size of the factor base. (Compare with FactorsMPQS (3.6-1).)


gap> FactorsCFRAC( Factorial(34) - 1 );
[ 10398560889846739639, 28391697867333973241 ]

3.6 The Multiple Polynomial Quadratic Sieve (MPQS)

3.6-1 FactorsMPQS

‣ FactorsMPQS( n ) ( function )

‣ MPQS( n ) ( function )

This function tries to factor n using the Single Large Prime Variation of the Multiple Polynomial Quadratic Sieve (MPQS). In case of failure an error is signalled.

The intermediate results of a computation can be saved by interrupting it with [Ctrl][C] and calling Pause(); from the break loop. This causes all data needed for resuming the computation again to be pushed as a record MPQSTmp on the options stack. When called again with the same argument n, FactorsMPQS takes the record from the options stack and continues with the previously computed factorization data. For continuing the factorization process in another session, one needs to write this record to a file. This can be done by the function SaveMPQSTmp(filename). The file written by this function can be read by the standard Read-function of GAP.

The Multiple Polynomial Quadratic Sieve tries to find integers x and y such that x^2 ≡ y^2 (mod n), but not ± x ≡ ± y (mod n). In this situation, taking gcd(x - y,n) yields a nontrivial divisor of n. For determining such a pair (x,y), the algorithm chooses polynomials f_a of the form f_a(r) = ar^2 + 2br + c with suitably chosen coefficients a, b and c which satisfy b^2 ≡ n (mod a) and c = (b^2 - n)/a. The identity a ⋅ f_a(r) = (ar + b)^2 - n yields a congruence (mod n) with a perfect square on one side and a ⋅ f_a(r) on the other. The algorithm uses a sieving technique similar to the Sieve of Eratosthenes over an appropriately chosen sieving interval to search for factorizations of values f_a(r) over a chosen factor base. Any two factorizations with the same single "large" factor which does not belong to the factor base can also be used. Taking more polynomials and hence shorter sieving intervals has the advantage of having to factor smaller values f_a(r) over the factor base.

Once enough values f_a(r) have been factored, as a final stage Gaussian Elimination over GF(2) is used to determine which congruences have to be multiplied together to obtain a congruence of the desired form x^2 ≡ y^2 (mod n). Let M be the corresponding matrix. Then the entries of M are given by M_ij = 1 if an odd power of the j-th element of the factor base divides the i-th usable factored value, and M_ij = 0 otherwise. To obtain the desired congruence, it is necessary that the rows of M are linearly dependent. In other words, this means that the number of usable factorizations of values f_a(r) needs to be larger than the rank of M. The latter is approximately equal to the size of the factor base. (Compare with FactorsCFRAC (3.5-1).)


gap> FactorsMPQS( Factorial(38) + 1 );
[ 14029308060317546154181, 37280713718589679646221 ]

3 The Routines for Specific Factorization Methods

3.1 Trial division

3.1-1 FactorsTD

3.2 Pollard's p-1

3.2-1 FactorsPminus1

3.3 Williams' p+1

3.3-1 FactorsPplus1

3.4 The Elliptic Curves Method (ECM)

3.4-1 FactorsECM

3.5 The Continued Fraction Algorithm (CFRAC)

3.5-1 FactorsCFRAC

3.6 The Multiple Polynomial Quadratic Sieve (MPQS)

3.6-1 FactorsMPQS