Acme::Tools - Lots of more or less useful subs lumped together and exported into your namespace
use Acme::Tools; print sum(1,2,3); # 6 print avg(2,3,4,6); # 3.75 print median(2,3,4,6); # 3.5 print percentile(25, 101..199); # 125 my @list = minus(\@listA, \@listB); # set operation my @list = union(\@listA, \@listB); # set operation print length(gzip("abc" x 1000)); # far less than 3000 writefile("/dir/filename",$string); # convenient my $s=readfile("/dir/filename"); # also convenient print "yes!" if between($PI,3,4); print percentile(0.05, @numbers); my @even = range(1000,2000,2); # even numbers between 1000 and 2000 my @odd = range(1001,2001,2); my $dice = random(1,6); my $color = random(['red','green','blue','yellow','orange']); pushr $arrayref[$num], @stuff; # push @{ $arrayref[$num] }, @stuff ... popr, shiftr, unshiftr print 2**200; # 1.60693804425899e+60 print big(2)**200; # 1606938044258990275541962092341162602522202993782792835301376 ...and much more.
About 120 more or less useful perl subroutines lumped together and exported into your namespace.
Subs created and collected since the mid-90s.
sudo cpan Acme::Tools sudo cpanm Acme::Tools # after: sudo apt-get install cpanminus make # for Ubuntu 12.04
Or to get the very newest:
git clone https://github.com/kjetillll/Acme-Tools.git cd Acme-Tools perl Makefile.PL make test sudo make install
Almost every sub, about 90 of them.
Beware of namespace pollution. But what did you expect from an Acme module?
See "code2num"
num2code() convert numbers (integers) from the normal decimal system to some arbitrary other number system. That can be binary (2), oct (8), hex (16) or others.
num2code()
Example:
print num2code(255,2,"0123456789ABCDEF"); # prints FF print num2code( 14,2,"0123456789ABCDEF"); # prints 0E
...because 255 are converted to hex FF (base length("0123456789ABCDEF") ) which is 2 digits of 0-9 or A-F. ...and 14 are converted to 0E, with leading 0 because of the second argument 2.
length("0123456789ABCDEF")
print num2code(1234,16,"01")
Prints the 16 binary digits 0000010011010010 which is 1234 converted to binary zeros and ones.
To convert back:
print code2num("0000010011010010","01"); #prints 1234
num2code() can be used to compress numeric IDs to something shorter:
$chars="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-_"; print num2code("241274432",5,$chars); # prints EOOv0 print code2num("EOOv0",$chars); # prints 241274432
Numbers in any number system of base between 2 and 36. Using capital letters A-Z for base higher than 10.
base(2,15) # 1111 2 --> binary base(8,4096) # 10000 8 --> octal base(10,4096) # 4096 of course base(16,254) # FE 16 --> hex base(16,254.3) # FE 16 --> hex, can not handle decimal numbers (yet...todo) base(36,123456) # FE 16 --> hex, can not handle decimal numbers (yet...todo) base(36,1234567891011) # FR5HUHC3 base36 using all 0-9 and A-Z as digits, 10+26=36 base(37,1) # die with message 'base not 2-36' base($x,0) # 0 base(16, 14,15,16,17) # list of four elements: E F 10 11
print dec2bin(101); # 1100101 print dec2hex(101); # 65 print dec2oct(101); # 145 print bin2dec(1010011110); # 670 print bin2hex(1010011110); # 29e print bin2oct(1010011110); # 1236 print hex2dec(101); # 257 print hex2bin(101); # 100000001 print hex2oct(101); # 401 print oct2dec(101); # 65 print oct2bin(101); # 1000001 print oct2hex(101); # 41
"The Euclidean algorithm (also called Euclid's algorithm) is an algorithm to determine the greatest common divisor (gcd) of two integers. It is one of the oldest algorithms known, since it appeared in the classic Euclid's Elements around 300 BC. The algorithm does not require factoring."
"
Input: two or more positive numbers (integers, without decimals that is)
Output: an integer
print gcd(12, 8); # prints 4
Because the (prime number) factors of 12 is 2 * 2 * 3 and the factors of 8 is 2 * 2 * 2 and the common ('overlapping') factors for both 12 and 8 is then 2 * 2 and the result becomes 4.
Example two:
print gcd(90, 135, 315); # prints 45 print gcd(2*3*3*5, 3*3*3*5, 3*3*5*7); # prints 45 ( = 3*3*5 which is common to all three args)
Implementation:
sub gcd { my($a,$b,@r)=@_; @r ? gcd($a,gcd($b,@r)) : $b==0 ? $a : gcd($b, $a % $b) }
One way of putting it: Keep replacing the larger of the two numbers with the difference between them until you got two equal numbers. Then thats the answer.
http://en.wikipedia.org/wiki/Greatest_common_divisor
http://en.wikipedia.org/wiki/Euclidean_algorithm
lcm() finds the Least Common Multiple of two or more numbers (integers).
lcm()
Input: two or more positive numbers (integers)
Output: an integer number
Example: 2/21 + 1/6 = 4/42 + 7/42 = 11/42
2/21 + 1/6 = 4/42 + 7/42 = 11/42
Where 42 = lcm(21,6).
print lcm(45,120,75); # prints 1800
Because the factors are:
45 = 2^0 * 3^2 * 5^1 120 = 2^3 * 3^1 * 5^1 75 = 2^0 * 3^1 * 5^2
Take the bigest power of each primary number (2, 3 and 5 here). Which is 2^3, 3^2 and 5^2. Multiplied this is 8 * 9 * 25 = 1800.
sub lcm { my($a,$b,@r)=@_; @r ? lcm($a,lcm($b,@r)) : $a*$b/gcd($a,$b) }
Seems to works with Math::BigInt as well: (lcm of all integers from 1 to 200)
lcm
perl -MAcme::Tools -MMath::BigInt -le'print lcm(map Math::BigInt->new($_),1..200)' 337293588832926264639465766794841407432394382785157234228847021917234018060677390066992000
Resolves an equation by Newtons method.
Input: 1-6 arguments. At least one argument.
First argument: must be a coderef to a subroutine (a function)
Second argument: if present, the target, f(x)=target. Default 0.
Third argument: a start position for x. Default 0.
Fourth argument: a small delta value. Default 1e-4 (0.0001).
Fifth argument: a maximum number of iterations before resolve gives up and carps. Default 100 (if fifth argument is not given or is undef). The number 0 means infinite here. If the derivative of the start position is zero or close to zero more iterations are typically needed.
Sixth argument: A number of seconds to run before giving up. If both fifth and sixth argument is given and > 0, resolve stops at whichever comes first.
resolve
Output: returns the number x for f(x) = 0
x
f(x)
...or equal to the second input argument if present.
The equation x^2 - 4x - 21 = 0 has two solutions: -3 and 7.
x^2 - 4x - 21 = 0
The result of resolve will depend on the start position:
print resolve(sub{ $_**2 - 4*$_ - 21 }); # -3 with $_ as your x print resolve(sub{ my $x=shift; $x**2 - 4*$x - 21 }); # -3 more elaborate call print resolve(sub{ my $x=shift; $x**2 - 4*$x - 21 },0,3); # 7 with start position 3 print "Iterations: $Acme::Tools::Resolve_iterations\n"; # 3 or larger, about 10-15 is normal
The variable $Acme::Tools::Resolve_iterations (which is exported) will be set to the last number of iterations resolve used. Also if resolve dies (carps).
$Acme::Tools::Resolve_iterations
The variable $Acme::Tools::Resolve_last_estimate (which is exported) will be set to the last estimate. This number will often be close to the solution and can be used even if resolve dies (carps).
$Acme::Tools::Resolve_last_estimate
BigFloat-example:
If either second, third or fourth argument is an instance of Math::BigFloat, so will the result be:
use Acme::Tools; my $equation = sub{ $_ - 1 - 1/$_ }; my $gr1 = resolve( $equation, 0, 1 ); # my $gr2 = resolve( $equation, 0, bigf(1) ); # 1/2 + sqrt(5)/2 bigscale(50); my $gr3 = resolve( $equation, 0, bigf(1) ); # 1/2 + sqrt(5)/2 print 1/2 + sqrt(5)/2, "\n"; print "Golden ratio 1: $gr1\n"; print "Golden ratio 2: $gr2\n"; print "Golden ratio 3: $gr3\n";
Output:
1.61803398874989 Golden ratio 1: 1.61803398874989 Golden ratio 2: 1.61803398874989484820458683436563811772029300310882395927211731893236137472439025 Golden ratio 3: 1.6180339887498948482045868343656381177203091798057610016490334024184302360920167724737807104860909804
See:
http://en.wikipedia.org/wiki/Newtons_method
Math::BigFloat
http://en.wikipedia.org/wiki/Golden_ratio
This prints 2:
print resolve_equation "x + 13*(3-x) = 17 - x"
A string containing at least one x is converted into a perl function. Then x is found by using resolve. The string conversion is done by replacing every x with $_ and if a = char is present it converts leftside = rightside into (leftside) - (rightside) = 0 which is the default behaviour of resolve.
=
leftside = rightside
(leftside) - (rightside) = 0
Converts between:
units of measurement
number systems
currencies
Examples:
print conv( 2000, "meters", "miles" ); #prints 1.24274238447467 print conv( 2.1, 'km', 'm'); #prints 2100 print conv( 70,"cm","in"); #prints 27.5590551181102 print conv( 4,"USD","EUR"); #prints 3.20481552905431 (depending on todays rates) print conv( 4000,"b","kb"); #prints 3.90625 (1 kb = 1024 bytes) print conv( 4000,"b","Kb"); #prints 4 (1 Kb = 1000 bytes) print conv( 1000,"mb","kb"); #prints 1024000 print conv( 101010,"bin","roman"); #prints XLII print conv( "DCCXLII","roman","oct"); #prints 1346
Units, types of measurement and currencies supported by conv are:
conv
Note: units starting with the symbol _ means that all metric prefixes from yocto 10^-24 to yotta 10^+24 is supported, so _m means km, cm, mm, µm and so on. And _N means kN, MN GN and so on.
Note2: Many units have synonyms: m, meter, meters ...
acceleration: g, g0, m/s2, mps2 angle: binary_degree, binary_radian, brad, deg, degree, degrees, gon, grad, grade, gradian, gradians, hexacontade, hour, new_degree, nygrad, point, quadrant, rad, radian, radians, sextant, turn area: a, ar, are, ares, bunder, ca, centiare, cho, cm2, daa, decare, decares, deciare, dekar, djerib, m2, dunam, dönöm, earths, feddan, ft2, gongqing, ha ha, hectare, hectares, hektar, jerib, km2, m2, manzana, mi2, mm2, mu, qing, rai, sotka, sqcm, sqft, sqkm, sqm, sqmi, sqmm stremmata, um2, µm2 bytes: Eb, Gb, Kb, KiB, Mb, Pb, Tb, Yb, Zb, b, byte, kb, kilobyte, mb, megabyte, gb, gigabyte, tb, terabyte, pb, petabyte, eb, exabyte, zb, zettabyte, yb, yottabyte charge: As, C, _e, coulomb, e current: A, _A, N/m2 energy: BTU, Btu, J, Nm, W/s, Wh, Wps, Ws, _J, _eV, cal, calorie, calories, eV, electronvolt, BeV, erg, ergs, foot-pound, foot-pounds, ftlb, joule, kWh, MWh, GWh, TWh kcal, kilocalorie, kilocalories, newtonmeter, newtonmeters, th, thermie force: N, _N, dyn, dyne, dynes, lb, newton length: NM, _m, _pc, astronomical unit, au, chain, ft, furlong, in, inch, inches, km, league, lightyear, ls, ly, m, meter, meters, mi, mil, mile, miles, nautical mile, nautical miles, nmi, parsec, pc, planck, yard, yard_imperical, yd, Å, ångstrøm, angstrom mass: Da, _eV, _g, bag, carat, ct, dwt, eV, electronvolt, g, grain, grains, gram, grams, kilo, kilos, kt, lb, lb_av, lb_t, lb_troy, lbs, ounce, ounce_av, ounce_troy, oz, oz_av, oz_t, pennyweight, pound, pound_av, pound_metric, pound_troy, pounds, pwt, seer, sl, slug, solar_mass, st, stone, t, tonn, tonne, tonnes, u, wey mileage: mpg, l/100km, l/km, l/10km, lp10km, l/mil, liter_pr_100km, liter_pr_km, lp100km money: AED, ARS, AUD, BGN, BHD, BND, BRL, BWP, CAD, CHF, CLP, CNY, COP, CZK, DKK, EUR, GBP, HKD, HRK, HUF, IDR, ILS, INR, IRR, ISK, JPY, KRW, KWD, KZT, LKR, LTL, LVL, LYD, MUR, MXN, MYR, NOK, NPR, NZD, OMR, PHP, PKR, PLN, QAR, RON, RUB, SAR, SEK, SGD, THB, TRY, TTD, TWD, USD, VEF, ZAR, BTC, LTC, mBTC, XBT Currency rates are automatically updated from the net at least every 24h since last update (on linux/cygwin). numbers: dec, hex, bin, oct, roman, dozen, doz, dz, dusin, gross, gro, gr, great_gross, small_gross (not supported: decimal numbers) power: BTU, BTU/h, BTU/s, BTUph, GWhpy, J/s, Jps, MWhpy, TWhpy, W, Whpy, _W, ftlb/min, ftlb/s, hk, hp, kWh/yr, kWhpy pressure: N/m2, Pa, _Pa, at, atm, bar, mbar, pascal, psi, torr radioactivity: Bq, becquerel, curie speed: _m/s, km/h, km/t, kmh, kmph, kmt, m/s, mi/h, mph, mps, kn, knot, knots, kt, kts, mach, machs, c, fps, ft/s, ftps temperature: C, F, K, celsius, fahrenheit, kelvin time: _s, biennium, century, d, day, days, decade, dy, fortnight, h, hour, hours, hr, indiction, jubilee, ke, lustrum, m, millennium, min, minute, minutes, mo, moment, mon, month, olympiad, quarter, s, season, sec, second, seconds, shake, tp, triennium, w, week, weeks, y, y365, ySI, ycommon, year, years, ygregorian, yjulian, ysideral, ytropical volume: l, L, _L, _l, cm3, m3, ft3, in3, liter, liters, litre, litres, gal, gallon, gallon_imp, gallon_uk, gallon_us, gallons, pint, pint_imp, pint_uk, pint_us, tsp, tablespoon, teaspoon, floz, floz_uk, therm, thm, fat, bbl, Mbbl, MMbbl, drum, container (or container20), container40, container40HC, container45HC
See: http://en.wikipedia.org/wiki/Units_of_measurement
Converts a number of bytes to something human readable.
Input 1: a number
Input 2: optionally the number of decimals if >1000 B. Default is 2.
Output: a string containing:
the number with a B behind if the number is less than 1000
the number divided by 1024 with two decimals and "kB" behind if the number is less than 1024*1000
the number divided by 1048576 with two decimals and "MB" behind if the number is less than 1024*1024*1000
the number divided by 1073741824 with two decimals and "GB" behind if the number is less than 1024*1024*1024*1000
the number divided by 1099511627776 with two decimals and "TB" behind otherwise
print bytes_readable(999); # 999 B print bytes_readable(1000); # 1000 B print bytes_readable(1001); # 0.98 kB print bytes_readable(1024); # 1.00 kB print bytes_readable(1153433.6); # 1.10 MB print bytes_readable(1181116006.4); # 1.10 GB print bytes_readable(1209462790553.6); # 1.10 TB print bytes_readable(1088516511498.24*1000); # 990.00 TB print bytes_readable(1088516511498.24*1000, 3); # 990.000 TB print bytes_readable(1088516511498.24*1000, 1); # 990.0 TB
Time written as 14h 37m is often more humanly comprehensible than 52620 seconds .
14h 37m
52620 seconds
print sec_readable( 0 ); # 0s print sec_readable( 0.0123 ); # 0.0123s print sec_readable(-0.0123 ); # -0.0123s print sec_readable( 1.23 ); # 1.23s print sec_readable( 1 ); # 1s print sec_readable( 9.87 ); # 9.87s print sec_readable( 10 ); # 10s print sec_readable( 10.1 ); # 10.1s print sec_readable( 59 ); # 59s print sec_readable( 59.123 ); # 59.1s print sec_readable( 60 ); # 1m 0s print sec_readable( 60.1 ); # 1m 0s print sec_readable( 121 ); # 2m 1s print sec_readable( 131 ); # 2m 11s print sec_readable( 1331 ); # 22m 11s print sec_readable(-1331 ); # -22m 11s print sec_readable( 13331 ); # 3h 42m print sec_readable( 133331 ); # 1d 13h print sec_readable( 1333331 ); # 15d 10h print sec_readable( 13333331 ); # 154d 7h print sec_readable( 133333331 ); # 4yr 82d print sec_readable( 1333333331 ); # 42yr 91d
Converts integers to roman numbers.
print int2roman(1234); # prints MCCXXXIV print int2roman(1971); # prints MCMLXXI
(Adapted subroutine from Peter J. Acklam, jacklam(&)math.uio.no)
I = 1 V = 5 X = 10 L = 50 C = 100 (centum) D = 500 M = 1000 (mille)
See also Roman.
See http://en.wikipedia.org/wiki/Roman_numbers for more.
roman2int("MCMLXXI") == 1971
Input: the four decimal numbers of two GPS positions: latutude1, longitude1, latitude2, longitude2
Output: the air distance in meters between the two points
Calculation is done using the Haversine Formula for spherical distance:
a = sin((lat2-lat1)/2)^2 + sin((lon2-lon1)/2)^2 * cos(lat1) * cos(lat2); c = 2 * atan2(min(1,sqrt(a)), min(1,sqrt(1-a))) distance = c * R
With earth radius set to:
R = Re - (Re-Rp) * sin(abs(lat1+lat2)/2)
Where Re = 6378137.0 (equatorial radius) and Rp = 6356752.3 (polar radius).
Re = 6378137.0
Rp = 6356752.3
my @oslo = ( 59.93937, 10.75135); # oslo in norway my @rio = (-22.97673, -43.19508); # rio in brazil printf "%.1f km\n", distance(@oslo,@rio)/1000; # 10431.7 km printf "%.1f km\n", distance(@rio,@oslo)/1000; # 10431.7 km printf "%.1f nmi\n", distance(@oslo,@rio)/1852.000; # 5632.7 nmi (nautical miles) printf "%.1f miles\n",distance(@oslo,@rio)/1609.344; # 6481.9 miles printf "%.1f miles\n",conv(distance(@oslo,@rio),"meters","miles");# 6481.9 miles
See http://www.faqs.org/faqs/geography/infosystems-faq/
and http://mathforum.org/library/drmath/view/51879.html
and http://en.wikipedia.org/wiki/Earth_radius
and Geo::Direction::Distance, but Acme::Tools::distance() is about 8 times faster.
big, bigi, bigf, bigr and bigscale are sometimes convenient shorthands for using Math::BigInt->new(), Math::BigFloat->new() and Math::BigRat->new() (preferably with the GMP for faster calculations). Examples:
Math::BigInt->new()
Math::BigFloat->new()
Math::BigRat->new()
my $num1 = big(3); #returns a new Math::BigInt-object my $num2 = big('3.0'); #returns a new Math::BigFloat-object my $num3 = big(3.0); #returns a new Math::BigInt-object my $num4 = big(3.1); #returns a new Math::BigFloat-object my $num5 = big('2/7'); #returns a new Math::BigRat-object my($i1,$f1,$i2,$f2) = big(3,'3.0',3.0,3.1); #returns the four new numbers, as the above four lines #uses wantarray print 2**200; # 1.60693804425899e+60 print big(2)**200; # 1606938044258990275541962092341162602522202993782792835301376 print 2**big(200); # 1606938044258990275541962092341162602522202993782792835301376 print big(2**200); # 1606938044258990000000000000000000000000000000000000000000000 print 1/7; # 0.142857142857143 print 1/big(7); # 0 because of integer arithmetics print 1/big(7.0); # 0 because 7.0 is viewed as an integer, see bigf below print 1/big('7.0'); # 0.1428571428571428571428571428571428571429 print 1/bigf(7); # 0.1428571428571428571428571428571428571429 print bigf(1/7); # 0.142857142857143 probably not what you wanted print 1/bigf(7); # 0.1428571428571428571428571428571428571429 bigscale(80); # for increased precesion (default is 40) print 1/bigf(7); # 0.14285714285714285714285714285714285714285714285714285714285714285714285714285714
In big() the characters . and / will make it return a Math::BigFloat- and Math::BigRat-object accordingly. Or else a Math::BigInt-object is returned.
big()
.
/
Instead of guessing, use bigi, bigf and bigr to return what you want.
bigi
bigf
bigr
Note: Acme::Tools does not depend on Math::BigInt and Math::BigFloat and GMP, but these four big*-subs do (by require). To use big, bigi, bigf and bigr effectively you should install Math::BigInt::GMP and Math::BigFloat::GMP like this:
require
sudo cpanm Math::BigFloat Math::GMP Math::BingInt::GMP # or sudo cpan Math::BigFloat Math::GMP Math::BingInt::GMP # or sudo yum install perl-Math-BigInt-GMP perl-Math-GMP # on RedHat, RHEL or sudo apt-get install libmath-bigint-gmp-perl libmath-gmp-perl # on Ubuntu or some other way
Unless GMP is installed for perl like this, the Math::Big*-modules will fall back to using similar but slower built in modules. See: https://gmplib.org/
Input: String to be tested on this regexp:
/^ \s* [\-\+]? (?: \d*\.\d+ | \d+ ) (?:[eE][\-\+]?\d+)?\s*$/x
If no argument is given isnum checks $_.
$_
Output: True or false (1 or 0)
use Acme::Tools; my @e=(' +32.12354E-21 ', 2.2, '9' x 99999, ' -123.12', '29,323.31', '29 323.31'); print isnum() ? 'num' : 'str' for @e; #prints num for every element except the last two print $_=~$Re_isnum ? 'num' : 'str' for @e; #same but slighhly faster
Input: Three arguments.
Returns: Something true if the first argument is numerically between the two next. Uses Perls <, >= and <= operators.
<
>=
<=
Like between but instead of assuming numbers it checks all three input args and does alphanumerical comparisons (with Perl operators lt, ge and le) if any of the three input args don't look like a number or look like a number but with one or more leading zeros.
lt
ge
le
btw(1,1,10) #true numeric order since all three looks like number according to =~$Re_isnum btw(1,'02',13) #true leading zero in '02' leads to alphabetical order btw(10, 012,10) #true leading zero here means oct number, 012 = 10 (8*1+2), so 10 is btw 10 and 10 btw('003', '02', '09') #false because '003' lt '02' btw('a', 'b', 'c') #false because 'a' lt 'b' btw('a', 'B', 'c') #true because upper case letters comes before lower case ones in the "ascii alphabet" btw('a', 'c', 'B') #true, btw() and between switches from and to if the first is > the second btw( -1, -2, 1) #true btw( -1, -2, 0) #true
Both between and btw returns undef if any of the three input args are undef (not defined). If you're doing only numerical comparisons, using between is faster than btw.
undef
between
btw
Input: Three arguments: value, minumum, maximum.
Output: Returns the value if its between the given minumum and maximum. Returns minimum if the value is less or maximum if the value is more. Changes the variable if 1st arg is a scalarref.
my $enthusiasm = 11; print curb( $enthusiasm, 1, 20 ); # prints 11, within bounds print curb( $enthusiasm, 1, 10 ); # prints 10 print curb( $enthusiasm, 20, 100 ); # prints 20 print curb(\$enthusiasm, 1, 10 ); # prints 10 and sets $enthusiasm = 10 print $enthusiasm; # prints 10
print log10(1000); # prints 3 print log10(10000*sqtr(10)); # prints 4.5 print log2(16); # prints 4 print logn(4096, 8); # prints 4 (12/3=4) print logn($PI, 2.71828182845905); # same as print log($PI) using perls builtin log()
Returns input string as uppercase or lowercase.
Can be used if Perls build in uc() and lc() for some reason does not convert æøå or other latin1 letters outsize a-z.
uc()
lc()
Converts æøåäëïöüÿâêîôûãõàèìòùáéíóúýñð to and from ÆØÅÄËÏÖÜ?ÂÊÎÔÛÃÕÀÈÌÒÙÁÉÍÓÚÝÑÐ
æøåäëïöüÿâêîôûãõàèìòùáéíóúýñð
ÆØÅÄËÏÖÜ?ÂÊÎÔÛÃÕÀÈÌÒÙÁÉÍÓÚÝÑÐ
See also perldoc -f uc and perldoc -f lc
perldoc -f uc
perldoc -f lc
Removes space from the beginning and end of a string. Whitespace (\s) that is. And removes any whitespace inside the string of more than one char, leaving the first whitespace char. Thus:
\s
trim(" asdf \t\n 123 ") eq "asdf 123" trim(" asdf\t\n 123\n") eq "asdf\t123"
Works on $_ if no argument i given:
print join",", map trim, " please ", " remove ", " my ", " spaces "; # please,remove,my,spaces print join",", trim(" please ", " remove ", " my ", " spaces "); # works on arrays as well my $s=' please '; trim(\$s); # now $s eq 'please' trim(\@untrimmedstrings); # trims array strings inplace @untrimmedstrings = map trim, @untrimmedstrings; # same, works on $_ trim(\$_) for @untrimmedstrings; # same, works on \$_
Left or right pads a string to the given length by adding one or more spaces at the end for rpad or at the start for lpad.
Input: First argument: string to be padded. Second argument: length of the output. Optional third argument: character(s) used to pad. Default is space.
rpad('gomle',9); # 'gomle ' lpad('gomle',9); # ' gomle' rpad('gomle',9,'-'); # 'gomle----' lpad('gomle',9,'+'); # '++++gomle' rpad('gomle',4); # 'goml' lpad('gomle',4); # 'goml' rpad('gomle',7,'xyz'); # 'gomlxy' lpad('gomle',10,'xyz'); # 'xyzxygoml'
Center pads. Pads the string both on left and right equal to the given length. Centers the string. Pads right side first.
cpad('mat',5) eq ' mat ' cpad('mat',4) eq 'mat ' cpad('mat',6) eq ' mat ' cpad('mat',9) eq ' mat ' cpad('mat',5,'+') eq '+mat+' cpad('MMMM',20,'xyzXYZ') eq 'xyzXYZxyMMMMxyzXYZxy'
Input: A string (i.e. a name). And an optional x (see example 2)
Output: A list of this strings trigrams (See examlpe)
Example 1:
print join ", ", trigram("Kjetil Skotheim");
Prints:
Kje, jet, eti, til, il , l S, Sk, Sko, kot, oth, the, hei, eim
Example 2:
Default is 3, but here 4 is used instead in the second optional input argument:
print join ", ", trigram("Kjetil Skotheim", 4);
And this prints:
Kjet, jeti, etil, til , il S, l Sk, Sko, Skot, koth, othe, thei, heim
trigram() was created for "fuzzy" name searching. If you have a database of many names, addresses, phone numbers, customer numbers etc. You can use trigram() to search among all of those at the same time. If the search form only has one input field. One general search box.
trigram()
Store all of the trigrams of the trigram-indexed input fields coupled with each person, and when you search, you take each trigram of you query string and adds the list of people that has that trigram. The search result should then be sorted so that the persons with most hits are listed first. Both the query strings and the indexed database fields should have a space added first and last before trigram()-ing them.
This search algorithm is not includes here yet...
trigram() should perhaps have been named ngram for obvious reasons.
Same as trigram (except there is no default width). Works also with arrayref instead of string.
sliding( ["Reven","rasker","over","isen"], 2 )
Result:
( ['Reven','rasker'], ['rasker','over'], ['over','isen'] )
Splits strings and arrays into chunks of given size:
my @a = chunks("Reven rasker over isen",7); my @b = chunks([qw/Og gubben satt i kveldinga og koste seg med skillinga/], 3);
Resulting arrays:
( 'Reven r', 'asker o', 'ver ise', 'n' ) ( ['Og','gubben','satt'], ['i','kveldinga','og'], ['koste','seg','med'], ['skillinga'] )
chars("Tittentei"); # ('T','i','t','t','e','n','t','e','i')
Synonym for replace().
Return the string in the first input argument, but where pairs of search-replace strings (or rather regexes) has been run.
Works as replace() in Oracle, or rather regexp_replace() in Oracle 10 and onward. Except that this replace() accepts more than three arguments.
replace()
print replace("water","ater","ine"); # Turns water into wine print replace("water","ater"); # w print replace("water","at","eath"); # weather print replace("water","wa","ju", "te","ic", "x","y", # No x is found, no y is returned 'r$',"e"); # Turns water into juice. 'r$' says that the r it wants # to change should be the last letters. This reveals that # second, fourth, sixth and so on argument is really regexs, # not normal strings. So use \ (or \\ inside "") to protect # the special characters of regexes. You probably also # should write qr/regexp/ instead of 'regexp' if you make # use of regexps here, just to make it more clear that # these are really regexps, not strings. print replace('JACK and JUE','J','BL'); # prints BLACK and BLUE print replace('JACK and JUE','J'); # prints ACK and UE print replace("abc","a","b","b","c"); # prints ccc (not bcc)
If the first argument is a reference to a scalar variable, that variable is changed "in place".
my $str="test"; replace(\$str,'e','ee','s','S'); print $str; # prints teeSt
The equivalent of substr on arrays or splice without changing the array. Input: 1) array or arrayref, 2) offset and optionally 3) length. Without a third argument, subarr returns the rest of the array.
substr
splice
@top10 = subarr( @array, 0, 10); # first 10 @last_two = subarr( @array, -2, 2); # last 2 @last_two = subarr( $array_ref, -2); # also last 2 @last_six = subarr $array_ref, -6; # parens are optional
The same can be obtained from @array[$from..$to] but that dont work the same way with negative offsets and boundary control of length.
@array[$from..$to]
Returns the smallest number in a list. Undef is ignored.
@lengths=(2,3,5,2,10,undef,5,4); $shortest = min(@lengths); # returns 2
Note: The comparison operator is perls <> which means empty strings is treated as 0, the number zero. The same goes for max(), except of course > is used instead.
0
max()
>
min(3,4,5) # 3 min(3,4,5,undef) # 3 min(3,4,5,'') # returns the empty string
Returns the largest number in a list. Undef is ignored.
@heights=(123,90,134,undef,132); $highest = max(@heights); # 134
Just as "min", except for strings.
print min(2,7,10); # 2 print mins("2","7","10"); # 10 print mins(2,7,10); # 10
Just as "mix", except for strings.
print max(2,7,10); # 10 print maxs("2","7","10"); # 7 print maxs(2,7,10); # 7
Input: Two or more arrayrefs. A number of equal sized arrays containing numbers, strings or anything really.
Output: An array of those input arrays zipped (interlocked, merged) into each other.
print join " ", zip( [1,3,5], [2,4,6] ); # 1 2 3 4 5 6 print join " ", zip( [1,4,7], [2,5,8], [3,6,9] ); # 1 2 3 4 5 6 7 8 9
zip() creates a hash where the keys are found in the first array and values in the secord in the correct order:
my @media = qw/CD DVD VHS LP Blueray/; my @count = qw/20 12 2 4 3/; my %count = zip(\@media,\@count); # or zip( [@media], [@count] ) print "I got $count{DVD} DVDs\n"; # I got 12 DVDs
Dies (croaks) if the two lists are of different sizes
...or any input argument is not an array ref.
Input: Two or more strings
Output: A number 0 - 1 indicating the similarity between two strings.
Requires String::Similarity where the real magic happens.
sim("Donald Duck", "Donald E. Knuth"); # returns 0.615 sim("Kalle Anka", "Kalle And")' # returns 0.842 sim("Kalle Anka", "Kalle Anka"); # returns 1 sim("Kalle Anka", "kalle anka"); # returns 0.8 sim(map lc, "Kalle Anka", "kalle anka"); # returns 1
Todo: more doc
Input: Two strings
Output: A number 0 - 1 indicating the maximum similarity between two strings tested against all permutations of both strings split on [\s,]+ and where the string with most words (i.e. names) are cut to as many words as the one with least words.
[\s,]+
While sim() is case sensitive, sim_perm() is not.
Name1 Name2 sim() sim_perm() ---------------------------------- ------------------------------------- ----- ---------- Humphrey DeForest Bogart Bogart Humphrey DeForest 0.71 1.00 Humphrey Bogart Humphrey Gump Bogart 0.86 1.00 Humphrey deforest Bogart Bogart DeForest 0.41 1.00 Humfrey DeForest Boghart BOGART HUMPHREY 0.05 0.87 Humphrey Bogart Humphrey 0.70 1.00 Humfrey Deforest Boghart BOGART D. HUMFREY 0.15 0.78 *) Presley, Elvis Aaron Elvis Presley 0.424242 1.00
sim_perm() was written to identify double-profiles in databases: two people with either the same (or similar) email or phone number or zip code and similar enough names are going on the list of probable doubles.
*) Todo: deal with initials better, should be higher than 0.78
Adds one or more element to a numerically sorted array and keeps it sorted.
pushsort @a, 13; # this... push @a, 13; @a = sort {$a<=>$b} @a; # is the same as this, but the former is faster if @a is large
Same as pushsort except that the array is kept sorted alphanumerically (cmp) instead of numerically (<=>). See "pushsort".
pushsort @a, "abc"; # this... push @a, "abc"; @a = sort @a; # is the same as this, but the former is faster if @a is large
Returns the position of an element in a numerically sorted array. Returns undef if the element is not found.
Input: Two, three or four arguments
First argument: the element to find. Usually a number.
Second argument: a reference to the array to search in. The array should be sorted in ascending numerical order (se exceptions below).
Third argument: Optional. Default false.
If true, whether result not found should return undef or a fractional position.
If the third argument is false binsearch returns undef if the element is not found.
If the third argument is true binsearch returns 0.5 plus closest position below the searched value.
Returns last position + 0.5 if the searched element is greater than all elements in the sorted array.
last position + 0.5
Returns -0.5 if the searched element is less than all elements in the sorted array.
-0.5
Fourth argument: Optional. Default sub { $_[0] <=> $_[1] }.
sub { $_[0] <=> $_[1] }
If present, the fourth argument is either:
a code-ref that alters the way binsearch compares two elements, default is sub{$_[0]<=>$_[1]}
sub{$_[0]<=>$_[1]}
a string that works as a hash key (column name), see example below
binsearch(10,[5,10,15,20]); # 1 binsearch(10,[20,15,10,5],undef,sub{$_[1]<=>$_[0]}); # 2 search arrays sorted numerically in opposite order binsearch("c",["a","b","c","d"],undef,sub{$_[0]cmp$_[1]}); # 2 search arrays sorted alphanumerically binsearchstr("b",["a","b","c","d"]); # 1 search arrays sorted alphanumerically my @data=( map { {num=>$_, sqrt=>sqrt($_), square=>$_**2} } grep !$_%7, 1..1000000 ); my $i = binsearch( {num=>913374}, \@data, undef, sub {$_[0]{num} <=> $_[1]{num}} ); my $i = binsearch( {num=>913374}, \@data, undef, 'num' ); #same as previous line my $found_hashref = defined $i ? $data[$i] : undef;
Same as binsearch except that the arrays is sorted alphanumerically (cmp) instead of numerically (<=>) and the searched element is a string, not a number. See "binsearch".
Input: Two or three arguments. N and an arrayref for the list to look at.
In scalar context: Returns the nth smallest number in an array. The array doesn't have to be sorted.
In array context: Returns the n smallest numbers in an array.
To return the n(th) largest number(s) instead of smallest, just negate n.
An optional third argument can be a sub that is used to compare the elements of the input array.
my $second_smallest = rank(2, [11,12,13,14]); # 12 my @top10 = rank(-10, [1..100]); # 100, 99, 98, 97, 96, 95, 94, 93, 92, 91 my $max = rank(-1, [101,102,103,102,101]); #103 my @contest = ({name=>"Alice",score=>14},{name=>"Bob",score=>13},{name=>"Eve",score=>12}); my $second = rank(2, \@contest, sub{$_[1]{score}<=>$_[0]{score}})->{name}; #Bob
Just as rank but sorts alphanumerically (strings, cmp) instead of numerically.
rank
Extended grep.
Works like grep but with more insight: local vars $i, $n, $prev, $next, $prevr and $nextr are available:
$i is the current index, starts with 0, ends with the length of the input array minus one
$n is the current element number, starts with 1, $n = $i + 1
$prev is the previous value (undef if current is first)
$next is the next value (undef if current is last)
$prevr is the previous value, rotated so that the previous of the first element is the last element
$nextr is the next value, rotated so that the next of the last element is the first element
$_ is the current value, just as with Perls built-in grep
my @a = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20); # 1..20 my @r = egrep { $_ % 3 == 0 } @a; # @r is 3, 6, 9, 12, 15, 18. Plain grep could have been used here my @r = egrep { $i==1 or $next==12 or $prev==14 } @a; # @r is now 2, 11, 15 my @a=2..44; egrep { $prev =~/4$/ or $next =~/2$/ } @a; # 5, 11, 15, 21, 25, 31, 35, 41 egrep { $prevr=~/4$/ or $nextr=~/2$/ } @a; # 2, 5, 11, 15, 21, 25, 31, 35, 41, 44 egrep { $i%7==0 } @a; # 2, 9, 16, 23, 30, 37, 44 egrep { $n%7==0 } @a; # 8, 15, 22, 29, 36, 43
Input: Two or more references to arrays.
Output: True (1) or false (0) for whether or not the arrays are numerically and alphanumerically equal. Comparing each element in each array with both == and eq .
==
eq
eqarr([1,2,3],[1,2,3],[1,2,3]); # 1 (true) eqarr([1,2,3],[1,2,3],[1,2,4]); # 0 (false) eqarr([1,2,3],[1,2,3,4]); # undef (different size, false) eqarr([1,2,3]); # croak (should be two or more arrays) eqarr([1,2,3],1,2,3); # croak (not arraysrefs)
Return true if the input array is numerically sorted.
@a=(1..10); print "array is sorted" if sorted @a; #true
Optionally the last argument can be a comparison sub:
@person=({Rank=>1,Name=>'Amy'}, {Rank=>2,Name=>'Paula'}, {Rank=>3,Name=>'Ruth'}); print "Persons are sorted" if sorted @person, sub{$_[0]{Rank}<=>$_[1]{Rank}};
Return true if the input array is alphanumerically sorted.
@a=(1..10); print "array is sorted" if sortedstr @a; #false @a=("01".."10"); print "array is sorted" if sortedstr @a; #true
Returns all 2^n-1 combinatory subarrays of an array where each element of input array participates or not. Note: The empty array is not among the returned arrayrefs unless an empty input is given.
my @a = subarrays( 'a', 'b', 'c' ); # same as: my @a = ( ['a' ], [ 'b'], ['a','b'], [ 'c'], ['a', 'c'], [ 'b','c'], ['a','b','c'] ); sub subarrays { map { my $n = 2*$_; [ grep {($n/=2)%2} @_ ] } 1 .. 2**@_-1 } #implemented as
Input: A code-ref and a list
Output: Two array-refs
Like grep but returns the false list as well. Partitions a list into two lists where each element goes into the first or second list whether the predicate (a code-ref) is true or false for that element.
grep
my( $odd, $even ) = part {$_%2} (1..8); print for @$odd; #prints 1 3 5 7 print for @$even; #prints 2 4 6 8
(Works like partition() in the Scala programming language)
partition()
Like part but returns any number of lists. Not just two. Sort of like group by in SQL.
part
Output: A hash where the returned values from the code-ref are keys and the values are arrayrefs to the list elements which gave those keys.
my %hash = parth { uc(substr($_,0,1)) } ('These','are','the','words','of','this','array'); print serialize(\%hash);
%hash = ( T=>['These','the','this'], A=>['are','array'], O=>['of'], W=>['words'] )
Like parth but returns an array of lists where the predicate returns an index number.
my @a = parta { length } qw/These are the words of this array/;
@a = ( undef, undef, ['of'], ['are','the'], ['this'], ['These','words','array'] )
Two undefs at first (index positions 0 and 1) since there are no words of length 0 or 1 in the input array.
Returns true or false (1 or 0) if the argument is an arrayref, hashref, scalarref, ref to an array of arrays, ref to an array of hashes
my $ref_to_array = [1,2,3]; my $ref_to_hash = {1,100,2,200,3,300}; my $ref_to_scalar = \"String"; print "arrayref" if ref($ref_to_array) eq 'ARRAY'; #true print "hashref" if ref($ref_to_hash) eq 'HASH'; #true print "scalarref" if ref($ref_to_scalar) eq 'SCALAR'; #true print "arrayref" if refa($ref_to_array); #also true, without: eq 'ARRAY' print "hashref" if refh($ref_to_hash); #also true, without: eq 'HASH' print "scalarref" if refs($ref_to_scalar); #also true, without: eq 'SCALAR' my $ref_to_array_of_arrays = [ [1,2,3], [2,4,8], [10,100,1000] ]; my $ref_to_array_of_hashes = [ {1=>10, 2=>100}, {first=>1, second=>2} ]; my $ref_to_hash_of_arrays = { alice=>[1,2,3], bob=>[2,4,8], eve=>[10,100,1000] }; my $ref_to_hash_of_hashes = { alice=>{a=>22,b=>11}, bob=>{a=>33,b=>66} }; print "aa" if refaa($ref_to_array_of_arrays); #true print "ah" if refah($ref_to_array_of_hashes); #true print "ha" if refha($ref_to_hash_of_arrays); #true print "hh" if refhh($ref_to_hash_of_hashes); #true
In Perl versions 5.12 - 5.22 push, pop, shift, unshift, splice, keys, values and each handled references to arrays and references to hashes just as if they where arrays and hashes. Examples:
my $person={name=>'Gaga', array=>[1,2,3]}; push $person{array} , 4; #works in perl 5.12-5.22 but not before and after push @{ $person{array} }, 4; #works in all perl5 versions pushr $person{array} , 4; #use Acme::Tools and this should work in perl >= 5.8 popr $person{array}; #returns 4
Input: a pile size s and a list
Output: A list of lists of length s or the length of the remainer in the last list. Piles together the input list in lists of the given size.
my @list=(1,2,3,4,5,6,7,8,9,10); my @piles = pile(3, @list ); # ([1,2,3], [4,5,6], [7,8,9], [10]) my $i=0; my @piles = parta {$i++/3} @list; # same as above pile(3, @list)
my @oceania=( {Area=>undef, Capital=>'Pago Pago', Code=>'AS', Name=>'American Samoa', Population=>54343}, {Area=>7686850, Capital=>'Canberra', Code=>'AU', Name=>'Australia', Population=>22751014}, {Area=>undef, Capital=>'West Island', Code=>'CC', Name=>'Cocos (Keeling) Islands', Population=>596}, {Area=>240, Capital=>'Avarua', Code=>'CK', Name=>'Cook Islands', Population=>9838}, {Area=>undef, Capital=>'Flying Fish Cove', Code=>'CX', Name=>'Christmas Island', Population=>1530}, {Area=>18270, Capital=>'Suva', Code=>'FJ', Name=>'Fiji', Population=>909389}, {Area=>702, Capital=>'Palikir', Code=>'FM', Name=>'Micronesia, Federated States of', Population=>105216}, {Area=>549, Capital=>'Hagatna (Agana)', Code=>'GU', Name=>'Guam', Population=>161785}, {Area=>811, Capital=>'Tarawa', Code=>'KI', Name=>'Kiribati', Population=>105711}, {Area=>181.3, Capital=>'Majuro', Code=>'MH', Name=>'Marshall Islands', Population=>72191}, {Area=>19060, Capital=>'Noumea', Code=>'NC', Name=>'New Caledonia', Population=>271615}, {Area=>undef, Capital=>'Kingston', Code=>'NF', Name=>'Norfolk Island', Population=>2210}, {Area=>21, Capital=>'Yaren District', Code=>'NR', Name=>'Nauru', Population=>9540}, {Area=>260, Capital=>'Alofi', Code=>'NU', Name=>'Niue', Population=>1190}, {Area=>268680, Capital=>'Wellington', Code=>'NZ', Name=>'New Zealand', Population=>4438393}, {Area=>undef, Capital=>'Papeete', Code=>'PF', Name=>'French Polynesia', Population=>282703}, {Area=>462840, Capital=>'Port Moresby', Code=>'PG', Name=>'Papua New Guinea', Population=>6672429}, {Area=>undef, Capital=>'Adamstown', Code=>'PN', Name=>'Pitcairn', Population=>48}, {Area=>458, Capital=>'Melekeok', Code=>'PW', Name=>'Palau', Population=>21265}, {Area=>28450, Capital=>'Honiara', Code=>'SB', Name=>'Solomon Islands', Population=>622469}, {Area=>undef, Capital=>undef, Code=>'TK', Name=>'Tokelau', Population=>1337}, {Area=>26, Capital=>'Funafuti', Code=>'TV', Name=>'Tuvalu', Population=>10869}, {Area=>12200, Capital=>'Port-Vila', Code=>'VU', Name=>'Vanuatu', Population=>272264}, {Area=>undef, Capital=>'Mata-Utu', Code=>'WF', Name=>'Wallis and Futuna', Population=>15500}, {Area=>2944, Capital=>'Apia', Code=>'WS', Name=>'Samoa (Western)', Population=>197773} ); print aoh2sql(\@oceania,{ name=>'country', drop=>2, #number=>'numeric', #default #varchar=>'varchar', #default, change to varchar2 if Oracle #date=>'date', #default, perhaps change to 'timestamp with time zone' if postgres #varchar_maxlen=>4000, #default, 4000 (used to be?) is max in Oracle #create=>1, #default, use 0 to dont include create table #drop=>0, #default 0: dont include drop table x; 1: drop table x; 2: drop table if exists x; #end=>"commit;\n", #begin=>"begin;\n", #fix_colnames=>0, });
Returns:
begin; drop table if exists country; create table country ( Area numeric(9,1), Capital varchar(16), Code varchar(2) not null, Name varchar(36) not null, Population numeric(9) ); insert into country values (null,'Pago Pago','AS','American Samoa',54343); insert into country values (7686850,'Canberra','AU','Australia',22751014); insert into country values (null,'West Island','CC','Cocos (Keeling) Islands',596); insert into country values (240,'Avarua','CK','Cook Islands',9838); insert into country values (null,'Flying Fish Cove','CX','Christmas Island',1530); insert into country values (18270,'Suva','FJ','Fiji',909389); insert into country values (702,'Palikir','FM','Micronesia, Federated States of',105216); insert into country values (549,'Hagatna (Agana)','GU','Guam',161785); insert into country values (811,'Tarawa','KI','Kiribati',105711); insert into country values (181.3,'Majuro','MH','Marshall Islands',72191); insert into country values (19060,'Noumea','NC','New Caledonia',271615); insert into country values (null,'Kingston','NF','Norfolk Island',2210); insert into country values (21,'Yaren District','NR','Nauru',9540); insert into country values (260,'Alofi','NU','Niue',1190); insert into country values (268680,'Wellington','NZ','New Zealand',4438393); insert into country values (null,'Papeete','PF','French Polynesia',282703); insert into country values (462840,'Port Moresby','PG','Papua New Guinea',6672429); insert into country values (null,'Adamstown','PN','Pitcairn',48); insert into country values (458,'Melekeok','PW','Palau',21265); insert into country values (28450,'Honiara','SB','Solomon Islands',622469); insert into country values (null,null,'TK','Tokelau',1337); insert into country values (26,'Funafuti','TV','Tuvalu',10869); insert into country values (12200,'Port-Vila','VU','Vanuatu',272264); insert into country values (null,'Mata-Utu','WF','Wallis and Futuna',15500); insert into country values (2944,'Apia','WS','Samoa (Western)',197773); commit;
Returns the sum of a list of numbers. Undef is ignored.
print sum(1,3,undef,8); # 12 print sum(1..1000); # 500500 print sum(undef); # undef
Returns the average number of a list of numbers. That is sum / count
sum / count
print avg( 2, 4, 9); # 5 (2+4+9) / 3 = 5 print avg( [2, 4, 9] ); # 5 pass by reference, same result but faster for large arrays
Also known as arithmetic mean.
Pass by reference: If one argument is given and it is a reference to an array, this array is taken as the list of numbers. This mode is about twice as fast for 10000 numbers or more. It most likely also saves memory.
Returns the geometric average (a.k.a geometric mean) of a list of numbers.
print geomavg(10,100,1000,10000,100000); # 1000 print 0+ (10*100*1000*10000*100000) ** (1/5); # 1000 same thing print exp(avg(map log($_),10,100,1000,10000,100000)); # 1000 same thing, this is how geomavg() works internally
Returns the harmonic average (a.k.a geometric mean) of a list of numbers. http://en.wikipedia.org/wiki/Harmonic_mean
print harmonicavg(10,11,12); # 3 / ( 1/10 + 1/11 + 1/12) = 10.939226519337
variance = ( sum (x[i]-Average)**2)/(n-1)
Standard_Deviation = sqrt(variance)
Standard deviation (stddev) is a measurement of the width of a normal distribution where one stddev on each side of the mean covers 68% and two stddevs 95%. Normal distributions are sometimes called Gauss curves or Bell shapes. https://en.wikipedia.org/wiki/Standard_deviation
stddev(4,5,6,5,6,4,3,5,5,6,7,6,5,7,5,6,4) # = 1.0914103126635 avg(@testscores) + stddev(@testscores) # = the score for one stddev above avg, 115 avg(@testscores) - stddev(@testscores) # = the score for one stddev below avg, 85
Relative stddev = stddev / avg
Returns the median value of a list of numbers. The list do not have to be sorted.
Example 1, list having an odd number of numbers:
print median(1, 100, 101); # 100
100 is the middlemost number after sorting.
Example 2, an even number of numbers:
print median(1005, 100, 101, 99); # 100.5
100.5 is the average of the two middlemost numbers.
Returns one or more percentiles of a list of numbers.
Percentile 50 is the same as the median, percentile 25 is the first quartile, 75 is the third quartile.
Input:
First argument is your wanted percentile, or a refrence to a list of percentiles you want from the dataset.
If the first argument to percentile() is a scalar, this percentile is returned.
If the first argument is a reference to an array, then all those percentiles are returned as an array.
Second, third, fourth and so on argument are the numbers from which you want to find the percentile(s).
This finds the 50-percentile (the median) to the four numbers 1, 2, 3 and 4:
print "Median = " . percentile(50, 1,2,3,4); # 2.5
This:
@data=(11, 5, 3, 5, 7, 3, 1, 17, 4, 2, 6, 4, 12, 9, 0, 5); @p = map percentile($_,@data), (25, 50, 75);
Is the same as this:
@p = percentile([25, 50, 75], @data);
But the latter is faster, especially if @data is large since it sorts the numbers only once internally.
Data: 1, 4, 6, 7, 8, 9, 22, 24, 39, 49, 555, 992
Average (or mean) is 143
Median is 15.5 (which is the average of 9 and 22 who both equally lays in the middle)
The 25-percentile is 6.25 which are between 6 and 7, but closer to 6.
The 75-percentile is 46.5, which are between 39 and 49 but close to 49.
Linear interpolation is used to find the 25- and 75-percentile and any other x-percentile which doesn't fall exactly on one of the numbers in the set.
Interpolation:
As you saw, 6.25 are closer to 6 than to 7 because 25% along the set of the twelve numbers is closer to the third number (6) than to he fourth (7). The median (50-percentile) is also really interpolated, but it is always in the middle of the two center numbers if there are an even count of numbers.
However, there is two methods of interpolation:
Example, we have only three numbers: 5, 6 and 7.
Method 1: The most common is to say that 5 and 7 lays on the 25- and 75-percentile. This method is used in Acme::Tools.
Method 2: In Oracle databases the least and greatest numbers always lay on the 0- and 100-percentile.
As an argument on why Oracles (and others?) definition is not the best way is to look at your data as for instance temperature measurements. If you place the highest temperature on the 100-percentile you are sort of saying that there can never be a higher temperatures in future measurements.
A quick non-exhaustive Google survey suggests that method 1 here is most used.
The larger the data sets, the less difference there is between the two methods.
Extrapolation:
In method one, when you want a percentile outside of any possible interpolation, you use the smallest and second smallest to extrapolate from. For instance in the data set 5, 6, 7, if you want an x-percentile of x < 25, this is below 5.
5, 6, 7
If you feel tempted to go below 0 or above 100, percentile() will die (or croak to be more precise)
percentile()
Another method could be to use "soft curves" instead of "straight lines" in interpolation. Maybe B-splines or Bezier curves. This is not used here.
For large sets of data Hoares algorithm would be faster than the simple straightforward implementation used in percentile() here. Hoares don't sort all the numbers fully.
Differences between the two main methods described above:
Data: 1, 4, 6, 7, 8, 9, 22, 24, 39, 49, 555, 992 Percentile Method 1 Method 2 (Acme::Tools::percentile (Oracle) and others) ------------- ----------------------------- --------- 0 -2 1 1 -1.61 1.33 25 6.25 6.75 50 (median) 15.5 15.5 75 46.5 41.5 99 1372.19 943.93 100 1429 992
Found like this:
perl -MAcme::Tools -le 'print for percentile([0,1,25,50,75,99,100], 1,4,6,7,8,9,22,24,39,49,555,992)'
And like this in Oracle-databases:
select percentile_cont(0.00) within group(order by n) per0, percentile_cont(0.01) within group(order by n) per1, percentile_cont(0.25) within group(order by n) per25, percentile_cont(0.50) within group(order by n) per50, percentile_cont(0.75) within group(order by n) per75, percentile_cont(0.99) within group(order by n) per99, percentile_cont(1.00) within group(order by n) per100 from ( select 0+regexp_substr('1,4,6,7,8,9,22,24,39,49,555,992','[^,]+',1,i) n from dual,(select level i from dual connect by level <= 12) );
(Oracle also provides a similar function: percentile_disc where disc is short for discrete, meaning no interpolation is taking place. Instead the closest number from the data set is picked.)
percentile_disc
Input: One or two arguments.
If two integer arguments: returns a random integer between the integers in argument one and two.
If the first argument is an arrayref: returns a random member of that array without changing the array.
If the first argument is an arrayref and there is a second arg: return that many random members of that array
If the first argument is an hashref and there is no second arg: return a random key weighted by the values of that hash
If the first argument is an hashref and there is a second arg: return that many random keys weighted by the values of that hash
If there is no second argument and the first is an integer, a random integer between 0 and that number is returned. Including 0 and the number itself.
$dice=random(1,6); # 1, 2, 3, 4, 5 or 6 $dice=random([1..6]); # same as previous @dice=random([1..6],10); # 10 dice tosses $dice=random({1=>1, 2=>1, 3=>1, 4=>1, 5=>1, 6=>2}); # weighted dice with 6 being twice as likely as the others @dice=random({1=>1, 2=>1, 3=>1, 4=>1, 5=>1, 6=>2},10); # 10 weighted dice tosses print random({head=>0.4999,tail=>0.4999,edge=>0.0002}); # coin toss (sum 1 here but not required to be) print random(2); # prints 0, 1 or 2 print 2**random(7); # prints 1, 2, 4, 8, 16, 32, 64 or 128 @dice=map random([1..6]), 1..10; # as third example above, but much slower perl -MAcme::Tools -le 'print for random({head=>0.499,tail=>0.499,edge=>0.002},10000);' | sort | uniq -c
Returns an pseudo-random number with a Gaussian distribution instead of the uniform distribution of perls rand() or random() in this module. The algorithm is a variation of the one at http://www.taygeta.com/random/gaussian.html which is both faster and better than adding a long series of rand().
rand()
random()
Uses perls rand function internally.
rand
Input: 0 - 3 arguments.
First argument: the average of the distribution. Default 0.
Second argument: the standard deviation of the distribution. Default 1.
Third argument: If a third argument is present, random_gauss returns an array of that many pseudo-random numbers. If there is no third argument, a number (a scalar) is returned.
random_gauss
Output: One or more pseudo-random numbers with a Gaussian distribution. Also known as a Bell curve or Normal distribution.
my @I=random_gauss(100, 15, 100000); # produces 100000 pseudo-random numbers, average=100, stddev=15 #my @I=map random_gauss(100, 15), 1..100000; # same but more than three times slower print "Average is: ".avg(@I)."\n"; # prints a number close to 100 print "Stddev is: ".stddev(@I)."\n"; # prints a number close to 15 my @M=grep $_>100+15*2, @I; # those above 130 print "Percent above two stddevs: ".(100*@M/@I)."%\n"; #prints a number close to 2.2%
my $num=1e6; my @h; $h[$_/2]++ for random_gauss(100,15, $num); $h[$_] and printf "%3d - %3d %6d %s\n", $_*2,$_*2+1,$h[$_],'=' x ($h[$_]*1000/$num) for 1..200/2;
...prints an example of the famous Bell curve:
44 - 45 70 46 - 47 114 48 - 49 168 50 - 51 250 52 - 53 395 54 - 55 588 56 - 57 871 58 - 59 1238 = 60 - 61 1807 = 62 - 63 2553 == 64 - 65 3528 === 66 - 67 4797 ==== 68 - 69 6490 ====== 70 - 71 8202 ======== 72 - 73 10577 ========== 74 - 75 13319 ============= 76 - 77 16283 ================ 78 - 79 20076 ==================== 80 - 81 23742 ======================= 82 - 83 27726 =========================== 84 - 85 32205 ================================ 86 - 87 36577 ==================================== 88 - 89 40684 ======================================== 90 - 91 44515 ============================================ 92 - 93 47575 =============================================== 94 - 95 50098 ================================================== 96 - 97 52062 ==================================================== 98 - 99 53338 ===================================================== 100 - 101 52834 ==================================================== 102 - 103 52185 ==================================================== 104 - 105 50472 ================================================== 106 - 107 47551 =============================================== 108 - 109 44471 ============================================ 110 - 111 40704 ======================================== 112 - 113 36642 ==================================== 114 - 115 32171 ================================ 116 - 117 28166 ============================ 118 - 119 23618 ======================= 120 - 121 19873 =================== 122 - 123 16360 ================ 124 - 125 13452 ============= 126 - 127 10575 ========== 128 - 129 8283 ======== 130 - 131 6224 ====== 132 - 133 4661 ==== 134 - 135 3527 === 136 - 137 2516 == 138 - 139 1833 = 140 - 141 1327 = 142 - 143 860 144 - 145 604 146 - 147 428 148 - 149 275 150 - 151 184 152 - 153 111 154 - 155 67
Mixes an array in random order. In-place if given an array reference or not if given an array.
mix() could also have been named shuffle(), as in shuffling a deck of cards.
mix()
shuffle()
print mix("a".."z"),"\n" for 1..3;
...could write something like:
trgoykzfqsduphlbcmxejivnwa qycatilmpgxbhrdezfwsovujkn ytogrjialbewcpvndhkxfzqsmu
This: @a=mix(@a) is the same as: mix(\@a).
@a=mix(@a)
mix(\@a)
Note that an input-array which COINCIDENTLY SOME TIMES has one element (but more other times), and that element is an array-ref, you will probably not get the expected result.
To check distribution:
perl -MAcme::Tools -le 'print mix("a".."z") for 1..26000'|cut -c1|sort|uniq -c|sort -n
The letters a-z should occur around 1000 times each.
Shuffles a deck of cards: (s=spaces, h=hearts, c=clubs, d=diamonds)
perl -MAcme::Tools -le '@cards=map join("",@$_),cart([qw/s h c d/],[2..10,qw/J Q K A/]); print join " ",mix(@cards)'
(Uses "cart", which is not a typo, see further down here)
Note: List::Util::shuffle() is approximately four times faster. Both respects the Perl built-in srand().
List::Util::shuffle()
srand()
Generates random passwords.
Input: 0-n args
* First arg: length of password(s), default 8
* Second arg: number of passwords, default 1
* Third arg: string containing legal chars in password, default A-Za-z0-9,-./&%_!
* Fourth to n'th arg: list of requirements for passwords, default if the third arg is false/undef (so default third arg is used) is:
sub{/^[a-zA-Z0-9].*[a-zA-Z0-9]$/ and /[a-z]/ and /[A-Z]/ and /\d/ and /[,-.\/&%_!]/}
...meaning the password should: * start and end with: a letter a-z (lower- or uppercase) or a digit 0-9 * should contain at least one char from each of the groups lower, upper, digit and special char
To keep the default requirement-sub but add additional ones just set the fourth arg to false/undef and add your own requirements in the fifth arg and forward (examples below). Sub pwgen uses perls own rand() internally.
$Acme::Tools::Pwgen_max_sec and $Acme::Tools::Pwgen_max_trials can be set to adjust for how long pwgen tries to find a password. Defaults for those are 0.01 and 10000. Whenever one of the two limits is reached, a first generates a croak.
$Acme::Tools::Pwgen_max_sec
$Acme::Tools::Pwgen_max_trials
my $pw=pwgen(); # a random 8 chars password A-Z a-z 0-9 ,-./&%!_ (8 is default length) my $pw=pwgen(12); # a random 12 chars password A-Z a-z 0-9 ,-./&%!_ my @pw=pwgen(0,10); # 10 random 8 chars passwords, containing the same possible chars my @pw=pwgen(0,1000,'A-Z'); # 1000 random 8 chars passwords containing just uppercase letters from A to Z pwgen(3); # dies, defaults require chars in each of 4 group (see above) pwgen(5,1,'A-C0-9', qr/^\D{3}\d{2}$/); # a 5 char string starting with three A, B or Cs and endring with two digits pwgen(5,1,'ABC0-9',sub{/^\D{3}\d{2}$/}); # same as above
Examples of adding additional requirements to the default ones:
my @pwreq = ( qr/^[A-C]/ ); pwgen(8,1,'','',@pwreq); # use defaults for allowed chars and the standard requirements # but also demand that the password must start with A, B or C push @pwreq, sub{ not /[a-z]{3}/i }; pwgen(8,1,'','',@pwreq); # as above and in addition the password should not contain three # or more consecutive letters (to avoid "offensive" words perhaps)
Returns the values of the input list, sorted alfanumerically, but only one of each value. This is the same as "uniq" except uniq does not sort the returned list.
print join(", ", distinct(4,9,3,4,"abc",3,"abc")); # 3, 4, 9, abc print join(", ", distinct(4,9,30,4,"abc",30,"abc")); # 30, 4, 9, abc note: alphanumeric sort
Returns 1 (true) if first argument is in the list of the remaining arguments. Uses the perl-operator eq.
Otherwise it returns 0 (false).
print in( 5, 1,2,3,4,6); # 0 print in( 4, 1,2,3,4,6); # 1 print in( 'a', 'A','B','C','aa'); # 0 print in( 'a', 'A','B','C','a'); # 1
I guess in perl 5.10 or perl 6 you could use the ~~ operator instead.
~~
Just as sub "in", but for numbers. Internally uses the perl operator == instead of eq .
print in(5000, '5e3'); # 0 print in(5000, 5e3); # 1 since 5e3 is converted to 5000 before the call print in_num(5000, 5e3); # 1 print in_num(5000, '+5.0e03'); # 1
Input: Two arrayrefs. (Two lists, that is)
Output: An array containing all elements from both input lists, but no element more than once even if it occurs twice or more in the input.
Example, prints 1,2,3,4:
perl -MAcme::Tools -le 'print join ",", union([1,2,3],[2,3,3,4,4])' # 1,2,3,4
Input: Two arrayrefs.
Output: An array containing all elements in the first input array but not in the second.
perl -MAcme::Tools -le 'print join " ", minus( ["five", "FIVE", 1, 2, 3.0, 4], [4, 3, "FIVE"] )'
Output is five 1 2.
five 1 2
Input: Two arrayrefs
Output: An array containing all elements which exists in both input arrays.
perl -MAcme::Tools -le 'print join" ", intersect( ["five", 1, 2, 3.0, 4], [4, 2+1, "five"] )' # 4 3 five
Output: 4 3 five
4 3 five
Output: An array containing all elements member of just one of the input arrays (not both).
perl -MAcme::Tools -le ' print join " ", not_intersect( ["five", 1, 2, 3.0, 4], [4, 2+1, "five"] )'
The output is 1 2.
1 2
Input: An array of strings (or numbers)
Output: The same array in the same order, except elements which exists earlier in the list.
Same as "distinct" but distinct sorts the returned list, uniq does not.
my @t=(7,2,3,3,4,2,1,4,5,3,"x","xx","x",02,"07"); print join " ", uniq @t; # prints 7 2 3 4 1 5 x xx 07
Beware of using sort like the following because sort will see uniq as the subroutine for comparing elements! Which you most likely didnt mean. This has nothing to do with the way uniq is implemented. It's Perl's sort.
sort
uniq
print sort uniq('a','dup','z','dup'); # will return this four element array: a dup z dup print sort(uniq('a','dup','z','dup')); # better, probably what you meant print distinct('a','dup','z','dup')); # same, distinct includes alphanumeric sort
Copies a subset of keys/values from one hash to another.
Input: First argument is a reference to a hash. The rest of the arguments are a list of the keys of which key/value-pair you want to be copied.
Output: The hash consisting of the keys and values you specified.
%population = ( Norway=>5000000, Sweden=>9500000, Finland=>5400000, Denmark=>5600000, Iceland=>320000, India => 1.21e9, China=>1.35e9, USA=>313e6, UK=>62e6 ); %scandinavia = subhash( \%population , 'Norway', 'Sweden', 'Denmark' ); # this and %scandinavia = (Norway=>5000000,Sweden=>9500000,Denmark=>5600000); # this is the same print "Population of $_ is $scandinavia{$_}\n" for keys %scandinavia;
...prints the populations of the three scandinavian countries.
Note: The values are NOT deep copied when they are references. (Use Storable::dclone() to do that).
Storable::dclone()
Note2: For perl versions >= 5.20 subhashes (hash slices returning keys as well as values) is built in like this:
%scandinavia = %population{'Norway','Sweden','Denmark'};
Input: a reference to a hash of hashes
Output: a hash like the input-hash, but matrix transposed (kind of). Think of it as if X and Y has swapped places.
%h = ( 1 => {a=>33,b=>55}, 2 => {a=>11,b=>22}, 3 => {a=>88,b=>99} ); print serialize({hashtrans(\%h)},'v');
Gives:
%v=( 'a'=>{'1'=>'33','2'=>'11','3'=>'88'}, 'b'=>{'1'=>'55','2'=>'22','3'=>'99'} );
Input: array of arrays
Output: array of hashes
Transforms an array of arrays (arrayrefs) to an array of hashes (hashrefs).
my @h = a2h( ['Name', 'Age', 'Gender'], #1st row become keys ['Alice', 20, 'F'], ['Bob', 30, 'M'], ['Eve', undef, 'F'] );
Result array @h:
( {Name=>'Alice', Age=>20, Gender=>'F'}, {Name=>'Bob', Age=>30, Gender=>'M'}, {Name=>'Eve', Age=>undef, Gender=>'F'}, );
Input: array of hashes
Output: array of arrays
Opposite of "a2h"
"zipb64", "unzipb64", "zipbin", "unzipbin", "gzip", and "gunzip" compresses and uncompresses strings to save space in disk, memory, database or network transfer. Trades time for space. (Beware of wormholes)
Compresses the input (text or binary) and returns a base64-encoded string of the compressed binary data. No known limit on input length, several MB has been tested, as long as you've got the RAM...
Input: One or two strings.
First argument: The string to be compressed.
Second argument is optional: A dictionary string.
Output: a base64-kodet string of the compressed input.
The use of an optional dictionary string will result in an even further compressed output in the dictionary string is somewhat similar to the string that is compressed (the data in the first argument).
If x relatively similar string are to be compressed, i.e. x number automatic of email responses to some action by a user, it will pay of to choose one of those x as a dictionary string and store it as such. (You will also use the same dictionary string when decompressing using "unzipb64".
The returned string is base64 encoded. That is, the output is 33% larger than it has to be. The advantage is that this string more easily can be stored in a database (without the hassles of CLOB/BLOB) or perhaps easier transfer in http POST requests (it still needs some url-encoding, normally). See "zipbin" and "unzipbin" for the same without base 64 encoding.
Example 1, normal compression without dictionary:
$txt = "Test av komprimering, hva skjer? " x 10; # ten copies of this norwegian string, $txt is now 330 bytes (or chars rather...) print length($txt)," bytes input!\n"; # prints 330 $zip = zipb64($txt); # compresses print length($zip)," bytes output!\n"; # prints 65 print $zip; # prints the base64 string ("noise") $output=unzipb64($zip); # decompresses print "Hurra\n" if $output eq $txt; # prints Hurra if everything went well print length($output),"\n"; # prints 330
Example 2, same compression, now with dictionary:
$txt = "Test av komprimering, hva skjer? " x 10; # Same original string as above $dict = "Testing av kompresjon, hva vil skje?"; # dictionary with certain similarities # of the text to be compressed $zip2 = zipb64($txt,$dict); # compressing with $dict as dictionary print length($zip2)," bytes output!\n"; # prints 49, which is less than 65 in ex. 1 above $output=unzipb64($zip2,$dict); # uses $dict in the decompressions too print "Hurra\n" if $output eq $txt; # prints Hurra if everything went well
Example 3, dictionary = string to be compressed: (out of curiosity)
$txt = "Test av komprimering, hva skjer? " x 10; # Same original string as above $zip3 = zipb64($txt,$txt); # hmm print length($zip3)," bytes output!\n"; # prints 25 print "Hurra\n" if unzipb64($zip3,$txt) eq $txt; # hipp hipp ...
zipb64() and zipbin() is really just wrappers around Compress::Zlib and inflate() & co there.
inflate()
zipbin() does the same as zipb64() except that zipbin() does not base64 encode the result. Returns binary data.
zipbin()
zipb64()
See "zip" for documentation.
Opposite of "zipb64".
First argument: A string made by "zipb64"
Second argument: (optional) a dictionary string which where used in "zipb64".
Output: The original string (be it text or binary).
See "zipb64".
unzipbin() does the same as "unzip" except that unzipbin() wants a pure binary compressed string as input, not base64.
unzipbin()
See "unzipb64" for documentation.
Input: A string or reference to a string you want to compress. Text or binary.
Output: The binary compressed representation of that input string.
gzip() is really just a wrapper for Compress:Zlib::memGzip() and uses the same compression algorithm as the well known GNU program gzip found in most unix/linux/cygwin distros. Except gzip() does this in-memory. (Both using the C-library zlib).
gzip()
Compress:Zlib::memGzip()
zlib
writefile( "file.gz", gzip("some string") );
Input: A binary compressed string or a reference to such a string. I.e. something returned from gzip() earlier or read from a .gz file.
.gz
Output: The original larger non-compressed string. Text or binary.
gunzip() is a wrapper for Compress::Zlib::memGunzip()
gunzip()
print gunzip( gzip("some string") ); #some string
Same as "gzip" and "gunzip" except with a different compression algorithm (compresses more but is slower). Wrapper for Compress::Bzip2::memBzip.
Compared to gzip/gunzip, bzip2 compression is much slower, bunzip2 decompression not so much.
See also Compress::Bzip2, man Compress::Bzip2, man bzip2, man bunzip2.
man Compress::Bzip2
man bzip2
man bunzip2
writefile( "file.bz2", bzip2("some string") ); print bunzip2( bzip2("some string") ); #some string
Decompressed something compressed by bzip2() or data from a .bz2 file. See "bzip2".
.bz2
Input: an IP-number
Output: either an IP-address machine.sld.tld or an empty string if the DNS lookup didn't find anything.
perl -MAcme::Tools -le 'print ipaddr("129.240.8.200")' # prints www.uio.no
Uses perls gethostbyaddr internally.
gethostbyaddr
ipaddr() memoizes the results internally (using the %Acme::Tools::IPADDR_memo hash) so only the first loopup on a particular IP number might take some time.
ipaddr()
%Acme::Tools::IPADDR_memo
Some few DNS loopups can take several seconds. Most is done in a fraction of a second. Due to this slowness, medium to high traffic web servers should probably turn off hostname lookups in their logs and just log IP numbers by using HostnameLookups Off in Apache httpd.conf and then use ipaddr afterwards if necessary.
HostnameLookups Off
httpd.conf
ipnum() does the opposite of ipaddr()
ipnum()
Does an attempt of converting an IP address (hostname) to an IP number. Uses DNS name servers via perls internal gethostbyname(). Return empty string (undef) if unsuccessful.
gethostbyname()
print ipnum("www.uio.no"); # prints 129.240.13.152
Does internal memoization via the hash %Acme::Tools::IPNUM_memo.
%Acme::Tools::IPNUM_memo
Input: (optional)
Zero or one input argument: A string of the same type often found behind the first question mark (?) in URLs.
?
This string can have one or more parts separated by & chars.
&
Each part consists of key=value pairs (with the first = char being the separation char).
key=value
Both key and value can be url-encoded.
key
value
If there is no input argument, webparams uses $ENV{QUERY_STRING} instead.
webparams
$ENV{QUERY_STRING}
If also $ENV{QUERY_STRING} is lacking, webparams() checks if $ENV{REQUEST_METHOD} eq 'POST'. In that case $ENV{CONTENT_LENGTH} is taken as the number of bytes to be read from STDIN and those bytes are used as the missing input argument.
webparams()
$ENV{REQUEST_METHOD} eq 'POST'
$ENV{CONTENT_LENGTH}
STDIN
The environment variables QUERY_STRING, REQUEST_METHOD and CONTENT_LENGTH is typically set by a web server following the CGI standard (which Apache and most of them can do I guess) or in mod_perl by Apache. Although you are probably better off using CGI. Or $R->args() or $R->content() in mod_perl.
$R->args()
$R->content()
webparams() returns a hash of the key/value pairs in the input argument. Url-decoded.
If an input string has more than one occurrence of the same key, that keys value in the returned hash will become concatenated each value separated by a , char. (A comma char)
,
use Acme::Tools; my %R=webparams(); print "Content-Type: text/plain\n\n"; # or rather \cM\cJ\cM\cJ instead of \n\n to be http-compliant print "My name is $R{name}";
Storing those four lines in a file in the directory designated for CGI-scripts on your web server (or perhaps naming the file .cgi is enough), and chmod +x /.../cgi-bin/script and the URL http://some.server.somewhere/cgi-bin/script?name=HAL will print My name is HAL to the web page.
chmod +x /.../cgi-bin/script
My name is HAL
http://some.server.somewhere/cgi-bin/script?name=Bond&name=+James+Bond will print My name is Bond, James Bond.
My name is Bond, James Bond
Input: a string
Output: the same string URL encoded so it can be sent in URLs or POST requests.
In URLs (web addresses) certain characters are illegal. For instance space and newline. And certain other chars have special meaning, such as +, %, =, ?, &.
+
%
These illegal and special chars needs to be encoded to be sent in URLs. This is done by sending them as % and two hex-digits. All chars can be URL encodes this way, but it's necessary just on some.
$search="Østdal, Åge"; my $url="http://machine.somewhere.com/search?q=" . urlenc($search); print $url;
Prints http://machine.somewhere.com/search?q=%D8stdal%2C%20%C5ge
http://machine.somewhere.com/search?q=%D8stdal%2C%20%C5ge
Opposite of "urlenc".
Example, this returns ' ø'. That is space and ø.
ø
urldec('+%C3')
ht2t is short for html-table to table.
ht2t
This sub extracts an html-<table>s and returns its <tr>s and <td>s as an array of arrayrefs. And strips away any html inside the <td>s as well.
<table>
<tr>s
<td>s
my @table = ht2t($html,'some string occuring before the <table> you want');
First argument: the html where a <table> is to be found and converted.
Second argument: (optional) If the html contains more than one <table>, and you do not want the first one, applying a second argument is a way of telling ht2t which to capture: the one with this word or string occurring before it.
Output: An array of arrayrefs.
ht2t() is a quick and dirty way of scraping (or harvesting as it is also called) data from a web page. Look too HTML::Parse to do this more accurate.
ht2t()
use Acme::Tools; use LWP::Simple; my $url = "http://en.wikipedia.org/wiki/List_of_countries_by_population"; for( ht2t( get($url), "Countries" ) ) { my($rank, $country, $pop) = @$_; $pop =~ s/,//g; printf "%3d | %-32s | %9d\n", @$_ if $pop>0; }
1 | China | 1367740000 2 | India | 1262090000 3 | United States | 319043000 4 | Indonesia | 252164800 5 | Brazil | 203404000
...and so on.
Justification:
Perl needs three or four operations to make a file out of a string:
open my $FILE, '>', $filename or die $!; print $FILE $text; close($FILE);
This is way simpler:
writefile($filename,$text);
Sub writefile opens the file i binary mode (binmode()) and has two usage modes:
binmode()
Input: Two arguments
First argument is the filename. If the file exists, its overwritten. If the file can not be opened for writing, a die (a croak really) happens.
Second input argument is one of:
Either a scaler. That is a normal string to be written to the file.
Or a reference to a scalar. That referred text is written to the file.
Or a reference to an array of scalars. This array is the written to the file element by element and \n is automatically appended to each element.
\n
Alternativelly, you can write several files at once.
Example, this:
writefile('file1.txt','The text....tjo'); writefile('file2.txt','The text....hip'); writefile('file3.txt','The text....and hop');
...is the same as this:
writefile([ ['file1.txt','The text....tjo'], ['file2.txt','The text....hip'], ['file3.txt','The text....and hop'], ]);
Automatic compression:
writefile('file.txt.gz','my text is compressed by /bin/gzip before written to the file');
Extentions .gz, .bz2 and .xz are recognized for compression. See also readfile() and openstr().
.xz
readfile()
openstr()
Output: Nothing (for the time being). die()s (croak($!) really) if something goes wrong.
die()
croak($!)
Just as with "writefile" you can read in a whole file in one operation with readfile(). Instead of:
open my $FILE,'<', $filename or die $!; my $data = join"",<$FILE>; close($FILE);
This is simpler:
my $data = readfile($filename);
More examples:
Reading the content of the file to a scalar variable: (Any content in $data will be overwritten)
$data
my $data; readfile('filename.txt',\$data);
Reading the lines of a file into an array:
my @lines; readfile('filnavn.txt',\@lines); for(@lines){ ... }
Note: Chomp is done on each line. That is, any newlines (\n) will be removed. If @lines is non-empty, this will be lost.
@lines
Sub readfile is context aware. If an array is expected it returns an array of the lines without a trailing \n. The last example can be rewritten:
for(readfile('filnavn.txt')){ ... }
With two input arguments, nothing (undef) is returned from readfile().
Automatic decompression:
my $txt = readfile('file.txt.gz'); #uses /bin/gunzip to decompress content
Extentions .gz, .bz2 and .xz are recognized for decompression. See also writefile() and openstr().
writefile()
Name of a directory.
A list of all files in it, except of . and .. (on linux/unix systems, all directories have a . and .. directory).
..
The names of all types of files are returned: normal files, directories, symbolic links, pipes, semaphores. That is every thing shown by ls -la except . and ..
ls -la
readdirectory do not recurce down into subdirectories (but see example below).
readdirectory
my @files = readdirectory("/tmp");
Why readdirectory?
Sometimes calling the built ins opendir, readdir and closedir seems a tad tedious, since this:
opendir
readdir
closedir
my $dir="/usr/bin"; opendir(D,$dir); my @files=map "$dir/$_", grep {!/^\.\.?$/} readdir(D); closedir(D);
my @files=readdirectory("/usr/bin");
See also: File::Find
Why not readdirectory?
On huge directories with perhaps tens or houndreds of thousands of files, readdirectory() will consume more memory than perls opendir/readdir. This isn't usually a concern anymore for modern computers with gigabytes of RAM, but might be the rationale behind Perls more tedious way created in the 80s. The same argument goes for file slurping. On the other side it's also a good practice to never assume to much on available memory and the number of files if you don't know for certain that enough memory is available whereever your code is run or that the size of the directory is limited.
How to get all files in the /tmp directory including all subdirectories below of any depth:
/tmp
my @files=("/tmp"); map {-d $_ and unshift @files,$_ or push @files,$_} readdirectory(shift(@files)) while -d $files[0];
...or to avoid symlinks and only get real files:
map {-d and !-l and unshift @files,$_ or -f and !-l and push @files,$_} readdirectory(shift(@files)) while -d $files[0];
The basename and dirname functions behaves like the *nix shell commands with the same names.
Input: One or two arguments: Filename and an optional suffix
Output: Returns the filename with any directory and (if given) the suffix removed.
basename('/usr/bin/perl') # returns 'perl' basename('/usr/local/bin/report.pl','.pl') # returns 'report' since .pl at the end is removed basename('report2.pl','.pl') # returns 'report2' basename('report2.pl','.\w+') # returns 'report2.pl', probably not what you meant basename('report2.pl',qr/.\w+/) # returns 'report2', use qr for regex
Input: A filename including path
Output: Removes the filename path and returns just the directory path up until but not including the last /. Return just a one char . (period string) if there is no directory in the input.
dirname('/usr/bin/perl') # returns '/usr/bin' dirname('perl') # returns '.'
Returns the current linux/unix username, for example the string root
print username(); #just (getpwuid($<))[0] but more readable perhaps
Deletes a file by "wiping" it on the disk. Overwrites the file before deleting. (May not work properly on SSDs)
Input: * Arg 1: A filename * Optional arg 2: number of times to overwrite file. Default is 3 if omitted, 0 or undef * Optional arg 3: keep (true/false), wipe() but no delete of file
Output: Same as the unlink() (remove file): 1 for success, 0 or false for failure.
unlink()
See also: https://www.google.com/search?q=wipe+file, http://www.dban.org/
Does chmod + utime + chown on one or more files.
Returns the number of files of which those operations was successful.
Mode, uid, gid, atime and mtime are set from the array ref in the first argument.
The first argument references an array which is exactly like an array returned from perls internal stat($filename) -function.
stat($filename)
my @stat=stat($filenameA); chall( \@stat, $filenameB, $filenameC, ... ); # by stat-array chall( $filenameA, $filenameB, $filenameC, ... ); # by file name
Copies the chmod, owner, group, access time and modify time from file A to file B and C.
See perldoc -f stat, perldoc -f chmod, perldoc -f chown, perldoc -f utime
perldoc -f stat
perldoc -f chmod
perldoc -f chown
perldoc -f utime
Works like perls mkdir() except that makedir() will create nesessary parent directories if they dont exists.
mkdir()
makedir()
First input argument: A directory name (absolute, starting with / or relative).
Second input argument: (optional) permission bits. Using the normal 0777^umask() as the default if no second input argument is provided.
0777^umask()
makedir("dirB/dirC")
...will create directory dirB if it does not already exists, to be able to create dirC inside dirB.
dirB
dirC
Returns true on success, otherwise false.
makedir() memoizes directories it has checked for existence before (trading memory and for speed). Thus directories removed during running the script is not discovered by makedir.
See also perldoc -f mkdir, man umask
perldoc -f mkdir
man umask
Input: a filename (or a scalar ref to a string, see below)
Output: a string of 32 hexadecimal chars from 0-9 or a-f.
Example, the md5sum gnu/linux command without options could be implementet like this:
use Acme::Tools; print eval{ md5sum($_)." $_\n" } || $@ for @ARGV;
This sub requires Digest::MD5, which is a core perl-module since version 5.?.? It does not slurp the files or spawn new processes.
If the input argument is a scalar ref then the MD5 of the string referenced is returned in hex.
Returns the first executable program in $ENV{PATH} paths (split by : colon) with the given name.
echo $PATH perl -MAcme::Tools -le 'print which("gzip")' # maybe prints /bin/gzip
First argument: A file name or a reference to a string with settings in the format described below.
Second argument, optional: A reference to a hash. This hash will have the settings from the file (or stringref). The hash do not have to be empty beforehand.
Returns a hash with the settings as in this examples:
my %conf = read_conf('/etc/your/thing.conf'); print $conf{sectionA}{knobble}; #prints ABC if the file is as shown below print $conf{sectionA}{gobble}; #prints ZZZ, the last gobble print $conf{switch}; #prints OK here as well, unsectioned value print $conf{part2}{password}; #prints oh:no= x
File use for the above example:
switch: OK #before first section, the '' (empty) section [sectionA] knobble: ABC gobble: XYZ #this gobble is overwritten by the gobble on the next line gobble: ZZZ [part2] password: oh:no= x #should be better text: { values starting with { continues until reaching a line with }
Everything from # and behind is regarded comments and ignored. Comments can be on any line. To keep a # char, put a \ in front of it.
A : or = separates keys and values. Spaces at the beginning or end of lines are ignored (after removal of #comments), as are any spaces before and after : and = separators.
:
Empty lines or lines with no : or = is also ignored. Keys and values can contain internal spaces and tabs, but not at the beginning or end.
Multi-line values must start and end with { and }. Using { and } keep spaces at the start or end in both one-line and multi-line values.
Sections are marked with [sectionname]. Section names, keys and values is case sensitive. Key:values above the first section or below and empty [] is placed both in the empty section in the returned hash and as top level key/values.
[sectionname]
Key:values
[]
read_conf can be a simpler alternative to the core module Config::Std which has its own hassles.
read_conf
$Acme::Tools::Read_conf_empty_section=1; #default 0 (was 1 in version 0.16) my %conf = read_conf('/etc/your/thing.conf'); print $conf{''}{switch}; #prints OK with the file above print $conf{switch}; #prints OK here as well
# returned from openstr: open my $FH, openstr("fil.txt") or die; # fil.txt open my $FH, openstr("fil.gz") or die; # zcat fil.gz | open my $FH, openstr("fil.bz2") or die; # bzcat fil.bz2 | open my $FH, openstr("fil.xz") or die; # xzcat fil.xz | open my $FH, openstr(">fil.txt") or die; # > fil.txt open my $FH, openstr(">fil.gz") or die; # | gzip > fil.gz open my $FH, openstr(">fil.bz2") or die; # | bzip2 > fil.bz2 open my $FH, openstr(">fil.xz") or die; # | xz > fil.bz2
Environment variable PATH is used. So in the examples above, /bin/gzip is returned instead of gzip if /bin is the first directory in $ENV{PATH} containing an executable file gzip. Dirs /usr/bin, /bin and /usr/local/bin is added to PATH in openstr(). They are checked even if PATH is empty.
See also writefile() and readfile() for automatic compression and decompression using openstr.
openstr
Redirects print and printf from STDOUT to a string which is returned.
print
printf
my $p = printed { print "hello!" }; # now $p eq 'hello!' my $p = printed { some_sub() }; # now $p contains whatever was printed by some_sub() and the subs call from it
Timestring, works somewhat like the Gnu/Linux date command and Oracle's to_char()
date
to_char()
Converts timestamps to more readable forms of time strings.
Converts seconds since epoch and time strings on the form YYYYMMDD-HH24:MI:SS to other forms.
YYYYMMDD-HH24:MI:SS
Input: One, two or three arguments.
First argument: A format string.
Second argument: (optional) An epock time() number or a time string of the form YYYYMMDD-HH24:MI:SS or YYYYMMDDTHH:MI:SS or YYYY-MM-DDTHH:MI:SS (in which T is litteral and HH is the 24-hour version of hours) or YYYYMMDD. Uses the current time() if the second argument is missing.
time()
TODO: Formats with % as in man date (%Y%m%d and so on)
man date
%Y%m%d
Third argument: (optional True or false. If true and first argument is eight digits: Its interpreted as a date like YYYYMMDD time string, not an epoch time. If true and first argument is six digits its interpreted as a date like DDMMYY (not YYMMDD!).
Output: a date or clock string on the wanted form.
Prints 3. july 1997 if thats the dato today:
3. july 1997
perl -MAcme::Tools -le 'print timestr("D. month YYYY")' print tms("HH24:MI"); # prints 23:55 if thats the time now tms("HH24:MI",time()); # ...same,since time() is the default tms("HH:MI",time()-5*60); # 23:50 if that was the time 5 minutes ago tms("HH:MI",time()-5*60*60); # 18:55 if thats the time 5 hours ago tms("Day Month Dth YYYY HH:MI"); # Saturday July 1st 2004 23:55 (big S, big J) tms("Day D. Month YYYY HH:MI"); # Saturday 8. July 2004 23:55 (big S, big J) tms("DAY D. MONTH YYYY HH:MI"); # SATURDAY 8. JULY 2004 23:55 (upper) tms("dy D. month YYYY HH:MI"); # sat 8. july 2004 23:55 (small s, small j) tms("Dy DD. MON YYYY HH12:MI am"); # Sat 08. JUL 2004 11:55 pm (HH12, am becomes pm if after 12) tms("DD-MON-YYYY"); # 03-MAY-2004 (mon, english)
The following list of codes in the first argument will be replaced:
YYYY Year, four digits YY Year, two digits, i.e. 04 instead of 2004 yyyy Year, four digits, but nothing if its the current year YYYY|HH:MI Year if its another year than the current, a time in hours and minutes elsewise MM Month, two digits. I.e. 08 for August DD Day of month, two digits. I.e. 01 (not 1) for the first day in a month D Day of month, one digit. I.e. 1 (not 01) HH Hour. From 00 to 23. HH24 Same as HH. HH12 12 becomes 12 (never 00), 13 becomes 01, 14 02 and so on. Note: 00 after midnight becomes 12 (am). Tip: always include the code am in a format string that uses HH12. MI Minutt. Fra 00 til 59. SS Sekund. Fra 00 til 59. am Becomes am or pm pm Same AM Becomes AM or PM (upper case) PM Same Month The full name of the month in English from January to December MONTH Same in upper case (JANUARY) month Same in lower case (january) Mont Jan Feb Mars Apr May June July Aug Sep Oct Nov Dec Mont. Jan. Feb. Mars Apr. May June July Aug. Sep. Oct. Nov. Dec. (always four chars) Mon Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec (always three chars) Day The full name of the weekday. Sunday to Saturday Dy Three letters: Sun Mon Tue Wed Thu Fri Sat DAY Upper case DY Upper case Dth 1st 2nd 3rd 4th 5th ... 11th 12th ... 20th 21st 22nd 23rd 24th ... 30th 31st WW Week number of the year 01-53 according to the ISO8601-definition (which most countries uses) WWUS Week number of the year 01-53 according to the most used definition in the USA. Other definitions also exists. epoch Converts a time string from YYYYMMDD-HH24:MI:SS, YYYYMMDD-HH24:MI:SS, YYYYMMDDTHH:MI:SS, YYYY-MM-DDTHH:MI:SS or YYYYMMDD to the number of seconds since January 1st 1970. Commonly known as the Unix epoch. JDN Julian day number. Integer. The number of days since the day starting at noon on January 1 4713 BC JD Same as JDN but a float accounting for the time of day
Third argument: (optional) Is_date. False|true, default false. If true, the second argument is interpreted as a date of the form YYYYMMDD, not as a number of seconds since epoch (January 1st 1970).
Convert strings to "time pieces". Example:
my($dd,$mm,$yyyy,$str) = s2t("18/february/2019:13:53","DD","MM","YYYY","YYYYMMDD-HH24:MI:SS") print "dd: $dd mm: $mm yyyy: $yyyy str: $str\n"; # dd: 18 mm: 02 yyyy: 2019 str: 20190218-13:53:00
Input: A year (a four digit number)
Output: array of two numbers: day and month of Easter Sunday that year. Month 3 means March and 4 means April.
sub easter { use integer;my$Y=shift;my$C=$Y/100;my$L=($C-$C/4-($C-($C-17)/25)/3+$Y%19*19+15)%30; (($L-=$L>28||($L>27?1-(21-$Y%19)/11:0))-=($Y+$Y/4+$L+2-$C+$C/4)%7)<4?($L+28,3):($L-3,4) }
...is a "golfed" version of Oudins algorithm (1940) http://astro.nmsu.edu/~lhuber/leaphist.html (see also http://www.smart.net/~mmontes/ec-cal.html )
Valid for any Gregorian year. Dates repeat themselves after 70499183 lunations = 2081882250 days = ca 5699845 years. However, our planet will by then have a different rotation and spin time...
( $day, $month ) = easter( 2012 ); # $day == 8 and $month == 4
my @e=map sprintf("%02d%02d", reverse(easter($_))), 1800..300000; print "First: ".min(@e)." Last: ".max(@e)."\n"; # First: 0322 Last: 0425
Note: The Spencer Jones formula differs Oudins used in easter() in some years before 1498. However, in that period the Julian calendar with a different formula was used anyway. Countries introduced the current Gregorian calendar between 1583 and 1926.
easter()
No input arguments.
Return the same number as perls time() except with decimals (fractions of a second, _fp as in floating point number).
print time_fp(),"\n"; print time(),"\n";
Could write:
1116776232.38632
...if that is the time now.
Or just:
1116776232
...from perl's internal time() if Time::HiRes isn't installed and available.
Time::HiRes
sleep_fp() work as the built in sleep() but also accepts fractional seconds:
sleep()
sleep_fp(0.020); # sleeps for 20 milliseconds
Sub sleep_fp do a require Time::HiRes, thus it might take some extra time the first call. To avoid that, add use Time::HiRes to your code. Sleep_fp should not be trusted for accuracy to more than a tenth of a second. Virtual machines tend to be less accurate (sleep longer) than physical ones. This was tested on VMware and RHEL (Linux). See also Time::HiRes.
require Time::HiRes
use Time::HiRes
sleep_fp(0.020); #sleeps for 20 milliseconds sleeps(0.020); #sleeps for 20 milliseconds, sleeps() is a synonym to sleep_fp() sleepms(20); #sleeps for 20 milliseconds sleepus(20000); #sleeps for 20000 microseconds = 20 milliseconds sleepns(20000000); #sleeps for 20 million nanoseconds = 20 milliseconds
Estimated time of arrival (ETA).
for(@files){ ...do work on file... my $eta = eta( ++$i, 0+@files ); # file now, number of files print "" . localtime($eta); }
TODO: eta is borken and out of wack, good idea?: http://en.wikipedia.org/wiki/Kalman_filter
...NOT YET
sleep_until(0.5) sleeps until half a second has passed since the last call to sleep_until. This example starts the next job excactly ten seconds after the last job started even if the last job lasted for a while (but not more than ten seconds):
for(@jobs){ sleep_until(10); print localtime()."\n"; ...heavy job.... }
Might print:
Thu Jan 12 16:00:00 2012 Thu Jan 12 16:00:10 2012 Thu Jan 12 16:00:20 2012
...and so on even if the ...heavy job...-part takes more than a second to complete. Whereas if sleep(10) was used, each job would spend more than ten seconds in average since the work time would be added to sleep(10).
...heavy job...
Note: sleep_until() will remember the time of ANY last call of this sub, not just the one on the same line in the source code (this might change in the future). The first call to sleep_until() will be the same as sleep_fp() or Perl's own sleep() if the argument is an integer.
Input: A year. A four digit number.
Output: True (1) or false (0) of whether the year is a leap year or not. (Uses current calendar even for periods before leapyears was used).
print join(", ",grep leapyear($_), 1900..2014)."\n"; 1904, 1908, 1912, 1916, 1920, 1924, 1928, 1932, 1936, 1940, 1944, 1948, 1952, 1956, 1960, 1964, 1968, 1972, 1976, 1980, 1984, 1988, 1992, 1996, 2000, 2004, 2008, 2012
Note: 1900 is not a leap year, but 2000 is. Years divided by 100 is a leap year only if it can be divided by 400.
The no value function (or null value function)
nvl() takes two or more arguments. (Oracles nvl-function take just two)
nvl()
Returns the value of the first input argument with length() > 0.
Return undef if there is no such input argument.
In perl 5.10 and perl 6 this will most often be easier with the // operator, although nvl() and // treats empty strings "" differently. Sub nvl here considers empty strings and undef the same.
//
""
See "decode".
decode() and decode_num() works just as Oracles decode().
decode()
decode_num()
decode() and decode_num() accordingly uses perl operators eq and == for comparison.
my $a=123; print decode($a, 123,3, 214,4, $a); # prints 3 print decode($a, 123=>3, 214=>4, $a); # prints 3, same thing since => is synonymous to comma in Perl
The first argument is tested against the second, fourth, sixth and so on, and then the third, fifth, seventh and so on is returned if decode() finds an equal string or number.
In the above example: 123 maps to 3, 124 maps to 4 and the last argument $a is returned elsewise.
my $a=123; print decode($a, 123=>3, 214=>7, $a); # also 3, note that => is synonym for , (comma) in perl print decode($a, 122=>3, 214=>7, $a); # prints 123 print decode($a, 123.0 =>3, 214=>7); # prints 3 print decode($a, '123.0'=>3, 214=>7); # prints nothing (undef), no last argument default value here print decode_num($a, 121=>3, 221=>7, '123.0','b'); # prints b
Sort of:
decode($string, %conversion, $default);
The last argument is returned as a default if none of the keys in the keys/value-pairs matched.
A more perl-ish and often faster way of doing the same:
{123=>3, 214=>7}->{$a} || $a # (beware of 0)
Input: An array of values to be used to test againts for existence.
Output: A reference to a regular expression. That is a qr//
qr//
The regex sets $1 if it match.
my @list=qw/ABc XY DEF DEFG XYZ/; my $filter=qrlist("ABC","DEF","XY."); # makes a regex of it qr/^(\QABC\E|\QDEF\E|\QXY.\E)$/ my @filtered= grep { $_ =~ $filter } @list; # returns DEF and XYZ, but not XYZ because the . char is taken literally
Note: Filtering with hash lookups are WAY faster.
Source:
sub qrlist (@) { my $str=join"|",map quotemeta, @_; qr/^($str)$/ }
Perhaps easier to use than Term::ANSIColor ?
Input: One argument. A string where the char ¤ have special meaning and is replaced by color codings depending on the letter following the ¤.
¤
Output: The same string, but with ¤letter replaced by ANSI color codes respected by many types terminal windows. (xterm, telnet, ssh, telnet, rlog, vt100, cygwin, rxvt and such...).
¤letter
Codes for ansicolor():
¤r red ¤g green ¤b blue ¤y yellow ¤m magenta ¤B bold ¤u underline ¤c clear ¤¤ reset, quits and returns to default text color.
print ansicolor("This is maybe ¤ggreen¤¤?");
Prints This is maybe green? where the word green is shown in green.
If Term::ANSIColor is not installed or not found, returns the input string with every ¤ including the following code letters removed. (That is: ansicolor is safe to use even if Term::ANSIColor is not installed, you just don't get the colors).
See also Term::ANSIColor.
Checks if a Credit Card number (CCN) has correct control digits according to the LUHN-algorithm from 1960. This method of control digits is used by MasterCard, Visa, American Express, Discover, Diners Club / Carte Blanche, JCB and others.
A credit card number. Can contain non-digits, but they are removed internally before checking.
Something true or false.
Or more accurately:
Returns undef (false) if the input argument is missing digits.
Returns 0 (zero, which is false) is the digits is not correct according to the LUHN algorithm.
Returns 1 or the name of a credit card company (true either way) if the last digit is an ok control digit for this ccn.
The name of the credit card company is returned like this (without the ' character)
'
Returns (wo '') Starts on Number of digits ------------------------------ ------------------------ ---------------- 'MasterCard' 51-55 16 'Visa' 4 13 eller 16 'American Express' 34 eller 37 15 'Discover' 6011 16 'Diners Club / Carte Blanche' 300-305, 36 eller 38 14 'JCB' 3 16 'JCB' 2131 eller 1800 15
And should perhaps have had:
'enRoute' 2014 eller 2149 15
...but that card uses either another control algorithm or no control digits at all. So enRoute is never returned here.
enRoute
If the control digits is valid, but the input does not match anything in the column starts on, 1 is returned.
starts on
(This is also the same control digit mechanism used in Norwegian KID numbers on payment bills)
The first digit in a credit card number is supposed to tell what "industry" the card is meant for:
MII Digit Value Issuer Category --------------------------- ---------------------------------------------------- 0 ISO/TC 68 and other industry assignments 1 Airlines 2 Airlines and other industry assignments 3 Travel and entertainment 4 Banking and financial 5 Banking and financial 6 Merchandizing and banking 7 Petroleum 8 Telecommunications and other industry assignments 9 National assignment
...although this has no meaning to Acme::Tools::ccn_ok().
Acme::Tools::ccn_ok()
The first six digits is Issuer Identifier, that is the bank (probably). The rest in the "account number", except the last digits, which is the control digit. Max length on credit card numbers are 19 digits.
Checks if a norwegian KID number has an ok control digit.
To check if a customer has typed the number correctly.
This uses the LUHN algorithm (also known as mod-10) from 1960 which is also used internationally in control digits for credit card numbers, and Canadian social security ID numbers as well.
The algorithm, as described in Phrack (47-8) (a long time hacker online publication):
"For a card with an even number of digits, double every odd numbered digit and subtract 9 if the product is greater than 9. Add up all the even digits as well as the doubled-odd digits, and the result must be a multiple of 10 or it's not a valid card. If the card has an odd number of digits, perform the same addition doubling the even numbered digits instead."
Input: A KID-nummer. Must consist of digits 0-9 only, otherwise a die (croak) happens.
- Returns undef if the input argument is missing.
- Returns 0 if the control digit (the last digit) does not satify the LUHN/mod-10 algorithm.
- Returns 1 if ok
See also: "ccn_ok"
One or more numeric arguments:
First: x (first returned element)
Second: y (up to y but not including y)
Third: step, default 1. The step between each returned element
If a fourth, fifth and so on arguments are given, they change the step for each returned element. As first derivative, second derivative.
If one argument: returns the array (0 .. x-1)
(0 .. x-1)
If two arguments: returns the array (x .. y-1)
(x .. y-1)
If three arguments: The default step is 1. Use a third argument to use a different step.
print join ",", range(11); # prints 0,1,2,3,4,5,6,7,8,9,10 (but not 11) print join ",", range(2,11); # 2,3,4,5,6,7,8,9,10 (but not 11) print join ",", range(11,2,-1); # 11,10,9,8,7,6,5,4,3 print join ",", range(2,11,3); # 2,5,8 print join ",", range(11,2,-3); # 11,8,5 print join ",", range(11,2,+3); # prints nothing print join ", ",range(2,11,1,0.1); # 2, 3, 4.1, 5.3, 6.6, 8, 9.5 adds 0.1 to step each time print join ", ",range(2,11,1,0.1,-0.01); # 2, 3, 4.1, 5.29, 6.56, 7.9, 9.3, 10.75
Note: In the Python language and others, range is a build in iterator (a generator), not an array. This saves memory for large sets and sometimes time. Use range in List::Gen to get a similar lazy generator in Perl.
range
Works like and uses Perls builtin glob() but adds support for ranges with {from..to} and {from..to..step}. Like brace expansion in bash.
glob()
{from..to}
{from..to..step}
my @arr = glob "X{a,b,c,d}Z"; # @arr now have four elements: XaZ XbZ XcZ XdZ my @arr = globr "X{a,b,c,d}Z"; # same as above my @arr = globr "X{a..d}Z"; # same as above my @arr = globr "X{a..f..2}"; # step 2, returns array: Xa Xc Xe my @arr = globr "{aa..bz..13}Z"; # aaZ anZ baZ bnZ my @arr = globr "{1..12}b"; # 1b 2b 3b 4b 5b 6b 7b 8b 9b 10b 11b 12b my @arr = globr "{01..11}b"; # 01b 02b 03b 04b 05b 06b 07b 08b 09b 10b 11b (keep leading zero) my @arr = globr "{01..12..3}b"; # 01b 04b 07b 10b
How many ways (permutations) can six people be placed around a table:
One person: one way Two persons: two ways (they can swap places) Three persons: 6 Four persons: 24 Five persons: 120 Six persons: 720
The formula is x! where the postfix unary operator !, also known as faculty is defined as: x! = x * (x-1) * (x-2) ... * 1. Example: 5! = 5 * 4 * 3 * 2 * 1 = 120.Run this to see the 100 first n!
x!
!
x! = x * (x-1) * (x-2) ... * 1
5! = 5 * 4 * 3 * 2 * 1 = 120
n!
perl -MAcme::Tools -le'$i=big(1);print "$_!=",$i*=$_ for 1..100' 1! = 1 2! = 2 3! = 6 4! = 24 5! = 120 6! = 720 7! = 5040 8! = 40320 9! = 362880 10! = 3628800 . . . 100! = 93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
permutations() takes a list and return a list of arrayrefs for each of the permutations of the input list:
permutations()
permutations('a','b'); #returns (['a','b'],['b','a']) permutations('a','b','c'); #returns (['a','b','c'],['a','c','b'], # ['b','a','c'],['b','c','a'], # ['c','a','b'],['c','b','a'])
Up to five input arguments permutations() is probably as fast as it can be in this pure perl implementation (see source). For more than five, it could be faster. How fast is it now: Running with different n, this many time took that many seconds:
n times seconds -- ------- --------- 2 100000 0.32 3 10000 0.09 4 10000 0.33 5 1000 0.18 6 100 0.27 7 10 0.21 8 1 0.17 9 1 1.63 10 1 17.00
If the first argument is a coderef, that sub will be called for each permutation and the return from those calls with be the real return from permutations(). For example this:
print for permutations(sub{join"",@_},1..3);
...will print the same as:
print for map join("",@$_), permutations(1..3);
...but the first of those two uses less RAM if 3 has been say 9. Changing 3 with 10, and many computers hasn't enough memory for the latter.
The examples prints:
123 132 213 231 312 321
If you just want to say calculate something on each permutation, but is not interested in the list of them, you just don't take the return. That is:
my $ant; permutations(sub{$ant++ if $_[-1]>=$_[0]*2},1..9);
...is the same as:
$$_[-1]>=$$_[0]*2 and $ant++ for permutations(1..9);
...but the first uses next to nothing of memory compared to the latter. They have about the same speed. (The examples just counts the permutations where the last number is at least twice as large as the first)
permutations() was created to find all combinations of a persons name. This is useful in "fuzzy" name searches with String::Similarity if you can not be certain what is first, middle and last names. In foreign or unfamiliar names it can be difficult to know that.
print @$_,"\n" for perm("a".."c"); # prints six lines: abc acb bac bca cab cba
my $c = permute { print @_,"\n" } "a".."c"; # prints six lines: abc acb bac bca cab cba print "count: $c\n"; # prints 6 = 3*2*1 = 3!
The permute BLOCK needs to return true (which print does) for permute to continue:
my $c = permute { print @_,"\n"; rand()<.5 } "a".."d"; # probably prints less than 24 strings print "count: $c\n"; # prints random number up to 24 = 4*3*2*1 = 4!
my @abc = ("a", "b", "c"); my @start = ("b", "a", "c"); # starting sequence to continue from my $c = permute_continue { print @_,"\n" } \@abc, \@start; # prints four lines: bac bca cab cba my $c = permute { print @_,"\n" } \@abc, \@start; # same, =permute_continue when coreref+arrayref+arrayref print "count: $c\n"; # prints 6-2 = 3*2*1-2 = 3!-2
Cartesian product
Easy usage:
Input: two or more arrayrefs with accordingly x, y, z and so on number of elements.
Output: An array of x * y * z number of arrayrefs. The arrays being the cartesian product of the input arrays.
It can be useful to think of this as joins in SQL. In select statements with more than one table behind from, but without any where condition to join the tables.
select
from
where
Advanced usage, with condition(s):
- Either two or more arrayrefs with x, y, z and so on number of elements.
- Or coderefs to subs containing condition checks. Somewhat like where conditions in SQL.
Output: An array of x * y * z number of arrayrefs (the cartesian product) minus the ones that did not fulfill the condition(s).
This of is as joins with one or more where conditions as coderefs.
The coderef input arguments can be placed last or among the array refs to save both runtime and memory if the conditions depend on arrays further back.
Examples, this:
for(cart(\@a1,\@a2,\@a3)){ my($a1,$a2,$a3) = @$_; print "$a1,$a2,$a3\n"; }
Prints the same as this:
for my $a1 (@a1){ for my $a2 (@a2){ for my $a3 (@a3){ print "$a1,$a2,$a3\n"; } } }
This: with a condition: the sum of the first two should be divisible by 3:
for( cart( \@a1, \@a2, sub{sum(@$_)%3==0}, \@a3 ) ) { my($a1,$a2,$a3)=@$_; print "$a1,$a2,$a3\n"; }
for my $a1 (@a1){ for my $a2 (@a2){ next if 0==($a1+$a2)%3; for my $a3 (@a3){ print "$a1,$a2,$a3\n"; } } }
Examples, from the tests:
my @a1 = (1,2); my @a2 = (10,20,30); my @a3 = (100,200,300,400); my $s = join"", map "*".join(",",@$_), cart(\@a1,\@a2,\@a3); ok( $s eq "*1,10,100*1,10,200*1,10,300*1,10,400*1,20,100*1,20,200" ."*1,20,300*1,20,400*1,30,100*1,30,200*1,30,300*1,30,400" ."*2,10,100*2,10,200*2,10,300*2,10,400*2,20,100*2,20,200" ."*2,20,300*2,20,400*2,30,100*2,30,200*2,30,300*2,30,400"); $s=join"",map "*".join(",",@$_), cart(\@a1,\@a2,\@a3,sub{sum(@$_)%3==0}); ok( $s eq "*1,10,100*1,10,400*1,20,300*1,30,200*2,10,300*2,20,200*2,30,100*2,30,400");
Example, hash-mode:
Returns hashrefs instead of arrayrefs:
my @cards=cart( # @card gets 5200 hashrefs, 100 decks of 52 cards deck => [1..100], rank => [qw(2 3 4 5 6 7 8 9 10 J Q K A)], suit => [qw(heart diamond club star)], ); for my $card ( mix(@cards) ) { print "From deck number $$card{deck} we got $$card{rank} $$card{suit}\n"; }
Note: using sub-ref filters do not work (yet) in hash-mode. Use grep on result instead.
From: Why Functional Programming Matters: http://www.md.chalmers.se/~rjmh/Papers/whyfp.pdf http://www.cse.chalmers.se/~rjmh/Papers/whyfp.html
http://www.md.chalmers.se/~rjmh/Papers/whyfp.html
DON'T TRY THIS AT HOME, C PROGRAMMERS.
sub reduce (&@) { my ($proc, $first, @rest) = @_; return $first if @rest == 0; local ($a, $b) = ($first, reduce($proc, @rest)); return $proc->(); }
Many functions can then be implemented with very little code. Such as:
sub mean { (reduce {$a + $b} @_) / @_ }
Resembles the pivot table function in Excel.
pivot() is used to spread out a slim and long table to a visually improved layout.
pivot()
For instance spreading out the results of group by-selects from SQL:
group by
pivot( arrayref, columnname1, columnname2, ...) pivot( ref_to_array_of_arrayrefs, @list_of_names_to_down_fields )
The first argument is a ref to a two dimensional table.
The rest of the arguments is a list which also signals the number of columns from left in each row that is ending up to the left of the data table, the rest ends up at the top and the last element of each row ends up as data.
top1 top1 top1 top1 left1 left2 left3 top2 top2 top2 top2 ----- ----- ----- ---- ---- ---- ---- data data data data data data data data data data data data
my @table=( ["1997","Gerd", "Weight", "Summer",66], ["1997","Gerd", "Height", "Summer",170], ["1997","Per", "Weight", "Summer",75], ["1997","Per", "Height", "Summer",182], ["1997","Hilde","Weight", "Summer",62], ["1997","Hilde","Height", "Summer",168], ["1997","Tone", "Weight", "Summer",70], ["1997","Gerd", "Weight", "Winter",64], ["1997","Gerd", "Height", "Winter",158], ["1997","Per", "Weight", "Winter",73], ["1997","Per", "Height", "Winter",180], ["1997","Hilde","Weight", "Winter",61], ["1997","Hilde","Height", "Winter",164], ["1997","Tone", "Weight", "Winter",69], ["1998","Gerd", "Weight", "Summer",64], ["1998","Gerd", "Height", "Summer",171], ["1998","Per", "Weight", "Summer",76], ["1998","Per", "Height", "Summer",182], ["1998","Hilde","Weight", "Summer",62], ["1998","Hilde","Height", "Summer",168], ["1998","Tone", "Weight", "Summer",70], ["1998","Gerd", "Weight", "Winter",64], ["1998","Gerd", "Height", "Winter",171], ["1998","Per", "Weight", "Winter",74], ["1998","Per", "Height", "Winter",183], ["1998","Hilde","Weight", "Winter",62], ["1998","Hilde","Height", "Winter",168], ["1998","Tone", "Weight", "Winter",71], );
my @reportA=pivot(\@table,"Year","Name"); print "\n\nReport A\n\n".tablestring(\@reportA);
Will print:
Report A Year Name Height Height Weight Weight Summer Winter Summer Winter ---- ----- ------ ------ ------ ------ 1997 Gerd 170 158 66 64 1997 Hilde 168 164 62 61 1997 Per 182 180 75 73 1997 Tone 70 69 1998 Gerd 171 171 64 64 1998 Hilde 168 168 62 62 1998 Per 182 183 76 74 1998 Tone 70 71
my @reportB=pivot([map{$_=[@$_[0,3,2,1,4]]}(@t=@table)],"Year","Season"); print "\n\nReport B\n\n".tablestring(\@reportB);
Report B Year Season Height Height Height Weight Weight Weight Weight Gerd Hilde Per Gerd Hilde Per Tone ---- ------ ------ ------ ----- ----- ------ ------ ------ 1997 Summer 170 168 182 66 62 75 70 1997 Winter 158 164 180 64 61 73 69 1998 Summer 171 168 182 64 62 76 70 1998 Winter 171 168 183 64 62 74 71
my @reportC=pivot([map{$_=[@$_[1,2,0,3,4]]}(@t=@table)],"Name","Attributt"); print "\n\nReport C\n\n".tablestring(\@reportC);
Report C Name Attributt 1997 1997 1998 1998 Summer Winter Summer Winter ----- --------- ------ ------ ------ ------ Gerd Height 170 158 171 171 Gerd Weight 66 64 64 64 Hilde Height 168 164 168 168 Hilde Weight 62 61 62 62 Per Height 182 180 182 183 Per Weight 75 73 76 74 Tone Weight 70 69 70 71
my @reportD=pivot([map{$_=[@$_[1,2,0,3,4]]}(@t=@table)],"Name"); print "\n\nReport D\n\n".tablestring(\@reportD);
Report D Name Height Height Height Height Weight Weight Weight Weight 1997 1997 1998 1998 1997 1997 1998 1998 Summer Winter Summer Winter Summer Winter Summer Winter ----- ------ ------ ------ ------ ------ ------ ------ ------ Gerd 170 158 171 171 66 64 64 64 Hilde 168 164 168 168 62 61 62 62 Per 182 180 182 183 75 73 76 74 Tone 70 69 70 71
Options:
Options to sort differently and show sums and percents are available. (...MORE DOC ON THAT LATER...)
See also Data::Pivot
Input: a reference to an array of arrayrefs -- a two dimensional table of strings and numbers
Output: a string containing the textual table -- a string of two or more lines
The first arrayref in the list refers to a list of either column headings (scalar) or ... (...more later...)
In this output table:
- the columns will not be wider than necessary by its widest value (any <html>-tags are removed in every internal width-calculation)
- multi-lined cell values are handled also
- and so are html-tags, if the output is to be used inside <pre>-tags on a web page.
- columns with just numeric values are right justified (header row excepted)
print tablestring([ [qw/AA BB CCCC/], [123,23,"d"], [12,23,34], [77,88,99], ["lin\nes",12,"asdff\nfdsa\naa"],[0,22,"adf"] ]);
Prints this string of 11 lines:
AA BB CCCC --- -- ----- 123 23 d 12 23 34 77 8 99 lin 12 asdff es fdsa aa 10 22 adf
As you can see, rows containing multi-lined cells gets an empty line before and after the row to separate it more clearly.
Returns a data structure as a string. See also Data::Dumper (serialize was created long time ago before Data::Dumper appeared on CPAN, before CPAN even...)
Data::Dumper
Input: One to four arguments.
First argument: A reference to the structure you want.
Second argument: (optional) The name the structure will get in the output string. If second argument is missing or is undef or '', it will get no name in the output.
Third argument: (optional) The string that is returned is also put into a created file with the name given in this argument. Putting a > char in from of the filename will append that file instead. Use '' or undef to not write to a file if you want to use a fourth argument.
''
Fourth argument: (optional) A number signalling the depth on which newlines is used in the output. The default is infinite (some big number) so no extra newlines are output.
Output: A string containing the perl-code definition that makes that data structure. The input reference (first input argument) can be to an array, hash or a string. Those can contain other refs and strings in a deep data structure.
Limitations:
- Code refs are not handled (just returns sub{die()})
sub{die()}
- Regex, class refs and circular recursive structures are also not handled.
$a = 'test'; @b = (1,2,3); %c = (1=>2, 2=>3, 3=>5, 4=>7, 5=>11); %d = (1=>2, 2=>3, 3=>\5, 4=>7, 5=>11, 6=>[13,17,19,{1,2,3,'asdf\'\\\''}],7=>'x'); print serialize(\$a,'a'); print serialize(\@b,'tab'); print serialize(\%c,'c'); print serialize(\%d,'d'); print serialize(\("test'n roll",'brb "brb"')); print serialize(\%d,'d',undef,1);
Prints accordingly:
$a='test'; @tab=('1','2','3'); %c=('1','2','2','3','3','5','4','7','5','11'); %d=('1'=>'2','2'=>'3','3'=>\'5','4'=>'7','5'=>'11','6'=>['13','17','19',{'1'=>'2','3'=>'asdf\'\\\''}]); ('test\'n roll','brb "brb"'); %d=('1'=>'2', '2'=>'3', '3'=>\'5', '4'=>'7', '5'=>'11', '6'=>['13','17','19',{'1'=>'2','3'=>'asdf\'\\\''}], '7'=>'x');
Areas of use:
- Debugging (first and foremost)
- Storing arrays and hashes and data structures of those on file, database or sending them over the net
- eval earlier stored string to get back the data structure
Be aware of the security implications of evaling a perl code string stored somewhere that unauthorized users can change them! You are probably better of using YAML::Syck or Storable without enabling the CODE-options if you have such security issues. More on decompiling Perl-code: Storable or B::Deparse.
eval
Debug-serialize, dumping data structures for you to look at.
Same as serialize() but the output is given a newline every 80th character. (Every 80th or whatever $Acme::Tools::Dserialize_width contains)
serialize()
$Acme::Tools::Dserialize_width
Synonym to "serialize", but remove unnecessary single quote chars around \w+-keys and number values (except numbers with leading zeros). Example:
\w+
serialize:
%s=('action'=>{'del'=>'0','ins'=>'0','upd'=>'18'},'post'=>'1348','pre'=>'1348', 'updcol'=>{'Laerestednr'=>'18','Studietypenr'=>'18','Undervisningssted'=>'7','Url'=>'11'}, 'where'=>'where 1=1');
srlz:
%s=(action=>{del=>0,ins=>0,upd=>18},post=>1348,pre=>1348, updcol=>{Laerestednr=>18,Studietypenr=>18,Undervisningssted=>7,Url=>11}, where=>'where 1=1');
Todo: update "serialize" to do the same, but in the right way. (For now srlz runs the string from serialize() through two s///, this will break in certain cases). "srlz" will be kept as a synonym (or the other way around).
s///
my %nordic_country_population=(Norway=>5214890,Sweden=>9845155,Denmark=>5699220,Finland=>5496907,Iceland=>331310); print cnttbl(\%nordic_country_population); Iceland 331310 1.25% Norway 5214890 19.61% Finland 5496907 20.67% Denmark 5699220 21.44% Sweden 9845155 37.03% SUM 26587482 100.00%
Todo: Levels...:
my %sales=( Toyota=>{Prius=>19,RAV=>12,Auris=>18,Avensis=>7}, Volvo=>{V40=>14, XC90=>4}, Nissan=>{Leaf=>19,Qashqai=>17}, Tesla=>{ModelS=>8} ); print cnttbl(\%sales); Toyota SUM 56 Volvo SUM 18 Nissan SUM 36 Tesla SUM 8 SUM SUM 56 100%
NOT IMPLEMENTED
Same as ref, but goes deeper.
print ref_deep( { 10=>[1,'ten'], 100=>[2,'houndred'], 1000=>[3,'thousand'] } ); # prints HASH_of_ARRAYS print ref_deep( { 10=>'ten', 100=>[2,'houndred'], 1000=>[3,'thousand'] } ); # prints same (mixed, deepest) print ref_deep( { 1=>[{a=>3,b=>6},{a=>1,b=>8}], 5=>[{a=>2,b=>5},{a=>7,b=>1}] } ); # HASH_of_ARRAYS_of_HASHES
(Todo, not supported: circular, alternatives for mixed)
print 14.3 - 14.0; # 0.300000000000001 print 34.3 - 34.0; # 0.299999999999997 print nicenum( 14.3 - 14.0 ); # 0.3 print nicenum( 34.3 - 34.0 ); # 0.3
Call instead of system if you want die (Carp::croak) when something fails.
system
die
sub sys($){ my$s=shift; my$r=system($s); $r==0 or croak"ERROR: system($s)==$r ($!) ($?)" }
Returns true or false (actually 1 or 0) depending on whether the current sub has been called by itself or not.
sub xyz { xyz() if not recursed; }
String editor commands
literals: a-z 0-9 space move cursor: FBAEPN MF MB ME delete: D Md up/low/camelcase word U L C backspace: - search: S return/enter: R meta/esc/alt: M shift: T cut to eol: K caps lock: C yank: Y start and end: < > macro start/end/play: { } ! times for next cmd: M<number> (i.e. M24a inserts 24 a's)
(TODO: alfa...and more docs needed)
while(<>){ my $line=$_; print "\n" if changed(/^\d\d\d\d-\d\d-(\d\d)/); print "\n" if changed(substr($_,8,2)); }
Returns undef, 0 or 1. Undef if its the first time changed is called on that perl line. 0 if not the first time and the parameters differ from the last call on that line. 1 if not the first time and the parameters is the exact same as they where on the previous call on that line of perl source code.
changed
Input: one or two arguments
First argument: a string, source code of the brainfu language. String containing the eight charachters + - < > [ ] . , Every other char is ignored silently.
Second argument: if the source code contains commas (,) the second argument is the input characters in a string.
Output: The resulting output from the program.
print brainfu(<<""); #prints "Hallo Verden!\n" ++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>---.+++++++++++..+++.>++.<<++++++++++++++ .>----------.+++++++++++++.--------------.+.+++++++++.>+.>.
See http://en.wikipedia.org/wiki/Brainfuck
Just as "brainfu" but instead it return the perl code to which the brainfu code is translated. Just eval() this perl code to run.
eval()
print brainfu2perl('>++++++++[<++++++++>-]<++++++++.>++++++[<++++++>-]<---.');
Prints this string:
my($c,$o,@b)=(0); sub out{$o.=chr($b[$c]) for 1..$_[0]||1} ++$c;++$b[$c];++$b[$c];++$b[$c];++$b[$c];++$b[$c];++$b[$c];++$b[$c];++$b[$c]; while($b[$c]){--$c;++$b[$c];++$b[$c];++$b[$c];++$b[$c];++$b[$c];++$b[$c];++$b[$c]; ++$b[$c];++$c;--$b[$c];}--$c;++$b[$c];++$b[$c];++$b[$c];++$b[$c];++$b[$c];++$b[$c]; ++$b[$c];++$b[$c];out;++$c;++$b[$c];++$b[$c];++$b[$c];++$b[$c];++$b[$c];++$b[$c]; while($b[$c]){--$c;++$b[$c];++$b[$c];++$b[$c];++$b[$c];++$b[$c];++$b[$c];++$c;--$b[$c];} --$c;--$b[$c];--$b[$c];--$b[$c];out;$o;
Just as "brainfu2perl" but optimizes the perl code. The same example as above with brainfu2perl_optimized returns this equivalent but shorter perl code:
$b[++$c]+=8;while($b[$c]){$b[--$c]+=8;--$b[++$c]}$b[--$c]+=8;out;$b[++$c]+=6; while($b[$c]){$b[--$c]+=6;--$b[++$c]}$b[--$c]-=3;out;$o;
Bloom filters can be used to check whether an element (a string) is a member of a large set using much less memory or disk space than other data structures. Trading speed and accuracy for memory usage. While risking false positives, Bloom filters have a very strong space advantage over other data structures for representing sets.
In the example below, a set of 100000 phone numbers (or any string of any length) can be "stored" in just 91230 bytes if you accept that you can only check the data structure for existence of a string and accept false positives with an error rate of 0.03 (that is three percent, error rates are given in numbers larger than 0 and smaller than 1).
You can not retrieve the strings in the set without using "brute force" methods and even then you would get slightly more strings than you put in because of the error rate inaccuracy.
Bloom Filters have many uses.
See also: http://en.wikipedia.org/wiki/Bloom_filter
See also: Bloom::Filter
Initialize a new Bloom Filter:
my $bf = bfinit( error_rate=>0.01, capacity=>100000 );
The same:
my $bf = bfinit( 0.01, 100000 );
since two arguments is interpreted as error_rate and capacity accordingly.
bfadd($bf, $_) for @phone_numbers; # Adding strings one at a time bfadd($bf, @phone_numbers); # ...or all at once (faster)
Returns 1 on success. Dies (croaks) if more strings than capacity is added.
my $phone_number="99999999"; if ( bfcheck($bf, $phone_number) ) { print "Yes, $phone_number was PROBABLY added\n"; } else { print "No, $phone_number was DEFINITELY NOT added\n"; }
Returns true if $phone_number exists in @phone_numbers.
$phone_number
@phone_numbers
Returns false most of the times, but sometimes true*), if $phone_number doesn't exists in @phone_numbers.
*) This is called a false positive.
Checking more than one key:
@bools = bfcheck($bf, @keys); # or ... @bools = bfcheck($bf, \@keys); # better, uses less memory if @keys is large
Returns an array the same size as @keys where each element is true or false accordingly.
Same as bfcheck except it returns the keys that exists in the bloom filter
bfcheck
@found = bfgrep($bf, @keys); # or ... @found = bfgrep($bf, \@keys); # better, uses less memory if @keys is large, or ... @found = grep bfcheck($bf,$_), @keys; # same but slower
Same as bfgrep except it returns the keys that do NOT exists in the bloom filter:
bfgrep
@not_found = bfgrepnot($bf, @keys); # or ... @not_found = bfgrepnot($bf, \@keys); # better, uses less memory if @keys is large, or ... @not_found = grep !bfcheck($bf,$_), @keys); # same but slower
Deletes from a counting bloom filter.
To enable deleting be sure to initialize the bloom filter with the numeric counting_bits argument. The number of bits could be 2 or 3*) for small filters with a small capacity (a small number of keys), but setting the number to 4 ensures that even very large filters with very small error rates would not overflow.
counting_bits
*) Acme::Tools do not currently support counting_bits => 3 so 4 and 8 are the only practical alternatives where 8 is almost always overkill.
counting_bits => 3
my $bf=bfinit( error_rate => 0.001, capacity => 10000000, counting_bits => 4 # power of 2, that is 2, 4, 8, 16 or 32 ); bfadd( $bf, @unique_phone_numbers); bfdelete($bf, @unique_phone_numbers);
Example: examine the frequency of the counters with 4 bit counters and 4 million keys:
my $bf=bfinit( error_rate=>0.001, capacity=>4e6, counting_bits=>4 ); bfadd($bf,[1e3*$_+1 .. 1e3*($_+1)]) for 0..4000-1; # adding 4 million keys one thousand at a time my %c; $c{vec($$bf{filter},$_,$$bf{counting_bits})}++ for 0..$$bf{filterlength}-1; printf "%8d counters = %d\n",$c{$_},$_ for sort{$a<=>$b}keys%c;
The output:
28689562 counters = 0 19947673 counters = 1 6941082 counters = 2 1608250 counters = 3 280107 counters = 4 38859 counters = 5 4533 counters = 6 445 counters = 7 46 counters = 8 1 counters = 9
Even after the error_rate is changed from 0.001 to a percent of that, 0.00001, the limit of 16 (4 bits) is still far away:
47162242 counters = 0 33457237 counters = 1 11865217 counters = 2 2804447 counters = 3 497308 counters = 4 70608 counters = 5 8359 counters = 6 858 counters = 7 65 counters = 8 4 counters = 9
In algorithmic terms the number of bits needed is ln of ln of n. Thats why 4 bits (counters up to 15) is "always" good enough except for extremely large capasities or extremely small error rates. (Except when adding the same key many times, which should be avoided, and Acme::Tools::bfadd do not check for that, perhaps in future versions).
ln of ln of n
Bloom filters of the counting type are not very space efficient: The tables above shows that 84%-85% of the counters are 0 or 1. This means most bits are zero-bits. This doesn't have to be a problem if a counting bloom filter is used to be sent over slow networks because they are very compressable by common compression tools like gzip or Compress::Zlib and such.
Deletion of non-existing keys makes bfdelete die (croak).
bfdelete
Deletes from a counting bloom filter:
bfdelete($bf, @keys); bfdelete($bf, \@keys);
Returns $bf after deletion.
$bf
Croaks (dies) on deleting a non-existing key or deleting from an previouly overflown counter in a counting bloom filter.
Adds another bloom filter to a bloom filter.
Bloom filters has the proberty that bit-wise OR-ing the bit-filters of two filters with the same capacity and the same number and type of hash functions, adds the filters:
my $bf1=bfinit(error_rate=>0.01,capacity=>$cap,keys=>[1..500]); my $bf2=bfinit(error_rate=>0.01,capacity=>$cap,keys=>[501..1000]); bfaddbf($bf1,$bf2); print "Yes!" if bfgrep($bf1, 1..1000) == 1000;
Prints yes since bfgrep now returns an array of all the 1000 elements.
Croaks if the filters are of different dimensions.
Works for counting bloom filters as well (counting_bits=>4 e.g.)
counting_bits=>4
Returns the number of 1's in the filter.
my $percent=100*bfsum($bf)/$$bf{filterlength}; printf "The filter is %.1f%% filled\n",$percent; #prints 50.0% or so if filled to capacity
Sums the counters for counting bloom filters (much slower than for non counting).
Input, two numeric arguments: Capacity and error_rate.
Outputs an array of two numbers: m and k.
m = - n * log(p) / log(2)**2 # n = capacity, m = bits in filter (divide by 8 to get bytes) k = log(1/p) / log(2) # p = error_rate, uses perls internal log() with base e (2.718)
...that is: m = the best number of bits in the filter and k = the best number of hash functions optimized for the given capacity (n) and error_rate (p). Note that k is a dependent only of the error_rate. At about two percent error rate the bloom filter needs just the same number of bytes as the number of keys.
Storage (bytes): Capacity Error-rate Error-rate Error-rate Error-rate Error-rate Error-rate Error-rate Error-rate Error-rate Error-rate Error-rate Error-rate 0.000000001 0.00000001 0.0000001 0.000001 0.00001 0.0001 0.001 0.01 0.02141585 0.1 0.5 0.99 ------------- ----------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- 10 54.48 48.49 42.5 36.51 30.52 24.53 18.53 12.54 10.56 6.553 2.366 0.5886 100 539.7 479.8 419.9 360 300.1 240.2 180.3 120.4 100.6 60.47 18.6 0.824 1000 5392 4793 4194 3595 2996 2397 1798 1199 1001 599.6 180.9 3.177 10000 5.392e+04 4.793e+04 4.194e+04 3.594e+04 2.995e+04 2.396e+04 1.797e+04 1.198e+04 1e+04 5991 1804 26.71 100000 5.392e+05 4.793e+05 4.193e+05 3.594e+05 2.995e+05 2.396e+05 1.797e+05 1.198e+05 1e+05 5.991e+04 1.803e+04 262 1000000 5.392e+06 4.793e+06 4.193e+06 3.594e+06 2.995e+06 2.396e+06 1.797e+06 1.198e+06 1e+06 5.991e+05 1.803e+05 2615 10000000 5.392e+07 4.793e+07 4.193e+07 3.594e+07 2.995e+07 2.396e+07 1.797e+07 1.198e+07 1e+07 5.991e+06 1.803e+06 2.615e+04 100000000 5.392e+08 4.793e+08 4.193e+08 3.594e+08 2.995e+08 2.396e+08 1.797e+08 1.198e+08 1e+08 5.991e+07 1.803e+07 2.615e+05 1000000000 5.392e+09 4.793e+09 4.193e+09 3.594e+09 2.995e+09 2.396e+09 1.797e+09 1.198e+09 1e+09 5.991e+08 1.803e+08 2.615e+06 10000000000 5.392e+10 4.793e+10 4.193e+10 3.594e+10 2.995e+10 2.396e+10 1.797e+10 1.198e+10 1e+10 5.991e+09 1.803e+09 2.615e+07 100000000000 5.392e+11 4.793e+11 4.193e+11 3.594e+11 2.995e+11 2.396e+11 1.797e+11 1.198e+11 1e+11 5.991e+10 1.803e+10 2.615e+08 1000000000000 5.392e+12 4.793e+12 4.193e+12 3.594e+12 2.995e+12 2.396e+12 1.797e+12 1.198e+12 1e+12 5.991e+11 1.803e+11 2.615e+09 Error rate: 0.99 Hash functions: 1 Error rate: 0.5 Hash functions: 1 Error rate: 0.1 Hash functions: 3 Error rate: 0.0214158522653385 Hash functions: 6 Error rate: 0.01 Hash functions: 7 Error rate: 0.001 Hash functions: 10 Error rate: 0.0001 Hash functions: 13 Error rate: 0.00001 Hash functions: 17 Error rate: 0.000001 Hash functions: 20 Error rate: 0.0000001 Hash functions: 23 Error rate: 0.00000001 Hash functions: 27 Error rate: 0.000000001 Hash functions: 30
Storing and retrieving bloom filters to and from disk uses Storables store and retrieve. This:
store
retrieve
bfstore($bf,'filename.bf');
It the same as:
use Storable qw(store retrieve); ... store($bf,'filename.bf');
my $bf=bfretrieve('filename.bf');
Or this:
my $bf=bfinit('filename.bf');
Is the same as:
use Storable qw(store retrieve); my $bf=retrieve('filename.bf');
Deep copies the bloom filter data structure. (Which btw is not very deep, two levels at most)
my $bfc = bfclone($bf);
Works just as:
use Storable; my $bfc=Storable::dclone($bf);
use Acme::Tools; my $bf=new Acme::Tools::BloomFilter(0.1,1000); # the same as bfinit, see bfinit above print ref($bf),"\n"; # prints Acme::Tools::BloomFilter $bf->add(@keys); $bf->check($keys[0]) and print "ok\n"; # prints ok $bf->grep(\@keys)==@keys and print "ok\n"; # prints ok $bf->store('filename.bf'); my $bf2=bfretrieve('filename.bf'); $bf2->check($keys[0]) and print "ok\n"; # still ok $bf2=$bf->clone();
To instantiate a previously stored bloom filter:
my $bf = Acme::Tools::BloomFilter->new( '/path/to/stored/bloomfilter.bf' );
The o.o. interface has the same methods as the bf...-subs without the bf-prefix in the names. The bfretrieve is not available as a method, although bfretrieve, Acme::Tools::bfretrieve and Acme::Tools::BloomFilter::retrieve are synonyms.
bf...
bf
bfretrieve
Acme::Tools::bfretrieve
Acme::Tools::BloomFilter::retrieve
The internal hash-functions are md5( "$key$salt" ) from Digest::MD5.
md5( "$key$salt" )
Since md5 returns 128 bits and most medium to large sized bloom filters need only a 32 bit hash function, the result from md5() are split (unpack-ed) into 4 parts 32 bits each and are treated as if 4 hash functions was called at once (speedup). Using different salts to the key on each md5 results in different hash functions.
md5
unpack
Digest::SHA512 would have been even better since it returns more bits, if it werent for the fact that it's much slower than Digest::MD5.
String::CRC32::crc32 is faster than Digest::MD5, but not 4 times faster:
time perl -e'use Digest::MD5 qw(md5);md5("asdf$_") for 1..10e6' #5.56 sec time perl -e'use String::CRC32;crc32("asdf$_") for 1..10e6' #2.79 sec, faster but not per bit time perl -e'use Digest::SHA qw(sha512);sha512("asdf$_") for 1..10e6' #36.10 sec, too slow (sha1, sha224, sha256 and sha384 too)
Md5 seems to be an ok choice both for speed and avoiding collitions due to skewed data keys.
http://www.internetmathematics.org/volumes/1/4/Broder.pdf
http://blogs.sun.com/jrose/entry/bloom_filters_in_a_nutshell
http://pages.cs.wisc.edu/~cao/papers/summary-cache/node8.html
See also Scaleable Bloom Filters: http://gsd.di.uminho.pt/members/cbm/ps/dbloom.pdf (not implemented in Acme::Tools)
...and perhaps http://intertrack.naist.jp/Matsumoto_IEICE-ED200805.pdf
sudo perl -MAcme::Tools -e install_acme_command_tools Wrote executable /usr/local/bin/conv Wrote executable /usr/local/bin/due Wrote executable /usr/local/bin/xcat Wrote executable /usr/local/bin/freq Wrote executable /usr/local/bin/deldup Wrote executable /usr/local/bin/ccmd Wrote executable /usr/local/bin/z2z Wrote executable /usr/local/bin/2gz Wrote executable /usr/local/bin/2gzip Wrote executable /usr/local/bin/2bz2 Wrote executable /usr/local/bin/2bzip2 Wrote executable /usr/local/bin/2xz Wrote executable /usr/local/bin/resubst
Examples of commands then made available:
conv 1 USD EUR #might show 0.88029 if thats the current currency rate. Uses conv() conv .5 in cm #reveals that 1/2 inch is 1.27 cm, see doc on conv() for all supported units due [-h] /path/1/ /path/2/ #like du, but show statistics on file extentions instead of subdirs xcat file #like cat, zcat, bzcat or xzcat in one. Uses file extention to decide. Uses openstr() freq file #reads file(s) or stdin and view counts of each byte 0-255 ccmd grep string /huge/file #caches stdout+stderr for 15 minutes (default) for much faster results later ccmd "sleep 2;echo hello" #slow first time. Note the quotes! ccmd "du -s ~/*|sort -n|tail" #ccmd store stdout+stderr in /tmp files (default) z2z [-pvk1-9oe -t type] files #convert from/to .gz/bz2/xz files, -p progress, -v verbose (output result), #-k keep org file, -o overwrite, 1-9 compression degree, -e for xz does "extreme" #compressions, very slow. For some data types this reduces size significantly #2xz and 2bz2 depends on xz and bzip2 being installed on system 2xz #same as z2z with -t xz 2bz2 #same as z2z with -t bz2 2gz #same as z2z with -t gz rttop trunc file(s) wipe file(s)
The commands 2xz, 2bz2 and 2gz are just synonyms for z2z with an implicitly added option -t xz, -t xz or -t gz accordingly.
2xz
2bz2
2gz
z2z
-t xz
-t gz
z2z [-p -k -v -o -1 -2 -3 -4 -5 -6 -7 -8 -9 ] files
Converts (recompresses) files from one compression type to another. For instance from .gz to .bz2 Keeps uid, gid, mode (chmod) and mtime.
-p Show a progress meter using the pv program if installed -k Keeps original file -v Verbose, shows info on degree of compression and file number if more than one file is being converted -o Overwrites existing result file, otherwise stop with error msg -1 .. -9 Degree of compression, -1 fastest .. -9 best -e With -t xz (or 2xz) passes -e to xz (-9e = extreme compression) -L rate With -p. Slow down, ex: -L 200K means 200 kilobytes per second -D sec With -p. Only turn on progress meter (pv) after x seconds -i sec With -p. Info update rate -l With -p. Line mode -I With -p. Show ETA as time of arrival as well as time left -q With -p. Quiet. Useful with -L to limit rate, but no output
The options -L -D -i -l -I -q implicitly turns on -p. Those options are passed through to pv. See: man pv.
Like du command but views space used by file extentions instead of dirs. Options:
du
due [-options] [dirs] [files] due -h View bytes "human readable", i.e. C<8.72 MB> instead of C<9145662 b> (bytes) due -k | -m View bytes in kilobytes | megabytes (1024 | 1048576) due -K Like -k but uses 1000 instead of 1024 due -z View two extentions if .z .Z .gz .bz2 .rz or .xz (.tar.gz, not just .gz) due -M Also show min, medium and max date (mtime) of files, give an idea of their age due -C Like -M, but create time instead (ctime) due -A Like -M, but access time instead (atime) due -P Also show 10, 50 (medium) and 90 percentile of file date due -MP Both -M and -P, shows min, 10p, 50p, 90p and max due -a Sort output alphabetically by extention (default order is by size) due -c Sort output by number of files due -i Ignore case, .GZ and .gz is the same, output in lower case due -t Adds time of day to -M and -P output due -e 'regex' Exclude files (full path) matching regex. Ex: due -e '\.git' TODO: due -l TODO: Exclude hardlinks (dont count "same" file more than once, "man du") ls -l | due Parses output of ls -l, find -ls, tar tvf for size+filename and reports find | due List of filenames from stdin produces same as just command 'due' ls | due Reports on just files in current dir without recursing into subdirs
Find duplicate files. Three steps to speed this up in case of many large files: 1) Find files of same size, 2) of those: find files with the same first 8 kilobytes, 3) of those: find duplicate files by finding the MD5sums of the whole files.
finddup [-d -s -h] paths/ files/* ... #reports (+deletes with -d) duplicate files #-s for symlinkings dups, -h for hardlink finddup <files> # print duplicate files, <files> might be filenames and directories finddup -a <files> # print duplicate files, also print the first file finddup -d <files> # delete duplicate files, use -v to also print them before deletion finddup -s <files> # make symbolic links of duplicate files finddup -h <files> # make hard links of duplicate files finddup -v ... # verbose, print before -d, -s or -h finddup -n -d <files> # dry run: show rm commands without actually running them finddup -n -s <files> # dry run: show ln commands to make symlinks of duplicate files todo:NEEDS FIX! finddup -n -h <files> # dry run: show ln commands to make hard links of duplicate files finddup -q ... # quiet finddup -k o # keep oldest with -d, -s, -h, consider newer files duplicates finddup -k n # keep newest with -d, -s, -h, consider older files duplicates finddup -k O # same as -k o, just use access time instead of modify time finddup -k N # same as -k n, just use access time instead of modify time finddup -0 ... # use ascii 0 instead of the normal \n, for xargs -0 finddup -P n # use n bytes from start of file in 1st md5 check (default 8192) finddup -p # view progress in last and slowest of the three steps
Default ordering of files without -k n or -k o is the order they are mentioned on the command line. For directory args the order might be random: use dir/* to avoid that (but then dot files are not included).
-k n
-k o
dir/*
Parses command line options and arguments:
my %opt; my @argv=Acme::Tools::args('i:nJ123',\%opt,@ARGV); #returns remaining command line elements after C<-o ptions> are parsed into C<%opt>.
Uses Getopt::Std::getopts(). First arg names the different one char options and an optional : behind the letter or digit marks that the switch takes an argument.
Getopt::Std::getopts()
Uses DBI. Comming soon...
$Dbh dlogin dlogout drow drows drowc drowsc dcols dpk dsel ddo dins dupd ddel dcommit drollback
Update Acme::Tools to newest version quick and dirty:
function pmview(){ ls -ld `perl -M$1 -le'$m=shift;$mi=$m;$mi=~s,::,/,g;print $INC{"$mi.pm"};warn"Version ".${$m."::VERSION"}."\n"' $1`;} pmview Acme::Tools #view date and version before sudo perl -MAcme::Tools -e Acme::Tools::self_update #update to newest version pmview Acme::Tools #view date and version after
Does cd to where Acme/Tools.pm are and then wget -N https://raw.githubusercontent.com/kjetillll/Acme-Tools/master/Tools.pm
cd
TODO: cmd_acme_tools_self_update, accept --no-check-certificate to use on curl
Release history
0.27 Feb 2020 Small fixes for some platforms 0.26 Jan 2020 Convert subs: base bin2dec bin2hex bin2oct dec2bin dec2hex dec2oct hex2bin hex2dec hex2oct oct2bin oct2dec oct2hex Array subs: joinr perm permute permute_continue pile sortby subarrays Other subs: btw in_iprange ipnum_ok iprange_ok opts s2t 0.24 Feb 2019 fixed failes on perl 5.16 and older 0.23 Jan 2019 Subs: logn, egrep, which. More UTF-8 "oriented" (lower, upper, ...) Commands: zsize, finddup, due (improved), conv (improved, [MGT]?Wh and many more units), due -M for stdin of filenames. 0.22 Feb 2018 Subs: subarr, sim, sim_perm, aoh2sql. command: resubst 0.21 Mar 2017 Improved nicenum() and its tests 0.20 Mar 2017 Subs: a2h cnttbl h2a log10 log2 nicenum rstddev sec_readable throttle timems refa refaa refah refh refha refhh refs eachr globr keysr popr pushr shiftr splicer unshiftr valuesr Commands: 2bz2 2gz 2xz z2z 0.172 Dec 2015 Subs: curb openstr pwgen sleepms sleepnm srlz tms username self_update install_acme_command_tools Commands: conv due freq wipe xcat (see "Commands") 0.16 Feb 2015 bigr curb cpad isnum parta parth read_conf resolve_equation roman2int trim. Improved: conv (numbers currency) range ("derivatives") 0.15 Nov 2014 Improved doc 0.14 Nov 2014 New subs, improved tests and doc 0.13 Oct 2010 Non-linux test issue, resolve. improved: bloom filter, tests, doc 0.12 Oct 2010 Improved tests, doc, bloom filter, random_gauss, bytes_readable 0.11 Dec 2008 Improved doc 0.10 Dec 2008
https://github.com/kjetillll/Acme-Tools
Kjetil Skotheim, <kjetil.skotheim@gmail.com>
2008-2020, Kjetil Skotheim
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install Acme::Tools, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Acme::Tools
CPAN shell
perl -MCPAN -e shell install Acme::Tools
For more information on module installation, please visit the detailed CPAN module installation guide.