A perl script to look up Bible references and search for verses.
Usage: ref [bible] [book] [chapter:[verse[-verse|,verse]]] {/pattern/in}*
You can specify any of the bible, book, chapter or a list or range of verse and/or one or more patterns. The BIBLES environment variable gives the directory where bible versions are stored. The BIBLE variable is the name of the default bible to search.
Examples:
ref John 3:16 print this famous verse ref all John 3:16 print this verse from every available bible ref jn 3: print all of chapter 3 of John's gospel ref 3:16 list chapter 3 verse 16 in each book that has one ref 3 john print a whole book ref 3 john 2 some books don't have chapter divisions ref AV 1 jn /Jesus/ list verses in 1 John in the AV which contain Jesus ref RSV /Jesus wept/ find the shortest verse in the RSV! ref /Adam/ /Eve/ list verses which mention both Adam and Eve ref /adam/i case-insensitive search, finds Adam, adamant etc. ref RSV /baalmeon/in finds the name Ba'al-me'on, ignoring the punctuation ref print the whole of the default bible
Bible etext files formatted for ref (download and unzip to the BIBLES directory):
Matthew Thorley has added a few features and made his version available on github.
This perl script can read and decode wav files recorded from cassette tapes in Kansas City format, as used by the Compukit UK101, Ohio Superboard and many other home computers.
It has also been used by several people as an FSK decoder program to deode Yamaha and Roland synthesiser tapes.
The program uses two perl modules: Audio::Wav and Math::FFT, these can be downloaded from CPAN.
By default, a '0' bit is represented as four cycles of a 1200 Hz sine wave, and a '1' bit as eight cycles of 2400 Hz. This gives a data rate of 300 baud. The carrier wave is a stream of 1 bits (2400Hz). Each frame starts with one start bit (a '0') followed by eight data bits (least significant bit first) followed by two stop bits ('1's). So each frame is 11 bits, for a data rate of 27 bytes per second.
These defaults can be changed via the options. For example, a CUTS tape represents a '1' bit as one cycle of a 1200Hz sine wave and a '0' bit as half a cycle of a 600Hz sine wave, giving a data rate of 1200baud.
The program uses Fourier analysis to determine the points where the signal changes from low to high frequency, and vice versa.
The options are:
hi=N High frequency (1 bit/carrier/stop bit) (default=2400Hz) lo=N Low frequency (0 bit/start bit) (default=1200Hz) baud=N Baud rate (default=300) CUTS CUTS format (short for: hi=1200 lo=600 baud=1200) frame=Nxy Format: N=data bits, x=parity (E/O/N), y=stop bits (default=8N2) max=N Stop after reading N samples from the file steps=N Compute N Fast Fourier Transform steps per bit (default=10) window=xxx FFT window function (none/bartlett/welch/hann) (default=hann) resample=N Resample wav file so that one bit is N samples (default=0) keep=Y/N Keep all data, including short isolated sections? (default=N) graph=Y/N Plot a graph of the frequency spectrum against time (default=N) channel=x Channel to use (L=Left, R=Right, A=Average) (default=A)
If you have a poor quality recording, or a high bit rate recording (eg a CUTS tape) try resampling to, say, 128 or 256 samples per bit using the option resample=128 (the number of samples should be a power of two), and set the number of steps to 8 or 16.
Download some sample wav files here: sample-wav-files.zip.
1200TARG.wav is a CUTS format file, decode it with:
perl tape-read CUTS 1200TARG.wavor equivalently:
perl tape-read baud=1200 lo=600 hi=1200 1200TARG.wavThis should produce a binary file 1200TARG-001.txt which should be identical to the file 1200TARG.txt in the archive.
All the other files are in the (default) UK101 format: 300 baud, lo=1200, hi=2400. The script automatically detects the sample rate of the wav file.
In theory, the lowest possible sample rate for a 2400Hz signal is 4800Hz (the Nyquist limit). In practice, sampling at 4800Hz causes the 2400Hz signal to disappear periodically (when the sample points coincide with the zero crossings of the signal). But you can get quite close to the limit.
What this means in practice is:
MP3 encoding is OK for good quality 300 baud recordings, but will destroy 1200 baud recordings. If you need to compress the wav files, use a "lossless" compression format, such as FLAC.
CUTS tapes can be hard to decode because the zero bit is only half a cycle of the low frequency. The script has to analyse a two bit wide window. Using the welch window function can help since it gives more weighting to the centre of the window. For poor quality CUTS tapes the following method has worked sucessfully:
tape-read hi=1120 lo=560 baud=1120 resample=128 steps=32 window=welch file.wav
Weber Kai has created a modified version of the program for processing MSX tapes which is available on GitHub. He has used this to extract the MSX code from an old brazilian radio programme.
This perl script will read a wav file and create a gif file with a plot of the frequency spectrum.
This can be useful for analysing an unknown computer tape format.
A collection of my Compukit UK101 BASIC and assembler programs, recovered from tapes which have lain in my attic for the last 30 years.
Tim Baldwin has written an excellent Compukit UK101 simulator/emulator which is implemented in Java and runs on Windows, Linux and Mac systems.
Some of my programs use the "enhanced" 48x32 character screen: these will need modified ROMs which can be downloaded here along with a suitable properties file for the UK101 simulator.
My Real Time Star Trek game now has it's own page (copied from Tim's sourceforge site).
A perl module and sample scripts for filtering email using several popular mail filters. The module presents a uniform interface for passing a message through each filter and determining which filters consider the message to be spam
The spamcheck
script passes a copy of
the given message to each filter and counts how many filters
consider it to be spam. It adds a X-SPAM-Votes:
header with the total.
I currently delete everything with four or more votes and quarantine everything with one to three votes using these procmail rules:
:0fw: spamcheck.lock | spamcheck # Record the votes in the procmail log file: :0 * ^X-Spam-Votes: \/.*$ { LOG="Spam-Votes: ${MATCH}" } # Junk anything that 4 or more scanners give a positive result on. :0 * ^X-Spam-Votes: [456789] /dev/null # Filter anything which any scanner considers to be spam: :0 * ^X-Spam-Votes: [123] SPAM.ASSASSIN
The isspam
and notspam
scripts
can be used to train your filters. Any spam message which is missed by any filter
can be passed to isspam
while false positives should
be passed to notspam
.
The spam filters it currently knows about are:
Email martin@gkc.org.uk with codes for any additional filters you know about!
A perl script for copying a directory tree to another location (eg a separate hard drive for backups). It looks at the size and modification times of the files to decide whether to copy them or not. As a result, after the first "clone", keeping the copy up to date is a very quick operation.
Note: If you keep a backup of your windows partition on a linux partition, or if you use NTBackup or XCOPY to backup your windows partition, then you also see sfn-fix (see below).
The Windows FAT16 and FAT32 file systems don't really have long file names: the long names are hacked on top of the ``real'' file name which has to keep to the old 8.3 format. When files get backed up and restored by many backup programs (including my clonedir above and Microsoft's NTBackup and XCOPY) then they can end up with different short file name. This wouldn't be so bad if everybody always referred to files by their long names, but Window's Registry is stuffed full of references to files by their short names.
ITS Systems has an article on the subject (plugging their own backup software) as does PC World
Microsoft's workaround is to ``Adhere to a pure 8.3 short file naming convention...''. They don't say what to do about directories such as ``My Documents'' or even ``Program files'' which don't adhere to the convention!
My solution, sfn-fix, is a perl script which uses a saved copy of the output of ``mdir -/ C:'' (a Linux utility for listing a directory on a windows filesystem) to give the restored files their original short file names. An mdir listing includes both long and short names, and the -/ option does a recursive directory listing.
Note that Windows ME and Linux use different file name mangling conventions for creating short file names after the first nine files in a directory with the same first six characters and extension. Only short files of the form xxxxxx~n.xxx (where n is a digit) can be restored. But this should be enough to keep the Registry happy.
Text::Reflow v1.04 is a perl module which takes some ascii text, in a file, string or array, (with paragraphs separated by blank lines) and reflows the paragraphs. If two or more lines in a row are "indented" then they are assumed to be a quoted poem and are passed through unchanged. It uses Knuth's paragraphing algorithm (the same algorithm used by TeX) to choose optimal line breaks based on keeping the lines the same length while avoiding breaks within a proper noun or after certain connectives ("a", "the", etc.) and encouraging breaks at punctuation.
The result is a file with a fairly "ragged" right margin but which is easier to read than a file with a strict right margin since it is less likely that phrases are broken across the line.
The package includes a simple perl script for reflowing files.
The -skipindented option causes all indented lines to be passed through unchanged.
The -veryslow option reflows each paragraph 16 times with different optimal line widths and picks the "best" result--with this option the paragraphs are easier to read, but the line width may vary from one paragraph to the next.
The -skipto pattern option skips to the first line which starts with the given pattern: this is to avoid reflowing header material such as the Project Gutenberg header.
Most of the text files on my G. K. Chesterton site are reflowed with this script.
Cdiff will compare two text files, ignoring differences in layout, and produce an output file which shows the differences. It can also be used to merge the two files into a single version, choosing between variant readings in the files based on a dictionary of words.
See this separate page for full documentation.
Check-punct is another perl script which checks an ascii file for bad spacing around punctuation and other errors such as mismatched quotes and parentheses. It is particularly useful for checking scanned documents.
Another perl script which uses a dictionary to find words which have been broken by an end-of-line hyphen and deletes the hyphen and line break. This also fixes most "Larson" encodes, _*emphasis_ and and simple HTML codes (<i>, <b> and accented characters such as é).
The -head option will try to delete page headers from scanned documents.
Convert paragraph breaks from indentation to a blank line. The default is to treat a line starting with a tab character as a new paragraph. The -n option looks for n spaces at the start of a line.
The -skipto pattern option skips to the first line which starts with the given pattern: this is to avoid reflowing header material such as the Project Gutenberg header.
A perl script to download selected news from selected groups based on regular expressions. Based on "newscan" by John F. McGowen but completely rewritten to use the Net::NNTP module from CPAN and to be vastly more efficient over a slow link such as a modem.
The .newsgreprc file in your home directory lists which groups you are interested in and which articles to select.
A sample .newsgreprc file:
# Sample .newsgreprc file (this line is a comment) # List your nntp server NNTP nntphost.at.your.isp # Mailbox where you want the news to be stored: MBOX ~/NEWS # SELECT command selects a news group # WHERE/REQUIRE/UNLESS commands select articles from the newsgroup # based on the given perl regular expressions # Any UNLESS match means that we don't want this article # All REQUIRE patterns must match. # If there are no WHERE patterns, then we want everything that is left. # Otherwise, at least one WHERE pattern must match. SELECT comp.lang.perl.announce REQUIRE /^Approved:/ SELECT comp.compilers WHERE /^Approved:/ UNLESS /^Subject:.* Frequently Asked Questions/ SELECT rec.games.bridge # I am interested in articles that mention the word gib or GIB or Gib... WHERE /\bgib\b/i # The script will record which articles have already been seen # at the end of this file: