True, he did not say that he was going to sort on each field, but
let's say that you have two people of the same name - the logical
thing to do then is to follow up with a sort on the next field,
which might be address. Or it might be phone number. Who knows?
Who cares?
As to multiple key sorts, there are examples right here in the
forums, including examples involving ARRAY SORT and algorythms
such as QuickSort and Binary Sort. But for someone just getting
into the topic, who has rather extensive requirements, I would
suggest he or she not attempt to master the art of creating a sort
engine just yet. And for a DOS-based application, the amount of
available memory can quickly become an issue. External sorts can
do very well and circumvent all that for you.
------------------
Old Navy Chief, Systems Engineer, Systems Analyst, now semi-retired
Announcement
Collapse
No announcement yet.
Max array size determination vs ability?
Collapse
X
-
A lot depends upon how you intend to sort. You indicate 8 fields ..but if you intend to sort on two or more fields.... You cannot do that with a simple [ARRAY] SORT..
If you go to the Information Management Systems web site at http://www.infoms.com and navigate to the BASICcally Speaking archives for Volume 6 Issue 8 May 1997 you'll find the public domain PB/DOS code which accompainied my article "Multi-Key Sorting Revisited" showing you how to do it.
(The article text is not available at that site).
Maybe I'll release that text sometime over the next couple of weeks and post it on my site.
Leave a comment:
-
A lot depends upon how you intend to sort. You indicate 8 fields,
but if you intend to sort on two or more fields, you may have an
issue of retainint prior sort order - you don't want the next
field sort to wipe out the results of the preceeding ones. You
cannot do that with a simple ARRAYT SORT, because you can only
have one TAGARRAY, and that would have to serve as an index to
retain sort order for each subsequent sort.
Actually, if you are striving for an unlimited number of fields,
say each one could represent a separate contact or branch of a
family tree from a preceeding one, then you do not really have a
sortable structure, because not all members are constructed alike.
The suggestion to use PowerTree to arrange all this into a
navigatable database would make sence, because the search criteria
can work very fast in a b-tree (binary tree) structure.
Sorting is only useful when you want a series of alike items to
be organized into a sequenced structure. By "alike", I don't
mean all nails or all hammers, but I might mean all bin-able or
shelved items. Or I might mean all properties listed for sale,
rent, or lease. I would not include items that are not consistent
with the category that I have chosen to quantify, organize, and
measure, because to do so, lessens the value of the result.
On the Internet, you can find a sort program called RPSort 102,
meaning version 1.02. It sorts files on disk, and works very
well, with many options. It is also very fast. Bob Zale also
provided a sample sort program in one of the examples. I found
it easy enough to modify it to also sort external disk files,
and was very pleased with the results.
The technique used by the sort process is not always known.
Each employs at least one approach, some use several at different
stages in the process. Some may involve multiple recursive
calls to the same process, which quickly eats up stack space,
and others may use arrays to retain partial values that simulate
a much larger stack structure, but also demand memory space.
There are a number of tools and efforts made to determine the
number of steps needed to sort data. As more and more data is
involved, the time to sort and the number of processes required
increases - but is it a linear increase, or an expodential one?
One of the issues here is how many times do we have to compare
the same items? Another issue is how many times do we have to
move the same information? The trick is to try and project
where the data should likely go, and how to exclude some of the
data from being retested unnecessarily.
So there are no easy answers to your question. Just go with
what works, and if you want to handle an unlimited range of
data, you are going to have to consider using an external sort
involving disk files so as not to overtask your memory and
effect other programs and process that may need to run as well.
Remember, DOS has memory limits that prevent you from making
full use of available memory, allocating it to you as either
extended or expanded memory, or letting you define a RAM drive
that would let you copy real disk files to and from, but with
the risk that you could lose your work if the system shuts down
on you.
------------------
Old Navy Chief, Systems Engineer, Systems Analyst, now semi-retired
Leave a comment:
-
I'd go with the disk sort myself (and I have made available a disk "quicksort" for pb/dos) but..
You can calculate the space necessary to store the array data no problem, but what you can't calculate is any workspace required by the compiler to actually run an ARRAY SORT statement. And it may be you can't actually calculate it 'in advance' because the amount of work space varies with the initial 'sortedness' of the data.
Or, if you are a PowerTree(tm) licensee, this would be a real good place to use it, as you would not even need to think much about available memory and I don't think disk space should be a problem. (Even Mr. Stone Age here as about 30 Gb available).
Leave a comment:
-
Can't you sort it on disk? Using modern machinery it usually gets
into the cache, so it's more rapid than you might expect.
------------------
Leave a comment:
-
Max array size determination vs ability?
Is there a way (short of testing on every configuration possible) to determine a maximum "length" array based on ability to carry out the sort?
I have data file of varying lengths, but each line contains eight
"pieces" of data on ancestors. Some people will have a file that
is 50K (50K people, meaning the array will be at least of size
People1(50000,8), other will have just 300 people, and there is
always that one person who will try for one million people.
Can I determine ahead of time the maximum number of people a user's
computer can work with, so rather than let the program fail, I can
count the people and see if it is managable?
Thank you.
Robert Carneal
------------------
Tags: None
Leave a comment: