No announcement yet.

Max array size determination vs ability?

  • Filter
  • Time
  • Show
Clear All
new posts

  • Max array size determination vs ability?

    Is there a way (short of testing on every configuration possible) to determine a maximum "length" array based on ability to carry out the sort?

    I have data file of varying lengths, but each line contains eight
    "pieces" of data on ancestors. Some people will have a file that
    is 50K (50K people, meaning the array will be at least of size
    People1(50000,8), other will have just 300 people, and there is
    always that one person who will try for one million people.

    Can I determine ahead of time the maximum number of people a user's
    computer can work with, so rather than let the program fail, I can
    count the people and see if it is managable?

    Thank you.

    Robert Carneal


  • #2
    Can't you sort it on disk? Using modern machinery it usually gets
    into the cache, so it's more rapid than you might expect.



    • #3
      I'd go with the disk sort myself (and I have made available a disk "quicksort" for pb/dos) but..

      You can calculate the space necessary to store the array data no problem, but what you can't calculate is any workspace required by the compiler to actually run an ARRAY SORT statement. And it may be you can't actually calculate it 'in advance' because the amount of work space varies with the initial 'sortedness' of the data.

      Or, if you are a PowerTree(tm) licensee, this would be a real good place to use it, as you would not even need to think much about available memory and I don't think disk space should be a problem. (Even Mr. Stone Age here as about 30 Gb available).

      Michael Mattias
      Tal Systems (retired)
      Port Washington WI USA
      [email protected]


      • #4
        A lot depends upon how you intend to sort. You indicate 8 fields,
        but if you intend to sort on two or more fields, you may have an
        issue of retainint prior sort order - you don't want the next
        field sort to wipe out the results of the preceeding ones. You
        cannot do that with a simple ARRAYT SORT, because you can only
        have one TAGARRAY, and that would have to serve as an index to
        retain sort order for each subsequent sort.

        Actually, if you are striving for an unlimited number of fields,
        say each one could represent a separate contact or branch of a
        family tree from a preceeding one, then you do not really have a
        sortable structure, because not all members are constructed alike.
        The suggestion to use PowerTree to arrange all this into a
        navigatable database would make sence, because the search criteria
        can work very fast in a b-tree (binary tree) structure.

        Sorting is only useful when you want a series of alike items to
        be organized into a sequenced structure. By "alike", I don't
        mean all nails or all hammers, but I might mean all bin-able or
        shelved items. Or I might mean all properties listed for sale,
        rent, or lease. I would not include items that are not consistent
        with the category that I have chosen to quantify, organize, and
        measure, because to do so, lessens the value of the result.

        On the Internet, you can find a sort program called RPSort 102,
        meaning version 1.02. It sorts files on disk, and works very
        well, with many options. It is also very fast. Bob Zale also
        provided a sample sort program in one of the examples. I found
        it easy enough to modify it to also sort external disk files,
        and was very pleased with the results.

        The technique used by the sort process is not always known.
        Each employs at least one approach, some use several at different
        stages in the process. Some may involve multiple recursive
        calls to the same process, which quickly eats up stack space,
        and others may use arrays to retain partial values that simulate
        a much larger stack structure, but also demand memory space.

        There are a number of tools and efforts made to determine the
        number of steps needed to sort data. As more and more data is
        involved, the time to sort and the number of processes required
        increases - but is it a linear increase, or an expodential one?
        One of the issues here is how many times do we have to compare
        the same items? Another issue is how many times do we have to
        move the same information? The trick is to try and project
        where the data should likely go, and how to exclude some of the
        data from being retested unnecessarily.

        So there are no easy answers to your question. Just go with
        what works, and if you want to handle an unlimited range of
        data, you are going to have to consider using an external sort
        involving disk files so as not to overtask your memory and
        effect other programs and process that may need to run as well.

        Remember, DOS has memory limits that prevent you from making
        full use of available memory, allocating it to you as either
        extended or expanded memory, or letting you define a RAM drive
        that would let you copy real disk files to and from, but with
        the risk that you could lose your work if the system shuts down
        on you.

        Old Navy Chief, Systems Engineer, Systems Analyst, now semi-retired


        • #5
          A lot depends upon how you intend to sort. You indicate 8 fields ..but if you intend to sort on two or more fields.... You cannot do that with a simple [ARRAY] SORT..
          He didn't say he has eight sort fields, but even if he did, yes you can do multikey sorts with ARRAY SORT.

          If you go to the Information Management Systems web site at and navigate to the BASICcally Speaking archives for Volume 6 Issue 8 May 1997 you'll find the public domain PB/DOS code which accompainied my article "Multi-Key Sorting Revisited" showing you how to do it.

          (The article text is not available at that site).

          Maybe I'll release that text sometime over the next couple of weeks and post it on my site.

          Michael Mattias
          Tal Systems (retired)
          Port Washington WI USA
          [email protected]


          • #6
            True, he did not say that he was going to sort on each field, but
            let's say that you have two people of the same name - the logical
            thing to do then is to follow up with a sort on the next field,
            which might be address. Or it might be phone number. Who knows?
            Who cares?

            As to multiple key sorts, there are examples right here in the
            forums, including examples involving ARRAY SORT and algorythms
            such as QuickSort and Binary Sort. But for someone just getting
            into the topic, who has rather extensive requirements, I would
            suggest he or she not attempt to master the art of creating a sort
            engine just yet. And for a DOS-based application, the amount of
            available memory can quickly become an issue. External sorts can
            do very well and circumvent all that for you.

            Old Navy Chief, Systems Engineer, Systems Analyst, now semi-retired