Announcement

Collapse
No announcement yet.

A Fast Method for Data/Information Retrieval?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • A Fast Method for Data/Information Retrieval?

    Here is a problem I am having. I need to store general information (of a larger data set) which I need to retrieve (for a smaller data set).

    For example, suppose you are working with a subset of employees for a company, you need to access information regarding any employee you are working with at the time from some sort of table/list (I am using an Array) of information from all the employees. As long as you know the Array Index, information retrieval is fast, i.e.: Salary = Data(232).Salary, assuming I know that the employee, whose information I am working with is stored in slot # 232. My question is what is the best method for determining that the index you want is #232?

    The most straightforward method is simply to loop through all the records until you find the one you are looking for. It is simple and direct, but very time consuming. A faster approach is to use a BiSection search method to locate the desired record, but the caveat there is that the original list MUST be sorted in sequential Ascending or Descending order for what ever value you are searching for. That is not the case in most cases.

    A friend of mine who is a C/C++ programmer suggested that I use a Hash table. He said that this is the most efficient and fastest war to retrieve information. The problem is I have no idea what a Hash table is, how to construct/use one, or if PowerBasic even supports this.

    I am hoping that someone here has already experienced this problem and come up with an effective (and fast) solution. I am looking forward to hearing your thoughts/ideas/suggestions on this.

    Best regards,

    Andrew Peskin
    [email protected]

    ------------------

  • #2
    Andrew,

    have a look at PB's (lightning fast) ARRAY statements, ARRAY SCAN and ARRAY SORT should give you good/fast tools to build your retrieval functions.

    Knuth

    ------------------
    http://www.softAware.de

    Comment


    • #3
      hi andrew

      alternatively, i posted dictionary (hash table) code some time
      ago. you'll find a link to the code at:
      http://www.powerbasic.com/support/pb...ad.php?t=27788

      cheers

      florent

      ------------------

      Comment


      • #4
        Andrew...

        As Knuth stated above, I would use PB's ARRAY SORT command to put your array elements into sorted order and
        then do a binary search, bisecting the list with each < and > compare until a match is found. The worst case
        scenario in a list of 1,000,000 array elements is about 20 compares (averaging around 10). Not too shabby.

        However, if that just won't work for you, and you want to learn about hashing techniques, try...

        http://www.cs.usask.ca/research/research_groups/aries/projects/applets/tut orials/hashing/index.html?

        Good luck!

        Timm

        [This message has been edited by Timm Motl (edited October 07, 2000).]
        mailto:[email protected]om
        Tsunami Record Manager

        Comment


        • #5
          andrew,

          why don't you use b-tree indexing tools like powertree or dvbtree.

          dvbtree is provided with full source code and data compression and
          you can try it before buy.
          see below:
          http://www.powerbasic.com/support/pb...ad.php?t=25413

          ------------------
          patrice terrier
          mailto[email protected][email protected]</a>
          Patrice Terrier
          www.zapsolution.com
          www.objreader.com
          Addons: GDImage.DLL 32/64-bit (Graphic library), WinLIFT.DLL 32/64-bit (Skin Engine).

          Comment


          • #6
            Andrew,

            To help you understand what a hash table is, in plain english
            It is a method of putting your data into more manageable size
            chunks.

            The hashing algorithm will decide which 'chunk' to put the
            record into and each chunk is basically a linked list. The hashing
            algorithm can be very complex or very simple and as Florent has
            posted some code I strongly suggest you go and look at it.

            The algorithm is normally based around a modulo operation and the
            modulo will be the deciding factor on how many linked lists you
            will have. Obviously, the more lists you have the smaller each list
            will be. Choosing a prime number for the modulo helps as well

            It is as simple in theory as that. If you use a hashing algorithm
            to put your data into the table, and use the same algorithm to
            find it you can speed up your search by a factor the same as the
            modulo. i.e Modulo 7 = 7 times faster and more memory needed,
            modulo 1001 = 1001 times faster and even more memory needed. That
            is the balancing act required. Obviously if you have one and a half
            million items in the list, then you need a hefty modulo but you
            don't if for instance you only have one and a half thousand. In
            the mainframe world (where I currently work) the system is taken
            down a couple of times a year for what is called 'file resizing'
            which is based on recalculating the modulo required as a file grows.

            As I said, the theory is simple, and this is a fairly basic overview.

            I hope it helps you to decide which method to use.

            By the way, for 'Linked List' you can read 'Array' the theory is
            the same.

            ------------------

            Comment

            Working...
            X