No announcement yet.

A Fast Method for Data/Information Retrieval?

  • Filter
  • Time
  • Show
Clear All
new posts

  • Trevor Lane

    To help you understand what a hash table is, in plain english
    It is a method of putting your data into more manageable size

    The hashing algorithm will decide which 'chunk' to put the
    record into and each chunk is basically a linked list. The hashing
    algorithm can be very complex or very simple and as Florent has
    posted some code I strongly suggest you go and look at it.

    The algorithm is normally based around a modulo operation and the
    modulo will be the deciding factor on how many linked lists you
    will have. Obviously, the more lists you have the smaller each list
    will be. Choosing a prime number for the modulo helps as well

    It is as simple in theory as that. If you use a hashing algorithm
    to put your data into the table, and use the same algorithm to
    find it you can speed up your search by a factor the same as the
    modulo. i.e Modulo 7 = 7 times faster and more memory needed,
    modulo 1001 = 1001 times faster and even more memory needed. That
    is the balancing act required. Obviously if you have one and a half
    million items in the list, then you need a hefty modulo but you
    don't if for instance you only have one and a half thousand. In
    the mainframe world (where I currently work) the system is taken
    down a couple of times a year for what is called 'file resizing'
    which is based on recalculating the modulo required as a file grows.

    As I said, the theory is simple, and this is a fairly basic overview.

    I hope it helps you to decide which method to use.

    By the way, for 'Linked List' you can read 'Array' the theory is
    the same.


    Leave a comment:

  • Patrice Terrier

    why don't you use b-tree indexing tools like powertree or dvbtree.

    dvbtree is provided with full source code and data compression and
    you can try it before buy.
    see below:

    patrice terrier
    mailto[email protected][email protected]</a>

    Leave a comment:

  • Timm Motl

    As Knuth stated above, I would use PB's ARRAY SORT command to put your array elements into sorted order and
    then do a binary search, bisecting the list with each < and > compare until a match is found. The worst case
    scenario in a list of 1,000,000 array elements is about 20 compares (averaging around 10). Not too shabby.

    However, if that just won't work for you, and you want to learn about hashing techniques, try... orials/hashing/index.html?

    Good luck!


    [This message has been edited by Timm Motl (edited October 07, 2000).]

    Leave a comment:

  • Florent Heyworth
    hi andrew

    alternatively, i posted dictionary (hash table) code some time
    ago. you'll find a link to the code at:




    Leave a comment:

  • Knuth Konrad

    have a look at PB's (lightning fast) ARRAY statements, ARRAY SCAN and ARRAY SORT should give you good/fast tools to build your retrieval functions.



    Leave a comment:

  • A Fast Method for Data/Information Retrieval?

    Here is a problem I am having. I need to store general information (of a larger data set) which I need to retrieve (for a smaller data set).

    For example, suppose you are working with a subset of employees for a company, you need to access information regarding any employee you are working with at the time from some sort of table/list (I am using an Array) of information from all the employees. As long as you know the Array Index, information retrieval is fast, i.e.: Salary = Data(232).Salary, assuming I know that the employee, whose information I am working with is stored in slot # 232. My question is what is the best method for determining that the index you want is #232?

    The most straightforward method is simply to loop through all the records until you find the one you are looking for. It is simple and direct, but very time consuming. A faster approach is to use a BiSection search method to locate the desired record, but the caveat there is that the original list MUST be sorted in sequential Ascending or Descending order for what ever value you are searching for. That is not the case in most cases.

    A friend of mine who is a C/C++ programmer suggested that I use a Hash table. He said that this is the most efficient and fastest war to retrieve information. The problem is I have no idea what a Hash table is, how to construct/use one, or if PowerBasic even supports this.

    I am hoping that someone here has already experienced this problem and come up with an effective (and fast) solution. I am looking forward to hearing your thoughts/ideas/suggestions on this.

    Best regards,

    Andrew Peskin
    [email protected]