Andrew,
To help you understand what a hash table is, in plain english
It is a method of putting your data into more manageable size
chunks.
The hashing algorithm will decide which 'chunk' to put the
record into and each chunk is basically a linked list. The hashing
algorithm can be very complex or very simple and as Florent has
posted some code I strongly suggest you go and look at it.
The algorithm is normally based around a modulo operation and the
modulo will be the deciding factor on how many linked lists you
will have. Obviously, the more lists you have the smaller each list
will be. Choosing a prime number for the modulo helps as well
It is as simple in theory as that. If you use a hashing algorithm
to put your data into the table, and use the same algorithm to
find it you can speed up your search by a factor the same as the
modulo. i.e Modulo 7 = 7 times faster and more memory needed,
modulo 1001 = 1001 times faster and even more memory needed. That
is the balancing act required. Obviously if you have one and a half
million items in the list, then you need a hefty modulo but you
don't if for instance you only have one and a half thousand. In
the mainframe world (where I currently work) the system is taken
down a couple of times a year for what is called 'file resizing'
which is based on recalculating the modulo required as a file grows.
As I said, the theory is simple, and this is a fairly basic overview.
I hope it helps you to decide which method to use.
By the way, for 'Linked List' you can read 'Array' the theory is
the same.
------------------
Announcement
Collapse
No announcement yet.
A Fast Method for Data/Information Retrieval?
Collapse
X
-
andrew,
why don't you use b-tree indexing tools like powertree or dvbtree.
dvbtree is provided with full source code and data compression and
you can try it before buy.
see below:
http://www.powerbasic.com/support/pb...ad.php?t=25413
------------------
patrice terrier
mailto[email protected][email protected]</a>
Leave a comment:
-
Andrew...
As Knuth stated above, I would use PB's ARRAY SORT command to put your array elements into sorted order and
then do a binary search, bisecting the list with each < and > compare until a match is found. The worst case
scenario in a list of 1,000,000 array elements is about 20 compares (averaging around 10). Not too shabby.
However, if that just won't work for you, and you want to learn about hashing techniques, try...
http://www.cs.usask.ca/research/research_groups/aries/projects/applets/tut orials/hashing/index.html?
Good luck!
Timm
[This message has been edited by Timm Motl (edited October 07, 2000).]
Leave a comment:
-
hi andrew
alternatively, i posted dictionary (hash table) code some time
ago. you'll find a link to the code at:
http://www.powerbasic.com/support/pb...ad.php?t=27788
cheers
florent
------------------
Leave a comment:
-
Andrew,
have a look at PB's (lightning fast) ARRAY statements, ARRAY SCAN and ARRAY SORT should give you good/fast tools to build your retrieval functions.
Knuth
------------------
http://www.softAware.de
Leave a comment:
-
A Fast Method for Data/Information Retrieval?
Here is a problem I am having. I need to store general information (of a larger data set) which I need to retrieve (for a smaller data set).
For example, suppose you are working with a subset of employees for a company, you need to access information regarding any employee you are working with at the time from some sort of table/list (I am using an Array) of information from all the employees. As long as you know the Array Index, information retrieval is fast, i.e.: Salary = Data(232).Salary, assuming I know that the employee, whose information I am working with is stored in slot # 232. My question is what is the best method for determining that the index you want is #232?
The most straightforward method is simply to loop through all the records until you find the one you are looking for. It is simple and direct, but very time consuming. A faster approach is to use a BiSection search method to locate the desired record, but the caveat there is that the original list MUST be sorted in sequential Ascending or Descending order for what ever value you are searching for. That is not the case in most cases.
A friend of mine who is a C/C++ programmer suggested that I use a Hash table. He said that this is the most efficient and fastest war to retrieve information. The problem is I have no idea what a Hash table is, how to construct/use one, or if PowerBasic even supports this.
I am hoping that someone here has already experienced this problem and come up with an effective (and fast) solution. I am looking forward to hearing your thoughts/ideas/suggestions on this.
Best regards,
Andrew Peskin
[email protected]
------------------
Tags: None
Leave a comment: