Because of the various discussion going about about thread downloading methods, I've chosen to gbBuilder into two parts - gbDownLoader, that downloads and formats threads and gbIndexer, that creates the gbThreads index files. This posts provides the code for gbDownloader.
gbDownLoader uses the WinHttpRequest API to download threads. I've found that to be significantly faster than the URLDownloadToFile and URLOpenBlockingStream API.
But for updating a few hundred threads, I think any of the three provide acceptable download times.
The new gbDownLoader has a couple of new features that were not available in gbBuilder:
gbdownloader.zip has the EXE, source code and other needed files.
Discussion is here.
Here's the code for gbdownloader.bas:
gbDownLoader uses the WinHttpRequest API to download threads. I've found that to be significantly faster than the URLDownloadToFile and URLOpenBlockingStream API.
But for updating a few hundred threads, I think any of the three provide acceptable download times.
The new gbDownLoader has a couple of new features that were not available in gbBuilder:
- Implements multiple threads to speed up downloads.
- Captures multiple page threads without saving individual pages to file (incoming data is kept in memory until all pages are captured)
gbdownloader.zip has the EXE, source code and other needed files.
Discussion is here.
Here's the code for gbdownloader.bas:
Code:
'Compilable Example: #Compile Exe "gbdownloader.exe" #Dim All #Debug Error On #Debug Display On %Unicode = 1 %ThreadCount = 4 %URLsPerThread = 13000 #Resource Icon logo,"_icons\download.ico" #Include "Win32API.inc" #Include "WinInet.inc" #Include "AfxTime.inc" #Include Once "httprequest.inc" #Include Once "ole2utils.inc" Declare Function DeleteUrlCacheEntryW Import "WININET.DLL" Alias "DeleteUrlCacheEntryW" ( ByRef lpszUrlName As WStringZ) As Long ' BOOL %MaxThreadCount = 55000 $Ver = "1.0" Union QuadFileTime ft As FileTime q As Quad End Union Type ForumIndex status As Long 'whether or not is checked forumname As StringZ * 110 'forum name short As StringZ * 20 'short name shortL As StringZ * 20 'short name tabs As Long tcount As Long 'temp value - number of threads of that forum type pcount As Long 'temp value - number of posts of that forum type End Type Enum Equates Singular IDC_Stop = 500 IDC_Start IDC_Format IDC_Localize IDC_StatusBar IDC_User IDC_Password IDC_OverWriteImages End Enum Global hDlg, hThread, ghHook As Dword, StopAction, OverWriteImages As Long Global URLS(), StartTime As String Global CommonImageList$, CNDFetchImageList$, HTMLHeader$ Global Forums() As ForumIndex Global VBUser, VBpsw, ValidWordCharacters, BadChar As WStringZ * %Max_Path Global AttachExt, AttachImg, AttachZip As WStringZ * %Max_Path Global qFreq, qStart, qStop As Quad Function PBMain() As Long Local i As Long Dialog Default Font "Tahoma",12, 0 Dialog New Pixels, 0, "gbDownLoader v" + $Ver,300,300,1000,150, %WS_OverlappedWindow To hDlg Dialog Set Icon hDlg, "logo" Control Add Button, hDlg, %IDC_Start,"Download", 10,15,110,20 Control Add Button, hDlg, %IDC_Format,"Format", 130,15,110,20 Control Add Button, hDlg, %IDC_Localize,"Localize", 250,15,110,20 Control Add CheckBox, hDlg, %IDC_OverWriteImages, "OverWrite Images", 250,40,150,25 Control Add Button, hDlg, %IDC_Stop,"Stop", 370,15,100,20 Control Add TextBox, hDlg, %IDC_User, "", 500, 10, 250, 25 Control Add TextBox, hDlg, %IDC_Password, "", 500, 40, 620, 25 Control Add Statusbar, hDlg, %IDC_StatusBar, "", 0,0,0,0 Statusbar Set Parts hDlg, %IDC_StatusBar, 75,75,75,75,75,75,75,75,75,75,75,75,75,75,75 'max of 15 threads For i = 1 To %ThreadCount : Statusbar Set Text hDlg, %IDC_Statusbar, i, 0, Str$(i) : Next i Dialog Show Modal hDlg Call DlgProc End Function CallBack Function DlgProc() As Long Local i As Long Select Case Cb.Msg Case %WM_InitDialog QueryPerformanceFrequency qFreq CreateSubFolders Settings_INI "get" GetPassword Control Set Check hDlg, %IDC_OverWriteImages, OverWriteImages Control Set Text hDlg, %IDC_User, VBUser Control Set Text hDlg, %IDC_Password, VBPsw LoadCommonImageList LoadURLs LoadForumNames BadChar = Remove$(Chr$(1 To 255), Any ValidWordCharacters) Dialog Set Text hDlg, "gbDownLoader " + $Ver + " " + Format$(UBound(URLS), "##,##0") Case %WM_Destroy Settings_INI "save" Case %WM_Help ? URLS(1) + $CrLf + URLS(2) Case %WM_Command Select Case Cb.Ctl Case %IDC_OverWriteImages Control Get Check hDlg, %IDC_OverWriteImages To OverWriteImages Case %IDC_Stop StopAction = 1 Case %IDC_Start If MsgBoxX("Are you sure?", %MB_OkCancel + %MB_IconQuestion + %MB_TaskModal, "gbDownLoader " + $Ver) = %IdCancel Then Exit Select StopAction = 0 StartTime = Time$ QueryPerformanceCounter qStart Dialog Set Text hDlg, "Downloading Threads ... " + Format$(UBound(URLS), "##,##0") + " " + StartTime ''''''NOTE: MUST KILL "threads" manually if starting from scratch 'If Len(Dir$("threads\*.htm")) Then FastFolderKiller("threads") For i = 1 To %ThreadCount Thread Create DownloadThreads(i) To hThread : Thread Close hThread To hThread Next i Case %IDC_Format If MsgBoxX("Are you sure?", %MB_OkCancel + %MB_IconQuestion + %MB_TaskModal, "Format Threads") = %IdCancel Then Exit Select StopAction = 0 StartTime = Time$ QueryPerformanceCounter qStart Dialog Set Text hDlg, "Formating Threads ... " + Format$(UBound(URLS), "##,##0") + " " + StartTime BuildHTMLHeader If Len(Dir$("threads_formatted\*.htm")) Then FastFolderKiller("threads_formatted") For i = 1 To %ThreadCount Thread Create FormatThreads(i) To hThread : Thread Close hThread To hThread Next i Case %IDC_Localize If MsgBoxX("Are you sure?", %MB_OkCancel + %MB_IconQuestion + %MB_TaskModal, "Localize Images") = %IdCancel Then Exit Select StopAction = 0 StartTime = Time$ QueryPerformanceCounter qStart Dialog Set Text hDlg, "Localizing Images ... " + Format$(UBound(URLS), "##,##0") + " " + StartTime If Len(Dir$("threads_localimages\*.htm")) Then FastFolderKiller("threads_localimages") For i = 1 To %ThreadCount Thread Create LocalizeImages(i) To hThread : Thread Close hThread To hThread Next i End Select End Select End Function Sub LoadURLs Local temp$, Delimiter$ Open "filesB\release_threads.txt" For Binary As #1 : Get$ #1, Lof(1), temp$ : Close #1 If InStr(temp$,$CrLf) Then Delimiter$ = $CrLf Else Delimiter$ = $Lf ReDim URLs(1 To ParseCount(temp$,Delimiter$)) Parse temp$, URLs(), Delimiter$ End Sub Sub RemoveTodayYesterday(gTemp$) Local Today$, Yesterday$, Y, M, D, JD, Changed As Long, wD, wM, wY As Word, TheDate$ TheDate$ = Date$ Y = Val(Mid$(TheDate$,7 To 10)) M = Val(Left$(TheDate$,2)) D = Val(Mid$(TheDate$,4 To 5)) Today$ = Format$(D,"#0") + $Spc + Left$(MonthName$(M),3)+ $Spc + Format$(Y,"0000") + "," JD = AfxAstroDay(D, M, Y) JD = AfxDateAdd(D, M, Y, -1) AfxJulianToDate(JD, wD, wM, wY) Yesterday$ = Format$(wD,"#0") + $Spc + Left$(MonthName$(wM),3)+ $Spc + Format$(wY,"0000") + "," If InStr(gTemp$,">Today,") Then Replace ">Today," With ">" + Today$ In gTemp$ : Changed = 1 If InStr(gTemp$,">Yesterday,") Then Replace ">Yesterday," With ">" + Yesterday$ In gTemp$ : Changed = 1 End Sub Sub CreateSubFolders If IsFalse IsFolder("temp") Then MkDir "temp" If IsFalse IsFolder("filesB") Then MkDir "filesB" If IsFalse IsFolder("images_common") Then MkDir "images_common" If IsFalse IsFolder("images_fetch") Then MkDir "images_fetch" If IsFalse IsFolder("images_local") Then MkDir "images_local" If IsFalse IsFolder("threads") Then MkDir "threads" If IsFalse IsFolder("threads_formatted") Then MkDir "threads_formatted" If IsFalse IsFolder("threads_localimages") Then MkDir "threads_localimages" End Sub Sub Settings_INI(Task$) Local xResult, yResult, tempZ, INIFileName As WStringZ * %Max_Path, WinPla As WindowPlacement 'set ini filename INIFileName = Exe.Path$ + "gbdownloader.ini" If Task$ = "get" Then 'get dialog width/height from INI file and use to set Dialog size GetPrivateProfileString "All", "Width", "1140", xResult, %Max_Path, INIFileName GetPrivateProfileString "All", "Height", "150", yResult, %Max_Path, INIFileName Dialog Set Size hDlg,Val(xResult), Val(yResult) 'width/height 'default position should be centered on screen Local DefaultX, DefaultY, wDeskTop, hDeskTop As Long Desktop Get Client To wDeskTop, hDeskTop DefaultX = (wDeskTop-Val(xResult))/2 DefaultY = (hDeskTop-Val(yResult))/2 'get dialog top/left from INI file and use to set Dialog location Getprivateprofilestring "All", "Left", Str$(DefaultX), xResult, %Max_Path, INIFileName Getprivateprofilestring "All", "Top", Str$(DefaultY), yResult, %Max_Path, INIFileName Dialog Set Loc hDlg, Val(xResult), Val(yResult) 'left/top 'get value for string variables Getprivateprofilestring "All", "ValidWordCharacters", Chr$("-.'!?#$%&@\^", 48 To 57, 65 To 90, 95, 97 To 122), ValidWordCharacters, %Max_Path, INIFileName '-.!?#$%&@\^0123456789 _ A-Z a-z Getprivateprofilestring "All", "VBUser","User", VBUser, %Max_Path, INIFileName Getprivateprofilestring "All", "VBPsw", "PSW", VBPsw, %Max_Path, INIFileName Getprivateprofilestring "All", "AttachExt", ".bas .txt .hlp .au .avi .bat .cab .cfm .cpl .hlp .inc .ini .rtf .swf .tlb .chm " + _ ".csv .dat .dll .doc .docx .exe .gbs .h .pdf .pptx .wav .wmv .xls .xlsx .log .msi .mid .mov .mp3 .pblib .pbr ", AttachExt, %Max_Path, INIFileName Getprivateprofilestring "All", "AttachImg", ".gif .jpg .jpeg .bmp .png .ico .svg .cur ", AttachImg, %Max_Path, INIFileName Getprivateprofilestring "All", "AttachZip", ".zip .tar .rar ", AttachZip, %Max_Path, INIFileName 'get value for numeric variables Getprivateprofilestring "All", "OverWriteImages", "0", tempz, %Max_Path, INIFileName : OverWriteImages = Val(tempz) End If If Task$ = "save" Then 'If Len(Dir$(INIFileName)) Then Kill INIFileName 'save dialog size/location unless minimized or maximized WinPla.length = SizeOf(WinPla) GetWindowPlacement hDlg, WinPla WritePrivateProfileString "All", "Left", Str$(WinPla.rcNormalPosition.nLeft), INIFileName WritePrivateProfileString "All", "Top", Str$(WinPla.rcNormalPosition.nTop), INIFileName WritePrivateProfileString "All", "Width", Str$(WinPla.rcNormalPosition.nRight - WinPla.rcNormalPosition.nLeft), INIFileName WritePrivateProfileString "All", "Height", Str$(WinPla.rcNormalPosition.nBottom - WinPla.rcNormalPosition.nTop), INIFileName WritePrivateProfileString "All", "MinMaxState", Str$(WinPla.showCmd), INIFileName 'save string variables WritePrivateProfileString "All", "ValidWordCharacters",ValidWordCharacters, INIFileName WritePrivateProfileString "All", "VBUser",VBUser, INIFileName WritePrivateProfileString "All", "VBPsw",VBPsw, INIFileName 'save numeric variables WritePrivateProfileString "All", "OverWriteImages", Str$(OverWriteImages), INIFileName End If End Sub Function DeskTopFolder() As String Local sPath As WStringZ * %Max_Path SHGetFolderPath(0, %CSIDL_DeskTop, 0, 0, sPath) Function = sPath End Function Sub GetPassword Local temp$, fName$ fName$ = "login.txt" If IsFile(fName$) Then Open fName$ For Input As #1 Line Input #1, temp$ : VBUser = temp$ Line Input #1, temp$ : VBPsw = temp$ Close #1 End If End Sub Sub LoadCommonImageList Local temp$ temp$ = Dir$("images_common\*.*") While Len(temp$) CommonImageList$ += "/" + temp$ temp$ = Dir$(Next) Wend End Sub Sub sBeep WinBeep(250,300) End Sub Function FastFolderKiller(Folder$) As Long Local i,iReturn As Long, NewFolder$ 'valid string format Folder$ = Trim$(Folder$) 'remove spaces on either end If Right$(Folder$,1) <> "\" Then Folder$ += "\" 'must have a trailing \ If IsFalse IsFolder(Folder$) Then Exit Function 'not a folder 'safety against deleting important folders If LCase$(Folder$) = "c:\program files\" Then Exit Function 'safety If LCase$(Folder$) = "c:\program files (x86)\" Then Exit Function 'safety If LCase$(Folder$) = "c:\windows\" Then Exit Function 'safety If LCase$(Folder$) = "c:\users\" Then Exit Function 'safety 'folder constraints If Dir$(Folder$ + "*.*", Only %SubDir) <> "" Then Exit Function '(optional) has subfolders, which this function does not handle If Dir$(Folder$ + "*.*") = "" Then RmDir Folder$ : Exit Function 'no files, so remove with RmDir. No further action required. 'get a new name for the temporary folder For i = 0 To 9999 NewFolder$ = "folder_being_killed_" + Format$(i,"0000") + "\" If IsFalse IsFolder(NewFolder$) Then Exit For Next i Name Folder$ As NewFolder$ 'get filled folder out of the way MkDir Folder$ 'create an empty Folder$, as though it's files had been deleted 'DOS CMD to delete renamed folder iReturn = Shell ("cmd /C RmDir " + NewFolder$ + " /s/Q" , 0) 'background elimination of renamed/full folder End Function Sub AdjustCharacters(temp$) temp$ = Utf8ToChr$(temp$) Replace Chr$(246) With "o" In temp$ Replace Chr$(232) With "e" In temp$ Replace Chr$(233) With "e" In temp$ Replace "</A>" With "" In temp$ End Sub Function ShortForumName(temp$) As String Local i As Long For i = 1 To UBound(Forums) If InStr(temp$,Forums(i).forumname) Then Function = Forums(i).short : Exit Function Next i Function = "Unknown Forum" End Function Sub CleanThreadTitle(threadtitle$) If threadtitle$ = "" Then threadtitle$ = "Unknown" : Exit Sub threadtitle$ = Utf8ToChr$(threadtitle$) Replace """ With $Spc In threadtitle$ Replace "&" With "&" In threadtitle$ Replace Chr$(34) With $Spc In threadtitle$ threadtitle$ = Shrink$(threadtitle$) End Sub Function ConvertDate(temp$) As String Local mnth$, pm$, yr$, dy$, mn$, hr$, tmp$ '24 Nov 2014, 08:45AM '2011 07-30 14:20 'Desired Output yr$ = Parse$(temp$, Any " ,:",3) dy$ = Format$(Val(Parse$(temp$," ",1)),"00") mnth$ = Format$(( InStr("JanFebMarAprMayJunJulAugSepOctNovDec",Parse$(temp$," ",2)) -1)/3+1, " 00-") pm$ = Parse$(temp$," ",5) hr$ = Format$(Val(Parse$(temp$," ",4))," 00") If (pm$="PM") And (Val(hr$)<12) Then hr$ = Format$(Val(hr$) + 12," 00") ElseIf (pm$="AM") And (Val(hr$)=12) Then hr$ = Format$(Val(hr$) - 12," 00") End If mn$ = Right$(Parse$(temp$," ",4),3) tmp$ = Build$(yr$, mnth$, dy$, hr$, mn$) If Val(tmp$) = 0 Then Function = Mid$(Date$, 4 To 5) + $Spc + Mid$(Date$,7 To 10) + "-" + Left$(Date$,2) + " 00:00" Else Function = Build$(yr$, mnth$, dy$, hr$, mn$) End If End Function Function FormatContent(temp$, CodeCount As Long) As String Local tmp$, i, iCount, iPos, QuoteOn, CodeOn As Long 'remove signature iPos = InStr(temp$,"------------------<br />") If iPos Then temp$ = Left$(temp$, iPos-1) 'put content into array for processing line by line iCount = ParseCount(temp$,$CrLf) ReDim C(1 To iCount) As String Parse temp$, C(), $CrLf For i = 1 To UBound(C) If InStr(C(i),"<div class=""js-post__content") Then Iterate For If InStr(C(i),"<div class=""bbcode_quote_container") Then Iterate For If InStr(C(i),"<div class=""bbcode_description") Then Iterate For If InStr(C(i),"<div class=""quote_container") Then QuoteOn = 1 tmp$ += "<br><table width=100%><tr style=background-color:#FFF8DC><td><startofquote><font size=3><b>Quote:</b></font><br>" + $CrLf ElseIf InStr(C(i),"<pre class=""bbcode_code") Then If InStr(C(i),"</pre>") Then Incr CodeCount tmp$ += $CrLf + "<br><table width=100%><tr style=background-color:#E0FFFF><td><startofcode><pre id=code" + Format$(CodeCount,"0000") + "><b>Code:</b><input type=checkbox id=" + Format$(CodeCount,"000")+ "><p>" tmp$ += C(i) + $CrLf tmp$ += "</pre><endofcode></table><br>" + $CrLf Else CodeOn = 1 Incr CodeCount tmp$ += $CrLf + "<br><table width=100%><tr style=background-color:#E0FFFF><td><startofcode><pre id=code" + Format$(CodeCount,"0000") + "><b>Code:</b><input type=checkbox id=" + Format$(CodeCount,"000")+ "><p>" End If ElseIf QuoteOn And InStr(C(i),"</div>") Then tmp$ += C(i) + $CrLf tmp$ += "<endofquote></table><br>" + $CrLf QuoteOn = 0 ElseIf CodeOn And InStr(C(i),"</pre>") Then tmp$ += C(i) + $CrLf tmp$ += "</pre><endofcode></table><br>" + $CrLf CodeOn = 0 Else tmp$ += C(i) + $CrLf End If Next i Function = tmp$ End Function Function RemoveEmailProtection(temp$) As String Local iPosA, iPosB As Long, encrypt1$, encrypt2$ iPosA = InStr(temp$, "-protection") iPosB = InStr(iPosA, temp$, $Dq) encrypt1$ = Mid$(temp$,iposA+12 To iPosB-1) encrypt1$ = DecryptString(encrypt1$) iPosA = InStr(iPosB, temp$, "-cfemail") iPosB = InStr(iPosA+10, temp$, $Dq) encrypt2$ = Mid$(temp$, iPosA+10 To iPosB-1) encrypt2$ = DecryptString(encrypt2$) Function = "<a href=""mailto:" + encrypt1$ + """>" + encrypt2 + "</a>" End Function Function RemoveCodeProtection(temp$) As String Local iPosA, iPosB As Long, tmp$ iPosA = InStr(-1, temp$, "cfemail") iPosB = InStr(iPosA+9, temp$, $Dq) tmp$ = Mid$(temp$, iPosA+9 To iPosB-1) Function = DecryptString(tmp$) End Function Function DecryptString(tmp$) As String Local decoded$, i,k As Long k = Val("&H"+Left$(tmp$,2)) For i = 3 To Len (tmp$) Step 2 decoded += Chr$(Val("&H"+ Mid$(tmp$, I,2))Xor k) Next I Function = decoded End Function Thread Function LocalizeImages(ByVal x As Long) As Long Local i,iCount, iPosA, iPosB As Long, ThreadName$, temp$ 'use threads_formatted to get threads_localimages StopAction = 0 Statusbar Set Text hDlg, %IDC_Statusbar, x, 0, "" For i = 1 + (x-1)*%URLsPerThread To x*%URLsPerThread If i > UBound(URLs) Then Exit For If StopAction Then Dialog Set Text hDlg, "Convert to Local URLs Stopped!" : Exit For iPosA = InStr(-1, URLs(i), "/") iPosB = InStr(iPosA+1, URLs(i), "-") temp$ = Trim$(Mid$(URLs(i), iPosA+1 To iPosB-1)) ThreadName$ = Format$(Val(temp$),"0000000") + ".htm If IsFalse IsFile("threads_formatted\" + ThreadName$) Then Iterate For Incr iCount ConvertSingleThreadToLocalImageURLs(ThreadName$, x) 'convert image URL to Local URLs + download online image to local image ConvertSingleThreadToLocalAttachmentURLs(ThreadName$, x) 'convert attachment URL to Local URLs + download attachments to local image 'If iCount Mod 250 = 0 Then Statusbar Set Text hDlg, %IDC_StatusBar, x, 0, Format$(iCount,"##,##0") Statusbar Set Text hDlg, %IDC_StatusBar, x, 0, Format$(iCount,"##,##0") Sleep 0 Next i Statusbar Set Text hDlg, %IDC_StatusBar,x,0, Format$(iCount,"##,##0") + " x" Dialog Set Text hDlg, "Images Converted ... " + Format$(iCount,"##,##0") + " " + StartTime$ + " " + Time$ sBeep End Function Sub ConvertSingleThreadToLocalAttachmentURLs(Threadname$, x As Long) 'contert attachment URLs to Local URls + download attachments to local image End Sub Sub ConvertSingleThreadToLocalImageURLs(Threadname$, x As Long) 'convert image URL to Local URLs + download online image to local image 'images URLs can be src="http OR src="filedata/fetch? Local temp$, iPosA, iPosB, iPosC, iResult As Long Local OnlineImageURL$, LocalImageName$, ShortName$, LocalImagePrefix$ Open "threads_formatted\" + ThreadName$ For Binary As #x : Get$ #x, Lof(x), temp$ : Close #x 'Get HTTP images iPosA = InStr(temp$,"src=""http") 'only online images 'if no image, just copy the file If iPosA = 0 Then 'Or ThreadName$ = "0748953.htm" Or ThreadName$ = "0795382.htm" Then Open "threads_localimages\" + ThreadName$ For Output As #x : Print #x, temp$; : Close #x Exit Sub End If While iPosA iPosB = InStr(iPosA+9,temp$,$Dq) OnlineImageURL$ = Mid$(temp$, iPosA+5 To iPosB-1) OnlineImageURL$ = Trim$(OnlineImageURL$, Any $Spc + $CrLf) If InStr(CNDFetchImageList$,OnlineImageURL$) Then Replace OnlineImageURL$ With "" In temp$ iPosA = InStr(iPosB, temp$,"src=""http") Iterate Loop End If iPosC = InStr(iPosB-Len(temp$), temp$, "/") ShortName$ = Mid$(temp$, iPosC+1 To iPosB-1) 'may be fetch info If InStr(CommonImageList$, "/" + ShortName$) Then 'cool.gif, eek.png, smile.gif, ... etc. LocalImageName$ = "..\images_common\" + ShortName$ ElseIf InStr(OnlineImageURL$, "fetch?id") Then LocalImagePrefix$ = "images_fetch\" + PathName$(Name, ThreadName$) + "_" If DownloadUrlPBForum (OnlineImageURL$, LocalImagePrefix$, LocalImageName$) Then LocalImageName$ = "..\" + LocalImageName$ Else LocalImageName$ = "..\images_common\404.jpg" End If Else 'image URL is clearly visible in the thread file LocalImageName$ = Exe.Path$ + "images_local\" + PathName$(Name, ThreadName$) + "_" + ShortName$ '"0054990_bob.jpg" If OverWriteImages Or IsFalse IsFile(LocalImageName$) Then DeleteURLCacheEntryW((OnlineImageURL$)) iResult = URLDownloadToFile(Nothing, (OnlineImageURL$), (LocalImageName$), 0, Nothing) If iResult <> %S_Ok Then CNDFetchImageList$ += $CrLf + OnlineImageURL$ + " ::: " + ThreadName$ LocalImageName$ = "..\images_common\404.jpg" End If End If End If Replace OnlineImageURL$ With LocalImageName$ In temp$ iPosA = InStr(iPosB, temp$,"src=""http") 'start at top because Replace will change the size of temp$ Wend Open "threads_localimages\" + ThreadName$ For Output As #x : Print #x, temp$; : Close #x End Sub Function DownloadUrlPBForum(OnlineImageURL As String, LocalImagePrefix As String, LocalImageName As String) As Long Local pWHttp As IWinHttpRequest Local vSTream As Variant Local pIStream As IStream Local Buffer As String * 8192 Local strBuffer As String Local cbRead As Dword Local iSucceeded As Integer Local wsHeaders, wTemp1, wFilename As WString Local lCount1, lCount2 As Long Static wsCookies, vb_UserID, vb_PassHash As WString ' these are the magic cookies that ID you Control Get Text hDlg, %IDC_User To vb_UserID Control Get Text hDlg, %IDC_Password To vb_PassHash Function = %false ' Creates an instance of the HTTP service pWHttp = NewCom "WinHttp.WinHttpRequest.5.1" If IsNothing(pWHttp) Then Exit Function Try 'OK we have what we need now fetch the image pWHttp.Open "GET", OnlineImageURL , %false pWHttp.setRequestHeader "Cookie", wsCookies + "; " + vb_UserID + "; " + vb_PassHash pWHttp.Send ' Wait for response with a timeout of 5 seconds iSucceeded = pWHttp.WaitForResponse(5) If iSucceeded Then wsHeaders = Format$(pWHttp.Status) + " " + pWHttp.Statustext + $CrLf wsHeaders += pWHttp.GetAllResponseHeaders If pWHttp.Status <> 200 Then Exit Function ' failed to get what we asked for. For lCount1 = 1 To ParseCount(wsHeaders, $CrLf) ' extract Filename from header wTemp1 = Parse$(wsHeaders, $CrLf, lCount1) If Tally(wTemp1, "filename=") = 1 Then For lCount2 = 1 To ParseCount(wTemp1, ";") wFilename = Parse$(wTemp1, ";", lCount2) If Tally (wFilename, "filename=") = 1 Then wFilename = Remove$(wFilename, "filename=") : Exit For : Exit For Next lCount2 End If Next lCount wFilename = Trim$(wFilename,Any $Dq+$Spc) LocalImageName = LocalImagePrefix + wFileName If OverWriteImages = 0 And IsFile (LocalImageName) Then Function = %true : Exit Function ' Get the response as a stream vStream = pWHttp.ResponseStream If VariantVT(vStream) = %VT_Unknown Then pIStream = vStream vStream = Empty ' Read the stream in chunks Do pIStream.Read VarPtr(buffer), SizeOf(buffer), cbRead If cbRead = 0 Then Exit Do If cbRead < SizeOf(buffer) Then strBuffer = strBuffer & Left$(buffer, cbRead) Else strBuffer = strBuffer & buffer End If Loop pIStream = Nothing If Len(strBuffer) Then ' Save the buffer into a file LocalImageName = LocalImagePrefix + wFileName If IsFile (LocalImageName) Then Kill LocalImageName Open LocalImageName For Binary As #1 Put #1, 1, strBuffer SetEof #1 Close #1 Function = %true Else MsgBox "Buffer is empty" End If End If End If Catch 'OleShowErrorInfo ObjResult End Try End Function Function MsgBoxX(msg$,Flag As Long,caption$) As Long ghHook = SetWindowsHookEx(%WH_CBT, CodePtr(SBProc), GetModuleHandle(""), GetCurrentThreadId) Function = MessageBox( 0, (msg$), (caption$), Flag) End Function Function SBProc(ByVal lMsg As Long, ByVal wParam As Long, ByVal lParam As Long) As Long Local x,y,DeskTopW, DeskTopH As Long, rc As rect 'MsgBoxX is approximately 200x200 CenterOverDialog 370,170,x,y If lMsg = %HCBT_ACTIVATE Then Desktop Get Size To DesktopW, DesktopH GetWindowRect(wParam,rc) If y + (rc.nBottom-rc.nTop) > DeskTopH Then y = DeskTopH - (rc.nBottom - rc.nTop)- 50 End If SetWindowPos wParam, 0, x,y, 0, 0, %SWP_NOSIZE Or %SWP_NOACTIVATE Or %SWP_NOZORDER UnhookWindowsHookEx ghHook End If End Function Sub CenterOverDialog(BoxX As Long, BoxY As Long, CenterX As Long, CenterY As Long) 'gets top/left position to center Box over Dialog 'you need to know, or at least estimate, the Box size Local x As Long, y As Long, w As Long, h As Long Dialog Get Loc hDlg To x,y Dialog Get Size hDlg To w,h CenterX = x + (w-BoxX)/2 CenterY = y + 110 '(h-BoxY)/2 End Sub Sub BuildHTMLHeader HTMLHeader$ = "--> HTMLHeader$ += $CrLf + "<head>" HTMLHeader$ += $CrLf + "<style type=text/css>" HTMLHeader$ += $CrLf + "<!--" HTMLHeader$ += $CrLf + ".tborder { border: 1px solid #808080; }" HTMLHeader$ += $CrLf + "h1 { color:#8B0000; }" HTMLHeader$ += $CrLf + "h2 { color:#8B0000; }" HTMLHeader$ += $CrLf + "h3 { color:#8B0000; }" HTMLHeader$ += $CrLf + "-->" HTMLHeader$ += $CrLf + "</style></head>" HTMLHeader$ += $CrLf + "<body><font face=""Calibri"" size=1><img src=..\images_common\banner.png>" End Sub Thread Function FormatThreads(ByVal x As Long) As Long Local i,iCount, iPosA, iPosB As Long, ThreadName$, temp$ StopAction = 0 Statusbar Set Text hDlg, %IDC_Statusbar, x, 0, "" For i = 1 + (x-1)*%URLsPerThread To x*%URLsPerThread If i > UBound(URLs) Then Exit For If StopAction Then Dialog Set Text hDlg, "Format Threads stopped!" : Exit For iPosA = InStr(-1, URLs(i), "/") iPosB = InStr(iPosA+1, URLs(i), "-") temp$ = Trim$(Mid$(URLs(i), iPosA+1 To iPosB-1)) ThreadName$ = Format$(Val(temp$),"0000000") + ".htm If IsFalse IsFile("threads\" + ThreadName$) Then Iterate For Incr iCount FormatSingleThread(ThreadName$, x) If iCount Mod 250 = 0 Then Statusbar Set Text hDlg, %IDC_StatusBar, x, 0, Format$(iCount,"##,##0") Next Statusbar Set Text hDlg, %IDC_StatusBar,x,0, Format$(iCount,"##,##0") + " x" QueryPerformanceCounter qStop Dialog Set Text hDlg, "Threads Formatted ... " + Format$(UBound(URLS), "##,##0") + " " + StartTime + " " + Time$ + " " + Format$((qStop-qStart)/qFreq,"###.0") & "s" sBeep End Function Sub FormatSingleThread(ThreadName$, x As Long) Local tmp$, temp$, tempp$, iPosA, iPosB, iDivA, iDivB, iLen, divCount, CodeCount As Long Local threadURL$, author$, postdate$, lastdate$, forum$, postcount$, firstdate$, threadtitle$, postcontent$, HTMLOut$ Open "threads\" + ThreadName$ For Binary As #x Get$ #x, Lof(x), temp$ Close #x 'content corrections AdjustCharacters(temp$) iLen = Len(temp$) iPosA = InStr(temp$, "url"" content=""https") iPosB = InStr(iPosA+14, temp$, $Dq) threadURL$ = Mid$(temp$, iPosA+14 To iPosB-1) 'full thread URL$ forum$ = ShortForumName(threadURL$) 'short forum name iPosA = InStr(temp$,"class=""main-title") iPosA = InStr(iPosA+1, temp$,"class=""main-title") iPosA = InStr(iPosA,temp$,">") iPosB = InStr(iPosA,temp$,"<") threadtitle$ = Trim$(Mid$(temp$, iPosA+1 To iPosB-1)) 'threadtitle$ CleanThreadTitle(threadtitle$) iPosA = InStr(temp$,"<div class=""author") iPosB = InStr(iPosA,temp$,"</strong>") If Mid$(temp$,iPosB-4 To iPosB-1) = "</a>" Then iPosB = iPosB-4 iPosA = InStr(iPosB-iLen,temp$,">") author$ = Mid$(temp$,iPosA+1 To iPosB-1) 'author$ author$ = Utf8ToChr$(author$) author$ = MCase$(Author$) iPosA = InStr(iPosB, temp$,"<time") iPosB = InStr(iPosA,temp$,"</time") iPosA = InStr(iPosB-iLen,temp$,">") firstdate$ = Mid$(temp$,iPosA+1 To iPosB-1) 'firstdate 'Replace "Today," With Today In firstDate$ 'Replace "Yesterday," With YesterDay In firstDate$ firstdate$ = ConvertDate(firstDate$) iPosA = InStr(-1, temp$,"<time") iPosB = InStr(iPosA,temp$,"</time") iPosA = InStr(iPosB-iLen,temp$,">") lastdate$ = Mid$(temp$,iPosA+1 To iPosB-1) 'lastdate 'Replace "Today," With Today In lastDate$ 'Replace "Yesterday," With YesterDay In lastDate$ lastdate$ = ConvertDate(lastDate$) 'tbdData - placeholder HTMLOut$ = "<html> HTMLOut$ += $CrLf + "<!--" HTMLOut$ += $CrLf + threadURL$ HTMLOut$ += $CrLf + forum$ HTMLOut$ += $CrLf + threadtitle$ HTMLOut$ += $CrLf + author$ HTMLOut$ += $CrLf + firstdate$ HTMLOut$ += $CrLf + lastdate$ HTMLOut$ += $CrLf + Trim$(Str$(ParseCount(temp$,"<time")-1)) HTMLOut$ += $CrLf + Date$ + ":::" + Time$ 'until tbdData$ is determined ? attachmentcount ::: zip count ::: other HTMLOut$ += $CrLf + HTMLHeader$ HTMLOut$ += $CrLf + "<p><font size=+2><b>" + forum$ + "</b></font><p>" HTMLOut$ += $CrLf + "<h1>" + threadtitle$ + "</h1><p>" iPosA = InStr(temp$,"<div class=""author") While iPosA iPosB = InStr(iPosA,temp$,"</strong>") If Mid$(temp$,iPosB-4 To iPosB-1) = "</a>" Then iPosB = iPosB-4 iPosA = InStr(iPosB-iLen,temp$,">") HTMLOut$ += $CrLf + "<table class=tborder cellpadding=6 cellspacing=1 border=0 width=100%><td>" author$ = Mid$(temp$,iPosA+1 To iPosB-1) author$ = MCase$(author$) HTMLOut$ += $CrLf + "<font size=+1><b>" + author$ + "</b></font>" 'author iPosA = InStr(iPosB, temp$, "class=""b-post__count") iPosA = InStr(iPosA, temp$, ">") iPosB = InStr(iPosA, temp$, "<") postcount$ = Mid$(temp$, iPosA+1 To iPosB-1) 'post number iPosA = InStr(iPosB, temp$,"<time") iPosB = InStr(iPosA,temp$,"</time") iPosA = InStr(iPosB-iLen,temp$,">") postdate$ = Mid$(temp$,iPosA+1 To iPosB-1) 'postdate HTMLOut$ += $CrLf + "<td align=right>" + postdate$ + "<td align=right>" + postcount$ iPosA = InStr(iPosB, temp$,"<div class=""js-post__content-text") iPosB = iPosA divCount = 1 Do iDivA = InStr(iPosB+1, temp$, "<div") iDivB = InStr(iPosB+1, temp$, "</div>") If iDivA < iDivB Then Incr divCount iPosB = iDivA Else Decr DivCount iPosB = iDivB End If Loop Until divCount = 0 HTMLOut$ += $CrLf + "<tr><td colspan=3><hr><tr><td colspan=3>" postcontent$ = Mid$(temp$, iPosA To iPosB-1) postcontent$ = FormatContent(postcontent$, CodeCount) HTMLOut$ += $CrLf + postcontent$ 'post content HTMLOut$ += $CrLf + "</table></table><p>" iPosA = InStr(iPosB, temp$,"<div class=""author") Wend HTMLOut$ += $CrLf + "</body></html>" 'correction of EMAIL Protection iPosA = InStr(HTMLOut$,"<a href=""/cdn-cgi/") While iPosA iPosB = InStr(iPosA,HTMLOut$,"</a>") 'Code: has one, emails have two temp$ = Mid$(HTMLOut$,iPosA To iPosB + 3) If InStr(temp$, "target=") Then tmp$ = RemoveEmailProtection(temp$) Replace temp$ With tmp$ In HTMLOut$ 'this invalidates iPosB Else tmp$ = RemoveCodeProtection(temp$) Replace temp$ With tmp$ In HTMLOut$ 'this invalidates iPosB End If iPosA = InStr(iPosA+1, HTMLOut$,"<a href=""/cdn-cgi/") 'probably should be iPosA + Len(tmp$) Wend 'post-formatting corrections Replace $Tab With "" In HTMLOut$ 'Replace ">Today," With ">" + Today In HTMLOut$ 'Replace ">Yesterday," With ">" + Yesterday In HTMLOut$ tempp$ = $CrLf + $CrLf tmp$ = $CrLf While InStr(HTMLOut$, tempp$) 'no blank lines Replace tempp$ With tmp$ In HTMLOut$ Wend Replace "&type=thumb" With "" In HTMLOut$ 'so that full image will be downloaded tempp$ = "filedata/fetch" 'fetch is cgi to specify server target file tmp$ = "https://forum.powerbasic.com/filedata/fetch" 'full URL link to server target file Replace tempp$ With tmp$ In HTMLOut$ tempp$ = "http://www.powerbasic.com/support/pbforums/showthread.php" 'old forum link to thread tmp$ = "https://forum.powerbasic.com/showthread.php" 'new forum link to thread Replace tempp$ With tmp$ In HTMLOut$ tempp$ = "http://www.powerbasic.com/support/pb...read.php" 'abbreviated URL tmp$ = "https://forum.powerbasic.com/showthread.php" 'corrected abbreviated URL Replace tempp$ With tmp$ In HTMLOut$ tempp$ = "https://forum.powerbasic.com/https://forum.powerbasic.com/" 'remnant of formatting tmp$ = "https://forum.powerbasic.com/" 'corrected remnant Replace tempp$ With tmp$ In HTMLOut$ tempp$ = "( <a href=""mailto:[email protected]"">[email protected]</a>[email protected] )<br />" 'remnant of formatting tmp$ = "<a href=""mailto:[email protected]"">[email protected]</a>" 'corrected remnant Replace tempp$ With tmp$ In HTMLOut$ tempp$ = "<a href=""mailto:[email protected]"">[email protected]</a>[email protected]" 'remnant of formatting tmp$ = "<a href=""mailto:[email protected]"">[email protected]</a>" 'corrected remnant Replace tempp$ With tmp$ In HTMLOut$ 'save the result Open "threads_formatted\" + ThreadName$ For Output As #x Print #x, HTMLOut$; Close #x End Sub Thread Function DownloadThreads(ByVal x As Long) As Long Local i, iCount, iPosA, iPosB, Flag As Long, temp$, tmp$, fName$, Msg$, gTemp$, SimpleURL$ StopAction = 0 Statusbar Set Text hDlg, %IDC_Statusbar, x, 0, "" For i = 1 + (x-1)*%URLsPerThread To x*%URLsPerThread If i > UBound(URLs) Then Exit For If StopAction Then Statusbar Set Text hDlg, %IDC_StatusBar, x, 0, "Stop" : Exit For iPosA = InStr(-1, URLs(i), "/") iPosB = InStr(iPosA+1, URLs(i), "-") If iPosB = 0 Then 'is shortened URL temp$ = Trim$(Mid$(URLs(i), iPosA+1)) Msg$ = Format$(Val(temp$),"0000000") fName$ = "threads\" + Msg$ + ".htm" SimpleURL$ = URLs(i) Else 'is full URL temp$ = Trim$(Mid$(URLs(i), iPosA+1 To iPosB-1)) Msg$ = Format$(Val(temp$),"0000000") fName$ = "threads\" + Msg$ + ".htm" SimpleURL$ = Left$(URLs(i), iPosB-1) End If If GetThread(SimpleURL$, gTemp$, 1) = 200 Then Incr iCount RemoveTodayYesterday gTemp$ Open fName$ For Output As #x : Print #x, gTemp$; : Close #x End If If iCount Mod 10 = 0 Then Statusbar Set Text hDlg, %IDC_Statusbar, x, 0, Format$(iCount,"##,##0") Next i Statusbar Set Text hDlg, %IDC_StatusBar,x,0, Format$(iCount,"##,##0") + " x" sBeep Dialog Set Text hDlg, "Downloaded Threads " + Format$(UBound(URLS), "##,##0") + " " + StartTime + " " + Time$ End Function Function GetThread(URL$, gTemp$, GetExtra As Long) As Long On Error GoTo GetX Local pWHttp As IWinHttpRequest, iResult As Integer Local nBytes As Long, vTemp As Variant Dim vByteArray(0) As Byte pWHttp = NewCom "WinHttp.WinHttpRequest.5.1" If IsNothing(pWHttp) Then Exit Function pWHttp.Open "GET", URL$ pWHttp.Send iResult = pWHttp.WaitForResponse(5) ' Wait for response with a timeout of 5 seconds If pWHttp.Status = 200 Then ' 200 OK vTemp = pWHttp.ResponseBody ' this keeps it unchanged vByteArray() = vTemp nBytes = UBound(vByteArray) - LBound(vByteArray) + 1 ' Convert the array of bytes to a string If nBytes Then gTemp$ = Peek$(VarPtr(vByteArray(0)), nBytes) If GetExtra Then GetExtraPages gTemp$ End If End If ' 200 OK Function = pWHttp.status pWHttp = Nothing Exit Function GetX: Open "temp\download_failures.txt" For Append As #7 : Print #7, URL$ : Close #7 sBeep : sBeep : sBeep End Function Sub GetExtrapages(gTemp$) Local i, iResult, iCount, iPosA, iPosB As Long, extraURL$, baseURL$, tmp$, temp$ If InStr(gTemp$, ">window.loc") Then Exit Sub 'is redirector. no action taken If InStr(gTemp$, "<span class=""pagetotal"">1</span>") Then Exit Sub 'no extra pages 'get full baseURL iPosA = InStr(gTemp$, "baseurl=") iPosB = InStr(iPosA, gTemp$, ">") baseURL$ = Mid$(gTemp$, iPosA+9 To iPosB-2) 'get only baseURL through thread# iPosA = InStr(-1, baseURL$, "/") iPosB = InStr(iPosA+1, baseURL$, "-") If iPosB = 0 Then baseURL$ = Mid$(baseURL$, iPosA+1) 'is shortened URL If iPosB Then baseURL$ = Left$(baseURL$, iPosB-1) 'is full URL 'get count of how many pages there are iPosA = InStr(gTemp$, "<span class=""pagetotal"">") iPosA = InStr(iPosA, gTemp$, ">") iPosB = InStr(iPosA, gTemp$, "<") tmp$ = Mid$(gTemp$, iPosA+1 To iPosB-1) iCount = Val(tmp$) 'get all of the extra pages ... 2 through iCount For i = 2 To iCount extraURL$ = baseURL$ + "/page" + Trim$(Str$(i)) iResult = GetThread(extraURL$, temp$, 0) If iResult = 200 Then gTemp$ += $CrLf + "--gbThreadBuilder--" + Format$(i,"0000") + $CrLf + temp$ Next i End Sub Sub LoadForumNames Local i,FI As Long, temp$ FI = FreeFile ReDim Forums(1 To 100) 'zero Open Exe.Path$ + "filesB\forums.txt" For Input As #FI While IsFalse Eof(FI) Line Input #FI, temp$ Incr i Forums(i).short = Trim$(Parse$(temp$,":",1)) Forums(i).shortL = LCase$(Forums(i).short) Forums(i).forumname = Trim$(Parse$(temp$,":",2)) Forums(i).status = Val(Parse$(temp$,":",3)) Forums(i).tabs = Val(Parse$(temp$,":",4)) Forums(i).tcount = Val(Parse$(temp$,":",5)) Forums(i).pcount = Val(Parse$(temp$,":",6)) Wend Close #FI ReDim Preserve Forums(1 To i) End Sub