General :  K-Meleon Web Browser Forum
General discussion about K-Meleon 
Help download images from otzar.org
Posted by: Shaina Peretz
Date: August 10, 2023 09:09PM

I would appreciate any feedback. I am trying to save pages one-by-one from http://www.otzar.org/book.asp?104414. My K-Meleon could do that before but no longer. I upgraded to the latest version 76.4.8 on Goanna 3.5.0 (build 20230805). It cannot even load the page. And in other browsers - even with JS disabled - the right-click menu never gives an option to save pics. How do I fetch the images out then?



Edited 2 time(s). Last edit at 08/11/2023 11:22PM by JohnHell.

Options: ReplyQuote
Re: Help download images from otzar.org
Posted by: JohnHell
Date: August 10, 2023 10:38PM

Rather impossible.

The code they have implemented obfuscates the images at the level of serve incomplete PNGs and other stuff.

Don't bother, with any browser.

Well, at least I won't bother. Too much hassle even for a couple of images.


Disabling JS in other browsers won't work because they don't actually disable JavaScript as K-meleon does. They have to reload the page to load a page without javascript and, then, the page won't load as relies on JavaScript.



Edited 1 time(s). Last edit at 08/11/2023 11:23PM by JohnHell.

Options: ReplyQuote
Re: Help download images from otzar.org
Posted by: Shaina Peretz
Date: August 11, 2023 08:02PM

JohnHell
When you mention “incomplete PNGs,” why do they still show up on screen? I have been under an impression that you can save stuff as long as browsers show it. By the way, I know that JS itself can be instantly disabled in almost any browser.



Edited 1 time(s). Last edit at 08/11/2023 11:23PM by JohnHell.

Options: ReplyQuote
Re: Help download images from otzar.org
Posted by: JohnHell
Date: August 11, 2023 09:46PM

Quote
Shaina Peretz
JohnHell
When you mention “incomplete PNGs,” why do they still show up on screen?

Because it is reconstructed via JavaScript function, but the data that comes from the HTTP query for each scanned page is an incomplete PNG, without headers or encrypted (I didn't dig that much).

EDIT;
To give an example, if you download the first page (or better said, the query for that image):
https://tablet.otzar.org/api/images/104414/S0001?c=a70bc483d47b6e8d5ae36f25955d3038

You'll find by the end of file of the response data is a PNG, but the headers aren't for a real PNG. It is just useless data without the header.

The "C" parameter might be the sha-1 or md5 hash of the actual image, I guess.



Quote
Shaina Peretz
I have been under an impression that you can save stuff as long as browsers show it.

Sometimes you need invest time, too much time, not just what you see is what you can download.

Web developers are trying harder and harder to limit the user options.

Quote
Shaina Peretz
By the way, I know that JS itself can be instantly disabled in almost any browser.

As you can see, not out of the box as K-meleon can.

In the other hand, that is about PaleMoon. So maybe there is a bit of light.

You could try on other browsers (I mean Firefox, I don't care about Chrome based) the developer console, as the reddit tells, and try to catch the actual images from there, if it catches them. Haven't tried.


All is a matter of the time you want to invest on getting the images one by one.


If you don't mind I'll split this "subthread" as is a bit off-topic here smiling smiley



Edited 4 time(s). Last edit at 08/11/2023 11:24PM by JohnHell.

Options: ReplyQuote
Re: Help download images from otzar.org
Posted by: Shaina Peretz
Date: August 11, 2023 10:09PM

I’ve seen that method of obfuscation at some Manga sites when the image gets reconstructed in the browser itself. But in end it is still reconstructed in a way we can see it. Thus, there must be a way to save it. I am a staunch user of K-Meleon, but I tried instantly disabling JS in other browsers, however, right clicks still never give any option to save pics. I tried Offline Explorer and some Manga downloaders... nothing has worked so far 😿



Edited 1 time(s). Last edit at 08/11/2023 11:24PM by JohnHell.

Options: ReplyQuote
Re: Help download images from otzar.org
Posted by: JohnHell
Date: August 11, 2023 11:27PM

On K-meleon, even at this level, if the site could load on it, it would be quite simple, but no help in this case.

Other browsers behave differently and without macros or extension to help can be a hassle.

Options: ReplyQuote
Re: Help download images from otzar.org
Posted by: Shaina Peretz
Date: August 11, 2023 11:42PM

I could not get the latest K-Meleon to even open the site. Other browsers can. But even when the page loads, and JS is instantly disabled, right click never gives an option to save pics.

Options: ReplyQuote
Re: Help download images from otzar.org
Posted by: JohnHell
Date: August 12, 2023 01:31AM

It just uses very recent javascript apis so, K-meleon wouldn't be the choice here and your best try is search, search and search until you find some method for other browsers.


Maybe I was too broad when I said "Rather impossible" and might exist some way, but you need to find it. I should have said "rather difficult", maybe.


Sorry, I can't help on that sad smiley

EDIT:
For what I have seen now, they are drawn to canvas. You should find some canvas to image extension converter.



Edited 2 time(s). Last edit at 08/12/2023 01:51AM by JohnHell.

Options: ReplyQuote
Re: Help download images from otzar.org
Posted by: Shaina Peretz
Date: August 15, 2023 06:59PM

May I ask for additional tips? Here is the page loaded with JS disabled, therefore, right click is available. There is no option, as you can see, to save any image or convert canvas:



I tried uploading the png you provided to some fix-png-online sites, however, neither was able to restore the image. I’ve also encountered several “canvas blockers” online, but I am unsure they can help here. I think they cannot. In addition, I searched for some handy extensions to perform canvas to image conversion, found none so far. Also tried (as the last resort) various screenshot grabbers that allow capturing the entire scrolling area, and none of them worked here because otzar.org has something that prevents scroll capturing...

The funny thing about otzar.org is that although they pose as pious and God-fearing men, the majority of books they offer are either under no copyright because they were printed hundreds of years ago and should be free-for-all (as in my example), or otzar.org books still have copyright because they were published just a few years ago by other people, but computer-illiterate ghetto authors and publishers who produced them may not even know their works have been pirated. I won’t even mention that OCR’d pages at otzar.org are full of errors. Yet otzar.org has a hefty price for any stuff they sell! As Ecclesiastes says, he that loves silver shall not be satisfied with silver.

Options: ReplyQuote
Re: Help download images from otzar.org
Posted by: JohnHell
Date: August 15, 2023 09:18PM

Quote
Shaina Peretz
May I ask for additional tips? Here is the page loaded with JS disabled, therefore, right click is available. There is no option, as you can see, to save any image or convert canvas:

I know, I didn't intend (not have knowledge) to say there was an option out of the box smiling smiley


You have this extension for Chrome based browsers:
https://chrome.google.com/webstore/detail/canvas2png/jinndaoajfaliklhhkgmmmpdhikeicaj

Or you have this "dirty" bookmarklet:
https://stackoverflow.com/questions/923885/capture-html-canvas-as-gif-jpg-png-pdf#comment-34787091


There are options, found in a search now, after all.


The later, maybe can work adapting it to select all canvas and open a window with the new image for every one.


Quote
Shaina Peretz
The funny thing about otzar.org is that although they pose as pious and God-fearing men, the majority of books they offer are either under no copyright because they were printed hundreds of years ago and should be free-for-all (as in my example), or otzar.org books still have copyright because they were published just a few years ago by other people, but computer-illiterate ghetto authors and publishers who produced them may not even know their works have been pirated. I won’t even mention that OCR’d pages at otzar.org are full of errors. Yet otzar.org has a hefty price for any stuff they sell! As Ecclesiastes says, he that loves silver shall not be satisfied with silver.


Despite you are right, they have the right to request a payment for they work on scanning and make it accessible to the public.

I guess they are not charging by the book, but the work behind.

I'm not judging if it is fair or abusive, just what could be happening.



Edited 1 time(s). Last edit at 08/15/2023 09:19PM by JohnHell.

Options: ReplyQuote
Re: Help download images from otzar.org
Posted by: Shaina Peretz
Date: August 16, 2023 09:09PM


Hmm... Both have not worked. And with the bookmarklet it just opens a blank window, at least with this code:
Quote

javascript:void(window.open().location = document.getElementsByTagName("canvas")[0].toDataURL("image/png"))
I am usually clueless when it comes to programming code, yet I tried changing [0] above to other numerals, but got blank pages anyhow.

Quote

Despite you are right, they have the right to request a payment for they work on scanning and make it accessible to the public.
I guess they are not charging by the book, but the work behind.
I'm not judging if it is fair or abusive, just what could be happening.

I hate being judgemental, however, I have not been happy with otzar.org capitalistic approach. I’ll elaborate: Bar Ilan 31+ — also known as the “world’s largest” database — is currently sold for $1,389.95, while Otzar HaHochma School Version 21 is been sold for $2,995.95. As you can see, the diff is $1,606! Bar Ilan’s database has been produced since the second part of the 20th century, whereas Otzar HaHochma is a relatively new critter. Texts at otzar.org are quickly scanned (and some pages are even missing) and OCR’d for pennies by college students working for Big Bosses. On the other hand, Bar Ilan is a much respected university that pays decent salaries to people hired. Besides, loads of OCR errors at otzar.org make any search ineffective. Well, even the quick comparison chart shows Bar Ilan’s superiority. Of course, anyone may legally make money if no laws are broken, still... consumers who buy from otzar.org are mostly poor people who aspire that ancient wisdom (these aren’t millions who go to watch Barbie these days). Thus, is it right to charge so much even for a “School Version” when the original investment has been fully recompensed many times long ago? If I were in charge of that enterprise, I would be more people-friendly, but that is just me:


Options: ReplyQuote
Re: Help download images from otzar.org
Posted by: JohnHell
Date: August 16, 2023 09:45PM

Quote
Shaina Peretz

Hmm... Both have not worked. And with the bookmarklet it just opens a blank window, at least with this code:
Quote

javascript:void(window.open().location = document.getElementsByTagName("canvas")[0].toDataURL("image/png"))
I am usually clueless when it comes to programming code, yet I tried changing [0] above to other numerals, but got blank pages anyhow.
javascriptsad smileyfunction(){var%20canvas%20=%20document.querySelectorAll('.img-canvas');%20if(canvas.length%20>%200){for(var%20x=0;%20x%20<%20canvas.length;%20x++){%20window.open().location%20=%20canvas[x].toDataURL('image/png');}%20}%20})();

Try the above, but..., 2 things:

- it is a very CPU time and RAM consuming;
- I found that the page has only 3 active canvas, so 3 pages only at a time


As I previously said, all depends on the time you want to invest.

If you can't copy it correctly, quote this message and copy from the editor.


About the other thing, as I said, "despite you are right", I was just clarifying the possible reasons.

Options: ReplyQuote
Re: Help download images from otzar.org
Posted by: Shaina Peretz
Date: August 16, 2023 10:18PM

Cannot make it work. I created a button with your code, but nothing happened when clicking on it. I added alert('over') after your code, and the message showed up, meaning that the script gets executed correctly.

Options: ReplyQuote
Re: Help download images from otzar.org
Posted by: JohnHell
Date: August 17, 2023 01:55AM

On roytam's NewMoon worked but what I said, extremely CPU and RAM intensive (needs to encode on base64 and then decode back).

On Gecko/Goanna engine browsers, disable temporarily CSP, just in case (about:config, filter by security.csp.enable and set to true before open the web).

If still doesn't work, I have no clue

Options: ReplyQuote
Re: Help download images from otzar.org
Posted by: Shaina Peretz
Date: August 17, 2023 07:07PM

Do you know how to add it to Pale Moon, not New Moon? I lack much JS knowledge to write the button code. In my case, at least, it must start with “Components.interfaces” to be able to tweak the contents of a browser window below, and I am unsure how to do that. Yet if I take a static page with canvas and add your code like this
Quote

<html>
<head>
<title>GitHub</title>
<style>.img-canvas{background-color:white;}body{background-color:white;}button{background-color:lightgreen;margin-left:20pt}</style>
</head>
<body>
<canvas class='img-canvas' id='img-canvas'></canvas><br>
<script>
var canvas,savnac;canvas=document.getElementById('img-canvas');savnac=canvas.getContext('2d');savnac.strokeStyle='lightgreen';savnac.lineWidth=36;savnac.beginPath();savnac.arc(70,70,50,0,Math.PI*2);savnac.stroke();
function canvas2img(){
var canvas = document.querySelectorAll('.img-canvas'); if(canvas.length > 0){for(var x=0; x < canvas.length; x++){window.open().location = canvas[x].toDataURL('image/png');}}
}
</script>
<button onClick='canvas2img()'>canvas2img</button>
</body>
</html>
then it indeed opens a new window with a converted image. Since so far I could not make the button code work, I tried to manually add the code to loaded pages of otzar.org through browser’s web inspector. But then no new window opens up and, besides, all canvas images in the original page instantly disappear. Strange!

Options: ReplyQuote
Re: Help download images from otzar.org
Posted by: JohnHell
Date: August 17, 2023 10:35PM

It is easier than that winking smiley

A bookmarklet is just that, a bookmark with a javascript line. Create a new bookmarks, and put the full "javascript:blablabla" line as url in its field.

Then, just click on it.

It will open 3 tabs/windows with the available canvas at that moment. They recycle them while you "scroll down" the pages.

Options: ReplyQuote
Re: Help download images from otzar.org
Posted by: Shaina Peretz
Date: August 18, 2023 09:00PM

Thank you, John, it has worked! Of course, bookmarklets, and they are cross-browser compatible too! My head was surely somewhere else — for I never thought in terms of them but browser APIs and script injections. You are the best! I slightly updated your code to prevent its accidental launch because once initiated, it really takes a loooooooooong time to complete:
javascript:(()=>{if(confirm('Would%20you%20like%20to%20perform%20canvas\nto%20image%20conversion%20for%20otzar.org%3F')==true){var%20cx=document.querySelectorAll('.img-canvas');if(cx.length>0){for(var%20x=0;x<cx.length;x++){window.open().location=cx[x].toDataURL('image/png')}}}})();
To all who see this post by the time greedy otzar.org bosses update their CSS to tweak “img-canvas” class: Just open the page source code, see what they have changed the class name to, and update the script above accordingly. One note: The script indeed works on three images at a time, however, it may process pages in an odd way. For example, if you scroll down to page 7 and launch the script, you may get pages 7, 8, and... 5! That, I assume, is because of the browser’s current cache. So, watch out! Of course, it would be much better to have some fast auto downloading script for all pages, but John’s is a piece of cake too!



Options: ReplyQuote
Re: Help download images from otzar.org
Posted by: JohnHell
Date: August 18, 2023 09:48PM

Credits to the original author at stackoverflow, not me.

The page loading behaviour I assume might happens depending on the power of your machine and the network bandwidth on both ends to retrieve the new pages.

To my understanding, they try to have available the current page, the previous and the next to mimic "smooth scroll reading", and if something is slow, some canvas might not get recycled in time.


Oddities of overprotection.

Options: ReplyQuote


K-Meleon forum is powered by Phorum.