SaltyCrane Blog — Notes on JavaScript and web development

Calling JavaScript from Python to de-CloudFlare scraped content

Yesterday I wrote a script to scrape my own web page because I screwed up the CSV export feature and Product needed the data. One problem was that the CloudFlare CDN obfuscated the email addresses on the page. My solutioncrazy hack: running a Node.js script to de-obfuscate the email from my Python scraping script.

Example obfuscated email stuff from CloudFlare:

<a href="/cdn-cgi/l/email-protection#d4b7b5a6b194a4b1a0e7e2e4fab7bbb9">
    <span class="__cf_email__" data-cfemail="0162607364417164753237312f626e6c">[email&#160;protected]</span>
    <script cf-hash='f9e31' type="text/javascript">
     /* <![CDATA[ */!function(){try{var t="currentScript"in document?document.currentScript:function(){for(var t=document.getElementsByTagName("script"),e=t.length;e--;)if(t[e].getAttribute("cf-hash"))return t[e]}();if(t&&t.previousSibling;){var e,r,n,i,c=t.previousSibling,a=c.getAttribute("data-cfemail");if(a){for(e="",r=parseInt(a.substr(0,2),16),n=2;a.length-n;n+=2)i=parseInt(a.substr(n,2),16)^r,e+=String.fromCharCode(i);e=document.createTextNode(e),c.parentNode.replaceChild(e,c)}}}catch(u){}}();/* ]]> */
    </script>
</a>

Using jsbeautifier.org, I adapted the JavaScript from above into this Node.js script, decloudflare.js.

var e, r, n, i, a = process.argv[2];
for (e = "", r = parseInt(a.substr(0, 2), 16), n = 2; a.length - n; n += 2) i = parseInt(a.substr(n, 2), 16) ^ r, e += String.fromCharCode(i);
console.log(e);

Example usage:

$ node decloudflare.js 0162607364417164753237312f626e6c
[email protected]

I used the Naked library (thanks to Sweetmeat) to call the Node.js script. (Though probably I could've just used the subprocess module.)

$ pip install Naked 
from Naked.toolshed.shell import muterun_js

def decloudflare_email(cfemail):
    resp = muterun_js('decloudflare.js', cfemail)
    return resp.stdout.rstrip()

cfemail = '0162607364417164753237312f626e6c'
print 'cfemail from python: ' + cfemail
email = decloudflare_email(cfemail)
print 'email from python: ' + email
cfemail from python: 0162607364417164753237312f626e6c
email from python: [email protected]

Comments


#1 chintan suthar commented on :

Is it possible in C# without javascript?

disqus:3586963801