infosec miscellany programming python technical

Bing IP Search via Python

Some time ago I wrote a small python script to search and parse out results from an IP: search on Bing. You can get it from the githib repo here. I had to work around some odd/broken behaviour from Bing along the way.

The Bing IP: operator allows you to search for an IP address and return results from any sites sharing that IP. Can be useful, but it was clunky to do manually so I wrote a script to do it all in the console.

I haven’t used it in years but recently thought to dust it off to publish on gitHub, and after searching through some old folders I dug out the most recent version I could find. It needed updating to Python 3 which was a simple matter of updating a few lines to print(), but I noticed some other problems which took longer to debug.

Firstly, the script worked but the result set was really short. After some manual testing I discovered this was due to some apparent bug/faults with Bing itself which have developed since I first wrote the script.

In general, the Bing IP search seems to be quite neglected: if you visit the default search form at bing.com and enter ip:204.79.197.200 – (bing.com) – for example, it returns an empty page (really empty – a blank, 0kb http response body). Removing all but the actual query parameter from the url string causes the page to render, so you end up with a url like this which works:

https://www.bing.com/search?q=ip%3A204.79.197.200

Which is what the script uses.

The main problem however was the truncated results set. The script tries to load more pages but beyond the first page they are empty – even with the parameter stripping hack which works on the first page, the rest are back to an empty response. It turns out that to load additional results pages, the additional URL parameter ‘first’ is required (eg first=11 – start from result 11) and it appears that more than one parameter used with ‘ip:’ alone in the query string breaks the site.

I confirmed this behaviour with a simple test which resulted in an empty response:

https://www.bing.com/search?q=ip%3A204.79.197.200&foo=bar

So it seems something is definitely breaking on the bing backend, and it happens when IP: is used as the query plus additional GET params; however I found that everything worked as expected as long as IP: was not the only search term in the query parameter itself. I still wanted just the results from the ip search without any modifiers/filters, so I tested out some Bing search operators and came up with these workarounds which resulted in all pages loading as advertised:

  • () ip:204.79.197.200
  • ip:204.79.197.200 OR ip:204.79.197.200
  • +ip:204.79.197.200

The () operator means (include these search terms). I left it blank to see what the behaviour was: it seemed to partially work so far as working around the page loading bug. Same with the OR operator, although both of these returned less results than the last one I tried (+) so I stuck with that.

The (+) operator simply means ‘this term must be included’. Seems redundant with only one search term, and I wasn’t sure how it would behave when applied to another search operator, but it worked and returned more results than the previous attempts so I settled on that.

You can check it out on github.

programming technical

A time travellers primer to Web Development in the 21st Century

This post started as an email aimed at untangling some concepts around modern web development for a friend. Like myself, he had spent quite some time in infrastructure roles away from the web development space and found understanding the new scene to be a bit of a hurdle.

Understanding how non-traditional web apps worked became necessary for me as a pentester and I had been on the topic for a while at this point, so I wrote this primer mostly as a brain-dump and a way to organise some of the concepts I knew about.

Caveats: Opinionated, I am not a developer, I very likely get things wrong. May be vaguely insulting to front end devs but we are all points on the same curve and I really, honestly bear you no ills. I think modern web apps are wonderous.

So, you’ve just arrived in the future.

You just stepped out of the time machine after arriving from the late 90’s, or early 2000’s. You have, or had, some coding skills, maybe ASP or probably PHP. You even did some development, maybe hacking a few wordpress themes and plugins, or creating some custom websites (this is before they were called ‘web apps’). You setup Windows servers with IIS, or Linux with Apache. The LAMP stack was the main scene. Javascript and CSS were just starting to be more of a thing. You mostly understood how it all worked.

Then you spent the next 15 years happily doing infrastructure or some other non web-dev role. This was your time machine, and now you’ve emerged, you’re dipping your foot back in the web development scene, and things are… different.

Culture shock is a thing

If you’re just getting (re)started, there’s a big gap between how things were and the current state of the art and it never stops moving, so there is no wonder it can be disorienting getting back onboard. The new scene can seem very different. There is help and friendly faces everywhere, and quite a few old school people here too, but the current generation has brought a whole new language and culture to navigate.

The good news is, after you’ve learned to interpret the new colloquialisms and peeled back the labels on what’s trendy to see what’s underneath, what you’ll find is still good old fashioned programming (and server) paradigms. The stuff you learned back in class doing C, PHP, even Java are relevant, if only translating the concepts to the current languages.

Frameworks for old people

After being used to coding my own stuff with the blinkers on for a long time, I started out being fairly old-man suspicious of frameworks. ‘I don’t want no layer of hippie abstraction getting in between me and the real code, get off my lawn’ and so on. This was based in ignorance, and once I realised what frameworks are and how they actually work, I got over that fast. 

Simply, a framework is a set of libraries – pre-written chunks of code – to do common and useful jobs in your app. A framework includes a whole set of functions, classes and so on done in the language of choice, so when you start making a (web) app, you don’t have to reinvent really common wheels, taking your base language and writing something to do the same job, except badly. Also if it’s an open source framework, it benefits from all the community eyes on the code, especially in areas like efficiency and security.

Effectively, if you make a web app *without* using an existing framework, you’re going to be writing parts of your own framework. Except it probably won’t be as good as what’s out there already.

This quote springs to mind and I think is a good analogy when comparing any particular language to a decent framework written in it:

“Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.”

https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule

Some examples of common jobs a framework might take care of for you:

  • Handle database connections to whatever server you want (mysql, mssql, postgres, etc), generate schema, and interface with your code via a nice API. So you don’t have to write a bunch of low level database calls, construct an SQL query out of some variables sent over as part of a web request, and introduce some horrible security bugs in the process.
  • Take some field names and create a html form in the page. Or create a form straight out of a database record with everything setup so you can hit ‘submit’ and update the data in the backend.
  • Handle authentication, so you just tell it that ‘this part of the app’ or ‘access to URLs matching that pattern’ needs a valid user logged in, and it will deal with storing credentials in the database, prompting the user for username and password, and doing magical things with browser cookies, so you the coder don’t have to. (Unless you want to).

So frameworks are good.

API, oh my

Last time you heard the term “API” was back in your Computer Science class, right? But you’re safe, because it has to do with systems programming or something you’re unlikely to be doing. Actually, no. In modern web apps, APIs are a big thing.

Quick refresher: an API (Application Programming Interface) is a high level term for a relatively friendly layer between your code and the actual work being done. A programming language or framework could have an API in the form of function calls within the code. So instead of, for example, having to work out how to convince the runtime environment (eg operating system, web browser, etc) to draw a circle on the screen via a highly granular, low level method doing something horrific and error prone like directly manipulating video memory, there would be a higher level API exposed which you could call instead by something like: draw_shape(‘circle’, ‘large’, ‘green’, ‘middle’), which would draw a large green circle in the middle of the screen.

In the same way, a modern web application uses Web APIs in the form of  dedicated URLs to get things done in the app. One such URL might look like:  https://my.web.site/api/search?query=test which is an interface to a search function on the site, which might be expected to return some useful data. Modern web apps make heavy use of Web APIs, which we will talk about a bit more soon.

Time for some higher level design abstraction

Next, let’s talk about the term ‘MVC’ (Model-View-Controller). MVC is a design pattern used to separate logical functionality in the code. It’s a useful idea and lots of frameworks are ‘MVC’ these days (or MVP, or MVVM – but let’s just stick with getting the idea of MVC for now). This is just a way of saying the framework conforms to the defacto standard of providing roughly three main chunks of functionality :

  • Database handling (model)
  • User interface stuff (view)
  • Everything else (controller)

‘Model’ deals with functions to access the database. ‘View’ handles input from the user and creates interfaces. The third element ‘Controller’ is code you write to actually do things in your app, especially stuff which might not be directly related to the framework.

For example, I might make a simple app to check remote RSS feeds for me and render them to a HTML table. I write some controller code which does some web requests on demand to download the feeds from a remote site and pass the data to an API on the model interface to shove it in the database. Then when a user comes along to use the app, some view code generates a welcome screen, followed by an ‘update remote feeds’ button the user can push which calls the controller code, followed by a nice HTML presentation of the data (which it pulled back out of the database via the model interface).

Clear. As. Mud. So that’s MVC.

Back to the tools

Next, nuts and bolts: languages, web servers, and so on. There are a lot of labels on everything, so brace yourself. React! Angular! Rails! NGINX! Python! DotNet! Node! Java! Full Stack! Webpack! Ruby! Javascript! CoffeeScript! JSX! JSON! And that’s just the start. It’s difficult to cleanly categorise; it’s more of a venn diagram, but here goes:

Frameworks and Servers

Frameworks and webservers – suprisingly, the lines blur. A framework can be ‘full stack’ and implement its own webserver – after all it’s just code, it can do whatever it wants, including serving HTTP requests. Usually there is dedicated code which is much better at being a webserver than the framework itself though, often a server like Apache, NGINX, IIS, so most often the framework just hangs off a ‘real’ webserver, which has functionality (eg WSGI) to interface nicely with your app/framework so they can live together and each part can get on with doing what it does best.

But, it’s useful to keep in mind the framework can be the webserver as well. Django has its own development server built in, for example (with a dedicated webserver recommended for production). When you run this you’re basically running a python script which fires up some (python) webserver code which relays web requests back and forth to the django code. It’s functionally the same as you’d experience in production but a lot easier to run and debug, so that’s where you do your development.

(I’m going to talk about JavaScript frameworks in a minute as a few rules are broken, but we’re getting there).

Languages and Frameworks

Languages. You have a bunch, I guess you can make a framework in anything you like, but here’s some of the key players:

  • Microsoft: You may have heard of them. They have a phat application framework called .NET. The subset of this which is used for web apps is called ASP.NET. .NET supports creating applications in a bunch of languages, (which it compiles to an intermediate language called CIL – Common Intermediate Language). ‘C#’ is one of these which is often used as a coding language for .NET. C# is Microsoft’s mutated/evolved/’inspired by’ version of C++ and Java. 
  • Ruby: Open source general purpose language. Mostly known due to the hugely popular Ruby on Rails (RoR) web framework. RoR was ‘extracted’ from the work done on a commercial Ruby web app (Basecamp), when the author realised he could publish a lot of the app code he had written as reusable code in the form of a web framework. Voila; RoR was born and made many web devs happy.
  • Python: Open source general purpose language, very popular. A bunch of frameworks are written in Python, but probably the best known / most popular / defacto standard is ‘Django’. As far as web frameworks go, RoR and Django are very similar aside from obviously using different languages and having some different philosophical quirks about how things get done. It’s a complex topic, no doubt the subject of many an internet forum holy war, but the difference between Django vs Ruby basically boils down to how much ‘magic’ there is in each framework – Django expects you to get your hands a bit greasier with the nuts and bolts while RoR is more focused on making decisions for you in order to get things working out of the box.
  • Java: Yes, Java is used for more than just slow performing desktop apps and browser applets from the 90’s with strange non-native user interface widgets. It’s popular in backend applications as well, including web frameworks, and in my experience particularly for Enterprise apps. I’m really not sure why; possibly there are a lot of venerable old Java devs out there keeping it ticking over while the young foks use newer tech? Spring seems to be one of the more popular frameworks, alongside Tomcat as the server of choice most of the time.
  • PHP: The venerable 90’s beast of yore. Word on the internet is PHP doesn’t seem to be considered a real contender anymore (if only from a security standpoint) – cue uproar from PHP devs – sorry about that. Let me clarify: The vibe I get from within the cybersecurity and dev community in general is, PHP has some problems, and its competitors (Ruby, Python) are both more secure and much nicer to use. But hey, don’t let me stop you with a lot of misinformed negative talk – a lot of sites still run PHP, it is actively developed, I have fond memories of it from the early days, and if it is your language of choice, go for it. There are a heap of PHP web frameworks (eg Laravel, CakePHP) – I imagine the same general paradigms apply.

The browser scripting language that could

Ok. JavaScript gets its very own section.

JavaScript is obviously the language traditionally associated with running cutesy bits of user interface stuff inside a browser, like making a button pop when clicked, or some cool looking tooltip. JavaScript is an OO language which looks kind of like C and mostly nothing like Java. It’s colloquially considered to be a bad language, but as long as it was confined to being used to create UI widgets by harmless ‘web designers‘, limited damage could be done. And it was probably going to stay that way, and who could blame it, it was a defacto standard written in ten days by a guy at Netscape in the 90s and it ran in BROWSERS, nobody expected anyone was in danger of creating skynet with this thing anytime soon.

Then Google came along and basically strapped a rocket engine onto JavaScript by creating a hugely powerful and optimised engine called V8, mainly to make webpages faster in Chrome. V8 is skynet. It compiles JavaScript to native machine code at runtime. And reoptimises it dynamically. This is how you can get entire virtual machines implemented in JavaScript in the freaking browser. Google, what darned heck hast thou wrought.

So, it became a feasible thing to run big chunks of JavaScript in the browser nice and fast. Front end coders rejoiced.

Which brings us to now

Entire MVC frameworks have thus been implemented in JS. Functionally, when the whole MVC framework is in the browser, when you hit a site, effectively it dumps all or part of the app down the tube to your browser in a bunch of JavaScript files. After that, the webserver falls back to being a HTTP talking database API endpoint, and the JavaScript app running in the browser talks to it via JSON or XML, via HTTP requests. You don’t necessarily even have to be online for the web app to keep working. Just like when you go offline and your Gmail session is still kind of functional – it’s still running everything locally in the browser.

Gmail, by the way, uses the Google sponsored open source JavaScript MVC framework ‘AngularJS’. Doing your web app this way makes heaps of sense if you have a huge user base as it effectively pushes all the interface load off to the client.

Now JavaScript runs fast, there is a lot more of it in use. The web interface designers have rebranded into ‘frontend developers’, and why not, since the language they code in can sure enough be used to make full apps. Even ‘native’ desktop and mobile apps. There are a lot more open source JS projects available, running the gamut from simple widget libraries like jQuery, to user interface libraries like React (“The ‘V’ in MVC”), to full MVC frameworks like Angular. 

On a side note, the promotion of JavaScript to major league status has corresponded with an explosion of the number of JavaScript based frameworks and tools, of reportedly wildly varying stability, development and support wise. There are a handful of major ones which can be relied on – jQuery, Angular (Google), and React (Facebook) are a couple which spring to mind, backed by the big players, and sound advice seems to be; use those, at least when starting out, because they’re more likely to stick around.

Enter JavaScript on the Backend

After all this though, you’re still running JavaScript in the browser which is served by a ‘real’ backend webserver and database. 

With JavaScript on the V8 engine running like a ferret on meth, the real pandoras box was opened when a developer decided to implement an entire webserver and framework in one running on top of JS, called NodeJS.

Disruptive generation gap

And depending which development circles you run in, opinion on Node can be polarised. Node is a webserver with big ambitions, using a crap language made powerful by Googles engine, populated by a new generation and community of developers, creating applications used by millions of people daily.

The arrival of Node made possible the incursion of interface designers into server side development, an event which may have ruffled some feathers among the old guard, to whom JavaScript was best constrained at the browser, but then when did technological progress every obey the rules? It’s called ‘disruptive’ because it disrupts.

Some tech notes on Node

My understanding of Node is limited but I’ll sketch it out.

Node runs as the web server, and is the framework as well (like Django in developer mode). The Node engine itself is actually mostly implemented in C – and it consumes JavaScript as its web app language. It is designed to talk to ‘NoSQL’ databases, which is another topic but is effectively a non-relational database type which runs quite fast with the tradeoff of it being, well, non-relational. (I hear that NoSQL may also seen as suspiciously hippie by some traditional DBAs).

Blocking? What blocking?

Node has a major advantage with certain types of applications when compared to apps written in more traditional languages; it is implemented to run code asynchronously by exploiting a feature of JavaScript known as the ‘non blocking event loop’. This effectively means it has a single thread which keeps spinning, processing whatever it can, not being blocked while it waits for some job to complete like a network connection or a database call. So it can be very fast/efficient when dealing with apps which require a lot of parallel processing. A bunch of big companies have jumped on the wagon, Node become flavor of the month, and a huge market sprung up for Node devs.

Packages packages everywhere

As a true badge of membership to the big leagues, Node has ‘NPM’ – node package manager (like apt or rpm, but for JavaScript modules). This can be managed on the command line of a node server (just like apt-get), making it easy to use and popular with devs. Lots of code is shared via NPM, and interesting and critical chains of dependencies can eventuate, with bad consequences. Some of the Node packages in common use have been considered to reflect poorly on the developers publishing and using them, as these discrete libraries do very simple jobs which are generally thought to be stuff a good coder should be able to implement themselves with low effort. Padding strings, for example. (See also: ‘npmgate‘).

Despite (or because of) its differences to traditional frameworks / engines, Node is very popular.

And that’s about it for now.

So there’s a brief overview of a few of the more popular moving parts in contemporary web application development. There is a lot more to dive into and the landscape evolves constantly (and I realise I haven’t even mentioned stuff like websockets, webGL, asm.js, and so on). So there’s a lot to digest if you’re new, but hopefully this gives you the beginning of the lay of the land and went some way to untangling things.

On a personal front, I’ve been having fun tinkering with Python and JavaScript, with a general aim of implementing my backends in Django and Flask, with ReactJS on the frontend. And with game dev always an enticing side track, and advancements like V8, HTML5 and WebGL, JavaScript is a real contender now to make games. A bunch of commercial game development tools (combo IDEs + frameworks) now offer the ability to compile directly to HTML5/javascript/webgl for browser publishing.

A good example is the JS game framework called Phaser (http://phaser.io/) as it looks like a great vector for gamedev and aligns well if anyone wants to do some light game dev and learn more about JS at the same time.

Thanks for reading!

infosec python technical

Parsing OpenVAS reports in Python

I was using OpenVAS to do some network auditing and accessing report results via the (Greenbone Security assistant) web interface quite often seemed somewhat slow and clunky. The report is downloadable as an XML file though, and I’ve recently been getting familiar with parsing nmap XML files in python, so a bit of scripting later and voila! GOXParse (Glens OpenVAS XML Parser) – a command line tool to quickly search / filter through the openvas scan results.

As an added bonus, you can output a .csv file from an nmap scan using gnxparse.py and feed it to goxparse.py to provide an inline comparison of open ports.

$ ./goxparse.py --help
usage: goxparse.py filename.xml [OPTIONS]

Glens OpenVas XML Parser (goxparse)

positional arguments:
  file  File containing OpenVAS XML report

optional arguments:
  -h, --help                show this help message and exit
  -i, -ips                  Output unfiltered list of scanned ipv4 addresses
  -host [HOSTIP]            Host to generate a report for
  -cvssmin [CVSSMIN]        Minimum CVSS level to report
  -cvssmax [CVSSMAX]        Maximum CVSS level to report
  -threatlevel [THREAT]     Threat Level to match, LOG/LOW/MEDIUM/HIGH/CRITICAL
  -matchfile [MATCHFILE]        .csv file from which to match open ports, in format HOSTIP,port1,port2,port3
  -v, --version         show program's version number and exit

usage examples:
        goxparse.py ./scan.xml -ips
        goxparse.py ./scan.xml -host <HOSTIP>
        goxparse.py ./scan.xml -cvssmin 5 -cvssmax 8
        goxparse.py ./scan.xml -threatlevel HIGH

 

You can get goxparse from the bitbucket repo here.