Author Archives: Glen

Avatar

Glen

Glen is an IT Person who also plays games and occasionally writes things. He has a website at glenscott.net and tweets @memoryresident.

A time travellers primer to Web Development in the 21st Century

This post started as an email aimed at untangling some concepts around modern web development for a friend. Like myself, he had spent quite some time in infrastructure roles away from the web development space and found understanding the new scene to be a bit of a hurdle.

Understanding how non-traditional web apps worked became necessary for me as a pentester and I had been on the topic for a while at this point, so I wrote this primer mostly as a brain-dump and a way to organise some of the concepts I knew about.

Caveats: Opinionated, I am not a developer, very likely get things wrong. May be vaguely insulting to front end devs but we are all points on the same curve and I really, honestly bear you no ills. I think modern web apps are wonderous.

So, you’ve just arrived in the future.

You just stepped out of the time machine after arriving from the late 90’s, or early 2000’s. You have, or had some coding skills, maybe ASP or probably PHP. You even did some development, maybe hacking a few wordpress themes and plugins, or creating some custom websites (this is before they were called ‘web apps’). You setup Windows servers with IIS, or Linux with Apache. The LAMP stack was the main scene. Javascript and CSS were just starting to be more of a thing. You mostly understood how it all worked.

Then you spent the next 15 years happily doing infrastructure or some other non web-dev role. This was your time machine, and now you’ve emerged, you’re dipping your foot back in the web development scene, and things are… different.

Culture shock is a thing.

If you’re just getting (re)started there’s big gap between how things were and the current state of the art and it never stops moving, so there is no wonder it can be disorienting getting back onboard. The new scene can seem very different. There’s help and friendly faces everywhere, and quite a few old school people here too, but the current generation has brought a whole new language and culture to navigate.

The good news is, after you’ve learned to interpret the new colloquialisms and peeled back the labels on whats trendy to see whats underneath, what you’ll find is still good old fashioned programming (and server) paradigms. The stuff you learned back in class doing C, PHP, even Java are relevant, if only translating the concepts to the current languages.

Frameworks for old people

After being used to coding my own stuff with the blinkers on for a long time, I started out being fairly old-man suspicious of frameworks. ‘I don’t want no layer of hippie abstraction getting in between me and the real code, get off my lawn’ and so on. This was based in ignorance, and once I realised what frameworks are and how they actually work, I got over that fast. 

Simply, a framework is a set of libraries – pre-written chunks of code code – to do common and useful jobs in your app. A framework is includes a whole set of functions, classes and so on done in the language of choice, so when you start making a (web) app, you don’t have to reinvent really common wheels, taking your base language and writing something to do the same job, except badly. Also if its an open source framework, it benefits from all the community eyes on the code, especially in areas like efficiency and security.

Effectively, if you make a web app *without* using an existing framework, you’re going to be writing parts of your own framework. Except it probably wont be as good as what’s out there already.

This quote springs to mind and I think is a good analogy when comparing any particular language to a decent framework written in it:

“Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.”

https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule

Some examples of common jobs a framework might take care of for you:

  • Handle database connections to whatever server you want (mysql, mssql, postgres, etc), generate schema, and interface with your code via a nice API. So you don’t have to write a bunch of low level database calls, construct an SQL query out of some variables sent over as part of a web request, and introduce some horrible security bugs in the process.
  • Take some field names and create a html form in the page. Or create a form straight out of a database record with everything setup so you can hit ‘submit’ and update the data in the backend.
  • Handle authentication, so you just tell it that ‘this part of the app’ or ‘access to URLs matching that pattern’ needs a valid user logged in, and it will deal with storing credentials in the database, promoting the user for username and password, and doing magical things with browser cookies, so you the coder don’t have to. (Unless you want to).

So frameworks are good.

API, oh my

Last time you heard the term “API” was back in your Computer Science class, right? But you’re safe, because it has to do with systems programming or something you’re unlikely to be doing. Actually, no. In modern web apps, APIs are a big thing.

Quick refresher: an API (Application Programming Interface) is a high level term for a relatively friendly layer between your code and the actual work being  done. A programming language or framework could have an API in the form  of function calls within the code. So instead of, for example having to  work out how to convince the runtime environment (eg operating system,  web browser, etc) to draw a circle on the screen via a highly granular, low level method doing something horrific and error prone like directly manipulating video memory, there would be an higher level API exposed which you could call instead by something like: draw_shape(‘circle’, ‘large’, ‘green’, ‘middle’), which would draw a large green circle in the middle of the screen.

In the same way, a modern web application uses a Web APIs in the form of  dedicated URLs to get things done in the app. One might look like:  https://my.web.site/api/search?query=test which is an interface to a  search function on the site which might be expected to return some  useful data. Modern web apps make heavy use of Web APIs, which we will talk about a bit more soon.

Time for some higher level design abstraction

Next, lets talk about the term ‘MVC’ (Model-View-Controller). MVC is a design pattern used to separate logical functionality in the backend. Its a useful idea and lots of frameworks are ‘MVC’ these days (or MVP, or MVVM – but just stick with getting the idea of MVC for now). This is just a way of saying the framework conforms to the defacto standard of providing roughly three main chunks of functionality :

  • Database handling (model)
  • User interface stuff (view)
  • Everything else (controller)

‘Model’ deals with functions to access the database. ‘View’ handles input from the user and creates interfaces. The third element ‘Controller’ is code you write to actually do things in your app, especially stuff which might not be directly related to the framework.

For example, I might make a simple app to check remote RSS feeds for me and render them to a HTML table. I write some controller code which does some web requests on demand to download the feeds from a remote site and pass the data to an API on the model interface to shove it in the database. Then when a user come along to use the app, some view code generates a login screen, followed by a ‘update remote feeds’ button the user can push which calls the controller code, followed by a nice HTML presentation of the data (which it pulled back out of the database via the model interface).

Clear. As. Mud. So that’s MVC.

Back to the tools

Next, nuts and bolts languages, web servers, and so on. There’s a lot of labels on everything, so brace yourself. React! Angular! Rails! NGINX! Python! DotNet! Node! Java! Full Stack! Webpack! Ruby! Javascript! CoffeeScript! JSX! JSON! And that’s just the start. It’s difficult to cleanly categorise; its more of a venn diagram, but here goes:

Frameworks and Servers

Frameworks and webservers – suprisingly, the lines blur. A framework can be ‘full stack’ and implement its own webserver – after all its just code, it can do whatever it wants, including serving HTTP requests. Usually there is dedicated code which is much better at being a webserver than the framework itself though, servers like Apache, NGINX, IIS, so most often the framework just hangs off a ‘real’ webserver, which has functionality (eg WSGI) to interface nicely with your app/framework so they can live together and each part can get on with doing what it does best.

But, its useful to keep in mind the framework can be the webserver as well. Django has its own development server built in, for example (with a dedicated webserver recommended for production). When you run this you’re basically running a python script which fires up some python webserver code which relays web requests back and forth to the django code. Its functionally the same as you’d experience in production but a lot easier to run and debug, so that’s where you do your development.

(I’m going to talk about JavaScript frameworks in a minute as a few rules are broken, but we’re getting there).

Languages and Frameworks

Languages. You have a bunch, I guess you can make a framework in anything you like, but here’s some of the key players:

  • Microsoft: You may have heard of them. They have a phat application framework called .NET. The subset of this which is used for web apps is called ASP.NET. .NET supports creating applications in a bunch of languages, (which it compiles to an intermediate language called Common Intermedia Language (CIL)). ‘C#’ is one of these which is often used as a coding language for .NET. C# is Microsoft’s mutated/evolved version of C and Java. 
  • Ruby: Open source general purpose language. Mostly known due to the hugely popular Ruby on Rails (RoR) web framework. RoR was ‘extracted’ from the work done on a commercial Ruby web app (Basecamp), when the author realised he could publish a lot of the app code he had written as reusable code in the form of a web framework. Voila; RoR was born and made many devs happy.
  • Python: Open source general purpose language, very popular. A bunch of frameworks are written in Python, but probably the best known / most popular / defacto standard is ‘Django’. As far as web frameworks go, RoR and Django are very similar aside from obviously using different languages and having some different philosophical quirks about how things get done. Its a complex topic, no doubt the subject of many a forum holy war, but the difference between Django vs Ruby basically boils down to how much ‘magic’ there is in each framework – Django expects you to get your hands a bit greasier with the nuts and bolts while RoR is more focused on making decisions for you in order to get things working out of the box.
  • Java: Yes, Java is used for more than just slow performing desktop apps and browser applets from the 90’s with strange non-native user interface widgets. It’s popular in backend applications as well, including web frameworks, and in my experience particularly for Enterprise apps. I’m really not sure why; possibly there are a lot of venerable old Java devs out there keeping it ticking over while the young foks use newer tech? Spring seems to be one of the more popular frameworks, and Tomcat seems to be the server of choice most of the time.
  • PHP: The venerable 90’s beast of yore. Word on the internet is PHP doesn’t seem to be considered a real contender anymore (if only from a security standpoint) – cue uproar from PHP devs – sorry about that. Let me clarify: The vibe I get from within the cybersecurity and dev community in general is, PHP has some problems, and its competitors (Ruby, Python) are both more secure and much nicer to use. But hey, don’t let me stop you with a lot of misinformed negative talk – a lot of sites still run PHP, it is actively developed, I have fond memories of it from the early days, and if it is your language of choice, go for it. There are a heap of PHP web frameworks (eg Laravel, CakePHP) – I imagine the same general paradigms apply.

The browser scripting language that could

Ok. JavaScript gets its very own section.

JavaScript is obviously the language traditionally associated with running cutesy bits of user interface stuff inside a browser, like making a button pop or some cool looking tooltip. JavaScript is an OO language which it looks kind of like C and mostly nothing like Java. It’s colloquially considered to be a bad language, but as long as it was confined to being used to create UI widgets by harmless ‘web designers‘, limited damage could be done. And it was probably going to stay that way, and who could blame it, it was a defacto standard written in ten days by a guy at Netscape in the 90s and it ran in BROWSERS, nobody expected anyone was in danger of creating skynet with this thing anytime soon.

Then Google came along and basically strapped a rocket engine onto JavaScript by creating a hugely powerful and optimised engine called V8, mainly to make webpages faster in Chrome. V8 is skynet. It compiles javascript to native machine code at runtime. And reoptimises it dynamically. This is how you can get entire virtual machines implemented in javascript in the freaking browser. Google, what darned heck hast though wrought.

So, it became a feasible thing to run big chunks of JavaScript in the browser nice and fast. Front end coders rejoiced.

Which brings us to now

Entire MVC frameworks have thus been implemented in JS. Functionally, when the whole MVC framework is in the browser, when you hit a site, effectively it dumps all or part of the app down the tube to your browser in a bunch of JavaScript files. After that, the webserver falls back to being a database API endpoint, and the JavaScript app running in the browser talks to it via JSON or XML via HTTP requests. You don’t necessarily even have to be online for the web app to keep working. Just like when you go offline and your gmail session is still kind of functional – its still running everything in the browser.

Gmail, by the way, uses the google sponsored open source JavaScript MVC framework ‘AngularJS’. Doing your web app this way makes heaps of sense if you have a huge user base as it effectively pushes all the interface load off to the client.

Now JavaScript runs fast, there is a lot more of it in use. The web interface designers have rebranded into ‘frontend developers’, and why not, since the language they code in can sure enough be used to make full apps. Even ‘native’ desktop and mobile apps. There are a lot more open source JS projects available, running the gamut from simple widget libraries like jQuery, to user interface libraries like React (“The ‘V’ in MVC”), to full MVC frameworks like Angular. 

On a side note, the promotion of Javascript to major league status has corresponded with an explosion of the number of JavaScript based frameworks and tools, of reportedly wildly varying reliability. There are a handful of major ones which can be relied on – jQuery, Angular (Google), and React (Facebook) are a couple which spring to mind, backed by the big players, and sound advice seems to be; use those, at least when starting out, because they’re more likely to stick around.

Enter Javascript on the Backend

After all this though, you’re still running JavaScript in the browser which is served by a ‘real’ backend webserver and database. 

With JavaScript on the V8 engine running like a ferret on meth, the real pandoras box was opened when a developer decided to implement an entire webserver and framework in one running on top of JS, called NodeJS.

Disruptive generation gap

And depending which development circles you run in, opinion on node can be polarised. Node is a webserver with big ambitions, using a crap language made powerful by Googles engine, populated by a new generation and community of developers, creating applications used by millions of people daily.

The arrival of node made possible the incursion of interface designers into server side development, an event which may have ruffled feathers amongst some of the old guard, to whom Javascript was best constrained at the browser, but then when did technological progress every obey the rules?

Some tech notes on node

My understanding of node is limited but I’ll sketch it out.

Node runs as the web server, and is the framework as well (like Django in developer mode). Node itself is actually mostly implemented in C – and it consumes JavaScript as its web app language. It is designed to talk to ‘NoSQL’ databases, which is another topic but is effectively a non-relational database type which runs quite fast with the tradeoff of it being, well, non-relational. (I hear that NoSQL may also seen as suspiciously hippie by some traditional DBAs).

Blocking? What blocking?

Node has a major advantage with certain types of applications when compared to apps written in more traditional languages; it is implemented to run code asynchronously by exploiting a feature of JavaScript known as the ‘non blocking event loop’. This effectively means it has a single thread which keeps spinning, processing whatever it can, not being blocked while it waits for some job to complete like a network connection or a database call. So it can be very fast/efficient when dealing with apps which require a lot of parallel processing. A bunch of big companies have jumped on the wagon, node become flavor of the month, and a huge market sprung up for node devs.

Packages packages everywhere

As a true badge of membership to the big leagues, node has ‘NPM’ – node package manager (like apt or rpm, but for javascript modules). This can be managed on the command line of a node server (just like apt-get), making it easy to use and popular with devs. Lots of code is shared via NPM, and interesting and critical chains of dependencies can eventuate, with bad consequences. Some of the node packages in common use have been considered to reflect poorly on the developers publishing and using them, as these discrete libraries do very simple jobs which are generally thought to be stuff a good coder should be able to implement themselves with low effort. Padding strings, for example. (See also: ‘npmgate‘).

Despite (or because of) its differences to traditional frameworks / engines, Node is very popular.

And thats about it for now.

So there’s a brief overview of a few of the more popular moving parts in contemporary web application development. There is a lot more to dive into and the landscape evolves constantly and (I realise I haven’t even mentioned stuff like websockets, webGL, asm.js, and so on). So there’s a lot to digest if you’re new, but hopefully this gives you the beginning of the lay of the land and went some way to untangling things.

On a personal front, I’ve been having fun tinkering with Python and Javascript, with a general aim of implementing my backends in Django and Flask, with ReactJS on the frontend. And with game dev always an enticing side track, and advancements like V8, HTML5 and WebGL, JavaScript is a real contender now to make games. A bunch of commercial game development tools (combo IDEs + frameworks) now offer the ability to compile directly to HTML5/javascript/webgl for browser publishing.

A good example is the JS game framework called Phaser (http://phaser.io/) as it looks like a great vector for gamedev and aligns well if anyone wants to do some light game dev and learn more about JS at the same time.

Thanks for reading!

Parsing OpenVAS reports in Python

I was using OpenVAS to do some network auditing and accessing report results via the (Greenbone Security assistant) web interface quite often seemed somewhat slow and clunky. The report is downloadable as an XML file though, and I’ve recently been getting familiar with parsing nmap XML files in python, so a bit of scripting later and voila! GOXParse (Glens OpenVAS XML Parser) – a command line tool to quickly search / filter through the openvas scan results.

As an added bonus, you can output a .csv file from an nmap scan using gnxparse.py and feed it to goxparse.py to provide an inline comparison of open ports.

$ ./goxparse.py --help
usage: goxparse.py filename.xml [OPTIONS]

Glens OpenVas XML Parser (goxparse)

positional arguments:
  file  File containing OpenVAS XML report

optional arguments:
  -h, --help                show this help message and exit
  -i, -ips                  Output unfiltered list of scanned ipv4 addresses
  -host [HOSTIP]            Host to generate a report for
  -cvssmin [CVSSMIN]        Minimum CVSS level to report
  -cvssmax [CVSSMAX]        Maximum CVSS level to report
  -threatlevel [THREAT]     Threat Level to match, LOG/LOW/MEDIUM/HIGH/CRITICAL
  -matchfile [MATCHFILE]        .csv file from which to match open ports, in format HOSTIP,port1,port2,port3
  -v, --version         show program's version number and exit

usage examples:
        goxparse.py ./scan.xml -ips
        goxparse.py ./scan.xml -host <HOSTIP>
        goxparse.py ./scan.xml -cvssmin 5 -cvssmax 8
        goxparse.py ./scan.xml -threatlevel HIGH

 

You can get goxparse from the bitbucket repo here.

Parsing and Merging Nmap XML Report Files in Python

Here are a couple of tools I wrote in python to parse and merge/ join Nmap .xml report files.

TL;DR:

  • gnxparse.py outputs discovered host, port info from nmap .xml, optionally in the form of nmap command(s) to re-scan hosts.
  • gnxmerge.py glues multiple (<host> sections from) nmap XML reports together.
  • You can download them from the git repo here.

Problem:

Nmap is great for network auditing. Scanning from an internal, privileged, and/or fast network location (eg inside your firewall) is straightforward and fast, but doesn’t give you the whole picture – which discovered hosts and services are exposed from a different – eg external/public network.

To get this info, you could do a firewall config audit, but if you don’t have this access or just want to do a functional test of the firewall, you need to run another scan. For the same accuracy, you’ll want the full range (1-65535) portscan, and this takes time. Also, this kind of scan is noisy and may generate a lot of firewall/ips logs. Lastly, if you traverse an IPS/IDS with such a noisy scan, it may drop you as malicious, and the rest of the results are lost.

Solution:

An alternative approach is to do a full scan internally, and use the results to make a much quieter external scan targeting only known live hosts and services. gnxparse.py can generate nmap ‘rescan’ commands to run from outside the firewall, and gnxmerge.py helps tidy up the results by merging the multiple output files back into a single report.

The workflow goes something like this:

  1. Perform a thorough scan of your publically routable subnets from a location inside the firewall. (Using some fairly well tuned host discovery options in nmap, it takes me about 3 hours to scan the full port range on ~1000 hosts on a fast internal network.)
  2. Run gnxparse.py with the ‘rescan’ option on the .xml file generated from the internal nmap scan. This will output a bash script with individual nmap commands to probe only those hosts and services found to be up.
  3. Copy the script to an external host with nmap and run it. (For all discovered services on the ~1000 hosts scaned earlier, it takes me only about five minutes to do a re-scan).
  4. Run gnxmerge.py on the folder of individual .xml reports generated (one per host) if you need to produce a single Nmap XML report file for any reason, for example to load up in Zenmap[1] and review which of your services are exposed externally.

Hopefully someone else finds these tools useful. You can download  gnxparse.py and gnxmerge.py from my GNXTools repo on bitbucket.

 

[1] I am aware Zenmap can also load multiple nmap report files for viewing; though it does not merge/save.