rockym93 dot net

archive · tags · feed

What was all that about?

12 January 201710:16AMneural-networks

This post was written by a neural network.

A network of units, connected together.

A neural network is basically a collection of little units, all chained together in kind of a grid. Or, well, a network. These units are very simple computer programs. They get signals from the units behind them, weight those signals according to some settings, and if they meet a threshold, they signal the next units along - which do the same thing, all the way through the network.

So, for any given input, you get some units activating and passing on signals, and some not. Eventually, you get a result out the other end which depends on the input you put in, but in a pretty complex and convoluted way.

A network of units, activating.

You adjust a network by changing the weightings that the units give their incoming signals, and the thresholds that control whether or not they send signals. Different settings in different parts of the network mean that different bits of the network - different paths through it - will light up and transmit signals from end to end. And different signals at the end combine to produce a different result.

Units with different weightings and thresholds.

You assess how close the results are to the ones you want by using a cost function. If the network is spitting out the wrong results, the cost will be higher. If the result is right, the cost will be lower.

Cost is the mathematical difference between what the network is doing and what we want it to do - an error value. We pass this error value back through the network, and adjust the network's internal weights and thresholds to try and make the error smaller next time. And so, over many iterations, the network gets better at doing what it's supposed to be doing.

These values are too different. And also two different!

After enough training, all of those little settings are perfectly tuned, through trial and error on a massive scale. You'll have a program that gives you exactly the results you trained it to.

The advantage that this has over programming a computer the traditional way, with explicit instructions, is that a neural network can learn pretty abstract processes or concepts or patterns. It can learn things would be very difficult to describe how to do to a computer explicitly and logically or mathematically - but as long as you can check that the result is correct reliably, you can train a network to do it.

The result coming from the network could be a number. You could train it to do maths - any maths, in theory. Or it could be a category (probably expressed as a number) so you could train it to sort things. Or it could be a word (again, probably also expressed as a number), so you could train it to 'describe' things. As long as you can check it against something, you can train a network to do it.

This makes neural networks really good at processing large amounts of real-world data, where yes, there are patterns - but programming computers to detect them is hard. Things like image recognition, or human language, or speech processing. Neural networks let computers be a bit better at the kind of things that brains are usually good at.

Still with me? Good.

Our little network, with its inputs and results, is all well and good if you want just one answer for one question at a time, but what if - like many complex problems - your problem depends on context? This is where recurrent neural networks come in.

Recurrent networks accept some level of input from themselves. The result of one run of the program can depend on the result of the previous run. And the weighting of that result as input is one of the adjustments we can make, so we can attempt to change the output of the loop as a whole by changing how much influence the previous result has.

It's like an infinite chain of programs stuck together, and while we're getting a result from each individual one, we're also feeding that back into another copy of the program.

A network of networks, connected together.

So essentially, what this network does is output a single letter - the single most likely letter to come next, given the letter that came before it as input - and before that, and before that, and before that too. It knows which letters are likely because it was trained to minimise the end result's difference from the text on this blog.

For example, it might see that in that text full stops appear in two places - sometimes in web addresses, and sometimes at the end of sentences. If it was just looking at one input and using overall likelihood, it'd get this wrong a lot - but it doesn't.

It knows that, perhaps given the previous characters contained a space, or something else non-address-y, that what comes after this full stop probably isn't a 'c' (to be followed by an 'o' and an 'm'), but another space. And as well as outputting that space as a result, it also passes it along. The next neural network along will know that a space is likely to be followed by a capital letter, given there was a full stop two characters back. Basically, it doesn't just know how to write letters - it knows how to write letters in context. It knows that it's not in the middle of 'writing a web address', because statistically they don't have spaces.

At least I assume that's how it does it - one of the weird things about neural nets is that you're never quite sure how they arrive at the conclusion they did, only that they work.

But even though it understands context, this is still a pretty simple network. It's only two layers deep, with a few thousand units. Which is why it's produced writing that looks and acts and feels a lot like mine, but that doesn't mean anything. Intuiting actual meaning from context is probably about a bazillion layers deeper - Google or Facebook might get close, but not this dinky little thing I ran on my laptop.

Why would you do that?

Cheers to Morgan for advice, links and fact-checking.

< Neurally and a probably can for a many. Why would you do that? >