I don’t feel that I know enough yet.
I’ve seen too many friends run around in circles jumping from project to project, failure to failure. 37signals likes to cite a Stanford (?) study that founders who have failed are statistically no more likely to succeed than first time founders. Only successful founders have a statistical leg up when starting a new venture.
It makes sense, when you think about it. After the near-infinite possible set of actions and combinations of actions you can take when starting a company, maybe 1% of them are right. 99% of the things you can possibly do at any one moment in a startup put you down the road to failure.
Failing only teaches you that the 1 in 1,000,000 way that you tried is wrong.
I don’t believe that a large-scale, I’m-gonna-go-out-and-raise project necessarily teaches you anything. It’s sort of like the big show - you should have rehearsed beforehand.
For example - my friend Hursh Agrawal has a kickass project called Roundtable, which recently was frontpaged on Hacker News. I talked to him yesterday and he was all over the place - putting out fires left and right on things that were broken, not working correctly, leads that needed to be followed up on. We talked briefly about learning resources we were working through - I’ve recently finished reading The Pragmatic Programmer and a book on Agile methodologies, and I’m starting on The Heart Aroused soon. Hursh remarked that he didn’t have any time to pick up anything new.
I’m not saying CEOs never improve or have time to learn - but there’s sort of a myth amongst non-founders that you’ll be able to learn as you go. That hasn’t been my experience. Usually, the learning comes after - after your startup is a huge, flaming wreck most likely. This is a mistake. Deming’s Plan-Do-Check-Act cycle is only 50% doing and acting. You have two options for minimizing this risk:
- Learn more up front.
- Ruthlessly protect your time, making sure you have time to evaluate and learn.
That said, that doesn’t mean I’m not learning. I think you can learn a ton from hacks/little projects, like Kyle Bragger’s tinyproj. Having users for a product does not a startup make, but I think you can learn a lot from that experience. I’m also working my way through a massive stack of management theory/software development/hacking books that I know I’ll never read once I actually start a company.
A lot of successful founders I know (Sean Ellis, for example) had significant work histories at startups or otherwise before starting something. I don’t have that yet.
Disclaimer: This is way, way different than waiting to launch your product. The only way to learn about your market and your customers is to start the Big Show immediately. This post has been about rehearsing for the Big Show as a person, as a founder. Not as a company.
Earlier this year, I wrote about Aspects, Intercepting Filters, Signal Slots, and Events, in order to compare these similar approaches to handling both asychronous programming as well as handling cross-cutting application concerns in a cohesive way.
I took the research I did for that article, and applied it to what was then a "SignalSlot" implementation within Zend Framework 2, and refactored that work into a new "EventManager" component. This article is intended to get you up and running with it.
I love graphite, I think it’s amazing, I specifically love that it’s essentially Stats as a Service for your network since you can get hold of the raw data to integrate into other tools.
I’ve started pushing more and more things to it on my network like all my Munin data as per my previous blog post.
What’s missing though is a very simple to manage dashboard. Work is ongoing by the Graphite team on this and there’s been a new release this week that refines their own dashboard even more.
I wanted a specific kind of dashboard though:
- The graph descriptions should be files that you can version control
- Graphs should have meta data that’s visible to people looking at the graphs for context. The image below show a popup that is activated by hovering over a graph.
- Easy bookmarkable URLs
- Works in common browsers and resolutions
- Allow graphs to be added/removed/edited on the fly without any heavy restarts required using something like Puppet/Chef – graphs are just text files in a directory
- Dashboards and graphs should be separate files that can be shared and reused
I wrote such a dashboard with the very boring name – GDash – that you can find in my GitHub. It only needs Sinatra and uses the excellent Twitter bootstrap framework for the visual side of things.
The project is setup to be hosted in any Rack server like Passenger but it will also just work in Heroku, if you hosted it on Heroku it would create URLs to your private graphite install. To get it going on Heroku just follow their QuickStart Guide. Their free tier should be enough for a decent sized dashboard. Deploying the app into Heroku once you are signed up and setup locally is just 2 commands.
You should only need to edit the config.ru file to optionally enable authentication and to point it at your Graphite and give it a name. After that you can add graphs, the example one that creates the above image is in the sample directory.
More detail about the graph DSL used to describe graphs can be found at GitHub, I know the docs for the DSL needs to be improved and will do so soon.
I have a few plans for the future:
- As I am looking to replace Munin I will add a host view that will show common data per host. It will show all the data there and you can give it display hints using the same DSL
- Add a display mode suitable for big monitors – wider layout, no menu bar
- Some more configuration options for example to set defaults that apply to all graphs
- Add a way to use dygraphs to display Graphite data
Ideas, feedback and contributions welcome!
I have thought long and hard about writing this post, especially here. So if I do post it, I obviously came to the conclusion that it was a good idea, although I am pretty sure it will be a selfish post, one that is like expelling a breath, rather than having a conversation.
I have lost a few people through death but I have only blogged about one dead friend here: http://mulqueeny.wordpress.com/2009/12/23/my-mate-mick/ and I love the fact that I did, because his death has affected my family more than any, and I like the fact that I can go to Google and write: Emma Mulqueeny My Mate Mick and I get the post I wrote when he died. At the time I wrote it just to get the pain out of me, now I read it very, very occasionally when I just want to indulge my memories, or grieve. I have taken this post on and offline fairly regularly.
So now Harry…
I never met Harry Moseley. But he died aged 11 on Saturday night at 11.10pm in his Mum’s arms, with his family, at home. He had an inoperable brain tumour diagnosed at age 7 and had spent the rest of his life speaking publicly, going to school and hospital, meeting his heroes, and keeping us all updated through twitter. He also raised money – he raised money for his family and to raise awareness for brain tumours, and this horrific cancer that comes sometimes with the unconscionable words: inoperable.
I found Harry through a stray tweet in my timeline from Duncan Ballantyne, tweeting a photo of him and Harry. By this time Harry was already in the coma he would never wake up from. I read his story and made sure that I donated, I also shared some tweets in my own twitter stream, sent his Mum some messages of support and checked his twitter account daily to see how he was. His Mum kept his twitter stream updated…
When it became apparent that he was going home to die:
Saddest day of our lives. 8.30am tomorrow we take Harry home to spend his last hours/days with us. So so sad my boys journey is ending x
Of course this day was inevitable, he was never going to survive this tumour – but he chose twitter to keep us all updated with his various activities, most now lost to the twitter mists of time. If you look at the photos he tweeted, then you get the story.
On the Friday night, knowing that he was home and Georgie, his Mum, was sleeping with him after an evening of watching childrens’ films Harry loved with all of his family – I could not sleep. And I found lots of other people could not sleep either, similarly keeping silent vigil with his Mum and him. Georgie broadcast this message:
Thanks for all messages of support. Harry all comfy. Settling down through the night and going to snuggle in with my h all night long x
Harry survived that first night and Georgie gave us a few updates during the day, but then the message came through twitter:
My brave inspirational boy fell asleep in my arms at 11.10pm. Suddenly our world is a very dark and cruel place
And we all fell into the inevitable cycle of grief for this boy, alongside this family many of us have never met.
I have lost a few people to cancer, too many families close to me over my 40 years have suffered either as children, parents or friends, as well as my own. Grief starts at diagnosis and the journey is a relentless march. Harry chose twitter to share his story and to do what he could to keep his own and his families’ spirits up.
One thing I do know, as many of you too will be aware, is that the patient – the person with the diagnosis – has to battle with the responsibility of knowing that they will inevitably cause huge pain by dying; so not only do they have to cope with their own mortality but also the responsibility of those they love and leave behind.
Harry chose to extend his own family through twitter, to spread the pain of his inevitable death and huge pain for his family and close friends. He did this to raise awareness of people in similar crisis, but I believe he also did this to build the staple support for his Mum, his Dad, his brother and his sister – so that his ‘twitfam’, as he liked to refer to everyone who followed him, would be united in supporting his family when his death came and inevitably caused an avalanche of unhappiness that just can’t be tweeted.
I grieve and weep for Harry, a truly kind and brave child, that I did not meet but know so well. But I know that he wanted his twitter family to support his real family when he died. He worked so hard to make sure we knew him, knew his life and his family. So, I think that we should now do what I imagine he wanted:
- support Georgie through his old twitter stream @harry_moseley
- of course send cash to http://www.helpharryhelpothers.com/ – buy a bracelet, donate, do whatever
- never ever take your family for granted
Wanted to leave u wiv this guys http://twitpic.com/63su7r getting ready to go diwn. Check out my gown . Tweet u soon
I chose to post this, but I would ask that you don’t comment here, I am sure you understand that they are not appropriate on such a blog post.
The last day of the OpenStack Design Summit and Conference it was announced that OpenStack would now be run as a foundation, rather than as a corporate subsidiary of Rackspace. I believe this is an important step in the growth and stability of this project, and am very excited about the plans.
The day of the announcement there was also a governance town hall meeting. The meeting was conducted with attendees sitting in a circle, discussing the foundation formation as a group. It felt the way a community discussion should feel: warm, open, but with a little bit of critical questioning on occasion as well. During this meeting a number of good ideas were put forward about who we should be getting advice from, possible structures, and most importantly, how we’ll create the foundation as a community over the next year. I think this discussion was a great starting point for the rest of the process.
In the next couple weeks we’ll hear how to actively participate in the creation of the foundation. I’m sure however the process works, that it’ll be transparent and fair. I also have a strong feeling that the outcome of this process will be a foundation that can not easily be coerced by a single vendor.
A few topics that I brought up during the town hall discussion concerned the possibility of the foundation being controlled by a single vendor. The first topic was about how roles in the foundation would be filled. Would the foundation employ all of the roles, or would they be appointed from community members? In the latter case, a single vendor can control the foundation by being the one that continuously occupies all of the positions. In the former case, the foundation would control the roles, but would require far more money to operate. My second topic was regarding control and money. How would donations be handled? Would it be possible for one vendor to control the foundation through being the primary sponsor? My third topic regarded the current situation with community roles. Nearly every role is currently filled by Rackspace. Should we limit the number of appointments any one specific organization can have?
I was very happy with the responses to my questions. Initial thoughts about employees vs appointments were that appointments will likely lead to a stronger community and it’s likely best to not have a sprawling foundation. The money question likely comes down to how much can legally be donated. Lastly, the thoughts on limiting appointments was that it is likely unnecessary and that it may hinder community involvement. Limiting appointments was initially tried on the software development policy side of things and no one liked it. It was also mentioned that we should be encouraging participation from every vendor in a way that is ongoing, rather than one-off.
My questions revolved around control. The project is, and will likely still be mostly lead and controlled by Rackspace for the near future. Of course, these questions are also purely theoretical. Rackspace has done an amazing job leading this project and encouraging participation and growth so far, and they haven’t used the project to benefit themselves over other community members. They just happen to currently be the organization that has the largest commitment to the project, since the project is still so young. That’s a great thing. I’m hoping to see the same level of commitment, if not even more from Rackspace after the foundation is formed. I’m also hoping to see much greater participation from other members of the community. I think that diversity will naturally occur over time as more community members add resources from their organizations.
I’d like to thank Rackspace for forming the foundation, and I’d like to thank in advance all the community members that are going to be working together to make this happen. Let’s make the project strong together!
<#comment hash="f92e3f4a596ee1383542fa82e3050512" /> <#comment hash="9d6ee31bc358db3224830f8469fa13c0" />Related posts:
Since joining Rackspace to help out with OpenStack, one of the hot topics of conversation I’ve been involved in has been extensibility and versioning.
I think most of my readers (yes, all six of you) are fairly familiar with, if not tired of (hi, Dave!) the various arguments and counter-arguments in this space. However, there is one new-ish bit; how to do distributed extensibility in JSON.
That’s because OpenStack’s API allows vendors to add extensions in various ways, in an uncoordinated fashion. And while that’s a well-understood (if still somewhat tricky) problem in XML, it hasn’t been approached at all in JSON, which has fast become the format of choice for data-bearing APIs.
JSON has a head start in that it embodies the mustIgnore rule; if you put extra data in a JSON document (for example, an extra property on an object), all implementations will just ignore it. Great. However, the problem comes in when multiple people want to extend a document, but avoid collisions.
For example, given this straw-man JSON document:
{
"foo": "bar",
"version": 1
}
and you both FooCorp and BarProject add a “widget” property, they’ll be fighting over who owns it. Bad luck.
So, some way to coordinate these parties and assure that they don’t conflict is necessary. In XML, this is done with Namespaces in XML, and so solutions to this problem are generally called Namespaces too, even though they don’t have to look or work the same way.
Prior Art
I’m not the first person to wonder in this direction, of course.
Yaron made the first proposal, as far as I can tell. His approach looks like this:
{
"org.goland.schemas.projectFoo.specProposal" : {
"title": "JSON Extensions",
"author": { "firstName": "Yaron",
"com.example.schemas.middleName":"Y",
"org.goland.schemas.projectFoo.lastName": "Goland",
}
}
}
It’s sort of a Java-ish approach, based on the DNS like URIs, but without the syntactic awkwardness of putting URIs in JSON. he also states that there’s an implicit name space for descendants; e.g., here, “title” is also in the org.goland.schemas.projectFoo name space.
There was another proposal in the JSON-schema mailing list in 2008. It looks very, very similar to XML schemas, except that the namespaces, as far as I can figure out, are bound inside the schema itself, rather than the document. It seems to have been shot down, because it required schema parsing to be able to identify things; never a good idea, especially in the JSON world.
Some Observations
Starting with the obvious, I’d say that if you can use JSON without namespaces, you really, really should. In other words, if you really need distributed extensibility, you need something like namespaces, but for all other purposes, they should be avoided like the plague; they make it too complex, and simplicity is the name of the game in JSON.
A bit more subtly, I think this isn’t just a document-by-document decision, but an node-by-node one in the document. I.e., you should identify the specific places in a document that need extensibility and allow namespaces there, but they shouldn’t pollute the rest of the document, if they aren’t needed there.
I suppose what I’m saying is that namespaces should be a purely syntactic convention to avoid collisions where distributed extensibility is allowed, rather than some magical thing that allows you to uniquely and globally identify every bit of data in the document. I know that’s going to rile up some of the linked data and semweb folks, but we’re talking JSON here, not Turtle or RDF.
This implies that Yaron’s inheritance is unnecessary; the very fact that the “title” property is a member of “org.goland.schemas.projectFoo.specProposal” is sufficient to assure lack of collisions (unless he wants to allow extensibility at that level too, in which case they should be explicit at that level).
Another Straw-Man
Given all of that, I wonder if the problem can be simplified enough to make some progress. I think Yaron’s proposal makes a certain amount of sense, with a few modifications:
- JSON-based formats need to define which objects require namespaced members explicitly. I.e., it’s opt-in and constrained to only those nodes nominated for distributed extensibility.
- No inheritance is assumed.
- Non-namespaced property names won't have the delimiter character in them (here, '.')
- Prefixes are defined by the format; they can either be in the DNS-based style that Yaron advocates, or if there’s some level of coordination, you could set up a registry (JSON-based, of course :) of shorter prefixes.
This would tweak Yaron’s sample to something like (assuming that a registry were used):
{
"FOO.specProposal" : {
"title": "JSON Extensions",
"author": { "firstName": "Yaron",
"EXAMPLE.middleName":"Y",
"lastName": "Goland",
}
}
}
I like this because it’s not very painful, it doesn’t require schema to process, and it gets the job done; it allows distributed extensibility. The important thing is to stop looking at namespaces as something you should slather over your format like butter — more is better! — and start seeing them as a specialised tool that should only be used when it can do some good.
The lovely Aleks Krotosky chairs a live podcast that was recorded on Monday night (before the Newsnight thing Rory did) and published today. I am on the panel alongside David Willetts MP, Dan Crow from Songkick and Prof Jeff Magee from Imperial College London.
Go and check it out and listen here
The Newsnight thing that Rory did is online now and is defo worth a watch if you are interested in this subject, it is available here
The Mozilla thing I was referring to is this and here is an event you could get involved with if you want to do more with Mozilla
The resources I mentioned using with Amy for introducing kids to code are here
And tonight I am hosting a BarCamp at the Guardian on Kids and Coding with Caper and Katy Beale, bringing communities together – tickets are all gone, but the hashtag to follow is #codingforkids but please do register as there may be a chance that you will be able to get in, and we will be setting up a Google Group
Last but by no means least, there is a grass-roots community of IT teachers and those who want to help support IT in education called Computing at School – it is incredible, do go and sign up for the Google group, and please do see what the teaching community is talking about – it’s great.
Unfortunately MySQL Proxy was no good source of inspiration today. MySQL Proxy can do many wonderful things which you can do with C based mysqlnd plugins as well. But not with PECL/mysqlnd_uh. PECL/mysqlnd_uh lets you write “plugins” in PHP. Given my desire to demo the power of mysqlnd plugins at the upcoming webinar Succeed with Plugins using PHP examples, I had to extend PECL/mysqlnd_uh to allow result set manipulation. Five brand new lines of magic.
class __mysqlnd_result extends MysqlndUhResult {
public function fetchInto($res, &$rows, $flags, $extension) {
$rows = array("Your mysqlnd has been hacked!");
}
}
mysqlnd_uh_set_result_proxy(new __mysqlnd_result());
The new, yet undocumented and untested built-in class MysqlndUhResult maps to mysqlnd’s internal result class. It is responsible for fetching the data of a result set. It consist of some 20 methods. To get started, I’ve exported MysqlndUhResult::fetchInto which is supposed to read the data of all rows of a result set into the rows variable passed to it by reference. For faking a result set, one assigns an array to the variable. Note, that only the data not the meta data is manipulated.
$mysqli = new mysqli("localhost", "root", "", "test");
$res = $mysqli->query("SELECT 'Enjoy your weekend!' FROM DUAL");
var_dump($res->fetch_assoc());
nixnutz@linux-fuxh:~/php/php-src/branches/PHP_5_4> sapi/cli/php fake.php
array(1) {
[0]=>
string(29) "Your mysqlnd has been hacked!"
}
A typical use case for injecting or manipulating a result is a cache. Please, do not start developing a cache using PECL/mysqlnd_uh. We did that already for you in C. Check out PECL/mysqlnd_qc…
Happy hacking!
Would you like to see the EXPLAIN output for all MySQL queries of any PHP application without changing the application much? Easy-peasy: compile PHP to use the mysqlnd library, install PECL/mysqlnd_uh and paste 22 lines of evil code into your auto_prepend_file .
class conn_proxy extends MysqlndUhConnection {
public function query($conn, $query, $self = false) {
if (!$self) {
$this->query($conn, "EXPLAIN " . $query, true);
if ($this->getFieldCount($conn)) {
printf("tAuto EXPLAIN for '%s'n", $query);
$res = $this->storeResult($conn);
$r = new MysqlndUhresult();
do {
$row = NULL;
$r->fetchInto($res, $row, 2, 1);
if (is_array($row))
printf("tt%sn", implode(" ", $row));
} while (!empty($row));
$r->freeResult($res, false);
}
}
return parent::query($conn, $query);
}
}
mysqlnd_uh_set_connection_proxy(new conn_proxy());
Not being a PHP hero, I don’t have a PHP application in the cloud to demo how it works. Thus, I wrote a very basic application who serves no other purpose creating a table, inserting some rows and fetching them to demo that the auto EXPLAIN SQL insertion works with any PHP MySQL API: PDO_MySQL, mysqli, mysql. All of them.
$mysqli = new mysqli("localhost", "root", "", "test");
$mysqli->query("DROP TABLE IF EXISTS test");
$mysqli->query("CREATE TABLE test(id INT)");
$mysqli->query("INSERT INTO test(id) VALUES (1), (2), (3)");
$res = $mysqli->query("SELECT t1.id, 'foo', t2.id + 1 FROM test AS t1, test AS t2");
printf("MySQLi has found %d resultsn", $res->num_rows);
$pdo = new PDO("mysql:host=localhost;dbname=test", "root", "");
$stmt = $pdo->query("SELECT t1.id, 'foo', t2.id + 1 FROM test AS t1, test AS t2");
printf("PDO has found %d resultsn", $stmt->rowCount());
Auto EXPLAIN for 'SELECT t1.id, 'foo', t2.id + 1 FROM test AS t1, test AS t2'
1 SIMPLE t1 ALL 3
1 SIMPLE t2 ALL 3 Using join buffer (BNL, incremental buffers)
MySQLi has found 9 results
Auto EXPLAIN for 'SELECT t1.id, 'foo', t2.id + 1 FROM test AS t1, test AS t2'
1 SIMPLE t1 ALL 3
1 SIMPLE t2 ALL 3 Using join buffer (BNL, incremental buffers)
PDO has found 9 results
If you want to learn more about PECL/mysqlnd_uh, please browse the blog archive and check out the manual. Also, don’t miss the MySQL “Succeed with Plugins” webinar next week. Credits go to Mayflower OpenSource Labs for the original development of this mysqlnd plugin!
Unfortunately I am cheating a bit when writing “easy-peasy”. The basics are really easy. But… please, note that the above is bloody, fresh meat. To toy around with SQL injection that requires fetching result sets, you have to build the development version of PECL/mysqlnd_uh. Please, keep also in mind that PECL/mysqlnd_uh exposes the internal mysqlnd C library interface to the PHP user. The mysqlnd library has been written to be used by C developers. Giving PHP hackers, which are used to garbage collection and other safety belts, access to it was never planned. Inappropriate use may cause memory leaks and crashes.
Use PECL/mysqlnd_uh for debugging and prototyping mysqlnd plugins. Test well before using in mission critical environments. Generally speaking: if your proxy script works once, it should always do. Good enough for your development machine. In production environments, you will usually want to write your mysqlnd plugins in C for performance reasons anyway. Using C has the additional advantage of gaining access to all feature not just those made available by an Internet Super Hero, to have nice examples for a web seminar.
Having said all this, who is the first to identify the line that needs to be changed to make the example leak memory? Give me one line, dear reader. Just one line.
Happy hacking!
More than ten years ago, I was working at Akamai and got involved in the specification of Edge Side Includes (ESI), sort of a templating language for intermediaries.
In that time, interest in ESI has grown, waned and been reborn. As far as I can tell, it's implemented not only by Akamai and Oracle (the main forces behind it), but also in Varnish, Squid, and lots of other places too.
Back then, I had a strong suspicion that it'd die because people would see it as locking them into Akamai (or some other vendor). Why, then, is this limited, funny, embarrassingly simple little templating language still around?
In a word, it's concurrency.
In the last couple of years, it's become hot to build massively scalable Web servers by re-thinking how they handle concurrency; often using asynchronous, non-blocking single-process servers, rather than threads or multiple processes.
The benefits of this approach have been known for a long time; way before Dan Kegel wrote the C10K page, Web proxy servers like Squid (and its predecessor, Harvest) were using this approach because it's the only sensible way to scale for them.
However, as folks are finding out when they use newer tools that implement these methods (e.g., Twisted, Node.JS), writing event-driven code is something you either love or hate. Many developers can't stand it, especially for debugging (personally, I love it, but that's just me).
So, ESI is a way to offer the massive concurrency of non-blocking, asynchronous servers in a way that's easy to digest. Since fetching a URI doesn't block, the only overhead is in stitching the page together, and you can control the overhead of that by limiting the language's capability.
This makes ESI a great tool for building highly scalable dynamic Web sites without writing and debugging new code. Win.
Making ESI Better
ESI is, as mentioned, more than a decade old, and the Web has changed a lot in the intervening time. Even putting that aside, ESI isn't exactly what we'd call Web-friendly. We can do better.
Over that time, I've had a number of thoughts about how to improve ESI as a language, which I've shared with some interested people privately. One of my back-burner projects has been to implement this, but I have to admit that this isn't going to happen soon, since I'm busy doing several other things.
Instead, I'm going to dump those ideas here, and hope someone runs with them. Here are a few:
The biggest single way I can see to improve ESI is to make it possible to source variables from a URI. In other words, it should be possible to fetch a URI, parse the response (probably in JSON), and then reference the data returned when evaluating the template.
This would enable some really exciting things. Because variables are now just state, you can do things like cache user preferences -- using plain old HTTP caching -- and have that state be local to where it's needed. When you update that state, it can be invalidated. ESI expressions now can have arbitrary, application-relevant input, instead of being limited to a few paltry request headers.
This could be what it looks like:
<!-- … -->
Here, you see some JSON being loaded into the user_prefs variable, form a URI that's templates using a cookie that identifies the user, to drive how the page loads. This is very similar to a set of techniques I discussed a while back for composing services "RESTfully", and it still works.
JSON also presents a way to clean up the variable model generally; instead of the random collection of variables, ESI 2.0 could instantiate a request object, with appropriate members like .method, .cookie, .headers, and so forth. It also brings about the possibility of making response attributes available as well, at least in the context of an include.
Going even further, JavaScript presents an opportunity to rally around a common, well-understood syntax for things like variable references, operators, and even common functions (e.g., string manipulation).
ESI:include desperately needs a timeout parameter, and a sensible means of specifying fallback content (probably as a child of the include element).
Deeper integration with HTTP is necessary; not only should it be possible to access arbitrary aspects of the incoming request, but it should be possible to affect more of the outgoing response; e.g., the status code. Likewise, finer-grained control over outgoing requests (generated by include as well as load) would be good (e.g., via attributes on the element).
There are lots of smaller, easier wins. Not requiring valid XML is an obvious one; integrating URI Templates is likewise a no-brainer. Cleaning up some of the cruft in the syntax would be nice; there are some elements that people just don't need in there (e.g., esi:inline, the alt attribute).
Anybody up for it?
Yesterday I received this email from a YRSer – they are happy for this to be published but I have removed all references in the email that might identify this person. I am publishing it for several reasons:
1. to get the help asked for from a wider community than just me
2. to show the kind of dilemma our students are facing
3. to give Universities/Google a chance top snap this person up
Email copy starts here:
I am studying Biology Chemistry Physics and Maths with Mechanics at A level (Year 13- upper sixth).
This year, I attended YRS and won a prize, I also won an award at the Cambridge Chemistry Challenge.
At GCSE I recieved A*A*A*AAAAAAABB.Computing in general was only really ever been a hobby for me – I decided against a degree in the area after taking DiDa at GCSE, which was a real trainwreck of a GCSE course, focusing more on secretarial skills than what I was interested in. I left with an A in the subject and the assumption that I had misconceptions about IT as a career. I had tried to really show my skills through the course’s website topic where candidates produced a web-page (though not hosted) to log their work, but the course wasn’t looking for the skills I had. At the time I knew VB, Javascript, C++, HTML, some PHP, and basic Python.
I was always interested in the sciences, and after taking some work experience, decided firmly on Medicine as a future career.
I didn’t do as well in my AS levels as I was expected to. I have a short history of underperforming relative to my skills in a given subject, but was naïve enough to assume it wouldn’t affect my AS results, though I think this can be remedied.I took Biology, Maths, Chemistry and Physics respectively.I believe these grades can be remedied, and after sorting myself out and really applying myself, I believe I can achieve A*AAB (or similar) at A level, please forgive me if I sound arrogant – I am really intending to work hard this academic year through retakes in January and Christmas.This, of course has the potential to get me into medicine once I have picked myself up, but after this I see things in a different light and am having second thoughts for medicine as a career.
YRS was one of the most enjoyable things I have ever undertaken – before this I felt all IT jobs (aside from the legendary Google jobs) were writing simple, static programs for big companies in C, or inaccessible to me. I was really looking for a job where I could be challenged with problems to solve (which drew me to medical diagnostics), but I met lots of interesting people who were working on equally interesting problems including an IBM employee working on a web spidering project who I discussed Machine Learning with (I am taking an online Introduction to Machine Learningcourse at Stanford university), and the man who wrote the very popular National Rail iPhone application, and made a similar train tracker.
As things are going I will probably end up at a fork in a road when I reapply to university next year, but having never considered this area as a future career, and not knowing anyone who works in this area I am lost. I understand there are many computer related IT courses, of which I know Computer Science, Software Architecture and Computing, but I don’t know which one is what I would like to go into, or even if I have what is necessary to get into the business.
I would be very thankful if you could answer a few questions I have:
- What IT related course should I take, or- how do I decide on one?
- How difficult is it to work at Google? What path would I take?
- What should I do to increase my chances of admission?
- Do I have what it takes to do a course, if not, what should I do?
- What areas, in your opinion, would I be interested in?
- What sort of work would I be doing?
The free Mysqlnd replication and load balancing plugin now offers load balancing and lazy connections independent of read write splitting. This makes the plugin attractive for MySQL Cluster users. All nodes participating in a MySQL Cluster can serve all requests, they all accept read and write requests. No statement redirection needs to be done. An application using MySQL Cluster has only one task: load balance requests over MySQL frontends (SQL Nodes).
| Client | ||||
|---|---|---|---|---|
| | | | | |||
| MySQL frontend (SQL Node) |
MySQL frontend (SQL Node) |
|||
| | | ||||
| Cluster Node | <> | Cluster Node | <> | Cluster Node |
If using the new configuration setting mysqlnd_ms.disable_rw_split=1, the plugin will load balance requests over the list of configured master servers. The term master is borrowed from the primary usage scenario of the plugin, which is MySQL Replication. Don’t get confused by it. Master, slave, MySQL frontend - after all those are just servers participating in a certain kind of database cluster. All three terms refer to nodes in a MySQL database cluster. The closest analogy to a MySQL Cluster SQL node is a MySQL replication master, thus you have to configure a list of "masters", if you want to use load balancing without read write splitting. We didn’t introduce a new name for the server list, that’s all. Remember, that MySQL replication is the primary focus of the plugin.
The example plugin configuration file has two MySQL Cluster SQL nodes configured in the "master" list. The "slave" list is intentionally left empty. The load balancing policy is round robin. The plugin will iterate over the server list whenever a statement is to be executed.
lb_only_for_mysql_cluster.ini
{
"myapp": {
"master": {
"master_0":{"host":"localhost"},
"master_1":{"host":"192.168.78.136"}
},
"slave":{
},
"filters":{
"roundrobin":[]
}
}
}
Because the plugins primary usage scenario is MySQL replication, we have to set two PHP configuration settings to make it accept more than one master and to disable read write splitting.
mysqlnd_ms.enable=1 mysqlnd_ms.ini_file=lb_only_for_mysql_cluster.ini mysqlnd_ms.multi_master=1 mysqlnd_ms.disable_rw_split=1
PECL/mysqlnd_ms can be used together with any PHP MySQL API (extension) compiled to use the mysqlnd library. The test script show the use with the two recommended choices, which are mysqli and PDO_MySQL.
lb.php
printf("nUsing mysqlinn");
$mysqli = new mysqli("myapp", "root", "", "test");
var_dump($mysqli->query("SELECT VERSION()")->fetch_assoc());
var_dump($mysqli->query("SELECT VERSION()")->fetch_assoc());
printf("nUsing PDOnn");
$pdo = new PDO("mysql:host=myapp;dbname=test", "root", "");
var_dump($pdo->query("SELECT VERSION()")->fetchAll());
var_dump($pdo->query("SELECT VERSION()")->fetchAll());
Finally, putting things together to see them in action…
nixnutz@linux-fuxh:~/php/php-src/branches/PHP_5_3> sapi/cli/php -dmysqlnd_ms.multi_master=1 -d mysqlnd_ms.disable_rw_split=1 -dmysqlnd_ms.enable=1 -dmysqlnd_ms.ini_file=lb_only.ini lb.php
Using mysqli
array(1) {
["VERSION()"]=>
string(16) "5.1.45-debug-log"
}
array(1) {
["VERSION()"]=>
string(12) "5.6.2-m5-log"
}
Using PDO
array(1) {
[0]=>
array(2) {
["VERSION()"]=>
string(16) "5.1.45-debug-log"
[0]=>
string(16) "5.1.45-debug-log"
}
}
array(1) {
[0]=>
array(2) {
["VERSION()"]=>
string(12) "5.6.2-m5-log"
[0]=>
string(12) "5.6.2-m5-log"
}
}
This is the final feature addition before the 1.1.x production ready release, planned still this week. The new feature has been documented but it will take a couple of days until the php.net reference manual shows the latest edits.
Free MySQL web seminar Building High Performance and High Traffic PHP Applications with MySQL - Part 3: Succeed with Plugins on Wednesday, October 26, 2011.
A lot of bits have been used over on the OpenStack list recently about versioning the HTTP APIs they provide.
This over-long and rambling post summarises my current thoughts on the topic, both as background for that discussion, as well as for review in the wider community.
The Warm-up: Software vs. Web Versioning
Developers are used to software versioning; e.g., for every release, you bump an identifier. There are usually major versions, minor versions, and sometimes things like package identifiers.
This fine level of granularity is useful to both developers and users; each of these things has precise semantics that helps in figuring out compatibility and debugging.
For example, on my Fedora box, I can do:
cloud:~> yum -q list installed httpd
Installed Packages
httpd.x86_64 2.2.17-1.fc14 @updates
… and I’ll know that Apache httpd version 2.2.17 is installed, and it’s the first package of that version for Fedora 14.
This lets me know that any modules I want to use with the server will need to work with Apache 2.2; and, that if there are security bugs found in httpd 2.2.15, I’m safe. Furthermore, when I install software that depends upon Apache, it can specify a specific version — and even packaging — to require, so that if it wants to avoid specific bugs, or require specific features, it can.
These are good and useful things to use software versioning for; it’s evolved into best practice that’s pretty well-understood. See, for example, Fedora’s package versioning guidelines.
However, they don’t directly apply to versioning on the Web. While there are similar use cases — e.g., maintaining compatibility, enabling debugging, dependency control — the mechanisms are completely different.
For example, if you throw such a version identifier into your URI, like this:
http://api.example.com/v2.2.17-1.fc14/things/foo
then every time you make a minor change to your software, you’ll be minting an entire new set of resources on the Web;
http://api.example.com/v2.2.17-2.fc14/things/foo
Moreover, you’ll need to still support the old ones for old clients, so you’ll have a massive footprint of URIs to support. Now consider what this does to caches in the middle; they have to maintain duplicates of the same thing — because it’s unlikely that foo has changed, but it can’t be sure — and your cache hit rate goes down.
Likewise, anybody holding onto a link from the previous version of the API has to decide what to do with it going forward; while they can guess that there’ll be compatibility between the two versions, they can’t really be sure, and they’ll still need to be rewriting a bunch of APIs.
In other words, just sticking software versions into Web URL removes a lot of the value we get from using HTTP, and if you do this, you might as well be using a ‘dumb’ RPC protocol.
So what does work, on the Web?
The answer is that there is no one answer; there are lots of different mechanisms in HTTP to meet the goals that people have for versioning.
However, there is an underlying principle to almost any kind of of versioning on the Web; not breaking existing clients.
The reasoning is simple; once you publish a Web API, people are going to start writing software that relies upon it, and every time you introduce a change, you introduce the potential to break them. That means that changes have to happen in predictable and well-understood ways.
For example, if you start using the Foo HTTP header, you can’t change its semantics or syntax afterwards. Even fixing bugs in how it works can be tricky, because clients will start to work around the bugs, and when you change things, you break the workarounds.
In other words, good mechanisms are extensible, so that you can introduce change without wiping the slate clean, and it means that any change that doesn’t fit into an extension needs to use a new identifier, so it doesn’t confuse clients expecting the old behaviour.
So, if you want to change the semantics of that Foo header, you can either take advantage of extensibility (if it allows it; see the Cache-Control headers extensibility policy for a great example), or you have to introduce another header, e.g., Foo2.
This approach extends to lots of other things, whether they be media types, URI parameters, and potentially URIs themselves (see below).
Because of this, versioning is something that should not take place often, because every time you change a version identifier, you’re potentially orphaning clients who “speak” that language.
The fundamental principle is that you can’t break existing clients, because you don’t know what they implement, and you don’t control them. In doing so, you need to turn a backwards-incompatible change into a compatible one.
This implies that API versioning absolutely cannot be tied to software versioning in any way; doing so will needlessly limit (and often break) your clients, and generally upset people.
There’s an interesting effect to observe here, by the way; this approach to versioning is inherently non-linear. In other words, every time you mint a new identifier, you’re minting a fundamentally new thing, whether it be a HTTP header, a format identified by a media type, or a URI. you might as well use “foo” and “bar” as “v1” and “v2”. In some ways, that’s preferred, because people read so much into numbers (especially when there are decimal points involved).
The tricky part, as we’ll see in a bit, is what identifiers you nominate to pivot interoperability around.
An Aside: Debugging with Product Tokens
So, if you don’t put minor version information into URIs, media types and other identifiers, how do you debug when you have an implementation-specific problem? How do you track these minor changes?
HTTP’s answer to this is product tokens. The appear in things like the User-Agent, Server and Via headers, and allow software to identify itself, without surfacing minor versioning and packaging information into the protocols “core” identifiers (whether it’s a URI, a media type, a HTTP header, or whatever).
These sorts of versions are free — or even encouraged, delta the security considerations — to contain fine-grained identifiers for what version, package, etc. of software is running. It’s what they’re for.
The Main Event: Resource Versioning
All of that said, the question remains of how to manage change in your Web application’s interface. These changes can be divided into two rough categories; representation format changes and resource changes.
Representation format changes have been covered fairly well by others (e.g., Dave), and they’re both simple and maddeningly complex. In a nutshell, don’t make backwards-incompatible changes, and if you do, change the media type.
JSON makes this easier than XML, because it has both a simpler metamodel, as well as a default mustIgnore rule.
Resource changes are what I’m more interested in here. This is doing things like adding new methods, changing the URIs that clients use (including query parameters and their semantics), and so forth.
Again, many (if not most) changes to resources can be accommodated by turning them into backwards-compatible changes. For example, rather than bumping a version when you want to modify how a resource handles query parameters, you mint a new, sibling resource with a different name that takes the alternate query parameters.
However, there comes a time when you need to “wipe the slate clean.” Perhaps it’s because your API has become overburdened with such add-on resources, or you’ve got some new insights into your problem that benefit from a fresh sheet. Then, it’s time to introduce a new API version (which again, shouldn’t happen often). The question is, “how?”
In this Corner: URI Versioning
The most widely accepted way to do version resources of Web APIs currently is in the URI. A typical example might be:
http://api.example.com/v1/things/foo
Here, first path segment is a major version identifier, and when it changes, everything under it does as well. Therefore, the client needs to decide what version of the API it wants to interact with; there isn’t any correlation between URIs between v1 and v2, for example.
So, even if you have:
http://api.example.com/v2/things/foo
There isn’t necessarily any correlation between the two URIs. This is important, because it gives you that clean slate; if there were correlation between v1 and v2 URIs, you’d be tying your hands in terms of what you could do in v2 (and beyond).
You can see evidence of this in lots of popular Web APIs out there; e.g., Twitter and Yahoo.
However, it’s not necessary to have that version number in there. Consider Facebook; their so-called old REST API has been deprecated in favour of their new Graph API. Neither has “v1” or “v2” in them; rather, they just use the hostname to name space the different interfaces (“api.facebook.com” vs. “graph.facebook.com”). Old clients are still supported, and new clients can get new functionality; they just called their new version something less boring than “v2”.
Fundamentally, this is how the Web works, and there’s nothing wrong with this approach, whether you use “v1” and “v2” or “foo” and “bar” — although I think there’s less confusion inherent in the latter approach.
The Contender: HATEOS
However, there is one lingering concern that gets tied up into this; people assume — very reasonably — that when you document a set of URIs and ship them as a version of an interface, clients can count on those URIs being useful.
This violates a core REST principle called “Hypertext As The Engine of Application State”, or HATEOS for short.
RESTafarians have long searched for signs of HATEOS in Web APIs, and Roy has lamented its absence in the majority of them.
Tying your clients into a pre-set understanding of URIs tightly couples the client implementation to the server; in practice, this makes your interface fragile, because any change can inadvertently break things, and people tend to like to change URIs over time.
In a HATEOS approach to an API, you’d define everything in terms of media types (what formats your accept and produce) and link relations (how the resources producing those representations are related).
This means that your first interaction with an interface might look like this:
GET / HTTP/1.1
Host: api.example.com
Accept: application/vnd.example.link_templates+json
HTTP/1.1 200 OK
Content-Type: application/vnd.example.link_templates+json
Cache-Control: max-age=3600
Connection: close
{
"account": "http://accounts.example.com/{account_id}",
"server": "/servers/{server_id}",
"image": "https://images.example.com/{image_id}"
}
Please don’t read too much into this representation; it’s just a sketch. The important thing is that the client uses information from the server to dynamically generate URIs at runtime, rather than baking them into the implementations.
All of the semantics are baked into those link relations — they should probably be URIs if they’re not registered, by the way — and in the formats produced. URIs are effectively semantic-free.
This gives a LOT of flexibility in the implementation; the client can choose which resources to use based upon the link relations it understands, and changes are introduced by adding new link relations, rather than new URIs (although that’s likely to be a side effect). The URIs in use are completely under control of the server, and can be arranged at will.
In this manner, you don’t need a different URI for your interface, ever, because the entry point is effectively used for agent-driven content negotiation.
The downsides? This approach requires clients to make requests to discover URIs, and not to take shortcuts. It’s therefore chatty — a fairly damning condemnation.
However, notice the all-important Cache-Control header in that response; it may be chatty without caching, but if the client caches, it’s not that bad at all.
The main issues with going HATEOS for your API, then, are the requirements it places upon clients. If client-side HTTP tools were more widely capable, this wouldn’t be a big deal, but currently you can only assume a very low-level, bare HTTP API without caching, so it does place a lot of responsibility on your client developer’s shoulders — not a good thing, since there are usually many more of them than there are server-side.
So, there are arguments for and against HATEOS, and one could say the trade-offs are somewhat balanced; both are at least reasoned positions. However, there’s one more thing…
Enter Extensibility
Extensibility and Versioning are the peanut butter and jelly of protocol engineering. Sure, my kids’ cohort in Australian primary schools are horrified by this combination, but stay with me.
OpenStack has an especially nasty extensibility problem; they allow vendors to add pretty much arbitrary things to the protocol, from new resources to new representations, as well as extensions inside their existing formats.
Allowing such freedom with “baked-in” URIs is hard. You have to carve out extension prefixes to avoid collisions, and then hope that that’s good enough. For example, what if an API uses URIs like this:
http://api.example.com/users/{userid}
and HP wants to add a new subresource to the users collection? Does it become
http://api.example.com/users/hp
? No, that’s bad, because then no userid can be “hp”, and special cases are evil, especially when they’re under the control of others.
You could do:
http://api.example.com/users/ext/hp
and special-case only one thing, “ext”, but that’s pretty nasty too, especially when you can still potentially add “hp” to any point in the URI tree.
Instead, if you take a HATEOS approach, you push extensibility into link relations, so that you have something like:
GET / HTTP/1.1
Host: api.example.com
Accept: application/vnd.example.link_templates+json
HTTP/1.1 200 OK
Content-Type: application/vnd.example.link_templates+json
Cache-Control: max-age=3600
Connection: close
{
"users": "http://api.example.com/users/{userid}",
"hp-user-stuff": "http://api.example.com/users/{userid}/stuff"
}
Now, the implementation has full control over the URIs used for extensions, and it’s responsible for avoiding collisions. All that HP (or anyone else wanting an extension) has to do is mint a new link relation type, and describe what it points to (using existing or new media types).
This isn’t the whole extensibility story, of course; format extensions are independent of URIs, for example. However, the freedom of extensibility that taking a HATEOS approach gives you is too good to pass up, in my estimation.
The key insight here, I think, is that URIs are used for so many things — persistent identifiers, cache keys, bases for relative resolution, bookmarks — that overloading them with versioning and extensibility information as well makes them worse for all of their various purposes. By pushing these concerns into link relations and media types using HATEOS, you end up with a flexible, future-proof system that can evolve in a controllable way, without giving up the benefits of using HTTP (never mind REST).















