Contrary to most DNS servers (such as BIND and NSD) which “compile” (i.e. check) the data they will be serving before successfully loading a zone, PowerDNS has to make do with what it finds in one of its sundry back-end databases. And let me tell you that it sometimes has to cope with very weird data. Here are some examples I’ve been finding recently:

  • Zones with missing SOA records
  • SOA records with impossible data in them (e.g. an mname with a colon in it), some of the SOA timers with a 0 in them, etc.
  • Zones without NS records
  • CNAME records with other data, which is forbidden
  • Domain names with non-ASCII characters in them
  • Domain names with white space in the names
  • Impossible IP addresses for A (e.g. 10.1.2) and AAAA (e.g. 10.1.2.1) records
  • Illegal RR types in the records table
  • Differing case of domain names in the records table associated with a single zone
  • Unqualified domain names in records table
  • Qualified mname and rname fields in SOA records

These are just some of the highlights.

PowerDNS is lenient when you query it and it finds bad data; it tries to fix some of the stuff on the fly which, on the one hand is good because the program helps you around your bad data, but on the other hand is really bad when you want that data served by a different brand of DNS server.

Case in point is the following scenario, where PowerDNS is configured as a hidden master server to an NSD, Yadifa, Knot, or BIND server:

PowerDNS as a hidden master

The slave servers on the right of the diagram will obtain their data using zone transfers (AXFR) from PowerDNS, and PowerDNS will gladly give them what it has. However, as BIND, say, checks the validity of incoming data, it may refuse to load the zone.

I was recently at a client who had a lot of incorrect data in their database. Let me repeat that: a lot. Several tens of thousands of zones that BIND would have either refused to load or which would have caused it to go a bit bonkers. (Look at the list above, and you may find some reasons why that would happen.) I’ll show you a simple example. Consider the following row in the records table:

mysql> SELECT name, type, content, ttl FROM records WHERE name like 'jp1.%';
+-------------+------+---------+------+
| name        | type | content | ttl  |
+-------------+------+---------+------+
| jp1.exam.aa | A    | 1.2.3   |   10 |
+-------------+------+---------+------+

Suppose you query PowerDNS for an A resource record, what would you expect to see? The answer is

;; ANSWER SECTION:
jp1.exam.aa.		10	IN	A	0.1.2.3

which is not what I would have expected. Be that as it may, it’s rubbish; nobody noticed when it was inserted years ago, and it hasn’t hurt anybody (because nobody queried that particular record or didn’t care about the result), but it’s rubbish none the less. This single entry isn’t particularly painful, but imagine an SOA refresh timer set to a few seconds, on thousands of zones … I’ll let your imagination wonder. :-)

We urgently needed to clean up the data in the back-end database, which we did. We concocted all sorts of programs which ran through and repaired stuff. End of story. Everybody is happy.

Almost everybody; I am not.

I’m convinced this is a) going to happen to other people, and b) can be handled. But how?

Let me re-iterate: this is absolutely no fault of PowerDNS; it’s the fault of the “provisioning” systems which allow incorrect data into whatever PowerDNS uses as a back-end database. Some people have provisioning systems that perform very careful checks, but others don’t: they might use a command-line interface (mysql) to add data, or maybe they use a flashy Web interface that doesn’t do enough checking before issuing the final SQL INSERT INTO table ... or UPDATE table SET ....

Be that as it may, bad data (from a DNS perspective) can typically easily make it into a bunch of database tables.

A longish example

Let me show you a few records for a (not so fictitious) zone called a1.aa, taken directly from a database which is being served by PowerDNS. I’ve recreated the records to protect the innocent.

SELECT id, name, type, content FROM records WHERE domain_id =
               (SELECT id FROM domains WHERE name = 'a1.aa');
+----+------------------+------+------------------------------------------+
| id | name             | type | content                                  |
+----+------------------+------+------------------------------------------+
| 36 | a1.aa            | SOA  | a1.NEt nOC.example.de 1 60 90 86400 3600 |
| 37 | a1.aa            | NS   | 10.0.12.1                                |
| 38 | a1.aa            | NS   | dns3..a1.aa                              |
| 39 | a1.aa            | NS   | ns.a1.aa                                 |
| 40 | a1.aa            | NS   | dns4.a1.aa.                              |
| 41 | a1.aa            | A    | 10.1.1                                   |
| 42 | a1.aa            | A    |  192.168.1.106                           |
| 43 | local host.a1.aa | A    |  127.0.0.1                               |
| 44 | l6.a1.aa         | AAAA | 192.168.69.14                            |
| 45 | l6.a1.aa         | AAAA | 2001::de8:8412:1a90::12                  |
| 46 | l6.a1.aa         | AAAA | fe80::a00:27ff:fe7a:63db                 |
| 47 | info..a1.aa      | TXT  | "hello world"                            |
| 48 | info.a1.aa.      | TXT  | another bit of prose                     |
+----+------------------+------+------------------------------------------+
13 rows in set (0.00 sec)

How many pitfalls do you see there? Two? Five? More? There are quite a few, even if some are hard to spot.

As a first step, let us ask PowerDNS some questions. Remember please, I know things are going to break. Actually it’s rather cool how PowerDNS handles this stuff to avoid breakage…

$ dig @localhost a1.aa
;; ANSWER SECTION:
a1.aa.                  600     IN      A       0.10.1.1
a1.aa.                  600     IN      A       192.168.1.106

Hmm. We’ve discussed the first issue already, but look at the second record in the reply: it’s good, in spite of the white space in the content column!

How about an ANY query?

$ dig @localhost a1.aa any

;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 28149
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

Oops. The PowerDNS console (or log) shows the following correct and fair answer:

Jan 31 17:41:48 Exception building answer packet (DNSPacketWriter::xfrLabel() \
                found empty label in the middle of name) sending out servfail

Let’s see what the (relatively new) pdnssec utility says about the underlying data:

$ pdnssec check-zone a1.aa
[Error] Following record had a problem: a1.aa IN NS dns3..a1.aa
[Error] Error was: DNSPacketWriter::xfrLabel() found empty label in the middle of name
[Warning] The record a1.aa with type NS has a trailing dot in the content (dns4.a1.aa.). Your backend might not work well with this.
[Error] Following record had a problem: l6.a1.aa IN AAAA 192.168.69.14
[Error] Error was: Asked to encode '192.168.69.14' as an IPv6 address, but does not parse
[Error] Following record had a problem: l6.a1.aa IN AAAA 2001::de8:8412:1a90::12
[Error] Error was: Asked to encode '2001::de8:8412:1a90::12' as an IPv6 address, but does not parse[Error] Following record had a problem: info..a1.aa IN TXT "hello world"
[Error] Error was: DNSPacketWriter::xfrLabel() found empty label in the middle of name
[Warning] The record info.a1.aa. with type TXT in zone a1.aa is out-of-zone.Checked 12 records of 'a1.aa', 4 errors, 2 warnings.

All in all, what “looked” somewhat OK to us from the database, is going to cause a number of problems. Thankfully, PowerDNS has been tightening up on the somewhat lax rules it used to apply (or rather: not apply) to its back-end data. This is necessary in the transition to DNSSEC.

pdnssec’s check-zone doesn’t find all errors – it wasn’t designed to do that. For example, CNAME and other records weren’t caught if their owner names differed in case. That has meanwhile been fixed.

Be that as it may, is there a solution for this?

Is there a cure?

Warning (insert guy shoveling pic)

I strongly believe the only solution is to enforce rules for the data in the database tables used by PowerDNS to be correct the moment it enters said database. I posted an idea to the PowerDNS mailing-list and feedback has been “mixed”. I won’t say we’ve been having fist fights, but only because the good #powerdns people on IRC and I were geographically distant from one-another. ;-)

There are basically two schools of thought:

  • Those that say: “make sure your data is good before it enters the database”
  • Those that say: “make sure your data can be stored in the database only if it’s good”. That’s me. ;-)

To cut a very long story short, I’ve been prototyping a couple of MySQL User Defined Functions which, together with a few triggers, should ensure that illegal data cannot be inserted into the database tables.

It’s early days, but the prototype looks pretty good, if I may say so myself.

Broadly speaking, there are two UDFs: one is called checkname, and the other is called checkrr. The former applies regular expressions (and spare me please; I know most of the jokes) to owner names, and the latter applies similar rules to the rdata (i.e. the content column). These rules are taken from a very lightweight and fast TinyCDB database on the fly, which is compiled from input such as this:

NS:name         ^([a-z][a-z0-9\-]+(\.|\-*\.))+[a-z]{2,6}$
NS:content      ^([a-zA-Z][a-zA-Z0-9\-]+(\.|\-*\.))+[a-zA-Z]{2,6}$

AAAA:name       ^(\*\.)?([a-z][a-z0-9\-]+(\.|\-*\.))+[a-z]{2,6}$
AAAA:content    @IP

SOA:name        ^([a-zA-Z][a-zA-Z0-9\-]+(\.|\-*\.))+[a-zA-Z]{2,6}$
SOA:content     @SOA

SOA:mname       ^([a-zA-Z][a-zA-Z0-9\-]+(\.|\-*\.))+[a-zA-Z]{2,6}$
SOA:rname       ^([a-zA-Z][a-zA-Z0-9\-]+(\.|\-*\.))+[a-zA-Z]{2,6}$
SOA:serial      ^\d+$
SOA:refresh     ^\d+$
SOA:retry       ^\d+$
SOA:expiry      ^\d+$
SOA:minimum     ^\d+$

SOA:refresh:min         600
SOA:refresh:max         7201
...

Each record type (NS, AAAA, etc.) can have any number of regular expressions applied to its name or content; if none of the rules match, the check fails.

For example, for the database record

+----+------------------+------+------------------------------------------+
| id | name             | type | content                                  |
+----+------------------+------+------------------------------------------+
| 37 | a1.aa            | NS   | 10.0.12.1                                |

the rulesets for NS:name are applied to the name column, and the rulesets NS:content are applied to the content column. You can easily tell from the NS:content regular expression, that the content check will fail for this value of content because it’s an IP address and not a domain name.

The special content rulesets @IP and @SOA perform inet_pton() and SOA checks respectively. The latter splits the SOA record into tokens and applies the SOA:mname, SOA:rname, … rules to the individual portions of the SOA record. As a special case, we can define, say, mininum and maxiumum values for the numeric portions of the SOA record. The example above specifies that the minimum for the refresh timer should be 600 seconds, and its maximum must be less than or equal to 7201 seconds. (Yes, I’ve also implemented an SOA:xxxx:equals rule.)

Let me show you what I have already. I’ll apply the UDFs to the records table from above.

SELECT id, type, name, SUBSTR(checkname(type, name), 1, 1) 
        FROM records 
        WHERE domain_id = (SELECT id FROM domains WHERE name = 'a1.aa');
+----+------+------------------+-------------------------------------+
| id | type | name             | SUBSTR(checkname(type, name), 1, 1) |
+----+------+------------------+-------------------------------------+
| 36 | SOA  | a1.aa            | Y                                   |
| 37 | NS   | a1.aa            | Y                                   |
| 38 | NS   | a1.aa            | Y                                   |
| 39 | NS   | a1.aa            | Y                                   |
| 40 | NS   | a1.aa            | Y                                   |
| 41 | A    | a1.aa            | Y                                   |
| 42 | A    | a1.aa            | Y                                   |
| 43 | A    | local host.a1.aa | N              space in name        |
| 44 | AAAA | l6.a1.aa         | Y                                   |
| 45 | AAAA | l6.a1.aa         | Y                                   |
| 46 | AAAA | l6.a1.aa         | Y                                   |
| 47 | TXT  | info..a1.aa      | N              empty label in name  |
| 48 | TXT  | info.a1.aa.      | N              trailing dot in name |
+----+------+------------------+-------------------------------------+
13 rows in set (0.00 sec)

The first character from the response of the checkname UDF contains a Y or N depending on whether the name is correct or not. (The rest of the string currently has debugging information in it, which is why I’m omitting that here; I’ve replaced that with manually added comments.)

Let’s now check the content column for the same records; this time I’ll leave the debug info in the column to save me typing it out:

SELECT id, type, content, checkrr(type, content)
      FROM records
      WHERE domain_id = (SELECT id FROM domains WHERE name = 'a1.aa');
+----+------+------------------------------------------+--------------------------------------------------------------------------------------+
| id | type | content                                  | checkrr(type, content)                                                               |
+----+------+------------------------------------------+--------------------------------------------------------------------------------------+
| 36 | SOA  | a1.NEt nOC.example.de 1 60 90 86400 3600 | N:SOA `refresh' doesn't satisfy min 600: 60                                          |
| 37 | NS   | 10.0.12.1                                | N:NS [10.0.12.1] fails regexp [^([a-zA-Z][a-zA-Z0-9\-]+(\.|\-*\.))+[a-zA-Z]{2,6}$]   |
| 38 | NS   | dns3..a1.aa                              | N:NS [dns3..a1.aa] fails regexp [^([a-zA-Z][a-zA-Z0-9\-]+(\.|\-*\.))+[a-zA-Z]{2,6}$] |
| 39 | NS   | ns.a1.aa                                 | Y                                                                                    |
| 40 | NS   | dns4.a1.aa.                              | N:NS [dns4.a1.aa.] fails regexp [^([a-zA-Z][a-zA-Z0-9\-]+(\.|\-*\.))+[a-zA-Z]{2,6}$] |
| 41 | A    | 10.1.1                                   | N:[10.1.1] is not a valid IP address for A                                           |
| 42 | A    |  192.168.1.106                           | N:[ 192.168.1.106] is not a valid IP address for A                                   |
| 43 | A    |  127.0.0.1                               | N:[ 127.0.0.1] is not a valid IP address for A                                       |
| 44 | AAAA | 192.168.69.14                            | N:[192.168.69.14] is not a valid IP address for AAAA                                 |
| 45 | AAAA | 2001::de8:8412:1a90::12                  | N:[2001::de8:8412:1a90::12] is not a valid IP address for AAAA                       |
| 46 | AAAA | fe80::a00:27ff:fe7a:63db                 | Y                                                                                    |
| 47 | TXT  | "hello world"                            | Y                                                                                    |
| 48 | TXT  | another bit of prose                     | N:TXT [another bit of prose] fails regexp [^".*"$]                                   |
+----+------+------------------------------------------+--------------------------------------------------------------------------------------+
13 rows in set (0.00 sec)

Pay particular attention to row 36, where the check has failed because the refresh timer isn’t within specified bounds. Row 44 is also bad: there’s an IPv4 address in an AAAA record. And so on, and so on.

Adding these UDF to a MySQL trigger then results in the following, when I try to insert such a record:

mysql> INSERT INTO records (domain_id, name, type, content) VALUES (6, 'a..a1.aa', 'A', '1.1.1.1');
ERROR 1123 (HY000): Can't initialize function 'raise_error'; Invalid record NAME: A [a..a1.aa] fails regexp [^(\*\.)?([a-z][a-z0-9\-]+(\.|\-*

mysql> INSERT INTO records (domain_id, name, type, content) VALUES (6, 'a1.a1.aa', 'A', '2.2.3');
ERROR 1123 (HY000): Can't initialize function 'raise_error'; Invalid record CONTENT: [2.2.3] is not a valid IP address for A

Interfaces

One of the very valid arguments on IRC was that all this would be useless if existing Web interfaces to PowerDNS (of which there are far too many) wouldn’t profit. Now, I know that many people use them, others have rolled their own interfaces or provision differently.

So the question is: how does, say, PowerAdmin, one of the more popular interfaces, react when the underlying database fails on an INSERT? The answer: it fails nicely. :-)

PowerAdmin interface with back-end error

I’m going to leave this here for a bit, and let it all sink in. I’m not yet convinced this is a good idea, mainly due to the heavy use of, yeah, regular expressions. But who knows: with a bit of work, maybe this could turn into something useful. On the other hand, the use of MySQL UDFs deprives PostgreSQL users from the benefits of however much work we put into this.

If you feel this is a Good Thing, tell me. If, on the other hand, you feel this is a Stupid Idea, then by all means, tell me. I’m awaiting your feedback. :-)

Update: I’ve put what I have in the way of code up for grabs. Fix it, make it good, and send me lots of pull requests. :)

PowerDNS, Database, and MySQL :: 31 Jan 2013 :: e-mail