How I Did It The Oracle backend to LDAP Mark Adamson This is some of the notes on how I got the LDAP data into Oracle. Topics are in no particular order. Oracle ======= You need an Oracle database server, and mine was supplied by Dan Rossi on development machines. He set up a table space and a schema (login ID) for me first on skydiver.andrew and then on ldap1.andrew, and finally on ora1.andrew. All of my tables on skydiver are named with an "adamson" prefix, and "ldap" on the other two, since those are the schemas I was using. back-sql ========= The SQL backend to OpenLDAP needs to be compiled into the slapd server. I first compiled it in the openldap/015 collection, and then later downloaded new source and compiled it in the 019 area. The backend is enabled with the "-enable-sql" option to the configure script; this is done in the SMakefile. The backend needs the ODBC drivers; see below. The libtool will try to include the static version of the SASL2 library, libsasl2.a. This causes untold reams of grief with missing symbols, and libtool cannot get it into its head to pick the dynamic library. Therefore, make sure to delete the libsasl2.a and libsasl2.la before compiling. slapd.conf =========== The slapd config file has just a few additions that separate it from other backends. Indexes, SASL config, include files, etc all work the same. The special directives I used were: database sql suffix "dc=cmu,dc=edu" rootdn "cn=Manager,ou=AuthEntity,dc=cmu,dc=edu" rootpw dbname DevServer dbuser adamson dbpasswd upper_func UPPER children_cond "ldap_entries.dn=UPPER(?)" subtree_cond "ldap_entries.dn like UPPER(CONCAT('%',?))" insentry_query "INSERT INTO ldap_entries (id,dn,oc_map_id,parent,keyval) VALUES (ldap_entry_ids.nextval,UPPER(?),?,?,?)" use_subtree_shortcut yes oc_query "SELECT id,name,keytbl,keycol,create_proc,delete_proc,expect_return,create_hint FROM ldap_oc_mappings" id_query "SELECT id,keyval,oc_map_id FROM ldap_entries WHERE dn=UPPER(?)" Note the first 4 lines are the same for other DB's as well. The rootdn identity does not have to exist in the database. The next group of 3 tell how to authenticate to Oracle. The dbname matches the header name in square brackets in the odbc.ini file (below). The user/passwd directives then give the authentication pair to send across the wire to log in. The last directives are described in sections below. ODBC ========= The backend needs an ODBC driver to connect to an RDBM. Many are available for free, I picked the one from OpenlinkSW (www.openlinksw.com). The libiodbc.so or libodbc.so file needs to be accessible during the build and runtime, so I put it in /usr/local/lib. The ODBC driver is the glue part, but it will also need the RDBM-specific driver for your database, i.e. Oracle 8.1.x. These are not free, we bought one from the same company for around $400. It comes as a shared library named ora81_st_lt.so and a license file. There are several config files that need to be set up. First there is a file $HOME/.odbc.ini which tells where's the Oracle driver file. Mine looks like this: --BEGIN--------------------- ; ; odbc.ini ; [ODBC Data Sources] [LDAP] ServerType = Oracle 9.x Driver = /usr/openldap/openlink_ora.so Options = LDAP JetFix = No Description = The Main Database server ReadOnly = No FetchBufferSize = 60 NoLoginBox = Yes [DEVDB] ServerType = Oracle 9.x.dev Driver = /usr/openldap/openlink_ora.so Options = DEVDB JetFix = No Description = The Main Database server ReadOnly = No FetchBufferSize = 60 NoLoginBox = Yes --END----------------------- The "Driver" line tells where the shared library for Oracle is located. Next you have to set up a file $HOME/.tnsnames.ora which maps database names to host/port TCP/IP sockets. For the databases I used, I had this file: --BEGIN--------------------- LDAP = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = ora6.andrew.cmu.edu)(PORT = 1525)) ) (CONNECT_DATA = (SERVICE_NAME = ldap) ) ) DEVDB = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = ora-dev2.andrew.cmu.edu)(PORT = 1525)) ) (CONNECT_DATA = (SERVICE_NAME = DEVDB) ) ) --END----------------------- The OpenLinkSW people also give you a file for configuring the ODBC library. You place this in the "current" directory where OpenLDAP will run, in my case I put it in /usr/openldap/, under the filename openlink.ini. The file provided didn't need any modifications. Next to it I put the license file, ora8_lt.lic or ora9_lt.lic. OpenLink contacts me annually for renewal of this software. DATA DESIGN ============= There are three main types of objects in our LDAP directory: infrastructure, people, and accounts. Others will come up later, like groups, but they'll follow the same basic format as people or accounts. You will still need to create an LDAP schema file like those include'd in the slapd.conf file (e.g. core.schema) that tells slapd what attributes MAY and MUST be in an objectClass. The RDBM part of the data design is figuring out how to store the LDAP data for each objectClass in Oracle. The general idea is to have a table that has a column for each attribute that can appear in the objectClass. It doesn't matter to Oracle if the attr is a MAY or MUST, that's all handled in the slapd front end. Then each LDAP object of that type will have a row in the table. Since each cell in a table can hold only one value, you will be stuck when trying to work with LDAP multivalued attributes, since the second value inserted would overwrite and replace the first value. For LDAP multivalued attributes you need secondary tables. A secondary table has only two columns: a value and a key telling which object the value belongs to. So for a person who has two common names "cn"'s, there will be one row in the main person table for the entry, and then two rows in the secondary cn table. The two rows will each have one of the cn values, but both will have the same owner key indicated in the second column. This provides the one-to-many mapping from object to multivalued attribute. The main table for an objectClass must have a column that stores the ID number for each entry (row). I've always called that column "id", and it stores an SQL "number". I was tempted to use the GUID as the ID column for people, but found the ID column must be a number. That column can then be used for setting the owner of multivalued attributes; you put the ID number for a person into the second column of each of their cn's, for example. There will be multiple types of accounts in LDAP, from cmu.edu accounts, to system accounts, dept accounts, clubs, deleted accounts, etc. This would normally require a bevy of tables, one for each objectClass, but they all draw from the same pool of attributes, so I put them all into a single table which has a column for all possible (singlevalued) attributes. There are only a few multivalued attributes for accounts: cn, owner, mailLocalAddress (mLA), and mailForwardingAddres (mFA), so those secondary tables were also created. There is only one type of person, so far, thus there is a single table for the Person objectClass, plus a host of about 18 secondary tables for the multivalued attributes allowed in Person objects. The third type of objects, infrastructural objects, got put into their own tables, called Organization and OrganizationalUnit, and there's not much to them. Note that Oracle can put lots of constraints on the table data, so no one can ever put in bad data. For example, many ID numbers must be one to a person with no duplicates, so Oracle "unique" constraints are on cmuUnivID, cmuHRISid, and cmuSISid, etc The account names cannot be duplicated, so the pair of columns "uid" and "ou" are together given an Oracle unique constraint. Several columns have a finite set of allowed values, so Oracle "check" constraints are applied. Boolean values "TRUE" and "FALSE" are applied to many, while cmuStudentClass must be one of "Freshman", "Sophmore", etc. At the end of 2007 and into 2008, Mark Adamson and Brian Byrd worked on upgrading how the person records were stored. Information from many sources had come in over the years, such as from Tepper, Alumni, Sponsored people, etc, and this was generating a lot of duplicate looking attributes, like cmuHRIScat cmuNetGuestCat, cmuAlumCat. It seemed that despite the many sources of data, much of the data was the same type of information: address, phone number, affiliation, businessCategory, etc. Brian said a single table should hold these repeating attributes, with a row for each person for each source. Thus one person could have a row for SIS info, one for HRIS info, etc. The table will have one column (person_id) linking back to the cmuPerson.ID to say whose row it is, and another column (sourceName) to tell the name of the source from which the data comes (e.g. 'HRIS'). The other columns in the table include affiliation, sourceID, postalAddress, homePostalAddress, cmuUnivID, etc. The name of the table is personContactInfo. meta data ========== There's a lot of glue to map an LDAP object to the data held in the RDBM. The author based the idea mainly on objectClasses. There is one table that tells what objectClasses are available, a second table that tells what attributes are possible in each objectClass, and then there is a third table that gives the DN of each object. The first table is called ldap_oc_mappings, and it has one row for each known objectClass. It has four interesting columns: an ID number for each OC, the name of the OC, the "keytable" column names the main table for the OC, and "keycolumn" tells the name of which column in that main table contains the primary ID number of objects of the OC. This table gets populated once, when the LDAP db is created, and is very rarely ever touched again. The second table, which tells what attrs are in each OC, is called ldap_attr_mappings, and it lists one attribute per row. All attrs for all OC's are in this one table. It has about 7 interesting columns: the name of the attribute, which OC it is for, 3 columns for how to extract attribute values from the main and secondary OC tables, a column telling how to add new values, another telling how to delete values, and lastly a column telling if the add/delete functions have return codes. Since multiple OC's can include the same attribute, e.g. "cn" is in both cmuPerson and cmuAccount, ldap_attr_mappings can have multiple rows with the same "name" value. But the oc_map_id column tells which OC the row is for, so the name + oc_map_id will be unique for the ldap_attr_mappings table. It takes 3 bits of info to take attribute values out of the RDBM, and those bits are stored in the 3 columns sel_expr,from_tbls, and join_where. They make up the parts of an SQL select statement. For a singlevalued attribute, the from_tbls info tells which table holds the attribute. Normally, you hold the attribute value in the main table, so for all singlevalued attrs in the cmuPerson OC, the from_tbls column says "cmuperson", which is the name of the table that holds cmuPerson entries. The sel_expr tells what column in that table holds the attribute. The third bit, join_where, is normally unused for singlevalued attrs. For multivalued attrs, the from_tbls tells the names of the two tables needed to fetch the attribute values, and it usually consists of the main table and the secondary table, such as "cmuPerson,person_cn". The sel_expr tells the column name in the secondary table that holds the value(s). The join_where column now tells how to perform the SQL join to match rows in the secondary table with the entry they are for in the main table. So for a multivalued "cn" attribute, whose secondary table holds the two columns "cn" and "personID", with the "personID" value matching the value in the "id" column in the main cmuPerson table, the join_where for "cn" is "personID=id". Lots of other glue is added to these 3 bits of info to make the actual selection of which person is being queried, and all this is done by the back-sql backend. Each attribute is added or deleted to an entry by making a call to the PL/SQL code named in the add_proc or delete_proc column of ldap_attr_mappings. When an attribute is just added to a table, with no additional checks, the add_proc value can simply be an "update" or "insert" call. If additional checks are made or the data is processed further, a PL/SQL function should be written, and a call to it placed int the add_proc value. The functions are named whatever you'd like, and I stuck with names like add_person_cn() and del_account_billable(). The function can then do things like convert the case of the value being added, or check that it's within a numerical range, or convert an LDAP date to an Oracle number. For del_proc code, the SQL code sets the attribute to "null" for singlevalued attributes, or deletes the row from the secondary table for multivalued attributes, e.g. "delete from person_cn where personID=idvalue and cn=oldvalue;". The functions normally take 2 parameters: the ID number of the entry (e.g., the number in the "id" column of a Person entry), and the new value being added (or deleted for delete_proc). Some add/delete functions like to perform some logic to determine if the operation should succeed. For example, I made it illegal to delete the "username" (aka "uid") from an account object. For slapd to know these functions transfer the succeed/fail information back to it, these attributes set flags in the expect_return column in ldap_attr_mappings. add functions set bit=1 and delete functions set bit=2 in the flag in expect_return, (so for example if your add function doesn't return a value but your delete function does, set expect_return to 2). The matching add_proc and delete_proc functions that are to return a value will take an extra parameter at the beginning of the list. This is an OUT parameter, which returns 0 on success (allowed) and an LDAP error code (e.g.LDAP_CONSTRAINT_VIOLATION) on failure (denied). The third meta glue table is ldap_entries. It has five interesting columns for keeping track of the DIT and all entries' DN's. The first column, "id", is just an ID number for that row, and becomes relevant later. There is an "oc_map_id" column telling the OC of the object this row represents, and a "keyval" column telling which entry of that objectClass this row is for. So if you look in ldap_oc_mappings and see cmuPerson has an ID=2, then all cmuPerson objects will have oc_map_id=2 in ldap_entries; if you want the DN of the entry in the cmuperson main table where the column id=1256, look for the row in the ldap_entries table where oc_map_id=2 and keyval=1256. The "DN" column very simply gives the complete DN of the entry, and the "parent" column tells the ID number of the parent DN of this entry. It is the "parent" column that must match a value in that "id" column mentioned first. The root DN (dc=cmu,dc=edu) has parent=0, and usually id=1; all entries on the first level, e.g. ou=Person,dc=cmu,dc=edu, have id=2,3,4,5... and parent=1. Again, given a row in ldap_entries, you can look at oc_map_id to find out the OC of the object, and then look in ldap_oc_mappings for the row matching that oc_map_id to find the OC name and which main table it's in. The "keycolumn" info in ldap_oc_mappings tells the column in that main table to find the ID number that matches "keyval" in ldap_entries. So for example if ldap_entries has oc_map_id=2 and keyval=31415, you see in ldap_oc_mappings that id=2 is for the cmuPerson objectClass, the keytable for that OC is the "cmuperson" table, the keycolumn is called "id", so this ldap_entries row matches the cmuPerson with id=31415. Creating new entries ===================== When an LDAP ADD request comes in, slapd first examines the objectClass of the new entry and looks it up in ldap_oc_mappings. One of the (thus unmentioned) columns in that table is "create_proc", which gives the name of a PL/SQL function, e.g. 'call create_person(?)'. slapd will call that function, with the first parameter being an OUTPUT number. That function has to add a row to the correct main table, give it an ID number, and return the ID number. The ID numbers are usually just an Oracle "sequence", so the function may just call create_person(RC OUT number) BEGIN insert into cmuPerson (id) values (cmuPerson_ids.nextval); RC := cmuPerson_ids.curval; END If you've set up columns in the main table that have "not null" restrictions, it is up to that function to fill in those columns. Note that originally NO information about the LDAP entry is available to the function, so you have to dummy in values for "not null" columns. For cmuPerson entries, I had put "guid" as being not null, so the create_person() function copied the sequence number into the guid column. Later, I added a new column to ldap_oc_mappings called "create_Hint", and in it you are able to put a single attribute name. When slapd calls the create_proc() function it will find the value of the attribute named by create_Hint and pass that value to create_proc(). For cmuPerson, for example, I pass the "guid" attribute. The reason this was necessary was in the inner workings of Oracle. The cmuPerson table in Oracle was designed to actually be 16 tables, for faster lookups, and rows get put into one of the 16 tables based on a hash of the "guid". If the "guid" is not available at the time the row is created, and then later the guid is added, Oracle has to do weird operations to move the row around. The create_Hint allows the row to be created in just the right table. Similarly, the cmuAccount objects use a create_hint of "uid". Once the create function returns, slapd will fill in the attr=value pairs one at a time, calling the add_proc for each one. Any errors, such as a failed Oracle "check" constraint (e.g. setting cmuStudentClass=blef) will cause the entire entry add to fail. Any attribute with expect_return & 1 can return non-zero in the output parameter to the add_proc and cause a similar entire entry failure. If an LDAP add fails and you're not sure why, you can check the slapd output log on STDERR to see which attribute was being added when the failure occured. The information is availble when the parameter to the -d flag to slapd includes the 1 bit (e.g. "slapd -d 257"). After all values are filled in, the meta tables are updated for you. This means that the DN for the entry is NOT in ldap_entries during the add process, so none of the add functions can look up the DN of the entry being added. Attributes with expect_return != 0 ----------------------------------- NAME EXPECT_RETURN -------------------- ------------- billingFromDate 1 deleteDate 1 expires 1 guardian 1 holdplacedby 1 lastWarningDate 1 lastbillingDate 1 macOSXhomeDir 1 ou 1 owner 1 cmuCampus 1 cmuDepartment 3 cmuPrimaryCampus 1 destroyDate 1 The expires and *Date functions make sure the passed in date is of a legitimate LDAP format. The ou, macosxHomeDir, and uid functions are for attributes that cannot be changed, so they make sure that if you try to add a value, you are only replacing the existing value with the SAME VALUE, i.e. a no-op. And lastly, the owner functions make sure the passed in value is a legitimate DN that already exists in ldap_entries. Implied Attributes =================== Many attributes that we keep in LDAP are copies of data from elsewhere. For example the cmuAndrewID attribute in Person records is just the UID attribute copied in from all Andrew accounts owned by the person; the cmuAccount attribute in Person records is the DN of all accounts owned by the person. Rather than copy data around and have to worry about triggers and data consistency, the back-sql allows us to imply the values of some attributes. This means that the cmuAndrewID will NOT be stored in the Person record, it will be fetched dynamically from the Accounts table. If you look at the ldap_attr_mappings table for the cmuAccount attribute, you will see it is fetched from cmuAccount where the owner column equals the ID of the person record. This makes the Oracle tables lighter and more reliable. Client programs no longer have to update them either, making them more lightweight. So, for example, when the "owner" is set in an Andrew account, the LDAP client does not have to update the Person record to add cmuAccount, cmuAndrewID, and cmuActiveDN. On the downside, it means that triggers must watch for when these implied attributes would change, and inform the RO LDAP copies of the new data. So when an "owner" is set in an Account, a trigger will write three lines to the "slurpd" table, adding cmuAccount, cmuAndrewID, and cmuActiveDN for the replica sites. List of implied attributes -------------------------- Person records cmuAccount cmuActiveDN cmuEduID cmuAndrewID cmuCSID cmuECEID cmuSEIID cmuCITID cmuStatID cmuSISepa macosxHomeDir Account records macosxHomeDir Troubleshooting ================= Here are some error messages and what had to be done to solve them. slapd complains that SASL mechanism does not allow client to talk first ------------------------------------------------------------------------ The LDAP client is sending a 0 length string instead of a NULL to SASL2. Check through the LDAP binding functions for the sasl_client_start function and find shortly after where it sends the client cred to ldap_bind. If the cred is length 0, send a NULL, and NOT a 0 length string. client is hanging up on the server right after the bind. ------------------------------------------------------------ As soon as the server is sending something to the client, e.g. the first search result, the client disconnects and says server is unreachable. What is really going on is the client can't decrypt the server message, because the server didn't encrypt it right. The server hasn't sent the IP addresses of both client and server to SASL with sasl_setprop(). Look in the server SASL code for the _open() function, and send the IP addrs to SASL after the SASL session is started. SQL binding fails: ora-12154: cannot resolve service name ---------------------------------------------------------- The slapd ODBC needs a $HOME/.tnsnames.ora file. Ask someone who knows what one looks like.