sql – Avoiding duplicate identifiers in the database – Education Career Blog

NOTICE: Appericiate all the answers, thanks but we already have a sequence.. and can’t use the UNIQUE contraints because some items need to have duplicates.. I need to handle this using PLSQL somehow, so based on some criteria (using an if statement) i need to ensure there is no duplicates for that.. And just to confirm, these identifiers are very customized with different types of string.. And for each set of string, we have a number taht counts up only for that string (STR-STR-####) and we have like hundreds of these STR-STR combinations and for each combination we have a #### that counts up… And on top of these some STR-STR combinations are allowed to have duplicates.. So we CAN’T use UNIQUE CONTRAINTS, not can we use PRIMARY KEY as it’s not a simple number and on top of that we do have a PRIMARY KEY assigned to the each item.. These Identifiers are for the users and not for the database management.

When a user creates an item, based on some creteria, we give a certain number to the item.. In the save function, a function is called to prepare the first initial string, then the table where the numbers are kept is scanned and the next available 4 digit number for that particular string is assigned.

Now there is some issues with this, there is about 1000 lines of codes between getting the next number and the commit to the database. The issue is when 2 people create an item with the same criteria within seconds of each other, sometimes the same number is issued.

What i’ve done is that just before the commit, i check the database for the number and if it exist, i call the function to get the next available number again…

Even though this code decreased the chances of a duplictation, if i save 2 items exactly at the same time, i still get a duplicate number..

Anyone know any way to avoid duplicates and drop the chances of a duplicate number to a 0?

EDIT1: We already have a primary key.. This identifier is a special string, and very customized, therefore it woudn’t be possible to just use a number that counts up from 0

EDIT2: There is also some cases where we need duplicates.. this is very minor (maybe about 10 different item tracks use duplicates) so before i do my checks for duplicates before the commit with an if statment, so if the item saves dosn’t belong to one of the systems that is supposed to have duplicates then, i skip the check…

EDIT 3: We are using PL/SQL here

EDIT 4: I guess this problem was very specific and i didn’t quite convey it so.. Although there were many answers, noone of them really captured my problem.. Regardless,i solved the problem and added below as my answer..

,

Look up the command CREATE SEQUENCE. Oracle can handle the unique number generation for you. (Virtually) Every database has a way of handling this issue although they differ somewhat in how it’s implemented.

In response to the new requirements edited into the question

You can still use a SEQUENCE to produce the counting-up part, then combine that with the prefix and save it back to the database. And, in those rare cases in which you need a duplicate ID, don’t get a number from the SEQUENCE, just use the one you already have. But SEQUENCE will solve your problem of creating a “well” from which you can draw a guaranteed unique number when needed

,

It sounds like you’ve denormalised 3 pieces of data into 1 field. This is what your string field currently contains:

  • StringField char (12): STR-STR-####

This is what you really should have (example field names only; it would help to give these meaningful names, but I don’t know what your data represents):

  • Str1 char (3): STR
  • Str2 char (3): STR
  • ID int: ####

You can now use a sequence in the ID field.

When you want to get your original string back, you concatenate the contents of the Str, Str2 and ID fields (with dividing hyphens).

In short: Your database design is broken, and you’re now paying the price for it. My advice would be to fix the design by normalising your identity field into 3 separate fields. Alternatively, you’ll find that you spend hours recreating built-in functionality of the database and end up with a solution that is buggy and has horrible problems with race conditions.


Alternatively, if PL/SQL allows all of this functionality:

Create a table that contains the following fields:

  • Str1 char (3)
  • Str2 char (3)
  • CurrentID int

Then:

  • For each possible combination of STR-STR identifiers, add an entry into the database, with the “CurrentID” value set to 0.
  • Write a stored procedure for retrieving the next ID. It will lock the relevant row based on the STR-STR pair passed in, get the value in CurrentID, increase the value, unlock the row and return the increased value.
  • Call the procedure whenever you need to generate a new ID.

You’ll have no concurrency problems, as each attempt to get an ID will have to wait for any other attempts to finish. Each STR-STR pair will have its own counter.

,

I almost hate to suggest this, but since the approach that you settled on was pretty lousy then the one I’m going to show here is pretty good in comparison. Forgive the sarcasm, but when you have this many users telling you that your approach in general is wrong you ought to take that into consideration.

Somewhere you are probably running a SQL like this:

SELECT MAX(DENORMALIZED_FIELD)
INTO   BADLY_NAMED_VARIABLE
FROM   POORLY_ORGANIZED_TABLE
WHERE  DENORMALIZED_FIELD LIKE 'ABC-XYZ%';

Then later you’re probably split the variable using SUBSTR, parse the second part into a NUMBER, increment it, and build a new variable with the new code.

What you can do is add the FOR UPDATE clause to the SELECT statement, placing a lock on the records in question. Sure, you’re not actually updating them, but by your definition you have to serialize the operation. This is a non-performant, unscalable, and dirty way to get what you want, but it should work. Check the Oracle docs to see all the implications.

,

Use a UNIQUE INDEX on that column, or a PRIMARY KEY!

,

Now there is some issues with this,
there is about 1000 lines of codes
between getting the next number and
the commit to the database. The issue
is when 2 people create an item with
the same criteria within seconds of
each other, sometimes the same number
is issued.

This is worrying. You should really be using a stored procedure for this and wrap it all in a transaction. You got nothing guaranteeing that any two records will have a different number and will cause problems. But I agree – you need to set a column as a unique identifer and use primary keys – it is a relational database I assume!

,

You might need some sort of serialization process to go through this. One path I would recommend is leaving this field blank during the initial insert, wait for the procedure to commit and then have another process (a recurring job for example) fill this column.

This other process would fill all rows according to your business rules. This job would be the only one able to touch this column and you would therefore avoid any concurrency problem.

You could set the job to run every X seconds, this would imply a small delay during which the column would be empty in this setup. Or you could let the initial session launch the update process after the commit with some sort of serialization (a preemptive lock on a row so that no two update process can run at the same time).

,

This could be done using a function-based unique index. First you’d need to add a column to the table to specify whether the identifier needs to be unique for each row:

alter table ... add (identifier_must_be_unique varchar2(1)
    check (identifier_must_be_unique='Y'));

Now create a unique FBI that only holds the identifiers that need to be unique:

create unique index xxx on yyy 
(case when identifier_must_be_unique='Y' then identifier end);

Finally, in your identifier generation logic, whenever you need the identifier to be unique, set identifier_must_be_unique=’Y’; otherwise leave it as null). The FBI will then implement the conditional constraint.

,

Create a PRIMARY KEY column or a UNIQUE KEY column that is auto incremented (either with an AUTOINCREMENT keyword or via SEQUENCE (as Oracle does)). That way, if 2 people add 2 exact data row at the exact same time, the database will add 2 exact values but each with their own unique ID.

,

Yuor code should definitely be replaced with a SEQUENCE. Oracle manages this concurrency very well.

Also, this sequence should be saved in a column that is constrained as UNIQUE.

,

You could add a table to hold all the identifiers that have been used and need to be unique like this:

create table unique_identifiers (id varchar2(12) primary key);

Then whenever in your 350-line logic you generate a ‘STR-STR=####’ value that needs to be unique, insert it into that table. If the insert fails, get another number and try again.

,

You have a broken data model. But, i’m assuming fixing it isn’t a viable option. If you biggest problem is the duplicate keys when you run two quick selects against the table that contains the sequence values (you should use sequences here… but you know that)… then you can try to use the “select … for update” which will lock the row for the session accessing it. Be careful, this logic may introduce latency into the application.

,

I may be missing something obvious, but I’d suggest something similar to what someone else suggested earlier. Create a unique constraint on the column such that when you try to persist the record, it will fail if some other process has already used that custom generated ID. You then can trap that exception in the PL/SQL and implement some sort of retry logic until you are able to successfully insert a unique generated ID into the database.

,

Well noone of the answers really solved the problem, mostly because i didn’t convey the whole situtation properly..

But basically i put my check just before the commit into a loop and updated and committed and checked again within the same loop.. and if still existed, the loop ran again, in which case the check happened.. It’s kinda hard to explain but the chances of a duplicate is extremly low (the loop has a limit of 100)

Leave a Comment