Main Page | Report this Page
 
   
Science Forum Index  »  Cryptography Forum  »  Update on my Enc+Auth Mode
Page 1 of 3    Goto page 1, 2, 3  Next
Author Message
Tom St Denis
Posted: Thu Dec 25, 2003 2:40 pm
Guest
I took the mode I discussed earlier... that is

1. Z = 0
2. for i from 1 to m do
2.1 C[i] = Ek(P[i] xor F2(i))
2.2 Z = Z xor F2(P[i] xor C[i])
3. Z = F1(Z xor F2(m + 1))
4. Return C[1..m], Z

And timed just step 2.1/2.2 [since they form the crux of the system]. Using
full RC6 as F1 and three rounds of RC6 as F2 I got a timing of about 28
cycles per byte [all in C on my Athlon]. RC6 itself takes 20 cycles/byte in
C [timed from LTC]. So really this mode has a trivial amount of overhead.

Some test code is at

http://iahu.ca:8080/tommac.c

It doesn't have key schedule, decrypt, terminate or verify functions [just
wanted proof of concept for encrypt].

It was pointed out that this mode is like OCB. Though my design is
conceptually simpler OCB may be a tad faster. However, my design has the
added bonus that it is not covered by patents!!! Wink At 28 cycles/byte this
mode is as fast as AES is in ECB mode using pure C code. In fact out of all
of the ciphers that LTC supports only three are faster (RC5, RC6, CAST5 in
that order) when in pure ECB mode.

I've asked someone in private if they want to co-author a paper on the
design, however, if he declines I'd appreciate anyone who is willing to
help. I'd like to submit the paper to Crypto'04 which means I only have a
month to write the paper.

Thanks,
Tom
Bartosz Zoltak
Posted: Fri Dec 26, 2003 6:12 am
Guest
Tom St Denis wrote:
Quote:
1. Z = 0
2. for i from 1 to m do
2.1 C[i] = Ek(P[i] xor F2(i))
2.2 Z = Z xor F2(P[i] xor C[i])
3. Z = F1(Z xor F2(m + 1))
4. Return C[1..m], Z

And timed just step 2.1/2.2 [since they form the crux of the
system]. Using
full RC6 as F1 and three rounds of RC6 as F2 I got a timing of about
28
cycles per byte [all in C on my Athlon]. RC6 itself takes 20
cycles/byte in
C [timed from LTC]. So really this mode has a trivial amount of
overhead.

(...)
Quote:
I'd like to submit the paper to Crypto'04 which means I only have a
month to write the paper.

When thinking of submitting, you might want to think of picking
another fast cipher for F1 and F2 - RC6 has a patent issue, is that
right? I don't know exactly what consequences are risked but maybe it
would be safer to use an intellectal-rights-free algorithm?

Also a general thought on the scheme - if step 3 is executed only
once-a-message - you might want to try to compromise a small percent
of speed for better clarity and simply use a full version of a block
cipher as F1, even if it was not necessary for the security of the
scheme.

Hope it is helpful,

Bartosz


--
Bartosz Zoltak
http://www.vmpcfunction.com
QPbzoltak@vmpcfunction.com
without "QP"
Bartosz Zoltak
Posted: Fri Dec 26, 2003 6:15 am
Guest
Quote:
Also a general thought on the scheme - if step 3 is executed only
once-a-message - you might want to try to compromise a small percent
of speed for better clarity and simply use a full version of a block
cipher as F1, even if it was not necessary for the security of the
scheme.

Whoops, I mixed F1 with F2 and take back the above thought, sorry!

Bartosz
Tom St Denis
Posted: Fri Dec 26, 2003 8:45 am
Guest
"Bartosz Zoltak" <QPbzoltak@vmpcfunction.com*(without "QP")> wrote in
message news:bsh58d$r17$1@nemesis.news.tpi.pl...
Quote:
Tom St Denis wrote:
1. Z = 0
2. for i from 1 to m do
2.1 C[i] = Ek(P[i] xor F2(i))
2.2 Z = Z xor F2(P[i] xor C[i])
3. Z = F1(Z xor F2(m + 1))
4. Return C[1..m], Z

And timed just step 2.1/2.2 [since they form the crux of the
system]. Using
full RC6 as F1 and three rounds of RC6 as F2 I got a timing of about
28
cycles per byte [all in C on my Athlon]. RC6 itself takes 20
cycles/byte in
C [timed from LTC]. So really this mode has a trivial amount of
overhead.
(...)
I'd like to submit the paper to Crypto'04 which means I only have a
month to write the paper.

When thinking of submitting, you might want to think of picking
another fast cipher for F1 and F2 - RC6 has a patent issue, is that
right? I don't know exactly what consequences are risked but maybe it
would be safer to use an intellectal-rights-free algorithm?

True. I was a bit greedy for speed. AES in C is a bit slow. By my rough
estimates we're looking at about 45 cycles/byte if I used AES [full for F1,
4 rounds for F2].

If I start a paper I'll do both AES and RC6.

The idea though was proof of concept. RC6 in C turns into close to the
assembly code [well closer than AES] which is nice. The fastest AES [iirc]
in C gets around 14-16 cycles/byte so my construction with AES might hit 22
cycles/byte which is quite acceptable...

The key benefits though of my design which will be the focus of my paper are

- Simple construction
- The encryption is provably as secure as the cipher [reduces to CTR]
- Blocks are independent of each other [e.g. any order, seekable, etc...]
- So far I can't figure out an efficient attack on the sheme
- No patents!! [unlike say OCB]

Tom
Mok-Kong Shen
Posted: Fri Dec 26, 2003 12:10 pm
Guest
Tom St Denis wrote:
Quote:

I took the mode I discussed earlier... that is

1. Z = 0
2. for i from 1 to m do
2.1 C[i] = Ek(P[i] xor F2(i))
2.2 Z = Z xor F2(P[i] xor C[i])
3. Z = F1(Z xor F2(m + 1))
4. Return C[1..m], Z

I like to know what is the intention underlying step 3?
(Wouldn't 2.2 alone lead to a useful final value of Z?)

M. K. Shen
Tom St Denis
Posted: Fri Dec 26, 2003 12:17 pm
Guest
"Mok-Kong Shen" <mok-kong.shen@t-online.de> wrote in message
news:3FEC6B69.9C14DFA3@t-online.de...
Quote:


Tom St Denis wrote:

I took the mode I discussed earlier... that is

1. Z = 0
2. for i from 1 to m do
2.1 C[i] = Ek(P[i] xor F2(i))
2.2 Z = Z xor F2(P[i] xor C[i])
3. Z = F1(Z xor F2(m + 1))
4. Return C[1..m], Z

I like to know what is the intention underlying step 3?
(Wouldn't 2.2 alone lead to a useful final value of Z?)

The idea is to prevent messages of different lengths that collide [sorta
like MD-strengthening].

Tom
Mok-Kong Shen
Posted: Fri Dec 26, 2003 12:26 pm
Guest
Tom St Denis wrote:
Quote:

"Mok-Kong Shen" <mok-kong.shen@t-online.de> wrote:


Tom St Denis wrote:

I took the mode I discussed earlier... that is

1. Z = 0
2. for i from 1 to m do
2.1 C[i] = Ek(P[i] xor F2(i))
2.2 Z = Z xor F2(P[i] xor C[i])
3. Z = F1(Z xor F2(m + 1))
4. Return C[1..m], Z

I like to know what is the intention underlying step 3?
(Wouldn't 2.2 alone lead to a useful final value of Z?)

The idea is to prevent messages of different lengths that collide [sorta
like MD-strengthening].

Sorry that my poor knowledge doesn't yet permit me to
understand what you meant. Could you elaborate a little
bit? Since 2.2 already employs encryption, I suppose
that could be good enough. If you employ just a non-linear
function (cf. the Latin square thread) to mix into Z the
entities P[i] and C[i], i.e. without protection by
encryption, then one would of course need at the final
stage an encryption of Z. (I think that such a scheme
could well function.)

M. K. Shen
Tom St Denis
Posted: Fri Dec 26, 2003 12:51 pm
Guest
"Mok-Kong Shen" <mok-kong.shen@t-online.de> wrote in message
news:3FEC6F51.7DD650DA@t-online.de...
Quote:
Sorry that my poor knowledge doesn't yet permit me to
understand what you meant. Could you elaborate a little
bit? Since 2.2 already employs encryption, I suppose
that could be good enough. If you employ just a non-linear
function (cf. the Latin square thread) to mix into Z the
entities P[i] and C[i], i.e. without protection by
encryption, then one would of course need at the final
stage an encryption of Z. (I think that such a scheme
could well function.)

The point of mixing the length is that while two different P' and P may
collide in the xorsum their different lengths will yield a different value.

This doesn't prevent collisions [obviously since it's a many to one
function] but it does make them harder to find.

Tom
Mok-Kong Shen
Posted: Fri Dec 26, 2003 1:34 pm
Guest
Tom St Denis wrote:
Quote:

"Mok-Kong Shen" <mok-kong.shen@t-online.de> wrote:

Sorry that my poor knowledge doesn't yet permit me to
understand what you meant. Could you elaborate a little
bit? Since 2.2 already employs encryption, I suppose
that could be good enough. If you employ just a non-linear
function (cf. the Latin square thread) to mix into Z the
entities P[i] and C[i], i.e. without protection by
encryption, then one would of course need at the final
stage an encryption of Z. (I think that such a scheme
could well function.)

The point of mixing the length is that while two different P' and P may
collide in the xorsum their different lengths will yield a different value.

This doesn't prevent collisions [obviously since it's a many to one
function] but it does make them harder to find.

O.k. Certainly, if you do one more processing operation,
it is likely to be better.

BTW, allow me to take this opportunity to elaborate what
I mentioned above (also shortly alluded to as non-linear
chaining in a post in the thread 'Non-linear combiniation').

Let the given plainatext blocks be P[1..m]. Define P[m+1]
be 0 or some agreed-upon constant and let an IV be given.
Let u(x,y) be a non-linear combination function of two
entities x and y, e.g. what is mentioned in the thread
on Latin squares (perferrably with additional rotations).
(To make computation simple, if the block consists of a
number of computer words, the function could operate on
units of words and not on the complete block as one
single unit, though in the notation below no distinction
is made in this respect).

Z=Ek(IV);
for i=1 to m+1 do
{ C[i]=Ek(P[i] xor Z);
Z=u(Z,P[i]); Z=u(Z,C[i]);
}
send IV,C[1..m+1];

M. K. Shen
Tom St Denis
Posted: Fri Dec 26, 2003 1:57 pm
Guest
"Mok-Kong Shen" <mok-kong.shen@t-online.de> wrote in message
news:3FEC7F35.3CC30383@t-online.de...
Quote:

Z=Ek(IV);
for i=1 to m+1 do
{ C[i]=Ek(P[i] xor Z);
Z=u(Z,P[i]); Z=u(Z,C[i]);
}
send IV,C[1..m+1];

This requires some algebraic latin square, re: slow. And your scheme does
have two "F2"'s they're the latin square. So even if you replaced your
latin square with a fast PRP you end up with the same speed. You also make
the proof of security harder.

In the

C[i] = F1(P[i] xor F2(i))

Case you can reduce the security of C[i] as far as privacy is concerned to
the security of F1 under the same conditions provided F2 is not a trivial
function (in theory any pair-wise decorrelated function will work as F2
perfectly).

Consider the case of F2(i) = i [e.g. identity]. Then you can tell if two
adjacent P's differ in the LSB if the ciphertexts are the same. However, if
F2 is pair-wise decorrelated function the difference between adjacent
F2(i)'s will be random (hence the decorrelation).

In fact thanks, I can now really add to my design. New requirement, F2 must
be a pair-wise decorrelated PRP. For example

F2(x) = ax + b in GF(2^128)[y]

will do. Which can be implemented several ways. One method is sixteen
8x128 tables [64KB] where you perform a ton of loads/xors. With 32-bit
registers you'd have to perform 4 lookups/xor per byte [so 64 loads/xors].
Of course with SSE2 or even MMX you can lower this.

You can optimize it a bit though... for instance if you limit "a" to say
n-bits you have about 1/2^n prob of guessing the difference [it is
decorrelated afterall]. I think this will lower the overall security [e.g.
no longer reducible to F1]

Tom
Mok-Kong Shen
Posted: Fri Dec 26, 2003 2:04 pm
Guest
Tom St Denis wrote:
Quote:


This requires some algebraic latin square, re: slow. ...
[snip]


Sorry, I don't yet capture this. The operation is
2*x*y+x+y mod 2^n (n = no. of bits in word) plus some
rotations. Could it be slower than the functions that
you use?

M. K. Shen
Tom St Denis
Posted: Fri Dec 26, 2003 2:20 pm
Guest
"Mok-Kong Shen" <mok-kong.shen@t-online.de> wrote in message
news:3FEC8654.2439580F@t-online.de...
Quote:


Tom St Denis wrote:


This requires some algebraic latin square, re: slow. ...
[snip]

Sorry, I don't yet capture this. The operation is
2*x*y+x+y mod 2^n (n = no. of bits in word) plus some
rotations. Could it be slower than the functions that
you use?

It has to be the size of the block, that is, n = 128 otherwise the scheme is
useless.

That's why yours is slower.

Also my F2() [the decorrelated one] won't work for the xorsum since the
F2(i) values will cancel out if you mix them in the wrong order. e.g.

you get F2(x) = ax [the b cancels out] so F2(C[i] xor P[i] xor F2(i')) =
a(C[i] + P[i] + ai' + b) now you take the other block F2(C[i'] xor P[i'] xor
F2(i)) = a(C[i'] + P[i'] + ai + b) and the sum of the two is

a(C[i] + P[i] + ai' + b) + a(C[i'] + P[i'] + ai + b) = a(C[i] + P[i] + ai+
b) + a(C[i'] + P[i'] + ai' + b)

So the operation that adds the F2(C[i] + P[i]) to Z should not commute. For
example, addition over the integers.

e.g. back to the drawing board...

Tom
Mok-Kong Shen
Posted: Fri Dec 26, 2003 2:24 pm
Guest
Tom St Denis wrote:
Quote:

"Mok-Kong Shen" <mok-kong.shen@t-online.de> wrote:


Tom St Denis wrote:


This requires some algebraic latin square, re: slow. ...
[snip]

Sorry, I don't yet capture this. The operation is
2*x*y+x+y mod 2^n (n = no. of bits in word) plus some
rotations. Could it be slower than the functions that
you use?

It has to be the size of the block, that is, n = 128 otherwise the scheme is
useless.

Why is that? Why couldn't the 4 words in an 128-bit
block be operated on separately? Analogy, S-boxes are
normally even smaller, e.g. 8 bits only.)

M. K. Shen
Tom St Denis
Posted: Fri Dec 26, 2003 2:34 pm
Guest
"Mok-Kong Shen" <mok-kong.shen@t-online.de> wrote in message
news:3FEC8AFF.4601677B@t-online.de...
Quote:
Why is that? Why couldn't the 4 words in an 128-bit
block be operated on separately? Analogy, S-boxes are
normally even smaller, e.g. 8 bits only.)

First off, the entire block has to be pair-wise decorrelated or you cannot
reduce the security to F1. That is anything you can learn about the
plaintext without inverting F1 shows the design cannot be reduced to F1 in
terms of security.

Second, if you did 4 words in parallel you still end up with a maximal
domain of order 2^32. That means there will be collisions within words in
<2^32 time. That is you can break the privacy in <2^128 time.

Really F2 has to be a perfectly pair-wise decorrelated non-linear function
over Z_{2^w} => Z_{2^w} for a w-bit block cipher for this to have any sense
of security.

Then I got thinking what about

F2(x) = a/x + b mod p(y)

in GF(2^128)[y]

That's nonlinear, decorrelated and certainly

a/(x1 + y1 + z2) + a/(x2 + y2 + z1) != a/(x1 + y1 + z1) + a/(x2 + y2 + z2)

Which means the xorsum would work again. The problem now though is that
inversion is hella slow. Maybe not so bad in an ONB but does one exist for
GF(2^128) ?

At this point CTR+OMAC would be faster though... ouch.

Tom
Mok-Kong Shen
Posted: Fri Dec 26, 2003 2:47 pm
Guest
Tom St Denis wrote:
Quote:

"Mok-Kong Shen" <mok-kong.shen@t-online.de> wrote:

Why is that? Why couldn't the 4 words in an 128-bit
block be operated on separately? Analogy, S-boxes are
normally even smaller, e.g. 8 bits only.)

First off, the entire block has to be pair-wise decorrelated or you cannot
reduce the security to F1. That is anything you can learn about the
plaintext without inverting F1 shows the design cannot be reduced to F1 in
terms of security.

The 4 words of P[i] and C[i] that get mixed into Z
and Z is xored with P[i+1] and passed as a whole
through the encryption algorithm (that has the effect
of mixing the 4 words contributed by Z in a sense
thoroughly). I don't think there is anything critical
practically, except when one is highly theory-oriented.
(But in that case, even the encryption algorithms
themselves don't have rigorous proofs, if I don't err.)

M. K. Shen
 
Page 1 of 3    Goto page 1, 2, 3  Next   All times are GMT - 5 Hours
The time now is Tue Oct 14, 2008 10:54 am