PDA

View Full Version : Skybuck's Universal Code 4 (memory efficient)


Skybuck Flying
May 26th 07, 08:01 AM
Skybuck's Universal Code 4:

The first version of Skybuck's Universal Code describes the basic idea but
it's memory inefficient for large ammounts of data/information.

For each data bit a marker bit was necessary.

Also the interleaving of bits could be cumbersome for software and maybe
even hardware ;)

Thus I present to you a somewhat more efficient universal code, which is
still variable bit-based but more efficient, though I am not sure if I like
the bitness of it... maybe I'll think up a byte version for
intel system which are octal/byte based :)

For now however the basic concept for version 4 is:

Length1 + Length2 + Data

This represent 3 fields stuck together in the information stream.

The first field describes the length of the second field. The second field
describes the length of the data.

The first field is like a marker for the second field, and the second field
is length for the data in binary encoding.

So I could also be described as:

Marker + Length + Data

But this could confuse people with version 1... where the term marker ment a
full marker...

So maybe describe it better as:

LengthMarker + Length + Data

or

LengthOfLength + Length + Data.


The first field consists out of zero's with a 1 terminator to describe the
length of the second field, the second field uses binary encoding, and the
data can be anything.


An example:

Suppose the following data has to be encoded/stored:

1234567890123
Hello World !


13 characters.

Length = 13 + 1 for null terminator = 14.

Bits needed to encoded 14 in binary: 0x1 + 1x2 + 1x4 + 1x8 = 4 bits.

Thus also 4 bits needed for the marker.

The information stream as stored as follows:

length marker:
0001

length:
0111

data:
xxxxxxetc.

00010111xxxxxxetc


the length field describes the data length in bits to make it bit flexible.


So now software can read the information stream as follows:

// read length marker:

LengthMarkerCount := 0;
repeat
if ReadBit( Bit ) then
begin
LengthMarkerCount := LengthMarkerCount + 1;
end else
begin
break; // error.
end;
until Bit = 1;

// allocate length field.

SetLengthInBits( Length, LengthMarkerCount );

// read length:
LengthCount := 0;
repeat
if ReadBit( Bit ) then
begin
SetBit( Length, LengthCount, Bit );
LengthCount := LengthCount + 1;
end else
begin
break; // error, or retry, inform, etc.
end;
until LengthCount = LengthMarkerCount;

// allocate data field in bits.

SetLengthInBits( Data, Length );

// read data
DataCount := 0;
repeat
if ReadBit( Bit ) then
begin
SetBit( Data, DataCount, Bit );
DataCount := DataCount + 1;
end else
begin
break; // error or retry.
end;
until DataCount = Length;


// ofcourse the example would not be complete without a write example how to
encode information so let's do that too:

For example take the string as example.

DataCountInBits := Length( 'Hello World !'+#0 ) * 8;
LengthCountInBits := BitNeededFor( DataCountInBits ); // use log functions
to determine bits needed to encode the value in binary.
LengthMarkerCountInBits := LengthCountInBits; // same

// now first write the marker:

if LengthMarkerCountInBits = 0 then exit; // strange error which should not
happen.

LengthMarkerCount := 0;

repeat
if LengthMarkerCount = LengthMarkerCountInBits then
begin
// write terminating bit.
if WriteBit( 1 ) then
begin
LengthMarkerCount := LengthMarkerCount + 1;
end else
begin
// error, or retry, inform, etc.
end;
end else
begin
// write leading bit(s)
if WriteBit( 0 ) then
begin
LengthMarkerCount := LengthMarkerCount + 1;
end else
begin
// error, or retry, inform, etc.
end;
end;

until LengthMarkerCount = LengthMarkerCountInBits;

// now write the binary length field.

if LengthCountInBits = 0 then exit; // strange error which should not
happen.

LengthCount := 0;

repeat
if LengthCount = LengthCountInBits then
begin
if GetBit( Length, LengthCount, Bit ) then
begin
// write binary length bit.
if WriteBit( Bit ) then
begin
LengthCount := LengthCount + 1;
end else
begin
// error, or retry, inform, etc.
end;
end else
begin
break; // memory error.
end;
end;

until LengthCount = LengthCountInBits;

// now write the data field:

if DataCountInBits = 0 then exit; // strange error which should not happen.

DataCount := 0;

repeat
if DataCount = DataCountInBits then
begin
if GetBit( Data, DataCount, Bit ) then
begin
// write data bit.
if WriteBit( Bit ) then
begin
DataCount := DataCount + 1;
end else
begin
// error, or retry, inform, etc.
end;
end else
begin
break; // memory error.
end;
end;

until DataCount = DataCountInBits;


There, that should give you an idea of how to do it. Ofcourse this is only
concept code... not really tested it... but should work nicely.

Some random and some crappy thoughts about using it in software development
source codes (can be skipped for reading):

(*
I am not sure if I like working with bits like this.. so as I wrote
before... I might think up a byte version of the universal code... which
might be more easy to program and handle...

not sure yet... working with bits can be straight forward to... but then
again sometimes not... depends on the source/classes etc... don't know
really, haven't tried yet...

Extra classess, and methods definetly needed... would be handy to have some
code which can simply encoded any type of data really... like a pointer and
a size or so... probably
easy to make... but also everything has to be "serialized" and
"deserialized" which is a bit weird and such... also offsets go out the
window ;) <- can almost no longer be used
since you don't know where you were or going to etc.. or maybe it's still
possible... probably not though...

First you have to serialize the previous fields before you can add the next
fields... that could be a drawback... but then again.. it doesn't really
have to be like that if you work
with seperate fields in memory... and then finally those fields can be
encoded and decoded seperately or they could be copied if they already
serialized / deserialized... so this kinda
creates a problem... since many possible design decissions possible...

Ofcourse it would be handy if the software works directly on universal
codes... this would make the software scale better automatically without
much or even any code changes necessary.

Working with encoded fields should therefore be recommanded with special
algorithm which take adventage of it, to create more flexiblity... and
finally for network and storage purposes
the fields could simply be copied... the serialize them after each other as
a sort of simple compression...

since on current hardware the fields will have some unused bits because the
memory works with bytes... etc.
*)

Though this code is complexer I like it better since it's more memory
efficient and actually also more speed efficient... less marker bits to
process.

Even for small numbers like 32 bits or 64 bits it's not that bad: for
example:

64 bits require 1x1,1x2,1x4,1x8,1x16,1x32,1x64 = 7 bits.

So that's a total of 7 markers bits, 7 length bits, 64 data bits = 78 bits.

Another extreme example, the opposite a large file:

20 Gigabyte = 20 * 8 = 160 gigabits.

8 bits for marker, 8 bits for length field, 160 gigabits = 160 gigabits plus
a little.

Compare that with 320 gigabits for version 1.

So a major efficiency improvement and still very probably just as flexible !
;)

Well a bit long and messy post, but I am lazy nowadays =D

But thinking about starting to use this stuff to make my software more
future-ready =DDD

BYYYYYYYYYEEEEEE,
BYEEEEEEEEEEEEEEEEEEEEEE,
Skybuck.

P.S.: My you enjoy universal codes, and maybe the Skybuck's Power be with
you =D

MooseFET
May 26th 07, 02:49 PM
On May 26, 12:01 am, "Skybuck Flying" > wrote:
[...]
> Length1 + Length2 + Data
>
> This represent 3 fields stuck together in the information stream.
>
> The first field describes the length of the second field. The second field
> describes the length of the data.
>
> The first field is like a marker for the second field, and the second field
> is length for the data in binary encoding.


If you go back and look in the responces to the earlier posts you made
on this subject, you will find a couple far more efficient and less
limiting suggestions were made back then. They would have allowed you
to transmit 8 bit data in far less than the 14 bits your method is
suggesting.

This method you are now posting is nothing new. I was something like
it implemented in the 1980s.

BTW: You can do better if you leave the LSBs off the numbers.

Skybuck Flying
May 26th 07, 03:57 PM
"MooseFET" > wrote in message
ps.com...
> On May 26, 12:01 am, "Skybuck Flying" > wrote:
> [...]
>> Length1 + Length2 + Data
>>
>> This represent 3 fields stuck together in the information stream.
>>
>> The first field describes the length of the second field. The second
>> field
>> describes the length of the data.
>>
>> The first field is like a marker for the second field, and the second
>> field
>> is length for the data in binary encoding.
>
>
> If you go back and look in the responces to the earlier posts you made
> on this subject, you will find a couple far more efficient and less
> limiting suggestions were made back then.

I read all reactions and I can not remember seeing such a suggestion.

> They would have allowed you
> to transmit 8 bit data in far less than the 14 bits your method is
> suggesting.

It's funny to see you did not get the method correct or maybe you just made
a sloppy mistake.

For 8 bits the encoding will be:

8 bit for data.

Prefixed with:

The length for the data which is: 0x1 + 0x2 + 0x4 + 1x8 = 4 bits.

Prefixed with:

The marker for the length field which is: 4 bits as well.

For a total of 8 + 8 = 16 bits :)

> This method you are now posting is nothing new. I was something like
> it implemented in the 1980s.

Maybe, maybe not.

You made same claims without backing them up with evidence :)

> BTW: You can do better if you leave the LSBs off the numbers.

I do not understand what you are hinting at.

Leaving out the least significant bit leads to inprecision ?

I don't want that !

Thanks anyway, funny to see america wake up and see my post around this time
:)

Bye,
Skybuck.

MooseFET
May 26th 07, 08:40 PM
On May 26, 7:57 am, "Skybuck Flying" > wrote:
> "MooseFET" > wrote in message
[....]
> > On May 26, 12:01 am, "Skybuck Flying" > wrote:
> > [...]
> >> Length1 + Length2 + Data
>
> >> This represent 3 fields stuck together in the information stream.
>
> >> The first field describes the length of the second field. The second
> >> field
> >> describes the length of the data.
>
> >> The first field is like a marker for the second field, and the second
> >> field
> >> is length for the data in binary encoding.
>
> > If you go back and look in the responces to the earlier posts you made
> > on this subject, you will find a couple far more efficient and less
> > limiting suggestions were made back then.
>
> I read all reactions and I can not remember seeing such a suggestion.
>
> > They would have allowed you
> > to transmit 8 bit data in far less than the 14 bits your method is
> > suggesting.
>
> It's funny to see you did not get the method correct or maybe you just made
> a sloppy mistake.

No, it appears I gave your credit for understanding something you
didn't.

>
> For 8 bits the encoding will be:
>
> 8 bit for data.
>
> Prefixed with:
>
> The length for the data which is: 0x1 + 0x2 + 0x4 + 1x8 = 4 bits.

That is in effiecient. You never need to code for a zero length so
the length field would only need 3 bits.

>
> Prefixed with:
>
> The marker for the length field which is: 4 bits as well.

Again it is not needed to allow for the zero length case so this is
only 3 bits.


> For a total of 8 + 8 = 16 bits :)

So it is worse than I thought.

>
> > This method you are now posting is nothing new. I was something like
> > it implemented in the 1980s.
>
> Maybe, maybe not.
>
> You made same claims without backing them up with evidence :)

Oh well, you can decide not to believe me if you like. Its up to you.

>
> > BTW: You can do better if you leave the LSBs off the numbers.
>
> I do not understand what you are hinting at.
>
> Leaving out the least significant bit leads to inprecision >?

There is no need to have an LSB on the length specifier. The number
can have an extra MSB in it for the case when needed to round it up to
the next length that can be specified. This reduces the length field
and the length of the length field.

It is still quite an inefficient way to do it. It is also limited in
length. There are far better options. Take a look at Huffman codes
you will see how they handle using differing numbers of bits. Any of
several systems based on those ideas would work better for you.

MooseFET
May 26th 07, 09:19 PM
On May 26, 7:57 am, "Skybuck Flying" > wrote:
> "MooseFET" > wrote in message
[....]
> > On May 26, 12:01 am, "Skybuck Flying" > wrote:
> > [...]
> >> Length1 + Length2 + Data
>
> >> This represent 3 fields stuck together in the information stream.
>
> >> The first field describes the length of the second field. The second
> >> field
> >> describes the length of the data.
>
> >> The first field is like a marker for the second field, and the second
> >> field
> >> is length for the data in binary encoding.
>
> > If you go back and look in the responces to the earlier posts you made
> > on this subject, you will find a couple far more efficient and less
> > limiting suggestions were made back then.
>
> I read all reactions and I can not remember seeing such a suggestion.
>
> > They would have allowed you
> > to transmit 8 bit data in far less than the 14 bits your method is
> > suggesting.
>
> It's funny to see you did not get the method correct or maybe you just made
> a sloppy mistake.

No, it appears I gave your credit for understanding something you
didn't.

>
> For 8 bits the encoding will be:
>
> 8 bit for data.
>
> Prefixed with:
>
> The length for the data which is: 0x1 + 0x2 + 0x4 + 1x8 = 4 bits.

That is in effiecient. You never need to code for a zero length so
the length field would only need 3 bits.

>
> Prefixed with:
>
> The marker for the length field which is: 4 bits as well.

Again it is not needed to allow for the zero length case so this is
only 3 bits.


> For a total of 8 + 8 = 16 bits :)

So it is worse than I thought.

>
> > This method you are now posting is nothing new. I was something like
> > it implemented in the 1980s.
>
> Maybe, maybe not.
>
> You made same claims without backing them up with evidence :)

Oh well, you can decide not to believe me if you like. Its up to you.

>
> > BTW: You can do better if you leave the LSBs off the numbers.
>
> I do not understand what you are hinting at.
>
> Leaving out the least significant bit leads to inprecision >?

There is no need to have an LSB on the length specifier. The number
can have an extra MSB in it for the case when needed to round it up to
the next length that can be specified. This reduces the length field
and the length of the length field.

It is still quite an inefficient way to do it. It is also limited in
length. There are far better options. Take a look at Huffman codes.



>
> I don't want that !
>
> Thanks anyway, funny to see america wake up and see my post around this time
> :)
>
> Bye,
> Skybuck.

Skybuck Flying
May 27th 07, 03:18 AM
Ok,

Now I understand what you're saying.

Marker:

1

Length

0

Data

Missing.

Encoding an empty universal code could be handy for software which uses
classess to implement universal codes and simply uses this encoding to
describe "no data".

You mention other systems, are they universal codes, meaning they describe
their own length and scalable indefinetly ?

I know static huffman does not scale, it's static afterall ;)

Nor will dynamic huffman probably. How will you encode the new character ?

Again you do not seem to crash the powerfull concept of a universal code =D
or you mistakenly think other systems are universal codes ;)

Nor will I modify the universal code just to save one bit and create a
mismatch in binary system... cause that what your suggestion would mean.

0 would no longer represent zero... but 1 <- not gonna happen ;) way to
confusing and I am lazy :)

Thanks anyway, it did show you have nice ideas ;)

Bye,
Skybuck.

MooseFET
May 27th 07, 04:07 AM
On May 26, 7:18 pm, "Skybuck Flying" > wrote:
> Ok,
>
> Now I understand what you're saying.
>
> Marker:
>
> 1
>
> Length
>
> 0
>
> Data
>
> Missing.

You've used more than 2 bits to say "no data" and 3 bits to say 1 or
zero.

Better coding based on the idea behind Huffman :

00 - Nodata
01 - True
10 - False
11 - All other cases

Now we have the three simple cases done now we can decide how
most efficient path goes from here. I'm lazy so I will only state a
possible path that stays more efficent than your suggestion:

At 2 bits, your method would take over 4 bits so we can step directly
to the 4 bit groups and always be better than your way:

1100 - 0 plus the following bit as a pair
1101 - 1 plus the following bit as a pair
1110 - Three bits follow
1111 - All other cases

This method can be extended without limit and will always be better.

The last time you posted this, someone pointed out a better pattern
than this but this one is good enough to prove the point.

Skybuck Flying
May 27th 07, 06:08 AM
"MooseFET" > wrote in message
oups.com...
> On May 26, 7:18 pm, "Skybuck Flying" > wrote:
>> Ok,
>>
>> Now I understand what you're saying.
>>
>> Marker:
>>
>> 1
>>
>> Length
>>
>> 0
>>
>> Data
>>
>> Missing.
>
> You've used more than 2 bits to say "no data" and 3 bits to say 1 or
> zero.
>
> Better coding based on the idea behind Huffman :
>
> 00 - Nodata
> 01 - True
> 10 - False
> 11 - All other cases

Seems like you need 2 bits to describe true.

While my encoding only need 1 bit, sure there is overhead, but that overhead
is not part of the original data.

The original data can have any encoding that's the beauty of Skybuck's
universal code, existing encodings can be integrated into the universal
encoding.

No other special encodings are needed to represent data.

The data can stay in it's original form.

Only the marker and the length fields are prefixed to it.

Alternatively new encodings for the data section can be used as well for
example to represent larger numbers for arbitrary precision math.

> Now we have the three simple cases done now we can decide how
> most efficient path goes from here. I'm lazy so I will only state a
> possible path that stays more efficent than your suggestion:
>
> At 2 bits, your method would take over 4 bits so we can step directly
> to the 4 bit groups and always be better than your way:
>
> 1100 - 0 plus the following bit as a pair
> 1101 - 1 plus the following bit as a pair
> 1110 - Three bits follow
> 1111 - All other cases
>
> This method can be extended without limit and will always be better.
>
> The last time you posted this, someone pointed out a better pattern
> than this but this one is good enough to prove the point.

I do not completely understand why you are making a big deal out of trying
to come up with a better encoding for the 1 bit of data case...

Also it seems you are trying to use a new encoding... that's not too
shabby... it can probably be integrated into the universal code data field
as well lol... but not really necessary.. that would just be double.

So for now I do not agree that "it's better" or even "more efficient" and I
definelty do not see how this would be "more flexible" or "self describing
length ?".

That's a good question: where is the length field in your scenerio ? ;)

Seems like you need the universal code after all ;)

The data of the universal code can use any other encoding... isn't that
beautifull ? =D

Also what did you describe: mangled encoding ?

Or did you describe a new encoding for one of the fields: Marker, Length,
Data ?

I like keeping those fields nicely seperated ;)

So I definetly do not want a mangled encoding. <- bleh .

Please clearify or try again and then come again =D

Bye,
Skybuck.

Skybuck Flying
May 27th 07, 06:19 AM
Also,

Your reasoning must be inheritly flawed.

Huffman compression is based on the idea that some characters or data will
occur more often then others.

Binary encoding is the most efficient "general" encoding where no occurence
statistics are known before hand.

Then there is also dynamic huffman compression which learns the occurence as
it views the data... however this requires describing new characters...

And a very important question is how do you describe/encode the new
characters ?

Fixed bits for new characters/data is ofcourse flawed, because it's not
variable bit, thus not an universal code :)

Some without completely understand what your little tiny encoding thingy was
about.. it's probably based on flawed reasoning in the first place :)

:::

Bye,
Skybuck ;)

P.S.: I like my universal encoding to be efficient for every possible
case/scenerio ;) =D not optimized for certain scenerio's and it definetly
should self describe the length of itself/scale/be flexible ofcourse
otherwise it wouldn't be a universal code ! DUH ;) =D

"MooseFET" > wrote in message
oups.com...
> On May 26, 7:18 pm, "Skybuck Flying" > wrote:
>> Ok,
>>
>> Now I understand what you're saying.
>>
>> Marker:
>>
>> 1
>>
>> Length
>>
>> 0
>>
>> Data
>>
>> Missing.
>
> You've used more than 2 bits to say "no data" and 3 bits to say 1 or
> zero.
>
> Better coding based on the idea behind Huffman :
>
> 00 - Nodata
> 01 - True
> 10 - False
> 11 - All other cases
>
> Now we have the three simple cases done now we can decide how
> most efficient path goes from here. I'm lazy so I will only state a
> possible path that stays more efficent than your suggestion:
>
> At 2 bits, your method would take over 4 bits so we can step directly
> to the 4 bit groups and always be better than your way:
>
> 1100 - 0 plus the following bit as a pair
> 1101 - 1 plus the following bit as a pair
> 1110 - Three bits follow
> 1111 - All other cases
>
> This method can be extended without limit and will always be better.
>
> The last time you posted this, someone pointed out a better pattern
> than this but this one is good enough to prove the point.

Rudy Velthuis
May 27th 07, 11:09 AM
27.05.2007 07:19:03, Skybuck Flying wrote:

> Also,
>
> Your reasoning must be inheritly flawed.

Yours is. Read up on Huffman coding and be amazed.
--
Rudy Velthuis http://rvelthuis.de

"Sometimes a scream is better than a thesis."
-- Ralph Waldo Emerson (1803-1882)

MooseFET
May 27th 07, 03:26 PM
On May 26, 10:08 pm, "Skybuck Flying" > wrote:
[....]
> > You've used more than 2 bits to say "no data" and 3 bits to say 1 or
> > zero.
>
> > Better coding based on the idea behind Huffman :
>
> > 00 - Nodata
> > 01 - True
> > 10 - False
> > 11 - All other cases
>
> Seems like you need 2 bits to describe true.
>
> While my encoding only need 1 bit, sure there is overhead, but that overhead
> is not part of the original data.

You have more than one bit of overhead so you need more than two bits
to describe true. You aren't using your system until you add your
overhead.

I only need two bits to describe true so I've used fewer bits.


> The original data can have any encoding that's the beauty of Skybuck's
> universal code, existing encodings can be integrated into the universal
> encoding.

That is also true of what I suggested. Any number of bits can be
encoded and those bits can have any meaning.


> No other special encodings are needed to represent data.
>
> The data can stay in it's original form.

If you look arefully at what I did, you will see that that is also
true of my method. The only difference is that my method is much
better.


> Only the marker and the length fields are prefixed to it.
>
> Alternatively new encodings for the data section can be used as well for
> example to represent larger numbers for arbitrary precision math.

My method imposed no limits on the payload data. It just used less
bits in the overhead to get the job done.


> > Now we have the three simple cases done now we can decide how
> > most efficient path goes from here. I'm lazy so I will only state a
> > possible path that stays more efficent than your suggestion:
>
> > At 2 bits, your method would take over 4 bits so we can step directly
> > to the 4 bit groups and always be better than your way:
>
> > 1100 - 0 plus the following bit as a pair
> > 1101 - 1 plus the following bit as a pair
> > 1110 - Three bits follow
> > 1111 - All other cases
>
> > This method can be extended without limit and will always be better.
>
> > The last time you posted this, someone pointed out a better pattern
> > than this but this one is good enough to prove the point.
>
> I do not completely understand why you are making a big deal out of trying
> to come up with a better encoding for the 1 bit of data case...

I'm pointing out that from zero to an infinite number of bits that my
method is better. It always has less overhead than your method. I
did the 0, 1 , 2, and 3 bit cases to show how the pattern goes.

I figured that showing how the first cases go would be enough to show
the pattern to you.

[....]
> So for now I do not agree that "it's better" or even "more efficient" and I
> definelty do not see how this would be "more flexible" or "self describing
> length ?".

You don't agree only because you haven't understood the method. The
system I showed have every advantage you could ever imagine yours has
but uses far few bits to do it.



> That's a good question: where is the length field in your scenerio ? ;)

Go back ad reread what I wrote. What I did is straight forward
enough.

[...]
> Or did you describe a new encoding for one of the fields: Marker, Length,
> Data ?

I showed how to do everything you are doing but to use far fewer bits
to do it.

>

Skybuck Flying
May 27th 07, 04:20 PM
How would you encode a larger example with your system, for example the
following data bits:

111010101010100110100101101110011

Maybe that would clearify your idea a bit :)

Bye,
Skybuck.

MooseFET
May 27th 07, 05:37 PM
On May 27, 8:20 am, "Skybuck Flying" > wrote:
> How would you encode a larger example with your system, for example the
> following data bits:
>
> 111010101010100110100101101110011
33 Bits by my count.

>
> Maybe that would clearify your idea a bit :)

I stopped after 3 bits, so I'll pick up at 4 bits and again.

The first 4 bits of the prefix are already stated as being 1111 so we
only need to deal with the next 6 bits.

00XXXX - The 4 bit value XXXX
010XXX - Two more bits follow making a total of 5 bits
0110XX - Four more bits follow making 6 bits
01110X - Six bits follow making 7 bits
011110 - 8 bits follow
10NNNN - (9+NNNN) bits follow 9..24
110NNN - Two more of N follow then (25+N) bits *
1110NN - 4 more bits of N follow then (57+N) bits
11110N - 6 more bits of N follow then (121+N) bits
111111 - All other cases

* Your question is answered here.

Notice how my system is not limited and has a smaller overhead whereas
your system is limited to only handle lengths up to 32768 bits and
takes more.

Skybuck Flying
May 28th 07, 05:56 AM
"MooseFET" > wrote in message
oups.com...
> On May 27, 8:20 am, "Skybuck Flying" > wrote:
>> How would you encode a larger example with your system, for example the
>> following data bits:
>>
>> 111010101010100110100101101110011
> 33 Bits by my count.
>
>>
>> Maybe that would clearify your idea a bit :)
>
> I stopped after 3 bits, so I'll pick up at 4 bits and again.
>
> The first 4 bits of the prefix are already stated as being 1111 so we
> only need to deal with the next 6 bits.
>
> 00XXXX - The 4 bit value XXXX
> 010XXX - Two more bits follow making a total of 5 bits
> 0110XX - Four more bits follow making 6 bits
> 01110X - Six bits follow making 7 bits
> 011110 - 8 bits follow
> 10NNNN - (9+NNNN) bits follow 9..24
> 110NNN - Two more of N follow then (25+N) bits *
> 1110NN - 4 more bits of N follow then (57+N) bits
> 11110N - 6 more bits of N follow then (121+N) bits
> 111111 - All other cases
>
> * Your question is answered here.
>
> Notice how my system is not limited and has a smaller overhead whereas
> your system is limited to only handle lengths up to 32768 bits and
> takes more.

Why do you think my system is limited ? It's not limited at all.

I have a question for your system:

After you have encoded, how do you decode ?

How do you detect where the prefixes start and the data begins ?

Bye,
Skybuck.

Skybuck Flying
May 28th 07, 06:23 AM
"Skybuck Flying" > wrote in message
...
>
> "MooseFET" > wrote in message
> oups.com...
>> On May 27, 8:20 am, "Skybuck Flying" > wrote:
>>> How would you encode a larger example with your system, for example the
>>> following data bits:
>>>
>>> 111010101010100110100101101110011
>> 33 Bits by my count.
>>
>>>
>>> Maybe that would clearify your idea a bit :)
>>
>> I stopped after 3 bits, so I'll pick up at 4 bits and again.
>>
>> The first 4 bits of the prefix are already stated as being 1111 so we
>> only need to deal with the next 6 bits.

from previous post:

> > 1100 - 0 plus the following bit as a pair
> > 1101 - 1 plus the following bit as a pair
> > 1110 - Three bits follow
> > 1111 - All other cases

>> 00XXXX - The 4 bit value XXXX
>> 010XXX - Two more bits follow making a total of 5 bits
>> 0110XX - Four more bits follow making 6 bits
>> 01110X - Six bits follow making 7 bits
>> 011110 - 8 bits follow
>> 10NNNN - (9+NNNN) bits follow 9..24
>> 110NNN - Two more of N follow then (25+N) bits *
>> 1110NN - 4 more bits of N follow then (57+N) bits
>> 11110N - 6 more bits of N follow then (121+N) bits
>> 111111 - All other cases

Your encoding starts either with a zero, followed by ones, followed by a
terminating zero, followed by the data.

Or

Your encoding starts with a one, followed by a terminating zero, followed by
the data.

This way the length of the prefix can be determined.

Knowing the length of the prefix how to determine how many data bits follow
?

(Also the 33 bits I described are data only, no overhead bits included, the
overhead bits would be 12 bits, total: 45 bits).

Bye,
Skybuck.

MooseFET
May 28th 07, 02:44 PM
On May 27, 9:56 pm, "Skybuck Flying" > wrote:
> "MooseFET" > wrote in message
>
> oups.com...
>
>
>
> > On May 27, 8:20 am, "Skybuck Flying" > wrote:
> >> How would you encode a larger example with your system, for example the
> >> following data bits:
>
> >> 111010101010100110100101101110011
> > 33 Bits by my count.
>
> >> Maybe that would clearify your idea a bit :)
>
> > I stopped after 3 bits, so I'll pick up at 4 bits and again.
>
> > The first 4 bits of the prefix are already stated as being 1111 so we
> > only need to deal with the next 6 bits.
>
> > 00XXXX - The 4 bit value XXXX
> > 010XXX - Two more bits follow making a total of 5 bits
> > 0110XX - Four more bits follow making 6 bits
> > 01110X - Six bits follow making 7 bits
> > 011110 - 8 bits follow
> > 10NNNN - (9+NNNN) bits follow 9..24
> > 110NNN - Two more of N follow then (25+N) bits *
> > 1110NN - 4 more bits of N follow then (57+N) bits
> > 11110N - 6 more bits of N follow then (121+N) bits
> > 111111 - All other cases
>
> > * Your question is answered here.
>
> > Notice how my system is not limited and has a smaller overhead whereas
> > your system is limited to only handle lengths up to 32768 bits and
> > takes more.
>
> Why do you think my system is limited ? It's not limited at all.

How many bits are in the length of length field?
If that number is finite, the maximum length is finite.


> I have a question for your system:
>
> After you have encoded, how do you decode ?
>
> How do you detect where the prefixes start and the data begins ?

You know how long the one prefix+data group is so you know when to
look for the next.

>
> Bye,
> Skybuck.

Skybuck Flying
May 28th 07, 04:03 PM
"MooseFET" > wrote in message
ups.com...
> On May 27, 9:56 pm, "Skybuck Flying" > wrote:
>> "MooseFET" > wrote in message
>>
>> oups.com...
>>
>>
>>
>> > On May 27, 8:20 am, "Skybuck Flying" > wrote:
>> >> How would you encode a larger example with your system, for example
>> >> the
>> >> following data bits:
>>
>> >> 111010101010100110100101101110011
>> > 33 Bits by my count.
>>
>> >> Maybe that would clearify your idea a bit :)
>>
>> > I stopped after 3 bits, so I'll pick up at 4 bits and again.
>>
>> > The first 4 bits of the prefix are already stated as being 1111 so we
>> > only need to deal with the next 6 bits.
>>
>> > 00XXXX - The 4 bit value XXXX
>> > 010XXX - Two more bits follow making a total of 5 bits
>> > 0110XX - Four more bits follow making 6 bits
>> > 01110X - Six bits follow making 7 bits
>> > 011110 - 8 bits follow
>> > 10NNNN - (9+NNNN) bits follow 9..24
>> > 110NNN - Two more of N follow then (25+N) bits *
>> > 1110NN - 4 more bits of N follow then (57+N) bits
>> > 11110N - 6 more bits of N follow then (121+N) bits
>> > 111111 - All other cases
>>
>> > * Your question is answered here.
>>
>> > Notice how my system is not limited and has a smaller overhead whereas
>> > your system is limited to only handle lengths up to 32768 bits and
>> > takes more.
>>
>> Why do you think my system is limited ? It's not limited at all.
>
> How many bits are in the length of length field?

Ah !

That's the trick right there.

The encoding for the length of length is:

1. Variable.

2. Zero's followed by a terminating 1.

To figure out the length of length field:

Count the zero's and count the final 1.

And that's your length of length right there ! =D

See, unlimited, just like a null terminator for c strings.

> If that number is finite, the maximum length is finite.

Nope, it's not :)

>
>> I have a question for your system:
>>
>> After you have encoded, how do you decode ?
>>
>> How do you detect where the prefixes start and the data begins ?
>
> You know how long the one prefix+data group is so you know when to
> look for the next.

I provided some simple pseudo code for my encoding/decoding.

Can you provide some pseudo code for your encoding/decoding ?

Bye,
Skybuck.

MooseFET
May 28th 07, 06:45 PM
On May 28, 8:03 am, "Skybuck Flying" > wrote:
[...]
> > How many bits are in the length of length field?
>
> Ah !
>
> That's the trick right there.
>
> The encoding for the length of length is:
>
> 1. Variable.
>
> 2. Zero's followed by a terminating 1.
>
> To figure out the length of length field:

Oh, that's a hopelessly inefficient way to do it.

[....]
>
> I provided some simple pseudo code for my encoding/decoding.
>
> Can you provide some pseudo code for your encoding/decoding ?

Yes I can.
Am I so lazy that it is unlikely that I will>
Also yes.

Skybuck Flying
May 28th 07, 07:31 PM
"Jim Thompson" > wrote in
message ...
> On Mon, 28 May 2007 14:34:08 +0200, "Skybuck Flying"
> > wrote:
>
>>So finally you admit huffman is not suited to encode the world's
>>information.
>>
>>Imagine having to come up with a table to encode the world's information.
>>
>>You have no idea what the information is now <- way too large for anybody
>>to
>>grasp.
>>
>>And you have no idea with the information will be tomorrow, the day after
>>or
>>billion years from now.
>>
>>I have designed an encoding which is suitable for the world's information
>>needs.
>>
>>While huffman is only suited for particular cases.
>>
>>You say so yourself.
>>
>>Enough with this huffman **** =D
>>
>>Bye,
>> Skybuck.
>>
>
> Poor Baby! Too ignorant to study information theory before opening
> mouth in high gear.

I do not need to study it, I am living it.

Non the less, I took a quick look at shannon's document from long ago.

He describes the bit. He describe the binary digit, a number consisting out
of multiple bits.

However what shannon probably failed to recgonize is that people/the world
needs a way to seperate the binary digits from each other.

If possible in a flexible way, to prevent the problems that the world has
now.

Look at how people write down multiple numbers.

The multiple numbers do consist out of decimals.

However there is more information to it than just that.

There is also hidden information.

The spaces between the numbers is just as important.

Without this hidden information it would be impossible to tell where one
number starts and one number ends.

Shannon failed to recgonize this.

Therefore maybe the bit is a terrible mistake and should be replaced by a
digit that can have 4 states.

0,1,2 and 3.

0,1 same as a bit.

2 = more bits follows
3 = last bit.

:)

That's what people more or less do when they write down numbers like so:

5423523 52345234

The distance between the decimals is like the 2... "the sticky stuff"

The space is the 3 "the end of number" indication :P* =D

Bye,
Skybuck.

Rudy Velthuis
May 28th 07, 07:34 PM
28.05.2007 20:31:10, Skybuck Flying wrote:

> > Poor Baby! Too ignorant to study information theory before opening
> > mouth in high gear.
>
> I do not need to study it, I am living it.

<sarcasm>
Yeah, as you so eloquently proved already. <g>
</sarcasm>

--
Rudy Velthuis http://rvelthuis.de

"A scholar who cherishes the love of comfort is not fit to be
deemed a scholar." -- Lao-Tzu (570?-490? BC)

Skybuck Flying
May 28th 07, 08:18 PM
"MooseFET" > wrote in message
oups.com...
> On May 28, 8:03 am, "Skybuck Flying" > wrote:
> [...]
>> > How many bits are in the length of length field?
>>
>> Ah !
>>
>> That's the trick right there.
>>
>> The encoding for the length of length is:
>>
>> 1. Variable.
>>
>> 2. Zero's followed by a terminating 1.
>>
>> To figure out the length of length field:
>
> Oh, that's a hopelessly inefficient way to do it.

I got a newsflash for ya:

Let's say we want to encode

18.446.744.073.709.551.616 data bits.

The overhead will be:

64 bits for the length.
64 bits for the marker.

Overhead: 128 bits.

Minor overhead.

> [....]
>>
>> I provided some simple pseudo code for my encoding/decoding.
>>
>> Can you provide some pseudo code for your encoding/decoding ?
>
> Yes I can.
> Am I so lazy that it is unlikely that I will>
> Also yes.

My overhead requires few instructions, which is good, since one must
remember cpu usage plays a role as well.

Now what was your encoding scheme ? hmmmm ?

Bye,
Skybuck.

Rudy Velthuis
May 28th 07, 08:34 PM
28.05.2007 21:18:03, Skybuck Flying wrote:

> I got a newsflash for ya:
>
> Let's say we want to encode
>
> 18.446.744.073.709.551.616 data bits.

How often would you have to do that? What about an integer of, say, a
maximum of 32 bits (they are pretty common, and happen much more often
than your super-long integers). What is the overhead for an integer of
32 bits?


--
Rudy Velthuis http://rvelthuis.de

"Gigerenzer's Law of Indispensable Ignorance: The world cannot
function without partially ignorant people." -- Gerd Gigerenzer