Topic: show u another dark spot of c @枫下论坛 The Rolia Forum

工作学习 / IT技术讨论 / show u another dark spot of c -blaise(blaise); 2001-6-21 {375} (#108408@0)
void main()
{
char c=0;
signed char sc=0;
sc=c; //works

int i=0;
signed int si=0;
si=i; //works

char* pc=0;
signed char* spc=0;
spc=pc; //not works

int* pi=0;
signed int* spi=0;
spi=pi; //works
}

I can't see any reasonable logic here

The compiler is just doing what the ANSI C standard has defined. -numnum(numnum); 2001-6-21 {256} (#108438@0)
All data types (mostly for integral types, becaue float and double are always signed) default to "signed" except for char. So , in the compiler's point of view, "int" equals to "signed int", while "char" and "signed char" are two different types

could you tell why the specification decided that :it's ok for short/int/long but not char?and why assigned value from char to signed char is ok but assign point is wrong? -blaise(blaise); 2001-6-21 (#108473@0)

This is usually taught in the first lecture for an introductory C or C++ course. -numnum(numnum); 2001-6-21 {455} (#108502@0)
You are right in that most people tend to overlook this small perk. Frankly speaking, I can tell this right on top of my head because I used to teach CS 111(introduction to C++ programming) course for the first two quaters in the graduate school. You know, you've gotta pay attention to details if you are a TA. The reason why you can assign char value to signed char variable is that the compiler does implicit conversions between all built-in types.

what i asked for is why specification didn't tell compiler to do the implicit convertion for the pointer. -blaise(blaise); 2001-6-21 (#108515@0)

Memory aligment -numnum(numnum); 2001-6-21 {420} (#108519@0)
. It's safe to do the implicit conversion for values, because you only make a memory copy. Imagine you explicitly cast a
char* into int*, what will happen? You will end up with a bus error on
Unix, although you are fine on PC because PC's are running little endian
chips. int data access requires an alignment of 4 on UNIX but your char* pointer might very likely be pointing to some odd numbered memory address.

good answer! -blaise(blaise); 2001-6-21 (#108521@0)
请教，endian是什么意思？ -pazu(InTheSky); 2001-6-21 (#108530@0)

Big endian refers to the chipset that addresses an integer by its most significant byte, while a small endian machine addresses the integer by the least significant byte. -numnum(numnum); 2001-6-21 {266} (#108714@0)
For an integer 234F, in a big endian machine, the representation would be 234F, while in a little endian machine, it would be represented as F432. One advantage for little endian machine is that the cpu deoesn't require the memory to be aligned for data access .

It is so good that you mentioned endian, let me ask a little more... -birdincage(birdincage); 2001-6-22 {753} (#108848@0)
Your explanation reminds me what we learnt long long ago in OS or principle of compilation courses. I cannot remember more details:-( We know all languages are interpreted into machine codes in the end, and are called to the memory when running. All data and codes are in the memory. Do you mean big and little endian machines use different chips, therefore, data (and codes) are saved differently? There are a few things I am confused.

1. Which one is of big endian, and which one is of little endian?
a char, 2b, saved in memory as:
0010 1011, or 1101 0100, or 1011 0010 ?

2. I may misunderstand you. Do you mean only integer data have such difference?

3. Or is it because of differences of different compilers?

Thank you :-)

1) a very good question, 00101011 is the big endian,(sorry for using English here, it comes more natural for me to discuss technical questions in English due to the terminologies and jargons) -numnum(numnum); 2001-6-22 {1211} (#108866@0)
本文发表在 rolia.net 枫下论坛and 11010100 is actually little endian. Notice that in little endian, not only is the byte order reversed, but also is the bit order . Usually for char type, you don't care wether it's a big endian or little endian, but in the case of bit field, such as the header structure for a DNS query, you need to consider the "endianess" of the machine and throw in the conditional compile flag, check out the /usr/include/arpa/nameser.h or ip.h(I forgot the path) for the definition of network packet structures, they all have conditional compile flags.
2) mostly for integral types, for float or double types, usually they have a coprocessor which has its' own way of storing floats and doubles. That's why in Real-time operation, usualy people avoid float arithematics because it will slow down the whole process significantly.
3) compilers play an important role in correct data generation for machines with different byte order, in case of ELF dynamic link file format, a compiler will make sure that the byte order and alignment of data agree with the architecture. Of course, different compilers for the same machine generate different binary codes, but the byte order of the data should stay the same.更多精彩文章及讨论，请光临枫下论坛 rolia.net

Why do you say sorry? :-) I am here to learn both techniques and English. If you answered my questions in Chinese, I would probably beg you to re-write it in English :-):-) :-) Thank you very much, now I get a rather clear picture:-) -birdincage(birdincage); 2001-6-22 (#109096@0)

Big Endian and Little Endian. -netwind(网风); 2001-6-22 {1452} (#108880@0)
本文发表在 rolia.net 枫下论坛The term is used because of an analogy with the story Gulliver's Travels, in which Jonathan Swift imagined a never-ending fight between the kingdoms of the Big-Endians and the Little-Endians, whose only difference is in where they crack open a hard-boiled egg.

Big Endian:
A colorful way of describing the sequence in which MULTIBYTE numbers are stored in a computer's memory.

Storing the most significant byte in the lowest memory address, which is the address of the data. Since TCP defines the byte ordering for network data, end-nodes
must call a processor-specific convert utility (which would do nothing if the machine's native byte-ordering is the same as TCP's) that acts on the TCP and IP
header information only. In a TCP/IP packet, the first transmitted data is the most significant byte.

Most UNIXes (for example, all System V) and the Internet are Big Endian. Motorola 680x0 microprocessors (and therefore Macintoshes), Hewlett-Packard
PA-RISC, and Sun SuperSPARC processors are Big Endian. The Silicon Graphics MIPS and IBM/Motorola PowerPC processors are both Little and Big
Endian (bi-endian).

Little Endian:
Specifies that the least significant byte is stored in the lowest-memory address, which is the address of the data.

The Intel 80X86 and Pentium and DEC Alpha RISC processors are Little Endian.

Windows NT and OSF/1 are Little Endian.

Little Endian is the less common UNIX implementation.更多精彩文章及讨论，请光临枫下论坛 rolia.net

Woow, the 'history' of Big and Little Endian, and where they are now, woooow, :-) Thank you so much:-) -birdincage(birdincage); 2001-6-22 (#109098@0)
Thx...netwind. According to numnum's words, I went through the /usr/include/arpa/names.h and found there is another ednian -- PDP Endian. Is there any other story about it? hoho...thx...and thank numnum... -pazu(InTheSky); 2001-6-22 {1357} (#109252@0)
本文发表在 rolia.net 枫下论坛Below is the relative content from /usr/include/arpa/nameser.h

#ifndef BYTE_ORDER
#if (BSD >= 199103)
# include <machine/endian.h>
#else
#define LITTLE_ENDIAN 1234 /* least-significant byte first (vax, pc) */
#define BIG_ENDIAN 4321 /* most-significant byte first (IBM, net) */
#define PDP_ENDIAN 3412 /* LSB first in word, MSW first in long (pdp)*/

#if defined(vax) || defined(ns32000) || defined(sun386) || defined(i386) || \
defined(MIPSEL) || defined(_MIPSEL) || defined(BIT_ZERO_ON_RIGHT) || \
defined(__alpha__) || defined(__alpha)
#define BYTE_ORDER LITTLE_ENDIAN
#endif

#if defined(sel) || defined(pyr) || defined(mc68000) || defined(sparc) || \
defined(is68k) || defined(tahoe) || defined(ibm032) || defined(ibm370) || \
defined(MIPSEB) || defined(_MIPSEB) || defined(_IBMR2) || \
defined(apollo) || defined(__convex__) || defined(__hppa) || \
defined(__hp9000) || defined(__hp9000s300) || defined(__hp9000s700) || \
defined (BIT_ZERO_ON_LEFT)
#define BYTE_ORDER BIG_ENDIAN
#endif

#endif /* BSD */
#endif /* BYTE_ORDER */

What's more, Endian is mentioned in below header files too:
/usr/include/sys/fs/vx_port.h
/usr/include/sys/pci.h
/usr/include/sys/spinlock.h
/usr/include/arpa/nameser.h
/usr/include/machine/param.h

Thank you guys....更多精彩文章及讨论，请光临枫下论坛 rolia.net

PDP endian stems from the legacy pre-UNIX system PDP11, PDP11 was the ancestor of all modern UNIX systems. -numnum(numnum); 2001-6-22 {357} (#109274@0)
Its byte order is a weird mix of big endian and little endian, it has little enidan short integers(word) and big endian long integers. BTW, you should read the /usr/include/arpa/nameser_compat.h instead, because in your system, nameser.h is only a wrapper for nameser_compat.h, where the real DNS headers are defined, pay attention to the HEADER structure

Help! numnum, I still don't understand why it will be ok on little endian chips. -pazu(InTheSky); 2001-6-22 (#109279@0)

because HIWORD is not garbage in this case. numnum, am i right? -mrviceroy(杀人者Daniel是也); 2001-6-22 (#109424@0)
I cannot see why it has anything to do with Big or Little Endian either. If number of bytes of two types are different, the order of byte or bit matters less. -birdincage(birdincage); 2001-6-23 {373} (#110143@0)
I don't know what does "work" mean in your program. Is it that "not work" is a complier error? It shouldn't be. All your assignments can pass, some of them may have compiler warnings. But assignments to pointers may cause unpredict errors, because the length of the data your pointer points is different. It is the same as you do convention between different types.

i bet most of the people don't know the char is special without meet the compile error then look into the doc,at least for me -blaise(blaise); 2001-6-21 (#108470@0)
It looks strange that you try to assign a fix value to a pointer.... -birdincage(birdincage); 2001-6-21 {703} (#108580@0)
1. It looks strange that you try to assign a fix value to a pointer, no matter it points to an int or a char. Since memory is such a complicated place, you'd better do not access it so directly unless you have to.

2. C and C++ do have implicit data conversion, but never use it. Don't rely on the compiler. And different compilers may have different criteria. Try to make your program simple and have less 歧义, and portable. When one does programming in a big project, the more important thing is to make your program easy to be read and understood. The fewer tricks you use, the better. Things might be different if you are learning C++ to take a test :-) You have to grasp every corner of it :-)

Sorry, I made a mistake. It is important to initialize a pointer to be 0. -birdincage(birdincage); 2001-6-23 (#110415@0)