CLK Cycle

Discussion in 'A+' started by Mof, Mar 25, 2008.

  1. Mof

    Mof Megabyte Poster

    526
    2
    49
    I under stand clk speeds but why is there a minimum of two cycles per each instruction.
     
    WIP: C++ and A+
  2. dmarsh
    Honorary Member 500 Likes Award

    dmarsh Petabyte Poster

    4,305
    503
    259
    There isn't :biggrin

    What processor you talking about. The number of cycles depends on the design of the processor, the relative complexity of the operation and what storage areas if any the instruction references.


    With a superscalar design it can take time to fill the pipeline, an instruction may take one instruction to decode and one to execute. A pipeline stall could therefore cause a 2 cycle duration. Normally theres something like fetch, decode, execute and store stages to the pipeline. Each stage could potentially be further divided making a longer pipeline or stages could be merged.

    http://en.wikipedia.org/wiki/Pipeline_%28computing%29
    http://en.wikipedia.org/wiki/Superscalar
    http://en.wikipedia.org/wiki/CPU_cache
    http://arstechnica.com/articles/paedia/cpu/p4andg4e.ars
    http://www.pcmech.com/article/pentium-4-calculation-controversy/
    http://www.hardwaresecrets.com/article/270/4
    http://softwarecommunity.intel.com/articles/eng/3089.htm

    The pipeline approach usually means one clock cycle per pipeline stage, a longer pipeline therefore will take longer to load and potentially after a stall can take many cycles to execute a simple instruction. With no stalls the processor can potentially process one or more instructions per cycle.

    With non superscalar processors theres no reason something like a NOP or a register XOR can't take one cycle in absolute terms.
     
  3. Mof

    Mof Megabyte Poster

    526
    2
    49
    Reading Mike Meyers. man in box states there is a min of two cycles, i suppose it means one cycle to take of the EDB ond secound to place back on EDB.
     
    WIP: C++ and A+
  4. dmarsh
    Honorary Member 500 Likes Award

    dmarsh Petabyte Poster

    4,305
    503
    259
    EDB ? :blink

    It depends on the design of the system, please bear in mind the PC is just a bunch of circuits somebody designed. Its not the only type of computer and even PC's differ widely in architecture and processors.

    A processor can be designed so that all its operations take one clock cycle if desired.

    Processors however often have expensive and cheap operations. Various optimizations have been designed to try to get the most out of the processor, this normally means making expensive operations take multiple cycles, while cheap operations take one cycle.

    Then pipelining, superscalar, branch predication, caching, SMP, hyperthreading, dual core, etc came along further complicating the issues...

    The question is meaningless without some context, are you talking about under best or worst conditions ? Are you talking when the pipeline and cache is fully loaded ? What exact processor are you talking about ?

    An instructions result need never leave the processor if its result is stored in a register, therefore the 'store' phase can be cheap.

    Any instructions involving memory will of course involve many cycles due to the inherant latency caused by a slower memory speed and slower bus speeds.

    I imagine hes talking about the decode and execute stages of the pipeline, when there is no fetch or store and its a simple instruction, its hard to be certain...
     
  5. Fergal1982

    Fergal1982 Petabyte Poster

    4,196
    172
    211
    doesnt it take one cycle to put the data into the processor, and the second cycle to get it out again? So each cycle looks like this:

    Cycle: Previous instruction output / next instruction input

    If thats the case (and I seemed to think it was), then it would indeed (at least look like) take 2 cycles minimum.
     
    Certifications: ITIL Foundation; MCTS: Visual Studio Team Foundation Server 2010, Administration
    WIP: None at present
  6. dmarsh
    Honorary Member 500 Likes Award

    dmarsh Petabyte Poster

    4,305
    503
    259
    No Fergal, to my knowledge thats not correct.

    An instruction can take a value from a register, negate it and store it back in the same register for instance.

    Depending on the design of the processor this could take 1-N cycles, its all down to the design...

    There does not have to be a pipeline in the externally visible sense.

    If there is a pipeline that is obvious than the stages will typically take one cycle each.

    Thats my understanding but I've not coded hardly any assembler for 13+ years and my memories of microprocessor arch lectures are pretty fuzzy...

    Getting data in or out of the processor takes MANY cycles :-

    http://chip-architect.com/news/2000_10_13_Willamette_MPF.html

    http://www.intel.com/technology/itj/q12001/pdf/art_2.pdf

    (This is a quote from pentium II days before alot of the advanced caching and new memory architectures !)

    This has latency at 100-800 cycles, it really depends what you are measuring, fully cached best performance, general performance or worst performance. With todays processors we can only really give statistics not absolutes.

    http://www.digit-life.com/articles2/cpu/rmma-p4-latency.html

    You explanation is woefully oversimplified, even my explanations miss all the advanced features of modern processors. The caching, instruction re-ordering and branch prediction logic is quite complex...
     
  7. Fergal1982

    Fergal1982 Petabyte Poster

    4,196
    172
    211
    Meh, I dont really know, my understanding of processors is fairly limited to be honest - always been much more software based. Thats just how I always seemed to think of it.
     
    Certifications: ITIL Foundation; MCTS: Visual Studio Team Foundation Server 2010, Administration
    WIP: None at present
  8. dmarsh
    Honorary Member 500 Likes Award

    dmarsh Petabyte Poster

    4,305
    503
    259
    Its all code isn't it ? :biggrin

    How would you write a JIT or an assembler without this knowledge ?
     
  9. Fergal1982

    Fergal1982 Petabyte Poster

    4,196
    172
    211
    Simple answer? I wouldnt. When I come to a point where I need to learn it, I'll learn it, but right now, I have no reason to do so, and neither have I had need to up until this point.
     
    Certifications: ITIL Foundation; MCTS: Visual Studio Team Foundation Server 2010, Administration
    WIP: None at present
  10. Mof

    Mof Megabyte Poster

    526
    2
    49
    I believe hes talking about the 8088
     
    WIP: C++ and A+
  11. hbroomhall

    hbroomhall Petabyte Poster Gold Member

    6,624
    117
    224
    I remember the 6502 had just one clock cycle for some instructions.

    The problem with the 8086 line of processors is that they have changed dramaticaly down the years, which is hardly suprising. So any description of how they work would have to be fairly simplistic if it was too general!

    Harry.
     
    Certifications: ECDL A+ Network+ i-Net+
    WIP: Server+
  12. dmarsh
    Honorary Member 500 Likes Award

    dmarsh Petabyte Poster

    4,305
    503
    259
    I guess you could have been describing the Fetch-Decode-Execute (FDX) cycle :-

    http://en.wikipedia.org/wiki/Instruction_cycle

    Heres a good summary at last :-

    http://en.kioskea.net/pc/processeur.php3

    The FDX cycle does not have to occur in one clock cycle in a pipelined design, the 'cycle' is used in the metaphorical sense, because its a continuous process :-

    Other design ideas like superscalar mean that under optimum conditions many instructions can be performed in parallel in one cycle.

    One thing the processor can be fairly certain of is there will be more instructions, so there is a fetch buffer or instruction cache and much work in recent years has been put into branch prediction as a branch can make most of the instruction cache and the pipeline state irrelevant. Some instructions can also be performed in parallel or out of order to optimise performance, this is a sort of hardware parallelism where the processor determines the synch points. This is necessary to make use of the superscallar design where there are multiple execution units.

    The processors have indeed changed a lot since the days of the 8086 and 68000 which were the processors in vogue when I was learning assembler. The RISC designs did indeed have a different approach making for easier to understand assembler in my mind.

    http://www.gamasutra.com/features/wyatts_world/19990528/pentium3_04.htm

    This shows that SIMD instructions can indeed be issued once per cycle under optimum conditions.

    http://homes.esat.kuleuven.be/~cosicart/pdf/AB-9600.pdf

     
  13. Mathematix

    Mathematix Megabyte Poster

    969
    35
    74
    you guys do realise that the fetch-decode-execute (or fetch-execute) cycle is very different from the number of cycles taken to execute an instruction, right?

    Unless there are parallel processes going on under the hood then the fetch-execute cyle will always take more than one clock cycle for a serial architecture, whereas executing a single instruction in assembly like incrementing a value can be executed in one cycle.

    Rather than going into the detail of Intel vs. AMD architectures which obscure important information, I'd research the following:

    1. Reduced instruction set computers (RISC) - examples being the UNIX based Sun SPARCstation. My most favourite computer at University! Did all my programming on it. :biggrin
    2. Complex instruction set computers (CISC) - examples being the humble PCs that we know and love.

    3-2-1 - research! :dry
     
    Certifications: BSc(Hons) Comp Sci, BCS Award of Merit
    WIP: Not doing certs. Computer geek.
  14. dmarsh
    Honorary Member 500 Likes Award

    dmarsh Petabyte Poster

    4,305
    503
    259
    Well I did my best to explain it but like I said its been a while ! :oops:

    My original links and the latest links detail the superscalar, pipeling and CISC vs RISC arguments that I think are most important to understand the subject.

    The Pentium series of processors are Hybrid, they are a CISC instruction set on a RISC core.

    Yes, I thought I explained that FDX is not a clock cycle to fergal, again in a roundabout way ! In some early processors as well as small microcontrollers is quite possible that the instruction cycle and the FDX cycle could be closely linked but there will normally be at least a two stage pipeline.

    Linking the two concepts completely would effectively halve the performance of the microprocessor while not dropping the complexity or transistor count by very much.

    In summary to the original question I don't know what mike myers was trying to say. The 68000 had a minimum instruction cycle whereby the simplist instructions time in cycles is multiplied by the time for one cycle. Maybe he was trying to say the simplist instruction takes 2 cycles, but as far as I can tell this is not true, it could be one or more depending on the circumstances. As has also been said, the PC's architecture has changed so much over the years that any comments without specifying a particular architecture are meaningless.

    So yes I'd agree with Math and say learn the theory, but thats not what the question was about, it specifed a 2 cycle detail. If Mike was talking about theory and the Fetch-Execute cycle and the two stage pipeline that it indicates, than thats what he should have said ! Note the pentium series does not have a two stage pipeline ! This is the problem with certs, sometimes really you need to learn the theory to really understand. The 'lets skip the computer science bit' doesn't always cut it.
     
  15. Fergal1982

    Fergal1982 Petabyte Poster

    4,196
    172
    211
    Perhaps the OP could quote the passage in the AIO book which mentions the 2 cycle minimum?
     
    Certifications: ITIL Foundation; MCTS: Visual Studio Team Foundation Server 2010, Administration
    WIP: None at present
  16. Mathematix

    Mathematix Megabyte Poster

    969
    35
    74
    One instruction per cycle. :biggrin
     
    Certifications: BSc(Hons) Comp Sci, BCS Award of Merit
    WIP: Not doing certs. Computer geek.
  17. Fergal1982

    Fergal1982 Petabyte Poster

    4,196
    172
    211
    Ah, but what I mean is that Mof can point out the section he is talking about. Perhaps we might be able to shed a better light on exactly what is being talked about.
     
    Certifications: ITIL Foundation; MCTS: Visual Studio Team Foundation Server 2010, Administration
    WIP: None at present
  18. dmarsh
    Honorary Member 500 Likes Award

    dmarsh Petabyte Poster

    4,305
    503
    259
    (Mike Myers)

    Personally I would have said :- "In theory one unit of work can be performed by the processor per cycle. A cycle is one pulse of the system clock. The system clock is normally a quartz crystal that is used as a square wave signal generator. The speed of the system clock is measured in cycles per second or Hertz. This is also used as a rough performance rating of the processor. Processors should be run at the manufactuers approved speed rating. The commands a processor can execute are called instructions, an instruction can take one or more clock cycles to complete."

    I don't know where he gets the number two from or why he thinks its significant in this context...
     
  19. Mathematix

    Mathematix Megabyte Poster

    969
    35
    74
    In its simplest terms it says that a logic gate's state can change in one CPU clock cycle, which is perfectly reasonable. I'll try to present an example of how states can change in one cycle.

    You guys recall 'twos-complement', or 'addition by subtraction' when subtracting a pair of binary numbers. In a complex instruction set machine it would have one instruction to perform this:

    sub b, a

    the above is pseudo-assembly which says to subtract 'a' from 'b' and store the result in 'b'.

    Now, for a reduced instruction set computer will break the above into something like:

    str a
    str b
    not b (this 'not' can be performed in one clock cycle!)
    add b, 1 (and maybe this as well)
    add b, a


    This is a very rough example that illustrates the reasons why RISC machines were produced for execution speed at the expense of longer assembly code. (Do not take this particular example as a real world implementation because it isn't!)

    Of course, subtraction is included in RISC instruction sets. Intensive instructions like multiplications and divisions are replaced with additions, substration, bitshifts, etc.
     
    Certifications: BSc(Hons) Comp Sci, BCS Award of Merit
    WIP: Not doing certs. Computer geek.
  20. dmarsh
    Honorary Member 500 Likes Award

    dmarsh Petabyte Poster

    4,305
    503
    259
    While its true that a transistor can only change state once per cycle this does not limit the complexity of the processes that can be undertaken in one clock cycle given enough transistors.

    This is the old RISC vs CISC argument

    http://cse.stanford.edu/class/sophomore-college/projects-00/risc/risccisc/

    It is now widely accepted that its not really the 'reduced instruction set' concept thats important, its the resulting simplicities in design and transistor count that allow for other optimisations. In some situations like FPUs and GPUs the more complex instructions and the transistor counts they require are justified. Larger programs can in themselves cause slowdown as the instruction cache must be bigger.

    Here is a more up to date definition :-

    http://arstechnica.com/cpu/4q99/risc-cisc/rvc-1.html
     

Share This Page

Loading...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.