The Nexus line is due for an update, with each product being released for at least a year. They are devices which embody Google's vision… for their own platform. You can fall on either side of that debate, whether it guides OEM partners or if it is simply a shard the fragmentation issue, if you even believe that fragmentation is bad, but they are easy to recommend and a good benchmark for Android.
We are expecting a few new entries in the coming months, one of which being the Nexus 9. Of note, it is expected to mark the return of HTC to the Nexus brand. They were the launch partner with the Nexus One and then promptly exited stage left as LG, Samsung, and ASUS performed the main acts.
We found this out because NVIDIA spilled the beans on their lawsuit filing against Qualcomm and Samsung. Apparently, "the HTC Nexus 9, expected in the third quarter of 2014, is also expected to use the Tegra K1". It has since been revised to remove the reference. While the K1 has a significant GPU to back it up, it will likely be driving a very high resolution display. The Nexus 6 is expected to launch at around the same time, along with Android 5.0 itself, and the 5.2-inch phone is rumored to have a 1440p display. It seems unlikely that a larger, tablet display will be lower resolution than the phone it launches alongside — and there's not much room above it.
The Google Nexus 9 is expected for "Q3".
They need to start focusing
They need to start focusing on getting everybody over to 64 bit, everything else means nothing.
The Tegra K1 has two
The Tegra K1 has two different SKUs, one with 4(32 bit ARM reference cores), and one with the custom “Denver” cores 2(64 bit custom cores). So what SKU will the Nexus 9 have?
The GPU used in the both Tegra K1 variants is the same, but are the 32 bit ARM reference cores as power efficient as the Denver custom cores, I’m reading online that the Denver cores are more efficient with power usage, and have extra power gating abilities(Useful for mobile form factor devices).
Also, What will this K1 be clocked at, to be able to run in the restricted thermal/power envelope of a Smartphone? Hopefully the battery will be big, smartphones are light and thin enough, and for sure the K1 will not have problems with graphics, and having the ability to utilize regular desktop version OpenGL, etc. is a unique ability on the Tegra K1s graphics in the mobile arena. The Apple A7 is a 6 wide superscalar design(6 IPC), the A8 is still being reviewed, but is at least 6 wide(6 IPC, maybe more ?), while the K1 custom Denver core is 7 wide(7 IPC). The Denver core has a more efficient branch pipeline/s with a 13 cycle mispredict penalty compared to the 32 bit ARM reference core K1 variant’s 15 cycle mispredict penalty. The Denver core’s dynamic code optimization hardware allows the decoded instructions to be cached into system memory, in Microcode format, so the instructions do not have to be continuously be re-decoded(This feature removes the space restrictions on out of order execution window Limits). The Denver custom core also has a larger L1 cache 128k+64K, to the 32 bit ARM core L1 of 32+32, and the Denver core has Floating point/Neon(SIMD) 2×128, compared to the 32 bit core’s 2X64. Even running 32 bit legacy code the Denver will be more efficient, able to fetch 2 32 bit instructions in one memory/fetch access cycle, plus execute more IPCs over twice the number of the 32 bit ARM reference design cores in the 32 bit K1 variant. The Nvidia Denver custom cores are very impressive if you go and read the Hot Chips symposium’s white papers on the Denver core’s microarchitecture, and should keep Apple on its toes, in the tablet market.
Note: don’t go there with this 64 bit has more ability to address more memory, the size of a CPU’s data BUS and Its standard word size( 64, 32, other size internal registers), has nothing to do with the amount of memory that can be accessed, as that is determined by the width of the address bus, and a 32 bit address bus can directly address 4 GB of memory, without the help of hardware memory virtualization logic, and page table management hardware.
Having a 64 bit instruction width, does however, have some advantages in the size limit of a block of contiguous code, for immediate types of machine language Op codes, that store an address/address offset as part of the Op code/immediate data instruction, and 64 bit processors can have more the 4 gigs of effective addressability(near or far calls) than processors with 32 bit instruction lengths(without having to do extra memory fetch/s) Single application Code limits rarely, if ever, get anywhere near 4 gigs in total size, and most software is in the kilobyte to low megabyte range for code size, and to save memory most code is made of assemblies of .DLL and .exe files.
The cortex A15’s NEON SIMD
The cortex A15’s NEON SIMD pipeline is also 128 bit (4×32). It has 2 pipelines the same as Denver/Cyclone/Cortex A57. Denver may be able to feed those pipes better and also has a higher clock. So SIMD performance is expectedly higher, but not because it has a more advanced NEON unit. In a highly threaded environment I.e. 3 or 4 threades the SIMD power of the quad A15s would beat the dual core denver and the dual core Cyclone in A7 and A8. Keep in mind a NEON pipeline is always 128 bits. However it comes down to the implementation. Cortex A7 has a single pipe per core that’s clocked lower (less stages) Cortex A15/Denver/Cyclone/Krait have 2 pipelines that are clocked higher (have more stages) They are all 128 bit pipelines.
The A15 is a 32 bit chip, 32
The A15 is a 32 bit chip, 32 bit data BUS and word size(standard registers, not specialized), and do you have any proof of that the Custom A7’s, or Custom Denver’s implementation of the ARMv8 NEON(SIMD) are implemented exactly the same as the ARM holdings’ Reference design microcode/microarchitecture, or any info about how the custom A7/A8’s, or Denver’s processors/SOC handle these instructions. The only thing that Apple, and Nvidia Licensed from ARM holdings, was the ARMv8 ISA, and not any ARM holdings reference design Microarchitecture, the A7/A8, as well as Denver are all completely custom microarchitectures that happen to be able to execute the ARMv8 ISA, so please link to any white papers, we do not have Anand Lal Shimpi to do the heavy lifting for us anymore, and short of any Hot Chips symposium presentations for the A8, there is little info on Apples product, the Nvidia K1 has a few white papers, and presentations at Hot Chips, that I have read. The Nvidia K1 Denver is a much different beast than the Apple A7/A8, and any Custom implementation of the K1(Denver), or A7/A8, has very little in common with the ARM holdings’ reference design/microarchitecture implementation of ARMv7, or ARMv8 ISA, just as AMD’s implementation of x86 microarchitecture is different from Intel’s x86 microarchitecture, other than the ability to run the x86 16/32/64 bit ISA.
The Nvidia PDF/Slides presentation by Darrell Boggs, CPU Architecture, Co-authors: Gary Brown, Bill Rozas, Nathan Tuck, K S Venkatraman, is the source for some Nvidia K1 info, at Hot Chips 2014. Sorry for not having the web reference to the PDF I downloaded, but if you Google Hot Chips, and the lead author’s name, there should be links to the presentation.
Any CPU’s/other’s ISA can be implemented entirely differently in a microcode/microarchitecture that is different, sometimes radically different, from any other CPU’s/others’ implementation of microcode/microarchitecture, Of the Same ISA, as long as the Base Rules of execution of the ISA are not violated, any standard ISA(ARM, x86, others), can be implemented in an unlimited variety of ways in custom microcode/microarchitecture. The software(compiled to assembly op codes of the ISA) never sees the implementation in the hardware/microcode, it is only executed according to the ISA’s general rules of basic execution that must be the same, but underneath that’s where differentiation between competing products happens, both Apple’s and Nvidia’s custom implementations are where the value is added, and soon AMD will be adding some custom value to the ARMv8 ISA of its own, and what does Arm Holdings care, Arm Holdings still gets a cut of every custom SOCs/CPUs that uses the Any ARM ISA, that and the initial licensing fees, for Both reference designs, and custom designs that only implement the Licensed ARM ISAs(ARMv7,ARMv8,etc).
Please note that Nvidia, and Apple probably have some custom Op codes(Extensions to the ISA) that only they use internally, but what CPUs/SOCs/others do not, as long as it does not affect the general purpose application ecosystem, and is reserved for OS/other specialized functionality that is transparent to the general purpose code base.
P.S. I’m down on my knees
P.S. I’m down on my knees praying(And I’m Not religious), that Nvidia gets a power8 license, if you want the read what the Power8(RISC) processor can do with 8 threads per core(SMT) and 12 cores, just read some of the Hot Chips presentations on the Power8, and the Power8 reference designs are up for anyone to license, through OpenPower. It is no wonder, that Google is evaluating the Power8 server chips, and with ARM style licensing on the Power8, Nvidia could enter the PC CPU/SOC market, with very little extra engineering needed on the Power8’s already powerful performance metrics. Both Apple(P.A. Simi engineers), and
Nvidas’ SOC/GPU engineers could have a field day with the Power8 design, for some derived laptop, and PC SKUs, and Think Nvidia’s graphics on a PC bound SKU, Nvidia is already working with IBM, the licenser of the Power8 reference designs, to implement its GPUs as accelerators for the IBM made Power8s, so Nvidia could license its own Power8’s and make Power8 based home gaming servers/systems, the engineering work is already being done with IBM, and what does IBM care about, it gets revenue for licensing the Power8, to others, just wait for 2015, and see the others that will be making power8s.
Yes Cortex A15 is a 32 bit
Yes Cortex A15 is a 32 bit processor in its integer cluster but its NEON SIMD unit has 128 bit pipelines. The same as Cortex A5 Cortex A7, Cortex A8, Cortex A9, A12, A17, A53, A57 Swift Krait and Cyclone. I can judge beyond a shadow of a doubt that Cyclone in the A7 uses 128 bit pipes because it benchmarks like a dual core with 2 SIMD pipes. I.E. in floating point heavy workloads that are highly threaded it has less than half the performance of its quad core cousins. Even in single threaded benchmarks that are float heavy it has lower performance because of its low clock speed. It acts like a 1.3-1.4 GHz dual core with 2 128 bit SIMD pipes per core because that is exactly what it is. The general consensus by both Anandtech and Toms is that the “new cyclone” is mostly just a dieshrink on the new 20nm process. BTW I also love IBM Power!
Excuse me I meant Cyclone
Excuse me I meant Cyclone performs like it has 3 128 bit NEON pipes/alus. Which is in contrast to Denver’s 2 pipes /alus. However Cyclone is clocked between 1.3-1.4 GHz as apposed to Denver clocked at 2.5 GHz. 2.5Ghz x 8 flops/clock x 2 vs 1.4 GHz x 12 flops/clock x 2.. You do the math on the SIMD results. Denver has the higher performance there. Assuming their architecture can remain fed. Which I hope they can cause I absolutely love my Tegra 4 and plan to upgrade to 64 bit Tegra K1 or to Erista.
the processor 64 Bit NVIDIA
the processor 64 Bit NVIDIA Tegra K1, is the most powerful one can find now on Android.