아파치 애로우

아파치 애로우
개발자	Apache 소프트웨어 재단
초기 릴리즈	2016년 10월 10일, 5년전( 10월 10일
안정된 릴리스	7.0.0 / 2022년 2월 8일; 5개월 전 ()
저장소	https://github.com/apache/arrow
기입처	C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, Rust
유형	데이터 형식, 알고리즘
면허증.	Apache 라이센스 2.0
웹 사이트	arrow.apache.org

Apache Arrow는 컬럼 데이터를 처리하는 데이터 분석 응용 프로그램을 개발하기 위한 언어에 구애받지 않는 소프트웨어 프레임워크입니다.표준 컬럼 지향 메모리 포맷이 포함되어 있어 최신 CPU ^[2]^[3]^[4]^[5]^[6]및 GPU 하드웨어에 대한 효율적인 분석 작업을 위해 플랫하고 계층적인 데이터를 나타낼 수 있습니다.이것에 의해, 다이나믹 랜덤 액세스 ^[7]메모리의 코스트, 변동성, 물리적인 제약 등, 대량의 데이터 세트를 취급할 가능성이 제한되는 요인이 경감 또는 배제됩니다.

상호 운용성

Arrow는 Apache Parquet, Apache Spark, NumPy, PySpark, Panda 및 기타 데이터 처리 라이브러리와 함께 사용할 수 있습니다.이 프로젝트에는 C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R, Ruby 및 Rust로 작성된 네이티브 소프트웨어 라이브러리가 포함됩니다.Arrow를 사용하면 이러한 언어와 ^[2]시스템 간에 시리얼라이제이션 오버헤드를 발생시키지 않고 제로 카피 읽기, 고속 데이터 액세스 및 교환이 가능합니다.

적용들

Arrow는 분석,^[8] 유전체학,^[9]^[7] 클라우드 컴퓨팅 ^[10]등 다양한 분야에서 사용되어 왔습니다.

Apache Parquet 및 ORC와의 비교

Apache Parquet 및 Apache ORC는 온디스크 컬럼 데이터 형식의 일반적인 예입니다.Arrow는 메모리 ^[11]내 데이터 처리를 위한 이러한 형식을 보완하기 위해 설계되었습니다.인메모리 처리를 위한 하드웨어 자원 엔지니어링의 단점은 온디스크 ^[12]스토리지와 관련된 단점과 다릅니다.Arrow 및 Parquet 프로젝트에는 두 ^[13]형식 간에 데이터를 읽고 쓸 수 있는 라이브러리가 포함되어 있습니다.

거버넌스

Apache Arrow는 2016년 2월 17일 Apache ^[14]Software Foundation에 의해 발표되었으며, 다른 오픈 소스 데이터 분석 ^[15]^[16]^[6]^[17]^[18]프로젝트의 개발자 연합이 개발을 주도했습니다.초기 코드베이스와 Java 라이브러리는 Apache ^[14]Drill의 코드로 시드되었습니다.

레퍼런스

^ "Apache Arrow 7.0.0 Release". 8 February 2022. Retrieved 15 April 2022.
^ ^a ^b "Apache Arrow and Distributed Compute with Kubernetes". 13 Dec 2018.
^ Baer, Tony (17 February 2016). "Apache Arrow: Lining Up The Ducks In A Row... Or Column". Seeking Alpha.
^ Baer, Tony (25 February 2019). "Apache Arrow: The little data accelerator that could". ZDNet.
^ Hall, Susan (23 February 2016). "Apache Arrow's Columnar Layouts of Data Could Accelerate Hadoop, Spark". The New Stack.
^ ^a ^b Yegulalp, Serdar (27 February 2016). "Apache Arrow aims to speed access to big data". InfoWorld.
^ ^a ^b Tanveer Ahmad (2019). "ArrowSAM: In-Memory Genomics Data Processing through Apache Arrow Framework". bioRxiv: 741843. doi:10.1101/741843.
^ Dinsmore T.W. (2016). "In-Memory Analytics". In-Memory Analytics. In: Disruptive Analytics. Apress, Berkeley, CA. pp. 97–116. doi:10.1007/978-1-4842-1311-7_5. ISBN 978-1-4842-1312-4.
^ Versaci F, Pireddu L, Zanetti G (2016). "Scalable genomics: from raw data to aligned reads on Apache YARN" (PDF). IEEE International Conference on Big Data: 1232–1241.
^ Maas M, Asanović K, Kubiatowicz J (2017). "Return of the runtimes: rethinking the language runtime system for the cloud 3.0 era". Proceedings of the 16th Workshop on Hot Topics in Operating Systems (ACM): 138–143. doi:10.1145/3102980.3103003.
^ Le Dem, Julien. "Apache Arrow and Apache Parquet: Why We Needed Different Projects for Columnar Data, On Disk and In-Memory". KDnuggets.
^ "Apache Arrow vs. Parquet and ORC: Do we really need a third Apache project for columnar data representation?". 2017-10-31.
^ "PyArrow:Reading and Writing the Apache Parquet Format".
^ ^a ^b "The Apache® Software Foundation Announces Apache Arrow™ as a Top-Level Project". The Apache Software Foundation Blog. Archived from the original on 2016-03-13.
^ Martin, Alexander J. (17 February 2016). "Apache Foundation rushes out Apache Arrow as top-level project". The Register.
^ "Big data gets a new open-source project, Apache Arrow: It offers performance improvements of more than 100x on analytical workloads, the foundation says". 2016-02-17.
^ Le Dem, Julien (28 November 2016). "The first release of Apache Arrow". SD Times.
^ "Julien Le Dem on the Future of Column-Oriented Data Processing with Apache Arrow".

외부 링크

Apache Arrow 프로젝트 웹 사이트
Apache Arrow GitHub 프로젝트 소스 코드

[wikidata-26c670337a9c23b54b3e679ed0beebb148a39372-v3-1] "Apache Arrow 7.0.0 Release". 8 February 2022. Retrieved 15 April 2022.

[xenonstack-2] "Apache Arrow and Distributed Compute with Kubernetes". 13 Dec 2018.

[seekingalpha-3] Baer, Tony (17 February 2016). "Apache Arrow: Lining Up The Ducks In A Row... Or Column". Seeking Alpha.

[zdnet-4] Baer, Tony (25 February 2019). "Apache Arrow: The little data accelerator that could". ZDNet.

[5] Hall, Susan (23 February 2016). "Apache Arrow's Columnar Layouts of Data Could Accelerate Hadoop, Spark". The New Stack.

[infoworld-6] Yegulalp, Serdar (27 February 2016). "Apache Arrow aims to speed access to big data". InfoWorld.

[biorxiv-7] Tanveer Ahmad (2019). "ArrowSAM: In-Memory Genomics Data Processing through Apache Arrow Framework". bioRxiv: 741843. doi:10.1101/741843.

[8] Dinsmore T.W. (2016). "In-Memory Analytics". In-Memory Analytics. In: Disruptive Analytics. Apress, Berkeley, CA. pp. 97–116. doi:10.1007/978-1-4842-1311-7_5. ISBN 978-1-4842-1312-4.

[9] Versaci F, Pireddu L, Zanetti G (2016). "Scalable genomics: from raw data to aligned reads on Apache YARN" (PDF). IEEE International Conference on Big Data: 1232–1241.

[10] Maas M, Asanović K, Kubiatowicz J (2017). "Return of the runtimes: rethinking the language runtime system for the cloud 3.0 era". Proceedings of the 16th Workshop on Hot Topics in Operating Systems (ACM): 138–143. doi:10.1145/3102980.3103003.

[11] Le Dem, Julien. "Apache Arrow and Apache Parquet: Why We Needed Different Projects for Columnar Data, On Disk and In-Memory". KDnuggets.

[12] "Apache Arrow vs. Parquet and ORC: Do we really need a third Apache project for columnar data representation?". 2017-10-31.

[13] "PyArrow:Reading and Writing the Apache Parquet Format".

[:0-14] "The Apache® Software Foundation Announces Apache Arrow™ as a Top-Level Project". The Apache Software Foundation Blog. Archived from the original on 2016-03-13.

[reg17Feb2016-15] Martin, Alexander J. (17 February 2016). "Apache Foundation rushes out Apache Arrow as top-level project". The Register.

[16] "Big data gets a new open-source project, Apache Arrow: It offers performance improvements of more than 100x on analytical workloads, the foundation says". 2016-02-17.

[17] Le Dem, Julien (28 November 2016). "The first release of Apache Arrow". SD Times.

[18] "Julien Le Dem on the Future of Column-Oriented Data Processing with Apache Arrow".

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

v t Apache 소프트웨어 재단
톱 레벨 프로젝트	누계 액티브 MQ 에어플로우 암바리 개미. 양 자리 화살표 Apache HTTP 서버 APR 에이브로 축. 축2 비임 블러드하운드 브루클린 빌드러 석회석 낙타 카본 데이터 카산드라 카이엔 화학 클라우드 스택 고치 코르도바 카우치 DB 테이크 CXF 더비 디렉토리 드릴 드루이드 엠파이어-db 펠릭스 유연성 플링크 플룸 프리마커 제로니모 기라프 검프 하둡 HBase 나선형 하이브 임팔라 잭래빗 제임스 예나 지니 JMeter 카프카 쿠두 카일린 루센 마호트 메이븐 미나 mod_filename(모드) 마이페이스 넷빈즈 너치 동작 우지 오픈 EJB Open JPA OpenNLP 오엔오피스 ORC PDF 박스 파르케 피닉스 POI 돼지. 피노 피벗 큐피드 롤러 로켓MQ 삼자 서비스 믹스 시로 싱가 슬링 솔 스파크 폭풍 스팸 어쌔신 스트럿 1 스트럿 2 전복 슈퍼셋 시스템DS 태피스트리 알뜰 티카 톰캣 트라포디온 트래픽 서버 UIMA 속도 위켓 잘란 엑스르체 XMLBeans 예투스 동물원 관리인
공통	BCEL BSF 데몬 젤리 로깅
인큐베이터	MXNet 너트X 타베르나
기타 프로젝트	바틱 전기톱 FOP 아이비 로그4j
다락방	압데라 에이펙스 AxKit 벌집 푸른하늘 아이바티스 C++ 표준 라이브러리 선인장 클릭 연속체 델타클라우드 식각 엑스칼리버 포레스트 하마 하모니 하이브 마인드 자카르타 레냐 마못타 ODE 셰일 신디그 미끄러지다 스쿠프 스탠볼 투스카니 파도. 윙크
라이선스	Apache 라이선스
카테고리

Search